• 検索結果がありません。

2. A Model and Pearson's

N/A
N/A
Protected

Academic year: 2022

シェア "2. A Model and Pearson's"

Copied!
13
0
0

読み込み中.... (全文を見る)

全文

(1)

Reprints Available directly from the Editor. Printed in New Zealand.

Extensions to the Kruskal-Wallis Test and a

Generalised Median Test with Extensions

J.C.W.RAYNER john [email protected]

Department of Applied Statistics, University of Wollongong, Northelds Avenue, NSW 2522, Australia

D.J.BEST [email protected]

CSIRO Mathematical and Information Sciences, PO Box 52, North Ryde, NSW 2113, Australia

Abstract. The data for the tests considered here may be presented in two-way contingency ta- bles with all marginal totals xed. We show that Pearson's test statisticX2P (P for Pearson) may be partitioned into useful and informative components. The rst detects location dierences be- tween the treatments, and the subsequent components detect dispersion and higher order moment dierences. For Kruskal-Wallis-type data when there are no ties, the location component is the Kruskal-Wallis test. The subsequent components are the extensions. Our approach enables us to generalise to when there are ties, and to when there is a xed number of categories and a large number of observations. We also propose a generalisation of the well-known median test. In this situation the location-detecting rst component ofXP2 reduces to the usual median test statistic when there are only two categories. Subsequent components detect higher moment departures from the null hypothesis of equal treatment eects.

Keywords: Categorical data, Components, Non-parametric tests, Orthonormal polynomial, Two-way data.

1. Introduction

The idea of decomposing a test into orthogonal contrasts, as in the analysis of variance, has long been appreciated by statisticians as a way of making hypothesis tests more informative. In the authors' smooth goodness of t work (see Rayner and Best, 1989), a similar approach is pursued. Omnibus test statistics are par- titioned into smooth components. We dene the components of a test statistic to be asymptotically pairwise independent, with each asymptotically having the chi-squared distribution, and such that their sum gives the original test statistic.

The components provide powerful directional tests and permit a convenient and informative scrutiny of the data. This approach is applied to Spearman's test in Best and Rayner (1996) and Rayner and Best (1996a) Rayner and Best (1996b) gave an overview of this approach applied to several commonly used nonparametric tests, including the Friedman and Durbin tests.

Data for a generalisation of the median test that we subsequently propose, and for the Kruskal-Wallis test both with and without ties, may be presented in the form of two-way tables with xed marginal totals. We derive the covariance matrix

(2)

of entries in such tables and then partition a multiple ofXP2 into components that detect location and higher moment dierences between rows.

For Kruskal-Wallis-type data when there are no ties, the location component is the Kruskal-Wallis test. Our approach enables us to generalise to when there are ties, and to when there is a xed number of categories and a large number of observations. We also propose a generalisation of the well-known median test. The location detecting rst component ofXP2 reduces to the usual median test statistic when there are only two categories. Using more categories allows components other than this location component to be calculated. These additional components, that detect dispersion and higher moment eects, are not available when using the usual median test.

The structure of this paper is as follows. In the next section the model for two- way contingency tables with xed marginal totals is given, and Pearson's XP2 is derived as a test statistic for the null hypothesis of like rows. In section three a multiple of XP2 is partitioned into components. The material in section 2 and 3 will be familiar to many readers, but is necessary background for the new work. In section four it is shown that when there are no ties the rst component is the usual Kruskal-Wallis statistic. The non-location detecting components are our extensions.

Section ve generalises the treatment to when there are ties. Section six introduces a generalisation of the usual medianX2test, which is thus identied as a location detecting test the extensions permit dispersion and other eects to be detected.

2. A Model and Pearson's

X2

Test

Suppose we have a two-way table of countsNij, withi= 1 ::: randj= 1 ::: c. The row and column totals, respectively ni:, i = 1 ::: r and n:j, j = 1 ::: c are known constants. Under the null hypothesis of simple random sampling, the likelihood was given by Roy and Mitra (1956) as

(

r

Y

i=1 n

i:

) 8

<

: c

Y

j=1 n

:j 9

=

= 8

<

: n

::

r

Y

i=1 c

Y

j=1 n

ij 9

=

in which n:: =Pini:=Pjn:j is the grand total of the observations. The mod- els for tables with just one set of marginal totals xed, or only the grand total xed, are quite dierent from our model in which all row and column totals are xed. See Lancaster (1969, chapter XI section 2, pp. 212-217). This likelihood can be expressed as a product of extended or multivariate hypergeometric probability functions:

r

Y

i=2 8

<

: 2

4 c

Y

j=1

n1j+:::+nij

C

n

ij 3

5

=

n1:+:::+ni:

C

n

i:

9

=

:

To nd moments of the Nij, expectations may be taken with respect to the distribution of the second row conditional on knowledge of the column sums of the

(3)

rst two rows, then conditional on the column sums of the rst three rows, and so on. It suces to know the moments of the extended hypergeometric distribution.

Details are given in the Appendix. We nd

ENij] =ni:n:j=n:: i= 1 ::: r and j= 1 ::: c:

WriteNi = (Ni1 ::: Nic)T,i = 1 ::: rand NT = (NT1 ::: NTr), so that N is the vector of all the cell counts. The joint covariance matrix ofNi and Nj is, for

i6=j,

cov(Ni Nj) =;ni:nj:

n 2

::

diag

n

:r n

(n::;::1)

;

n

:r n

:s

n

::

;1

:

Writefj=n:j=n:: j= 1 ::: c, and

R= diag

n

:r n

(n::;::1)

;

n

:r n

:s

n

::

;1

:

The covariance matrix of N is cov(N) = fdiag(fj);(fifj)gR, where is the direct or Kronecker product. See Lancaster (1969) for details about direct or Kronecker sums and products. Now dene the standardised cell counts

Z

ij= (Nij;ENij])=qENij] i= 1 ::: r and j= 1 ::: c

Z= (Z11 ::: Z1c ::: Zr1 ::: Zrc)T

I

a theabyaidentity matrix and

1

a theaby 1 vector with every element 1. Then

cov(Z) =fIr;(qfifj])gR :

The matrixfIr;(pfifj])ghasr;1 latent roots 1 and one latent root zero. The latent roots ofRare dicult to nd in general, but their asymptotic limits follow from Lancaster (1969, Chapter V.3). Lancaster showed that the quadratic form with vector the standardised cell counts and matrix essentiallyR, is the familiar Pearson goodness of t statistic, with asymptotic distribution 2c;1. Hence the latent roots ofR are asymptotically onec;1 times and zero once. So under the null hypothesis of simple random sampling,Zhas zero mean and covariance matrix

cov(Z), which asymptotically has (r;1)(c;1) latent roots one, and the remaining

r+c;1 latent roots zero.

In the well known and often used `classical' model,randcare xed and the total countn::!1. The test statisticXP2 is given by

X 2

P =Xr

i=1 c

X

j=1

N

ij

; n

i:

n

:j

n

::

2

=

n

i:

n

:j

n

::

=ZTZ:

We now conrm that our model leads to this test statistic. SupposeHis orthogonal and diagonalisescov(Z). Asymptotically we then have

(4)

H

Tcov(Z)H=I(r;1)(c;1)O(r+c;1)

where means direct or Kronecker sum. Dene Y = HTZ. Now ZTZ =

Y T

Y, in which Y, by the multivariate Central Limit Theorem, is asymptotically

N

rc(0 I(r;1)(c;1)0(r+c;1)]) under the null hypothesis of simple random sampling.

It follows that under the null hypothesis, XP2 =ZTZ=YTY asymptotically has the2(r;1)(c;1)distribution.

3. Partitioning Pearson's Statistic

We now show that XP2 may be partitioned into components, the sth of which detects sth moment departures from the null hypothesis of similarly distributed rows (treatments).

The elementsYiofY are such thatXP2 =Xrc

i=1 Y

2

i . There is some choice in dening theYi, asH is not yet fully specied. In doing so, our aim is to nd Yi that can be easily and usefully interpreted. To achieve one such partition, rst suppose thatfgs(j)gis the set of polynomials orthonormal onfn:j=n::g. See the Appendix for the denitions of the rst two polynomials and the derivation of subsequent polynomials. This approach results, when there are no ties, in the rst component being the Kruskal-Wallis test. Write gs for thec by 1 vector with elements gs(j).

DeneGby

G= G ::: Gc]=pc in whichGs is thercbyrmatrix

G

s=

2

6

6

6

4 g

s 0 ::: 0 gs ... ... ... ...

0 0 ::: gs

3

7

7

7

5

s= 1 ::: c;1 and

G

c =

2

6

6

6

4

1c 0 ::: 0 1c ... ... ... ...

0 0 ::: 1c

3

7

7

7

5

is also rc by r:

Dene Y = qn;1n GTZ. The elements of Y may be considered in blocks of

r, the sth block corresponding to the polynomial of order s. These blocks are asymptotically mutually independent. WriteYT = (VT1 ::: VTc), in which

V

1= (Y1 ::: Yr)T ::: Vc;1= (Y(c;2)r+1 ::: Y(c;1)r)T

(5)

andVc= 0 (all theVsarerby 1) so that

n;1

n

X 2

P =

n;1

n

Z T

Z=YTY =VT1V1+:::+VTc;1Vc;1: This partitions;n;1n XP2 into componentsVTsVs ,s= 1 ::: c;1. TheVs are asymptotically mutually independent and asymptoticallyNr(0 I(r;1)0), so that theVTsVs are asymptotically mutually independent2r;1. Explicitly we have, for

s= 1 ::: c;1,

V

s=

p(n;1)

n G

T

s Z=

p(n;1)

n 0

@ c

X

j=1 g

s(j)Zij

1

A

:

BecauseVsinvolves, throughgs, a polynomial of orders, the elements ofVsare polynomials of ordersin the elements of N. Under the null hypothesisEZ] = 0, but when this is not true EVs] involves moments up to order s of Z. So for

s= 1 ::: r;1,VTsVsdetectssth moment departures from the null hypothesis of similarly distributed rows (treatments).

Instructors Example

. See Conover (1980, p. 233). Three instructors assign grades in ve categories according to the following table.

Grade

A B C D E Total Instructor 1 4 14 17 6 2 43

Instructor 2 10 6 9 7 6 38

Instructor 3 6 7 8 6 1 28

Total 20 27 34 19 9 109

Conover (1980) found the Kruskal-Wallis statistic adjusted for ties to be 0.3209, which is to be compared with the22 (5%) point of 5.991. We nd the location de- tecting componentVT1V1 to have P-value 0.85, conrming, as Conover reported, that \none of the instructors can be said to grade higher or lower than the oth- ers on the basis of the evidence presented". However the dispersion detecting component VT2V2 has P-value 0.01, indicating a signicant variability dierence.

From the data it appears that the rst instructor is less variable than the other two. In fact, 9:643 = (;2:113)2 + (2:274)2 + (;0:031)2, with the elements of

v

2 = (;2:113 2:274 ;0:031)T being values of approximately standard normally distributed contributions from instructors 1, 2 and 3 respectively. The rst instruc- tor is less variable than the third who is less variable than the second. This can be formalised by a LSD analysis. The residualXP2 ;VT1V1;VT2V2has P-value 0.75, indicating no further eects in the data.

(6)

Partition ofXP2 for instructor's data

Statistics Degrees of Freedom Value P-value

V T

1 V

1 2 0.324 0.85

V T

2 V

2 2 9.643 0.01

X 2

P

;V T

1 V

1

;V T

2 V

2 4 2.021 0.75

X 2

P 8 11.985 0.15

4. The Kruskal-Wallis Test with No Ties

We now consider models that lead to the Kruskal-Wallis test when there are no ties.

The latent roots ofcov(Z) will be found explicitly rather than asymptotically as in section 2. We show thatXP2 is not an appropriate test statistic, but nevertheless, its components are. The rst component is the Kruskal-Wallis test statistic, and the subsequent components provide informative extensions.

Suppose we have distinct observationsxij, being the jth of ni observations on theith oft treatments. Alln=n1+:::+nt observations are combined, ordered, ranked, and the sums Ri of the ranks obtained by the ith treatment calculated.

The Kruskal-Wallis statistic is

H =f12=n(n+ 1)]giR2i=ni;3(n+ 1):

See for example, Conover (1980, section 5.2). The data may be presented as antby

ncontingency table of countsfNijg, withNij = 1 if rankjis allotted to treatment

i, and Nij = 0 if rankj is allotted to some other treatment. The row and column totals are all xed: the row totals are the treatment sample sizes, so thatni:=ni fori= 1 ::: t, while the column totals are all one: n:j = 1 fori= 1 ::: n. Such a table hasXP2 = (t;1)nno matter what thefNijg. SinceXP2 is constant, it has P-value 1. ClearlyXP2 is not a suitable test statistic.

The model of section 2 holds, except that now

R= n

n;1In; 1

n;1

1

n

1

Tn:

This matrix has one latent root one and n;1 latent rootsn=(n;1). It follows thatcov(Z) has (t;1)(n;1) latent rootsn=(n;1), and the remainingt+n;1 latent roots zero. So ifH is orthogonal and diagonalisescov(Z) then

H T

cov(Z)H= (n;n1)

I

(t;1)(n;1)

0(t+n;1): Dene

(7)

Y =

s

n;1

n

H T

Z:

Then

Y T

Y =

n;1

n

Z T

Z=

n;1

n

X 2

P :

With rreplaced byt andcreplaced byn, this is the same as in section three.

As in section 2, we are interested in the distribution theory asn!1. However there Z was an rc by 1 vector of xed length here Z is a tn by 1 vector. For- tunately, it is not the asymptotic distribution of Z that is required. First recall that XP2 has a xed value, (t;1)n, for all tables, and so is not available as a test statistic. Second, as in section three, the multivariate Central Limit Theorem shows that eachVsis asymptoticallyNt(0 I(t;1)0). Moreover consideration of all pairs

V

s, Vtshows that they are asymptotically jointly multivariate normal, and since their covariance matrix is zero, they are asymptotically pairwise independent. The

V T

s V

s still partition ;n;1n XP2. It is the pairwise independence and convenient

2

t;1 distribution of eachVs that makes data analysis so informative and conve- nient. What is lost by the unavailability ofXP2, is demonstrated in the Employees Example below: there is no residual available to assess if there are higher moment dierences between the treatments.

We now show that provided there are no ties,VT1V1is the Kruskal-Wallis statis- tic, so that the subsequent VTsVs provide extensions to the Kruskal-Wallis test.

First note that the fgs(j)g is the set of polynomials orthonormal on the discrete uniform distribution, so thatg1(j) =aj+b, j= 1 ::: n, in which

a=p12=(n2;1) and b=;p3(n+ 1)=(n;1) =;f(n+ 1)=2ga:

The rank sum for treatmenti, Ri, is Xn

j=1 jN

ij, i= 1 ::: t. Now since n:j = 1 for

j= 1 ::: n,

X

j g

1(j)qENij] =pninX

j g

1(j)g0(j)(1=n) = 0 and

X

j Z

ij g

1(j) = pn=ni]PjNij(aj+b) =pn=ni]faRi+bnig

= apn=ni]

R

i

; n+ 1

2 ni

:

(8)

Now

V T

1 V

1 = Y12+:::+Yt2= n;1

n 2

t

X

i=1 0

@ n

X

j=1 g

1(j)Zij

1

A 2

= n;1

n 2

a 2

n t

X

i=1

R

i

p

n

i

; n+ 1

2

p

n

i

2= 12

n(n+ 1)

t

X

i=1 R

2

i

n

i:

;3(n+ 1) after some manipulation. This is the Kruskal-Wallis statistic, well known to be sensitive to location departures from the null hypothesis. Since Vs assesses sth moment departures between treatments, we have partitioned the statistic (n;1n )XP2 into asymptotically pairwise independent components,VTsVs,s= 1 ::: n;1, each with the 2t;1 distribution, and such that the sth detects sth moment departures from the hypothesis of similarly distributed rows (treatments). Since the rst of these is the Kruskal-Wallis statistic, the subsequent components provide extensions to the Kruskal-Wallis test.

Employees Example

. Conover (1980, p. 238, exercise 2) gave an exercise in which 20 new employees are randomly assigned to four dierent job training programmes.

At the end of their training the employees are ranked, with a low ranking reecting a low job ability.

Programme Ranks 1 2, 4, 6, 7, 10 2 1, 3, 8, 11, 12 3 5, 14, 16, 19, 20 4 9, 13, 15, 17, 18

The value of the Kruskal-Wallis statistic is 9.72, with23P-value 0.021, but Monte Carlo permutation test P-value 0.010. The latter is more likely to be accurate as the sample size is small. Further components are not signicant. An LSD analysis can be used to show that programmes 1 and 2 and programmes 3 and 4 are equally eective, with 3 and 4 being superior.

5. The Kruskal-Wallis Test with Ties

If there are ties, the data may be presented as an t by n contingency table of countsfNijg, with the row totals are xed at the treatment sample sizes, so again

n

i:=ni,i= 1 ::: t, while the column totals are no longer all one. The covariance matrix ofZis

cov(Z) =fIt;(qfifj])gR and R=diag

n

:u n

(n::;::1)

;

n

:u n

:v

n

::

;1

:

(9)

As in section 2, the latent roots ofRare zero once and asymptotically onen;1 times. It follows that cov(Z) has (t;1)(n;1) latent roots asymptotically one, and the remaining t+n;1 latent roots zero. With suitable modications the partitioning of section three holds. Fors= 1 ::: n;1,

V

s=GTsZ=pn=Xn

j=1 g

s(j)Zij=pn:

Note thatfgs(j)gis the set of polynomials orthonormal on fn:j=n::g, not on the discrete uniform as in the previous section when there were no ties. This is the partition derived in section 3 for XP2. So the rst component of XP2 in the In- structors example is the Kruskal-Wallis statistic corrected for ties. The subsequent components are extensions to the Kruskal-Wallis test adjusted for ties. Note that for this example the model assumed in section 3, with xed numbers of rows and columns, is more plausible than the model of this section, sincen = 5 is hardly large.

6. Generalised Median Tests

Conover (1980, section 4.3) described the median test, in which random samples are taken from each of c populations. Each random sample is classied as above and below the grand median (the median of the combined random samples), forming anrby 2 contingency table with xed marginal totals. The usual chi-squared test, based onXP2, is then applied to this contingency table.

If instead of the grand median, a `grand quantile' is used, the resulting test is described as a quantile test: see Conover (1980, p. 174). These tests can be gen- eralised by choosingcinstead of two categories for the combined random samples, and so forming anrbyc contingency table of countsNij of the number of obser- vations for the ith sample in the jth category. This table has all row and column totals xed and can be tested for row consistency using the results of the sections 2 and 3. The rst three say, components ofXP2 are of particular interest, indicating location, dispersion and skewness dierences between treatments.

It is routine to show that the location componentVT1V1 of XP2 reduces to the median test statistic when observations are classied into just two categories. This is shown in the Appendix. The result identies the median test as a location detect- ing test. To detect up tosth moment dierences between the populations requires categorisation into s+ 1 categories and the use of theV2 ::: Vs components. If there are as many categories as observations and each category has one observation, the test based on the location component is the Kruskal-Wallis test, which is known to be more powerful than the median test. Using more than two categories will result in less loss of information due to categorisation compared to the median test, and will permit assessment of higher moment dierences between the treatments.

Corn Example

. Conover (1980, p. 172) gave the example of four dierent methods of growing corn. He classied the data as greater than 89 and up to 88 and applied

(10)

the median test. In this form this does not conform to the xed margins model.

If the objective were to divide the data into groups of the lowest 18 and highest 16 observations, it would conform to the xed margins model. We now classify the data into four approximately equal groups.

Using the median test, Conover reported a P-value \slightly less than 0.001": the method median yields are clearly dierent. We calculateXP2 = 49:712 on 9 degrees of freedom. In additionVT1V1= 25:723,VT2V2= 19:972 andVT3V3= 2:574, all on 3 degrees of freedom. The location and dispersion components andXP2 are all signicant, with P-values all zero to three decimal places. The residual or skewness component has23 P-value 0.45. The ner classication, compare to that employed by the median test, has uncovered a variability dierence between the methods:

methods 3 and 4 are signicantly less variable than 1 and 2.

First Second Third Fourth Total Quartile Quartile Quartile Quartile

Method 1 0 3 4 2 9

Method 2 1 6 3 0 10

Method 3 0 0 1 6 7

Method 4 8 0 0 0 8

Total 9 9 8 8 34

Appendix

The Orthogonal Polynomials

The rst two polynomials, dened on x1 ::: xc and orthonormal with regard to the weightsp1 ::: pc, areg1(xj) andg2(xj), given explicitly by:

g

1(xj) = (xj;)=p2 and

g

2(xj) =af(xj;)2;3(xj;)=2;2g j= 1 ::: c in which

=Xc

j=1 x

j p

j

r=Xc

j=1

(xj;)rpj and a=;4+23=2;22;0:5: The subsequent polynomialsg3(xj) ::: gc;1(xj) may be derived by using the useful recurrence relations in Emerson (1968). In the text we have taken, as in many applications,xj=j,j= 1 ::: c.

Derivation of the Covariance Matrix of the Cell Counts

In section 2 the method used to nd the moments of theNij is described. To nd

EN21], we takeEN21jN1j+N2j j = 1 ::: c], then the conditional expectation of this expression with the sum of the rst three columns being known, and so on.

The successive expectations are

(11)

n

2:(N11+N21)=(n1:+n2:)

fn

2:

=(n1:+n2:)gf(n1:+n2:)(N11+N21+N31)=(n1:+n2:+n3:)g ...

fn

2:

=(n1:+n2:)gf(n1:+n2:)=(n1:+n2:+n3:)g:::

f(n1:+:::+n(c;1):)=(n1:+:::+nc:)gfn:1g=n2:n:1=n:::

By symmetryENij] =ni:n:j=n::, i= 2 ::: r andj = 1 ::: c. By dierence the expectations for the rst row may be obtained, giving the familiar

ENij] =ni:n:j=n:: i= 1 ::: r and j= 1 ::: c:

In the same way

EN21(N21;1)] =n2:(n2:;1)n:1(n:1;1)=fn::(n::;1)g from which we obtainvar(N21), and

var(Nij) =ni:n:j

n

::

1;n:j

n

::

n

::

;n

i:

n

::

;1

i= 1 ::: r and j= 1 ::: c:

Similarly

cov(Nij Nik) =;ni:n:j

n

::

n

:k

n

::

n

::

;n

i:

n

::

;1

i= 1 ::: r and j6=k= 1 ::: c:

By symmetry

cov(Nrj Nsj) =;n:jnr:

n

::

n

s:

n

::

n

::

;n

:j

n

::

;1

i= 1 ::: r and j6=k= 1 ::: c and by the expectation argument again

cov(Nir Njs) =ni:n:r

n

::

n

j:

n

:s

n

::

1

n

::

;1

i6=j= 1 ::: r and r6=s= 1 ::: c:

WriteNi = (Ni1 ::: Nic)T, i = 1 ::: r and NT = (NT1 ::: NTr). The joint covariance matrix ofNi andNj is, fori6=j,

cov(Ni Nj) =;ni:

n

::

n

j:

n

::

diag

n

:r n

(n::;::1)

;

n

:r n

:s

n

::

;1

:

Now since thefNijgare such that the row and column totals are known constants,

cov(Ni N1+:::+Nr) = 0 fori= 1 ::: r. So if we writefj =n:j=n::,j= 1 ::: c, and

参照

関連したドキュメント

Kamp´e de F´eriet function, hypergeometric function, G and H-functions, Lauricella functions, Gauss function, Riemann-Liouville operator, Erd´elyi-Kober operator, fractional

Value-at-risk, or VaR for short, which is defined as the α-quantile of a loss distribution for some prescribed confidence level α ∈ (0, 1), is a popu- lar measure of risk used to

If instead of the grand median, a `grand quantile' is used, the resulting test is described as a quantile test: see Conover (1980, p. These tests can be gen- eralised by choosing

The median stabilization degree (msd, for short) of a median algebra measures the largest possible number of steps needed to generate a subalgebra with an arbitrary set of

Instead an elementary random occurrence will be denoted by the variable (though unpredictable) element x of the (now Cartesian) sample space, and a general random variable will

We define the additive complexity of a word ω on a finite subset S of Z (in fact we allow S to be a finite subset of Z m for any m ≥ 1) as the function defined on N that, for n ∈ N

The free 2-step nilpotent Lie algebra g is the nilradical of a parabolic subalgebra of a simple Lie algebra of type C and its cohomology can be described by a general result of

Minda and Wright [10] established that the hyperbolic radius R(D, w) of a convex hyperbolic domain D ⊂ C is a concave function of w, thus strengthening the theorem of Caffarelli