ARANK SUM TEST IN THE ANALYSIS
OF VARIANCE* BYRYuzo KANNO
t 1.PrQblem an己Mustration. Consider the problem of n rankings of m objects: mobj㏄ts are ranked independently fbr some characteristic by each of judges and an experimenter wishes to test the・difference among the rankings assigned to the〃t o1)jects. The test of signi血can㏄fbr such ranked data has b㏄n studied by Friedman [1コ,KendaH−Smith[3], Jonc㎞㏄re[2], Page[4], Thompson−W姐ke[5]and others. An exper㎞enter has frequ6ntly the fbllowing info皿ation:〃objects A1,…, Am are dassi丘ed血to two groups G1={Al,…, Ak}, G2={.4鳶+1,…,.4拐}which have a difference between the two and no−difference within each group. When fi血rank− ings of m objects are obtained丘om n judges an exper㎞e皿ter wishes to test the difference betw㏄n the two set of rankings assigned to k objects垣G、 and伽一k) objects i皿G2. The nu皿hypothesis to be tested is that the ranks are assigned at random by each of judges. Under the nUl1 hypothesis the rank sum R、,…, R… (Ri denotes the sum of the ranks f()r the i・th object)are not independently but iden− tically d姪tributed with equal mean and equal variapce. Therefbre the nul1 hypoth− esiS can be expressed by Ho: μ1=μ2=…=μ功 whereμ江epresents the mean of distribution of Ri. The most general fbrm of alternative may be written H,: μ1≒μ2≒・・〔:≒:μ泌 however in our problem, we sha皿consider the following alternative H2 instead of Hi. H,: μ1=…==μ為く三μ.+1==…=μ”3. To make clear our problem, let us consider a fbllowing art姐cial example. Now suppose that six objects are divided into two categories such as G1={A1,.42}, G2 ={・43,・44,・45,A6}and these six obj㏄ts are ranked a㏄ord血g to twelve judges as shown in Table 1. Our problem iS to test the difference between the two sets of ran㎞gs assigned to the two groups. For t)his data in Table 1, if we use the Friedman test: が ・・2一醐(12m十1)Σ(R,−R)・ ‘=1 *Received September 4,1974. [55]56
R.KANNO
Table 1
ユ ロ タメメメ4λオ
︷
01 硫 12
3 4 5 6 7 8 9 10 11 122︵34156
1325104
︵∠410︵315
3︵∠154.10
214635 435621
512364・
125463 413652
う鋼︵54.5110
−・︶4丁∩﹂62
34.P︶−轟26
Rank sum
R《 30 R2窒S8ρ50
vぬere 〃2=nulnber of obj㏄ts, n=nUln1)er of judges, R‘=sum of ranks fbr the i−th o句ect, th )弓ΣR・ どニユ then we have here m=6, n=12, R1=30, R2=32, R3=45, R4=48, R5=47, R6=50 and hence Xr2=9.0. Shlce the value of〃is large, refelコ〔五1g to the table of dli− square distribution, the probability associated with Zr2≧9.O when df=〃卜.1=5お 1arger than O.L Therefbre w註h this data, we could not司㏄t the nuH hypothes姪 that the ranks are assigned at random by each of the judges at 5%level of signif− icanoe‘Hence by using Friedman test we could not detect the difference between the rankings assigned to the two groups Gl and G2. The purpose of this paper is to give a more powerful test to the problem as men− tioned above. 2.Model and Analysis of Variance. Let riゴdenote the rank of o句ect.Ai as− signed by judgeゾ. Place rii in the i−th row andノーth column of the table as follows:Table 2
Objects Judges ・・oil
G2
o1:1
1‘2・・・…n
rll ril r鳶+1,1 つ ア餌1 ア12 ’’’’”rユヵ i i γ乃2 .・・… r鳶蕗 ア鳶+1,2’’’’”「后◆1,カ コ ロ r溺2 ・・・… γヵ悌力Rank sum
Ri − 盈一R⋮R
Rk+1 ⋮Rm
Each column of this table will contain a permutation of the first m integers, and タカ henceΣrξ」一〃2(m+1)/2. Let these n colurnn vectors are denoted with ri =(rli,…,rmi)’, i=ユ ノ=1,2,…,〃then these n m−variate random variables are mutua皿y independent. n が Furthermore we put Ri=Σrii, evidentlyΣRi=nm(〃十1)!2 and then under the null j=1 i=1ARANK SUM TEST】N/ THE ANALYSIS OF VARIANCE
57 hypothesis Ho we obtain E(rii)=(〃十1)/2, V(rlj)=伽十1)(〃−1)/12, 一(〃1十1)/12 and hence E(Ri)=n(〃1十1)/2, V(Ri)==n(m十1)(〃2−1)!12, −n(〃1十1)/12. Now we consider the fb皿owing linear mode1: (2.1) 「i= 11・・●− 1 ●■● 1 PtwhereΣεii−o and
. i=1 matr1X. (2.2)111
一⋮一た
克 二:κm
十
1十2θ
m
m−k
ε1j : ・ ε力」 εk+1,i : ・ εmi , Cov(rξi, rvi)= Cov(Rξ, Rn)= (ノ=1,2, ・■i,n) ■ ej =(εli,…,ε掃∫)’has following耳1ean v㏄tor and covariancem−1
E(・、)一・,・−m(S1gt!’)m17−−㌃
⊥m⊥m°
一 一 ● ■ . ● ・ ●−万一
一、m
:一 栩m
⋮
m−−㌃
● Nowε1,…,ε、 are mutually independent m−variate random variable with equal mean vector O and equal covariance matrix V as shown above, but normality is not・required. n Next consider a random v㏄tor R一Σrj, that is,・ ゴ=1 (2.3) ・ R=一 Aθ十ε n where R=(、R、,…, Rm)’,ε=Σε∫, ゴ=1 (2.4)A=
1 1 1 −寸−一⋮一た
●● 1 〃 一:・左m
・一・一 mn(m+1 2 nθ)]・m卜k
Now it should be noted that whe11〃tends to infiiiitY,εhas a degenerate nommal distribution With mean vector O and covarian㏄matrix.Xtn V, that is, when n is large, R is approximately distributed as degenerate normal distribution with皿ean vector Ae and covariaI1㏄matriX E. Our problem is to test the nu皿hypothesis Ho:θ=O against the alternative hypoth− esis H,:θ≒O by using analysiS of variance. To decomposition of the quadratic form R’R,1et us define the fbllowing notations: Lm : 1inear space of d㎞ension〃z, ・58 Now under
dimensional
subspace generated by hyperplane. the hyperplane: We have the (2.5) Here let Emm denote a m×7ηmatrix with all components 1, is the proj㏄tion operator from R to L(」)and(R’e1R)l shows the diStance from the origin in L楊 to LJ↓. Therefore we may consider only the decomposition of R’(lm−el)R(塩:m×」ηiden− tity matrix)in the伽一1)−d㎞ensional subspace Lt. In this case, Im−el is the projection operator丘om R to LJ⊥. ・ Now in the other hand, L況is also decomposed as fb110ws: (2.6) L拐=L(A)ΦLA⊥. =L(」)㊥L」⊥θ㊥LA⊥ then we have from(2.5)and(2.6), ’ (2.7)・ LJ⊥=LJ⊥(A)∈E)LA⊥ and it can be shown that (2.8) e2=A(ノ1’∠1)−1∠4’一(1/〃1)五ら楊 eo一塩一A(.4’!1)−1.4’ are projection operators from R to LJ⊥(A)and LAi respectively. In particular, it should be noted that the fbllowi皿g relation exist among the projection operators (2.9) Im−e1=e2十eo and these are shown as fbllows: R.KANNO L(A) : 1垣ear subspace of d㎞ensionぷgenerated by the column v㏄tors of the m×ぷmatrix A, LAL :.orthogonal complement of the血1ear subspace L(メ)in the m−dimensio皿al 1血1ear space L”, Lβ⊥(A):orthogonal complement of the lmear subspace L(β)in the Hnear sub− space L(メ)when L(A)⊃L(β)fbr two matrices, e:matrix of pr()jection operator and it has the fbllowing charaCters, eLξ, e2=e.(sy㎜etric idempotent) エ づ the collditionΣRi=n〃1(M+1)/2, the v㏄tor R generates a伽一1)− i=1 hyperplane in the〃ε一d㎞ensional l桓ear space L”. Let L(」)denote a the v㏄tor J=(1,… ,1)’, then L(」) jb orthogonal to t】血 In geOmetrical sense, L(」)is a perpendicular from the origin m L拐to m ΣR‘=物(m+1)/2,hen㏄this hyperplane is denoted by Lt. i=1 direct sum d㏄omposition of L況as fb皿ows: L励=L(」)①L」↓. then matrixθ1=(1/∼n)Emm (2.10)m−1
1.−el=m− m⋮− m
− m−∼m⋮祠m
⋮ ⋮ ⋮
@m−−㌃
−万弓
一m
卜ARANK SUM TEST IN THE ANALYSIS OF VARIANCE
栩一k
■ ◆ ■m−k
km
…m−k
■ ■ ● 細鍵 …〃卜k
e2=:km
km
⋮﹁ ⋮−万゜°工m
1 一 〃 〃÷⋮−τ
一
⊥m⋮−万⊥た
一 1 一 た 丘:・⊥瓦
1mlm
・・■一 一
■ ■ ・… ・ ・ . ● ・ ・ … −万÷°−万 59 k ’ コ コ . m(m−k) … k ・ . ・ m(m−k) eo =k
−瓦−〃
1 ⋮一 た ■ ◆ ● ・ ・ 噛 ● ・ ● ● m(m−k) i ・ k 〃伽一k)0
0
m−k−1
m−k l m−k … _ 1 m−k 1 1 ウ m−k m−km−k−1 1
. . . m−k 〃1−k : : ロ 1 m−k−1 − ● ■ ■m−k
m−k
Furthermore these operators are represented with respect to the characteriStic root in the following way, (2.11) Im−e:=P0
1 ・ ・ . 1 1 ・ ・ . 1 P’,e2』P
0
● ◆ ◆0
eo=P
0
1 . . 1 10
・ . ■0
P’,0
1 ● 1 P’ where P iS a orthogonal matriX with the followi皿g form:60 (2.12) P・一 ⊥ひ⊥万⊥万⋮⊥万
ノ吾
一ノ吾 : 0R.KANNO
ノ吾一ノ、毒
〆吾……ノ,、吉、鳶 一・Y……ノ、、」1)k・…+・(誌τ
−万⋮⋮⋮⋮⋮°°:−万 ∨ ン 0 ・ss‥▼・.■一一■.‥’・・シ.・.・・.・叫‥●■・◆.● 0 た 一m
ノ
一、⋮毒
一 0・…・・……・…一・………・・………・…0 0 ■■●・●■・“・・◆.・“・・.・・… .w.・.… ◆■・鵡4・.◆.・” 0 Thus from above, w the analysis of variance (2.13)where
(2.14)ノ蒜.,圧
i nt(n卜k) 1 …1ノ焉
ロ ノ剛竺。ノ
0 令・・..… ,・・… ■・・●… “・・・・・・・・・… . O ecan obtain the decomposition 一ノ吾 0 0 R’(1。−e、)R・=R’e・R+R’e・R ノ亮……V、_k.i、炉k、 ノ吾……ノ,m+1,_k、 −4弄……ノ、_k、i、。.6・…一(・+D/ 1
0f the s’浮 (m−k−IXm−k) of squares for th皿・弓ΣR’・Ra)一一
‘=11t should be noted
Xr2 =,[12/(nm(m十1))]R’(la鵬一、θ1)R. As mentioned befbre, when n is large, R is approx㎞ately erate nomal distribution with mean v㏄tor Aθand covarian㏄matrbζ 〃〃1(m+1) (塩一el). Σ= 121fwe put
(2.15) が R’(lm−el)R=Σ(R,−R)2, ‘=1 R’.、R− .t!E(m−k)(R(、一R②)・, m k m R’eoR=Σ(Ri−R(1))2+Σ(R,−R{2))2 i=1 ‘=k+1 k nt妾ΣR・・R…一点ΣR・・
どエ ぎコ ユthat(2.13) shows the decomposition of Friedman,s
distributed as degen一B1=
B2= B3== 12 (1.−e、), nm(m+1) 12 nm(m+1)e2・ 12 nm(m+1)e°・ then evidently we have (2.16) B12E’ = 1,肪一e1, B,2「=e2・ B,’S=ee’ Therefore we can see that(i)β1Σ, B22, B3.X are syirmetric idempotent and tr B1Σ==〃1−1, tr B2.X= 1, tr B3Σ=〃2−2 resp㏄tively, and he11㏄when刀姪1arge, the quadratic f()rms R’B,R, R’B2R, R/B,R are approx㎞ately distributed as chi−squareARANK SUM TEST IN THE ANALYSIS OF VARIANCE
61 with df=m−1, df=1, df=m−2 respectively, and(li)B2.XB3=0, hence the two qu・・ir・ti・・f・・m・R’B、R・and・R’β・R・ar・・t・tiStic・lly・ind・p・nd・nt・f ea・)h・the「・ Hence we have the analysiS of variance table as shown in Table 3.Table 3
Space Ll‘(A) LA‘LJL
d.f. 1m−2
栩一1 S.S. R’e2R R’eoR R’(lm−el)R m.S.S. R{e2R R’eoRm−2
F
R’e2R R「eoR 栩一2 Non−centrality parameterHo
1
H2
0 0 12nk θ2 (m+1Xm−k) 0 By thiS table we㎞ow that the ratio F can be used to test the hypothesiS 丑b:θ=Oagai皿st the alternative hypothesis H,:θ・≒0. Thus we obtain the fb皿owing test statistic:(2・17)Fr−k(m一
ヨ一2)・S(R、−ll蕊5−R,、、)、・
’エ1 」=為十1 k m寵・・eR…→ΣRl R…−m圭、ΣR’・
‘=1 i=k十1 When the number of judges(the siZe of n)is not too Small, the test StatiStic F. is distr丑)uted approximately as F−distribution with df=(1, m−2). 3. Note on the Test Statistic Fr . (のSummaiツof Procedureプbr」Fr. When rankhlg data are gwen m a twoエway table of m rows(objects)and n columns(judges), these are the steps in the use of Fr: (1)Determine the sum of the ranks in each row:、R, (2)Compute the value of Fr, using fbrmula(2.17). (3)The significance of the observed value of Fr may be dCtermined by reference to the table of F−diStril)ution, s垣ce Fr iS diStril)uteζ1 apProxirnately as F−distril)u− tion with df=(1, m−2). If the level of signi丘cance has been predetermined at dand the observed value of Fr equals or exceeds the selected value F”−21(α) in the table, where Fm_21(α)is chosen so that the test has sizeα, then we may reject Ho at the levelα・ For example, in.such cases as shown in Table l we have m=6, 〃=12, k=2, 、禽儂1こ;9:61iiiii
62
R.KANNO
Rω=(30十32)/2=31, R《2}=(45十48十47十50)/4=47.5, (R(1}−R(2})2=(31−47.5)2=272.25, カ Σ(R《rR(・))2−(30−31)2+(43−31)2=2, ‘=1 ’ が Σ(・R《一R(2})2=(45−47.5)2+(48−47.5)2+(47−47.5)2+(50−47.5)2=13. ‘=力+l We may compute the value of Fr by substituting these values in. fbrmula(2.17), Fr−2(6−2(6−2)・驚一96・・〉瓦・(・.・1)−21.・・. Thus we may r{)ject the null hypothesis that the ranks are assigned at random by each of tbe j.udges at the O・011evel of sign姐can㏄and condude that there exists a sign血cant difference between the two groups Gl and G2. (b) Sma〃S醐ple Distribution. In the previous s㏄tion, we s㏄that when n is not too small the test stat誌tic君is distributed approximately as 1乙d誌tribution with df=(1, m−2).耳owever, when n is too small the reliab五ity of this approx㎞ation may be doubtf過. From this point we attempted to compute the exact distributions of Fr fbr〃1=4,〃=2, k=1,2; 〃2=4,〃=3, k=1,2; 〃1=4, π=4, k=1 by using the IBM 1620. Table 4 presents a compariSon of the probab皿ties yielded by F−test and the exact d飴tr劃)ution of Fr. ’ (e) Treatment qプTies. 「When there are sets of ties in the rankingS, each of them is assigned to the average of the ranks which they would possess if they were distin− guiShable. This iS sometime known as the“mid−rank method.”Now we have to consider the effect of tied ranks on the statistic Fr. It is clear that when ties are pr『sent the means. of Ri(i=1,…, t7i)remain the same as untied rankhlg but the varian㏄s and covarian㏄s of R‘(i=1,…,〃1)require mod述cation. After some calcUlations, it canもe ver近ed that with the corr㏄tion of t輌es inCor− porated, the variances and covarian㏄s of R5(i=1,・∵, m)are (3.1)γ㈹一耐
P㍗一1)−iltZT,
T C・v(R…R・)一」(X;1)+m(mと1)ΣT・
T 「1 翌??窒?@summationΣtakes place over the various rankings andτis expressed by T(…) τ一☆Σ(’・一・,
ま wh訂e’denotes the number of observ就ions in a set of tied品r a given rankingand summationΣtakes place over the various ties within any one of theπ
t エankingS. Therefore if ties are present in the r孤㎞gs, R誌approximate取diStributed as degenerate normal diStr迦tion with mean vector Ae and covariance mat山:ARANK SUM TEST】N THE ANALYSIS OF VARIANCE
63 Table 4 (a) m=4, 刀=2, k=1 λ1
゜。 P6 P2 S P(Fr=λ)1
.020833 .041666 .041666 .145833 P(Fr≧λ) .062499 .104165 .249998 P(F≧λ) .0626 .0826 (b) 〃ε=4, η=2, 盈=2 λ1
゜。 P8 P6W4
P(Fr・=λ)1
.027777 .027777 .027777 .013888 .055555 P(Fr≧λ)t
.055554 .083331 .097219 .152774 P(E≧λ) .0526 .0626 .1027 (c) m=4, n=3, k=1 λ ◎0 49 27 12.25 9 P(F, =λ)1
.013888 .013020 .013020 .023437 .010416 .005208 P(Fr≧λ)i
.026908 .039928 .063365 .073782 .078990 P(F≧λ) .0224 .0394 .0419 .0814 .0976 (d) 〃1=4, 刀=3, k=2 λt
◎0 72 32 14.4 9 P(F, ・λ) .024305 .005208 .017361 .006944 .003472 .034722 P(Fr≧A)1
.029513 .046874 .053818 .057290 .092012 P(F≧λ) .0166 .0331 .0419 .0706 .0976 (e) m=4, π=4, k=1 λ1
∞ 100 64 25 16 14.286 12 P(Fr=λ)1
.007486 .003906 .008680 .002170 .003472 .034502 .005786 .032478 也 P(Fr≧λ)t
.011392 .020072 .022242 .025714 .060216 ’066002 .098480 P(F≧λ) .0096 .0186 .0226 .0419 .0626 .σ712 .082664