A RANK SUM TEST IN THE ANALYSIS OF VARIANCE

(1)

ARANK SUM TEST IN THE ANALYSIS

OF VARIANCE＊ BY

RYuzo KANNO

t 1．PrQblem an己Mustration． Consider the problem of n rankings of m objects： mobj㏄ts are ranked independently fbr some characteristic by each of judges and an experimenter wishes to test the・difference among the rankings assigned to the〃t o1）jects． The test of signi血can㏄fbr such ranked data has b㏄n studied by Friedman ［1コ，KendaH−Smith［3］， Jonc㎞㏄re［2］， Page［4］， Thompson−W姐ke［5］and others． An exper㎞enter has frequ6ntly the fbllowing info皿ation：〃objects A1，…， Am are dassi丘ed血to two groups G1＝｛Al，…， Ak｝， G2＝｛．4鳶＋1，…，．4拐｝which have a difference between the two and no−difference within each group． When fi血rank− ings of m objects are obtained丘om n judges an exper㎞e皿ter wishes to test the difference betw㏄n the two set of rankings assigned to k objects垣G、 and伽一k） objects i皿G2． The nu皿hypothesis to be tested is that the ranks are assigned at random by each of judges． Under the nUl1 hypothesis the rank sum R、，…， R… （Ri denotes the sum of the ranks f（）r the i・th object）are not independently but iden− tically d姪tributed with equal mean and equal variapce． Therefbre the nul1 hypoth− esiS can be expressed by Ho： μ1＝μ2＝…＝μ功 whereμ江epresents the mean of distribution of Ri． The most general fbrm of alternative may be written H，： μ1≒μ2≒・・〔：≒：μ泌 however in our problem， we sha皿consider the following alternative H2 instead of Hi． H，： μ1＝…＝＝μ為く三μ．＋1＝＝…＝μ”3． To make clear our problem， let us consider a fbllowing art姐cial example． Now suppose that six objects are divided into two categories such as G1＝｛A1，．42｝， G2 ＝｛・43，・44，・45，A6｝and these six obj㏄ts are ranked a㏄ord血g to twelve judges as shown in Table 1． Our problem iS to test the difference between the two sets of ran㎞gs assigned to the two groups． For t）his data in Table 1， if we use the Friedman test：が・・2一醐（12m十1）Σ（R，−R）・ ‘＝1 ＊Received September 4，1974．［55］

(2)

56 _R．KANNO

Table 1

ユロタ

メメメ4λオ

︷

01 硫 1

2

3 4 5 6 7 8 9 10 11 12

2︵34156

1325104

︵∠410︵315

3︵∠154．10

214635 435621

512364・

125463 413652

う鋼︵54．5110

−・︶4丁∩﹂62

34．P︶−轟26

Rank sum

R《 30 R2

窒S8ρ50

vぬere 〃2＝nulnber of obj㏄ts， n＝nUln1）er of judges， R‘＝sum of ranks fbr the i−th o句ect， th ）弓ΣR・どニユ then we have here m＝6， n＝12， R1＝30， R2＝32， R3＝45， R4＝48， R5＝47， R6＝50 and hence Xr2＝9．0． Shlce the value of〃is large， refelコ〔五1g to the table of dli− square distribution， the probability associated with Zr2≧9．O when df＝〃卜．1＝5お 1arger than O．L Therefbre w註h this data， we could not司㏄t the nuH hypothes姪 that the ranks are assigned at random by each of the judges at 5％level of signif− icanoe‘Hence by using Friedman test we could not detect the difference between the rankings assigned to the two groups Gl and G2． The purpose of this paper is to give a more powerful test to the problem as men− tioned above． 2．Model and Analysis of Variance． Let riゴdenote the rank of o句ect．Ai as− signed by judgeゾ． Place rii in the i−th row andノーth column of the table as follows：

Table 2

Objects Judges ・・

oil

G2

o1：1

1

_{‘2・・・…n}

rll ril r鳶＋1，1 つア餌1 ア12 ’’’’”rユヵ i i γ乃2 ．・・… r鳶蕗ア鳶＋1，2’’’’”「后◆1，カコロ r溺2 ・・・… γヵ悌力

Rank sum

Ri − 盈一

R⋮R

Rk＋1 ⋮

Rm

Each column of this table will contain a permutation of the first m integers， and タカ henceΣrξ」一〃2（m＋1）／2． Let these n colurnn vectors are denoted with ri ＝（rli，…，rmi）’， i＝ユノ＝1，2，…，〃then these n m−variate random variables are mutua皿y independent． n が Furthermore we put Ri＝Σrii， evidentlyΣRi＝nm（〃十1）！2 and then under the null j＝1 i＝1

(3)

ARANK SUM TEST】N／ THE ANALYSIS OF VARIANCE

57 hypothesis Ho we obtain E（rii）＝（〃十1）／2， V（rlj）＝伽十1）（〃−1）／12，一（〃1十1）／12 and hence E（Ri）＝n（〃1十1）／2， V（Ri）＝＝n（m十1）（〃2−1）！12， −n（〃1十1）／12． Now we consider the fb皿owing linear mode1：（2．1） _「i＝ 11・・●− 1 ●■● 1 Pt

whereΣεii−o and

． i＝1 matr1X．（2．2）

111 一⋮一た

克二：κ

m

十

1

十2θ

m

m−k

ε1j ：・ ε力」 εk＋1，i ：・ εmi ， Cov（rξi， rvi）＝ Cov（Rξ， Rn）＝（ノ＝1，2，・■i，n） ■ ej ＝（εli，…，ε掃∫）’has following耳1ean v㏄tor and covariance

m−1

E（・、）一・，・−m（S1gt！’）

m17−−㌃

⊥m⊥m°

一一 ● ■ ． ● ・ ●

−万一

一、m

：一栩

m

⋮

m−−㌃

● Nowε1，…，ε、 are mutually independent m−variate random variable with equal mean vector O and equal covariance matrix V as shown above， but normality is not・required． n Next consider a random v㏄tor R一Σrj， that is，・ゴ＝1 （2．3）・ R＝一 Aθ十ε n where R＝（、R、，…， Rm）’，ε＝Σε∫，ゴ＝1 （2．4）

_A＝

1 1 1 −寸−

一⋮一た

●● 1 〃一：・左

m

・一・一 mn（m＋1 2 nθ）］・

m卜k

Now it should be noted that whe11〃tends to infiiiitY，εhas a degenerate nommal distribution With mean vector O and covarian㏄matrix．Xtn V， that is， when n is large， R is approximately distributed as degenerate normal distribution with皿ean vector Ae and covariaI1㏄matriX E． Our problem is to test the nu皿hypothesis Ho：θ＝O against the alternative hypoth− esis H，：θ≒O by using analysiS of variance． To decomposition of the quadratic form R’R，1et us define the fbllowing notations： Lm ： 1inear space of d㎞ension〃z，・

(4)

58 Now under

dimensional

subspace generated by hyperplane． the hyperplane： We have the （2．5） Here let Emm denote a m×7ηmatrix with all components 1， is the proj㏄tion operator from R to L（」）and（R’e1R）l shows the diStance from the origin in L楊 to LJ↓． Therefore we may consider only the decomposition of R’（lm−el）R（塩：m×」ηiden− tity matrix）in the伽一1）−d㎞ensional subspace Lt． In this case， Im−el is the projection operator丘om R to LJ⊥．・ Now in the other hand， L況is also decomposed as fb110ws：（2．6） L拐＝L（A）ΦLA⊥．＝L（」）㊥L」⊥θ㊥LA⊥ then we have from（2．5）and（2．6）， ’ （2．7）・ LJ⊥＝LJ⊥（A）∈E）LA⊥ and it can be shown that （2．8） e2＝A（ノ1’∠1）−1∠4’一（1／〃1）五ら楊 eo一塩一A（．4’！1）−1．4’ are projection operators from R to LJ⊥（A）and LAi respectively． In particular， it should be noted that the fbllowi皿g relation exist among the projection operators （2．9） Im−e1＝e2十eo and these are shown as fbllows： R．KANNO L（A）： 1垣ear subspace of d㎞ensionぷgenerated by the column v㏄tors of the m×ぷmatrix A， LAL ：．orthogonal complement of the血1ear subspace L（メ）in the m−dimensio皿al 1血1ear space L”， Lβ⊥（A）：orthogonal complement of the lmear subspace L（β）in the Hnear sub− space L（メ）when L（A）⊃L（β）fbr two matrices， e：matrix of pr（）jection operator and it has the fbllowing charaCters， eLξ， e2＝e．（sy㎜etric idempotent）エづ the collditionΣRi＝n〃1（M＋1）／2， the v㏄tor R generates a伽一1）− i＝1 hyperplane in the〃ε一d㎞ensional l桓ear space L”． Let L（」）denote a the v㏄tor J＝（1，… ，1）’， then L（」） jb orthogonal to t】血 In geOmetrical sense， L（」）is a perpendicular from the origin m L拐to m ΣR‘＝物（m＋1）／2，hen㏄this hyperplane is denoted by Lt． i＝1 direct sum d㏄omposition of L況as fb皿ows： L励＝L（」）①L」↓． then matrixθ1＝（1／∼n）Emm （2．10）

m−1

1．−el＝

m− m⋮− m

− m−∼m⋮祠m

⋮ ⋮ ⋮

@m−−㌃

−万弓

一m

卜

(5)

ARANK SUM TEST IN THE ANALYSIS OF VARIANCE

栩一k

■ ◆ ■

m−k

km

…

m−k

■ ■ ● 細鍵 …

〃卜k

e2＝：

km

⋮﹁ ⋮

−万゜°工m

1 一〃〃

_÷⋮−τ

一

⊥m⋮−万⊥た

一 1 一た丘：・

⊥瓦

1mlm

・・■

一一

■ ■ ・… ・・． ● ・・ … −万÷°−万 59 k ’ ココ． m（m−k） … k ・．・ m（m−k） eo ＝

k

−瓦−〃

1 ⋮一た ■ ◆ ● ・・噛 ● ・ ● ● m（m−k） i ・ k 〃伽一k）

0

0 m−k−1

m−k l m−k … ＿ 1 m−k 1 1 ウ m−k m−k

m−k−1 1

．．． m−k 〃1−k ：：ロ 1 m−k−1 − ● ■ ■

m−k

Furthermore these operators are represented with respect to the characteriStic root in the following way，（2．11） _{Im−e：＝P}

0

1 ・・． 1 1 ・・． 1 P’，

_e2』P

0

● ◆ ◆

0 eo＝P

0

1 ．． 1 1

0

・． ■

₀

P’，

0

1 ● 1 P’ where P iS a orthogonal matriX with the followi皿g form：

(6)

60 （2．12） P・一 ⊥ひ⊥万⊥万⋮⊥万

ノ吾

一ノ吾： 0

R．KANNO

ノ吾一ノ、毒

〆吾……ノ，、吉、鳶一・Y……ノ、、」1）k

・…＋・（誌τ

−万⋮⋮⋮⋮⋮°°：−万 ∨ ン 0 ・ss‥▼・．■一一■．‥’・・シ．・．・・．・叫‥●■・◆．● 0 た一

m

ノ

一

、⋮毒

一 0・…・・……・…一・………・・………・…0 0 ■■●・●■・“・・◆．・“・・．・・… ．w．・．… ◆■・鵡4・．◆．・” 0 Thus from above， w the analysis of variance （2．13）

where

（2．14）

ノ蒜．，圧

i nt（n卜k） 1 …

1ノ焉

ロノ剛竺。

ノ

0 令・・．．… ，・・… ■・・●… “・・・・・・・・・… ． O ecan obtain the decomposition 一ノ吾 0 0 R’（1。−e、）R・＝R’e・R＋R’e・R ノ亮……V、＿k．i、炉k、ノ吾……ノ，m＋1，＿k、 −4弄……ノ、＿k、i、。．6

・…一（・＋D／ 1

0f the s’_浮（m−k−IXm−k） of squares for th

皿・弓ΣR’・Ra）一一

‘＝1

1t should be noted

Xr2 ＝，［12／（nm（m十1））］R’（la鵬一、θ1）R． As mentioned befbre， when n is large， R is approx㎞ately erate nomal distribution with mean v㏄tor Aθand covarian㏄matrbζ 〃〃1（m＋1）（塩一el）． Σ＝ 12

1fwe put

（2．15）が R’（lm−el）R＝Σ（R，−R）2， ‘＝1 R’．、R− ．t！E（m−k）（R（、一R②）・， m k m R’eoR＝Σ（Ri−R（1））2＋Σ（R，−R｛2））2 i＝1 ‘＝k＋1 k nt

妾ΣR・・R…一点ΣR・・

どエぎコユ

that（2．13） shows the decomposition of Friedman，s

distributed as degen一

B1＝

B2＝ B3＝＝ 12 （1．−e、）， nm（m＋1） 12 nm（m＋1）e2・ 12 nm（m＋1）e°・ then evidently we have （2．16） B12E’ ＝ 1，肪一e1， B，2「＝e2・ B，’S＝ee’ Therefore we can see that（i）β1Σ， B22， B3．X are syirmetric idempotent and tr B1Σ＝＝〃1−1， tr B2．X＝ 1， tr B3Σ＝〃2−2 resp㏄tively， and he11㏄when刀姪1arge， the quadratic f（）rms R’B，R， R’B2R， R／B，R are approx㎞ately distributed as chi−square

(7)

ARANK SUM TEST IN THE ANALYSIS OF VARIANCE

61 with df＝m−1， df＝1， df＝m−2 respectively， and（li）B2．XB3＝0， hence the two qu・・ir・ti・・f・・m・R’B、R・and・R’β・R・ar・・t・tiStic・lly・ind・p・nd・nt・f ea・）h・the「・ Hence we have the analysiS of variance table as shown in Table 3．

Table 3

Space Ll‘（A） LA‘

LJL

d．f． 1

m−2

栩一1 S．S． R’e2R R’eoR R’（lm−el）R m．S．S． R｛e2R R’eoR

m−2

F

R’e2R R「eoR 栩一2 Non−centrality parameter

Ho

1 H2

0 0 12nk θ2 （m＋1Xm−k） 0 By thiS table we㎞ow that the ratio F can be used to test the hypothesiS 丑b：θ＝Oagai皿st the alternative hypothesis H，：θ・≒0． Thus we obtain the fb皿owing test statistic：

（2・17）Fr−k（m一

ﾖ一2）・S（R、−ll蕊5−R，、、）、・

’エ1 」＝為十1 k m

寵・・eR…→ΣRl R…−m圭、ΣR’・

‘＝1 i＝k十1 When the number of judges（the siZe of n）is not too Small， the test StatiStic F． is distr丑）uted approximately as F−distribution with df＝（1， m−2）． 3． Note on the Test Statistic Fr ．（のSummaiツof Procedureプbr」Fr． When rankhlg data are gwen m a twoエway table of m rows（objects）and n columns（judges）， these are the steps in the use of Fr：（1）Determine the sum of the ranks in each row：、R，（2）Compute the value of Fr， using fbrmula（2．17）．（3）The significance of the observed value of Fr may be dCtermined by reference to the table of F−diStril）ution， s垣ce Fr iS diStril）uteζ1 apProxirnately as F−distril）u− tion with df＝（1， m−2）． If the level of signi丘cance has been predetermined at dand the observed value of Fr equals or exceeds the selected value F”−21（α） in the table， where Fm＿21（α）is chosen so that the test has sizeα， then we may reject Ho at the levelα・ For example， in．such cases as shown in Table l we have m＝6，〃＝12， k＝2，、

禽儂1こ；9：61iiiii

(8)

62 R．KANNO

Rω＝（30十32）／2＝31， R《2｝＝（45十48十47十50）／4＝47．5，（R（1｝−R（2｝）2＝（31−47．5）2＝272．25，カ Σ（R《rR（・））2−（30−31）2＋（43−31）2＝2， ‘＝1 ’ が Σ（・R《一R（2｝）2＝（45−47．5）2＋（48−47．5）2＋（47−47．5）2＋（50−47．5）2＝13． ‘＝力＋l We may compute the value of Fr by substituting these values in． fbrmula（2．17）， Fr−2（6−2（6−2）・驚一96・・〉瓦・（・．・1）−21．・・． Thus we may r｛）ject the null hypothesis that the ranks are assigned at random by each of tbe j．udges at the O・011evel of sign姐can㏄and condude that there exists a sign血cant difference between the two groups Gl and G2．（b） Sma〃S醐ple Distribution． In the previous s㏄tion， we s㏄that when n is not too small the test stat誌tic君is distributed approximately as 1乙d誌tribution with df＝（1， m−2）．耳owever， when n is too small the reliab五ity of this approx㎞ation may be doubtf過． From this point we attempted to compute the exact distributions of Fr fbr〃1＝4，〃＝2， k＝1，2；〃2＝4，〃＝3， k＝1，2；〃1＝4， π＝4， k＝1 by using the IBM 1620． Table 4 presents a compariSon of the probab皿ties yielded by F−test and the exact d飴tr劃）ution of Fr． ’ （e） Treatment qプTies．「When there are sets of ties in the rankingS， each of them is assigned to the average of the ranks which they would possess if they were distin− guiShable． This iS sometime known as the“mid−rank method．”Now we have to consider the effect of tied ranks on the statistic Fr． It is clear that when ties are pr『sent the means． of Ri（i＝1，…， t7i）remain the same as untied rankhlg but the varian㏄s and covarian㏄s of R‘（i＝1，…，〃1）require mod述cation． After some calcUlations， it canもe ver近ed that with the corr㏄tion of t輌es inCor− porated， the variances and covarian㏄s of R5（i＝1，・∵， m）are （3．1）

γ㈹一耐

P㍗一1）−iltZT，

T C・v（R…R・）一」（

X；1）＋m（mと1）ΣT・

T 「1 翌??窒?@summationΣtakes place over the various rankings andτis expressed by T

（…） τ一☆Σ（’・一・，

ま wh訂e’denotes the number of observ就ions in a set of tied品r a given ranking

and summationΣtakes place over the various ties within any one of theπ

t エankingS． Therefore if ties are present in the r孤㎞gs， R誌approximate取diStributed as degenerate normal diStr迦tion with mean vector Ae and covariance mat山：

(9)

ARANK SUM TEST】N THE ANALYSIS OF VARIANCE

63 Table 4 （a） m＝4，刀＝2， k＝1 λ

1

゜。 P6 P2 S P（Fr＝λ）

1

．020833 ．041666 ．041666 ．145833 P（Fr≧λ）．062499 ．104165 ．249998 P（F≧λ）．0626 ．0826 （b）〃ε＝4， η＝2，盈＝2 λ

1

゜。 P8 P6

W4

P（Fr・＝λ）

1

．027777 ．027777 ．027777 ．013888 ．055555 P（Fr≧λ）

t

．055554 ．083331 ．097219 ．152774 P（E≧λ）．0526 ．0626 ．1027 （c） m＝4， n＝3， k＝1 λ ◎0 49 27 12．25 9 P（F，＝λ）

1

．013888 ．013020 ．013020 ．023437 ．010416 ．005208 P（Fr≧λ）

i

．026908 ．039928 ．063365 ．073782 ．078990 P（F≧λ）．0224 ．0394 ．0419 ．0814 ．0976 （d）〃1＝4，刀＝3， k＝2 λ

t

◎0 72 32 14．4 9 P（F，・λ）．024305 ．005208 ．017361 ．006944 ．003472 ．034722 P（Fr≧A）

1

．029513 ．046874 ．053818 ．057290 ．092012 P（F≧λ）．0166 ．0331 ．0419 ．0706 ．0976 （e） m＝4， π＝4， k＝1 λ

1

∞ 100 64 25 16 14．286 12 P（Fr＝λ）

1

．007486 ．003906 ．008680 ．002170 ．003472 ．034502 ．005786 ．032478 也 P（Fr≧λ）

t

．011392 ．020072 ．022242 ．025714 ．060216 ’066002 ．098480 P（F≧λ）．0096 ．0186 ．0226 ．0419 ．0626 ．σ712 ．0826

(10)

64

_R．KANNO

（3．3） If we here put x・一

m

nm（m＋1） c・一

m

12

nm（m＋1）︾ 31 一

］

τ ▽ムτ

1 1一

m 1 ΣT］−1（・・−e・）， T ＼（3．4）

12

c・＝＝mnm（m十112）一

m−1

1 C・＝mnm（m＋1 12）一

m−1

．11 ΣT］−1・・， T

m−1

ΣT］−1・・， T then we have ’ ．一（35）く C、Σ＊一み一e、，（」，E十一e2，（r，．S＊＝・e。． Thus we can§ee that（i）C，2＊， C，Ef， C，Σ＊ate symmetric idempotent and tr Ci．S＊＝〃1−1， tt（左Σ＊＝1， tr C，E＊＝m−2 resp㏄tivdy， and when n is large， the quadratic fbrms R’C，R， R’C，R， R’C，R are approximately distributed as ch壬square with df＝m−1， df＝1， df＝・m−2 respectively， and（li）C，．S＊C3＝0， h㎝㏄the two quadratic forms R／C』R and、R’C3R are statistically ’independent of each other． There− fore we have the analysiS of variance table the sa皿e as untied ranking． Thus test statiStic F． may be used gven thotigh the rankings cohtain ties．

AcknowledgemeRts

The author wishes to express his deepest appr㏄iation to professor Dr． Yosiro Tumura fbr making this study possible and for his gUidance and assistance in pre− paring th誌paper．［1］［2］［3］［4］［5］ REFERENCES ’ Friedman， M．（1937）：The use of ranks to avo｛d the assumption of normality implicit 伍the analysis of variance．」． Amer． Stat姪t． Ass㏄．，32，675−701． Jonckheere， A． R．（1954）；Atest of sign血cance fbr the relation between m rankings and k ranked categories． Brit． J． Statist． Psych．，．7，93−100． Kendal1， M． G．and Smith， B．耳（1939）：The problem of in rankings． Am． Math． Statist．，10，275−287． Page， E B．（1963）：＾Ordered hypotheses fbr multiple treatments：asignificance test ㊤r∬near rankS．」． Amer． Statist． AsSoc．，58，21｛F230． Thompson，． W． A． and W皿1ke， T， A．（1963）：On an extreme rank sum test for out− 1iers． Biometrika，50，375−383，︸

A RANK SUM TEST IN THE ANALYSIS OF VARIANCE

ARANK SUM TEST IN THE ANALYSIS

RYuzo KANNO

56

R．KANNO

Table 1

メメメ4λオ

︷

2

2︵34156

1325104

︵∠410︵315

3︵∠154．10

214635 435621

512364・

125463 413652

う鋼︵54．5110

−・︶4丁∩﹂62

34．P︶−轟26

Rank sum

窒S8ρ50

Table 2

oil

G2

o1：1

‘2・・・…n

Rank sum

R⋮R

Rm

ARANK SUM TEST】N／ THE ANALYSIS OF VARIANCE

whereΣεii−o and

111

一⋮一た

m

十

十2θ

m

m−k

m−1

m17−−㌃

⊥m⊥m°

−万一

一、m

m

⋮

m−−㌃

A＝

一⋮一た

m

m卜k

dimensional

m−1

m− m⋮− m

− m−∼m⋮祠m

⋮ ⋮ ⋮

@m−−㌃

−万弓

一m

ARANK SUM TEST IN THE ANALYSIS OF VARIANCE

栩一k

m−k

km

m−k

〃卜k

km

km

−万゜°工m

÷⋮−τ

一

⊥m⋮−万⊥た

⊥瓦

1mlm

一 一

k

−瓦−〃

0

0

m−k−1

m−k−1 1

m−k

_R．KANNO

_{‘2・・・…n}

_A＝

_÷⋮−τ

一一

_e2』P

₀

_R．KANNO