成長曲線モデルにおける一様共分散構造の検定に対する 尤度比統計量の高次元漸近展開と誤差評価
Asymptotic Expansion of the Distribution of LR Statistics for Testing the Intraclass Correlation
under the Growth-Curve Model in High-Dimension and its Error Bound.
数学専攻 加藤 直広 KATO Naohiro
1 Introduction
Letxbe ap-dimensional random vector distributed as ap-variate normal distri- bution with an unknown mean vectorµand an unknown covariance matrixΣ, denoted,x∼Np(µ,Σ). Suppose thatx1, . . . ,xn is a sampleN(=n+ 1, n≥p) independent observation vectors onx. Consider the problem of testing the null hypothesis
H:Σ=ΣI =σ2{(1−ρ)I+ρ11} (1) against all alternatives, where1= (1,1, . . . ,1);σ2 andρare parameters which satisfyσ2>0 and−1/(p−1)< ρ <1. The covariance structure (1) is known as the intraclass correlation model. The likelihood ratio criterionλfor testing the hypothesis (1) is given by
λ=
(p−1)p−1|S|
uvp−1 N2
, where
u=1
p1S1, v= tr(S)−u.
Here,S denotes the sample covariance matrix based onx1, . . . ,xn.
For largeN and fixedp, Box type asymptotic expansion of the null distribu- tion ofλ∗=λ2/N is given (see e.g. Siotani et, al. 1985). In this paper, we shall derive asymptotic distribution under the following high-dimensional framework.
A:p→ ∞, N → ∞, p
N →c∈(0,1).
In this paper, we derive the asymptotic null distribution ofλ∗on the framework A. To demonstrate the accuracy of our approximation, numerical simulations are done. Furthemore, we obtain computable error bound between the exact distribution and the asymptotic distribution.
1
2 Main Result
In this section, we derive the asymptotic null distribution ofλ∗on the framework A. Theh-th moment ofλ∗ is given by
E(λ∗h) =K(p−1)(p−1)h p−1
j=1Γ[n2 +h−j2] Γ[(p−1)(n2 +h)] , where K = Γ[n2(p−1)]/p−1
j=1Γ[n2 −j2]. The cumulant-generating function of logλ∗ is expanded as follows;
log E[exp(itlogλ∗)] =itµn,p+1
2(it)2σ2n,p+ ∞ r=3
1
r!(it)rγr,n,p, (2) where
µn,p= (p−1) log(p−1) +ψ(p−1)(n 2 −1
2)−ψ(n
2(p−1))(p−1), σn,p2 =ψ(p−1)(n
2 −1
2)−ψ(n
2(p−1))(p−1)2, γr,n,p=ψ(r−1)(p−1)(n
2 −1
2)−ψ(r−1)(n
2(p−1))(p−1)r.
Here,ψis di-gamma function defined byψ(z) = (d/dz) log Γ(z) andψ(p−1)(a) = p−1
j=1ψ(a−12(j−1)). Let
zn,p=logλ∗−µn,p
σ2n,p .
From (2), the r-th cumulant of zn,p can be expressed as γr,n,p/(σn,p2 )r2. The characteristic function of thezn,p can be expressed as follows.
E[exp(itzn,p)] = exp −t2
2
1 + ∞ k=1
(it)3k k!
∞ j=0
γk,j,n,p (it)j
, where
γk,j,n,p=
j1+···+jk=j
γj1+3,n,p· · ·γjk+3,n,p
(j1+ 3)!· · ·(jk+ 3)!σj+3kn,p . Let
φs(t) = exp(−t2 2)
1 +
s k=1
(it)3k k!
s−k
j=0
γk,j,n,p(it)j
. (3)
Inverting (3), we obtain the Edgeworth expansion ofzn,pup to the orderO(m−s) as
Φs(x) = Φ(x)−φ(x) s
k=1
1 k!
s−k
j=0
γk,j,n,p h3k+j−1(x)
2
where Φ andφare the distribution function of standard normal distribution and its derivatives, respectively; hr(x) denotes the r-th order Hermite polynomial defined by
d dx
r exp
−x2 2
= (−1)rhr(x) exp −x2
2
.
3 Numerical Comparison
This section presents the results of numerical simulations to demonstrate the effectiveness of asymptotic normality of zn,p for some value of p and n. In all our simulations, we tookσ2 = 1, ρ= 1/2. In Table 1, we list the estimated significance levels forzn,pforN = 100 calculated by using 1,000,000 repetitions with nominal significance levels of 0.01,0.05,0.50,0.95 and 0.99.
Table 1.Actual probabilities ofzn,p forN = 100
0.01 0.05 0.50 0.95 0.99
p=10 0.0046 0.0408 0.5261 0.9366 0.9794 p=20 0.0068 0.0445 0.5127 0.9440 0.9855 p=30 0.0078 0.0463 0.5089 0.9461 0.9870 p=40 0.0081 0.0471 0.5063 0.9471 0.9879 p=50 0.0085 0.0479 0.5046 0.9475 0.9883 p=60 0.0087 0.0477 0.5048 0.9478 0.9884 p=70 0.0089 0.0483 0.5049 0.9481 0.9887 p=80 0.0088 0.0480 0.5044 0.9482 0.9887 p=90 0.0087 0.0480 0.5054 0.9480 0.9886 From Table 1, we find thatzn,p give a good approximation forp≥50.
For large N and fixed p, the chi-square approximation of LR statistic is given. We list significance levels for −2τlog(λ∗)n2 in Table 2 using the same setting as for the simulations presented in Table 1, where
τ = 1− p(2p3+p2−4p−3) 6n(p−1)(p2+p−4) is Bartlett correction factor.
Table 2.Actual probabilities of−2τlog(λ∗)n2 forN = 100
0.01 0.05 0.50 0.95 0.99
p=10 0.0100 0.0502 0.5013 0.9502 0.9902 p=20 0.0107 0.0527 0.5121 0.9533 0.9909 p=30 0.0127 0.0607 0.5419 0.9606 0.9928 p=40 0.0189 0.0821 0.6077 0.9739 0.9958 p=50 0.0385 0.1407 1.000 1.000 1.000 p=60 1.0000 1.0000 1.0000 1.0000 1.0000 p=70 1.0000 1.0000 1.0000 1.0000 1.0000 p=80 1.0000 1.0000 1.0000 1.0000 1.0000 p=90 1.0000 1.0000 1.0000 1.0000 1.0000
3
From Table 2, the approximation of−2τlog(λ∗)n2 is accurate forp≤20.
4 Error Bound
In this section we shall find an upper bound of error between the distribution of zn,p and the asymptotic distribution. The following inequality gives an upper bound.
supx |P(zn,p≤x)−Φs(x)| ≤ ∞
−∞
1
|t||E[exp(itzn,p)]−φs(t)|dt, We obtain the following error bound.
sup
x |P(zn,p≤x)−Φs(x)|
<
s k=1
23k2 k!
4p2
σ3kmn(n−1)B[2v/σn,p]k−s−k
j=0
γk,j,n,p
Γ 3k
2 −Γ
3k 2 ,m2v2
2
+ B[2v/σn,p]s+127s+72 p2s+2 ms+1ns+1(n−1)s+1σn,p3s+3
1−8B[2v/σn,p]vp2 n(n−1)σ3n,p
−3s−32
· Γ
3 2(s+ 1)
−Γ 3
2(s+ 1),m2v2 2
1−8B[2v/σn,p]vp2 n(n−1)σn,p3
+ (m2 + [p+12 ]−1)2
m2v2(p−12 {[p+12 ]−1} −1)·
1 + m2v2
σ2n,p(m2 + [p+12 ]−1)2
−p−12 {[p+12 ]−1}+1
+ 2
m2v2e−m22v2 + s k=1
23k+j2 k!
s−k
j=0
γk,j,n,p
Γ
3k+j 2
−Γ
3k+j 2 ,m2v2
2
.
Here,m=n−p+ 1, 0 < v <(σn,p2 )12/2 is a constant, [·] is Gaussian integer, Γ(z, a) =∞
a xz−1e−xdx is imcompete gamma function and B[v] =− 1
2v − 1 v2− 1
v3log(1−v) + 1 m
1 1−v. We computed the values of the error bound forN = 100 ands= 0.
p 30 50 70 90
0.180594 0.101753 0.0816023 0.123004
Acknowledgement: The author would like to thank Professor Yasunori Fu- jikoshi of Chuo University for continuous instruction. In addition, I am grateful to seniors of Sugiyama laboratory for the help of this study.
References
[1] Siotani, M., Hayakawa, T. and Fujikoshi, Y. (1985). Modern Multivariate Statistical Analysis: A Graduate Course and Handbook, American Sciences Press, Columbus, OH.
4