• 検索結果がありません。

Studentized Asymptotic Distribution of the Contribution Ratio in High-Dimensional Principal Component Analysis

N/A
N/A
Protected

Academic year: 2021

シェア "Studentized Asymptotic Distribution of the Contribution Ratio in High-Dimensional Principal Component Analysis"

Copied!
4
0
0

読み込み中.... (全文を見る)

全文

(1)

高次元主成分分析における累積寄与率のステューデント化と その漸近分布

Studentized Asymptotic Distribution of the Contribution Ratio in High-Dimensional Principal Component Analysis

数学専攻 兵頭 昌

Masashi Hyodo

Abstract

Consistent estimators of asymptotic variances of sample cumulative contribution ratio and its logit transformation are derived under the situation that the covariance matrix has a spiked model in a high-dimensional case where the number of observations and the magnitude of sample size are both large. We consider asymptotic distribution of studentized statistic for the high-dimensional case and the corresponding large sample case. These results are gener- alizations of Fujikoshi et al. (2008). Numerical simulation revealed that only the studentized statistic of the logit is not bad accuracy in high-dimensional case.

1 Introduction

Let x 1 , . . . , x n be sample observation vectors drown from p -variate normal population with an unknown mean µ and a covariance matrix Σ, say N p ( µ, Σ). It is well known that the unbiased estimator of Σ is sample covariance matrix

S = 1 n

N i=1

(x i x) (x ¯ i x) ¯ ,

where ¯ x is sample mean vector. Suppose that λ 1 ≥ · · · ≥ λ p > 0 and 1 > · · · > p 0 are the latent roots of Σ and S, respectively. Population cumulative contribution ratio on the first k(= 1, . . . , p) principal components

ρ (k) = (λ 1 + · · · + λ k )/ (λ 1 + · · · + λ p ), and the corresponding sample one

r (k) = ( 1 + · · · + k ) / ( 1 + · · · + p ) . Assume that the latent roots of Σ has a spiked model:

A0. λ 1 ≥ · · · ≥ λ k > λ k+1 = · · · = λ p = λ,

where λ is some positive constant(cf. Jhonstone. [4]). Fujikoshi et al. [1] show the asymptotic normality of r (k) under the following high-dimensional frame work:

A1. k; fix,

A2. λ i /m = O(1), i = 1, . . . , k, m = p k, A3. p/n c (0, ∞).

1

(2)

In this paper, we derive the asymptotic distribution of the studentized contribution ratio in high-dimensional case where the assumption A0 - A3 hold. The corresponding large sample result is also derived. In section 2, we derived consistent estimator of asymptotic variance of r (k) in high- dimensional case. In addition, asymptotic distribution of the studentized statistic is considered.

These results are generalizations of Fujikoshi et al. [1]. In Section 3, we derive the asymptotic distribution of studentized statistic under the large sample case. In Section 4, we do the simulation study to investigate the behaviors of these asymptotic distributions.

2 The asymptotic distribution of studentized cumulative contribution ratio in high dimensional case

In this section, we consider the asymptotic distribution of studentized statistic of r (k) in high- dimensional case. In this case Fujikoshi et al. [1] show the asymptotic normality of r (k) . Asymptotic variance of r (k) is

τ 2 = 2m 2λ 2 1 + · · · + ˜ λ 2 k )

λ 1 + · · · + ˜ λ k + m) 4 = λ 2 (trΛ 2 11 ) (tr Λ) 4 , where Λ = diag( λ 1 , . . . , λ k , λ, . . . , λ ) and Λ 11 = diag( λ 1 , . . . , λ k ) .

We derive consistent estimator of τ 2 , which is given in the following lemma.

Lemma 1 Under the high-dimensional framework A0 - A3, it holds that τ ˆ 2 p τ 2 ,

where

τ ˆ 2 = 2{ p

i=k+1 i } 2 {n 2 tr S 2 n(tr S) 2 (n

2

+n−2) m { p

i=k+1 i } 2 }

(n 2 + n 2)(tr S) 4 .

Using Slutzky’s theorem and Lemma 1, studentization of Fujikoshi et al. [1] can be expressed, which is given in the following theorem.

Theorem 1 Under the assumption of Lemma 1, it holds that (i)

n

r (k) ρ (k)

/ τ ˆ d N (0 , 1) , (ii)

n

r (k) ρ ˆ (k)

τ d N (0, 1), (iii)

n

log r (k)

1 r (k) log ρ (k)

1 ρ (k) τ ˆ d N (0 , 1) , (iv)

n

log r (k) 1 r (k)

log ρ (k)

1 ρ (k) + ˆ b (k) τ ˆ d N(0, 1),

where the notation d denotes convergence in distribution, τ ˆ 2 is the same one as in Lemma 1 and ρ ˆ (k) = ρ (k) + ˆ b (k) , τ ˆ

2

= {r (k) (1 r (k) ) } −2 τ ˆ 2 ,

ˆ b (k) = {r (k) (1 r (k) )} −1

ˆ b (k) + 1

n · 2r (k)−1 2r (k) (1 r (k) ) τ ˆ 2

.

2

(3)

Here,

ˆ b (k) = 1

n (1 r (k) )

mn 2 tr S 2 mn (tr S ) 2 ( n 2 + n 2) { p

i=k+1 } 2 m(n 2 + n 2)(tr S) 2

.

3 The asymptotic distribution of studentized contribution ratio in large sample case

In this section, we consider the asymptotic distribution of studentized statistic of r (k) under a large sample case where p is fixed and n is large. We use asymptotic normality of r (k) in Fujikoshi et al. [1]. By estimating the asymptotic variance, we have asymptotic distribution of studentized statistic of r (k) , which is given in the following theorem.

Theorem 2 Under the model A0, it holds that (i)

n ( r (k) ρ (k) ) / σ ˆ d N (0 , 1) , (ii)

n

log r (k)

1 r (k) log ρ (k)

1 ρ (k) σ ˆ 1 d N (0 , 1) ,

where

σ ˆ 2 1 = {r (k) (1 r (k) )} −2 σ ˆ 2 , σ ˆ 2 = 2{(1 r (k) ) 2 k

i=1 2 i + r 2 (k) p

i=k+1 2 i }

(tr S) 2 .

In addition, we show asymptotic unbiasedness of ˆ σ 2 .

4 Simulation Result

This section presents the results of numerical simulations to demonstrate the effectiveness of asymp- totic approximations for studentized statistics in the high dimensional case. We generated a sample x 1 , . . . , x N of size N = n + 1 from N p ( µ, Σ). In our simulations of high dimensional case, we assume the following spiked model:

Σ = diag(100 , 30 , 10 , 1 , . . ., 1) , i . e ., λ 1 = 100 , λ 2 = 30 , λ 3 = 10 , λ 4 = · · · = λ p = 1 .

The simulation is examined for the case when k = 3. In the high-dimensional case, we considered asymptotic approximations for the following statistics:

· Approx(1) Y h0 = n

r (k) ρ (k)

τ N(0, 1)

· Approx(2) Y ˜ h0 = n

r (k) ρ ˆ (k)

/ τ ˆ N (0, 1)

· Approx(3) Y ˜ h1 = n

log r (k)

1 r (k) log ρ ˆ (k) 1 ρ ˆ (k)

ˆ τ N (0 , 1)

3

(4)

where ˆ τ and ˆ τ are given in Theorem 1, and ˆ ρ (k) is the adjusted mean given in Theorem 1. We examined the effectiveness of these approximations by checking how close Approx(1)-(3) are to the standard normal distribution.

Approx(3) N = 100 Approx(3) N = 200

α p = 10 p = 20 p = 30 p = 40 p = 50 p = 10 p = 20 p = 30 p = 40 p = 50 N(0, 1)

0.95 1.86 1.74 1.71 1.70 1.69 1.85 1.71 1.69 1.67 1.68 1.64

0.99 2.59 2.40 2.36 2.34 2.34 2.58 2.38 2.34 2.32 2.32 2.33

Through numerical simulation we can see the following properties and tendencies:

(i) Approx(1)-(3) will become good as p increses.

(ii) Approx(3) is more accurate than Approx(1) and Approx(2).

5 Conclusion and discussion

In this paper the asymptotic distribution of studentized cumulative contribution ratio is derived in the high-dimensional case and the large sample case. Actually, we estimate asymptotic variance under a high-dimensional framework A0 - A3. Based on simulation result, we have recommended to use Approx(3) when p (< n) becomes large. Our future problem is to improve the asymptotic distribution of studentized cumulative contribution ratio in the large sample case. And the high- dimensional case, we examine the effectiveness of asymptotic distribution in the case that p > n.

References

[1] Fujikoshi, Y., Satoh, T. and Sugiyama, T. (2008). Asymptotic distribution of the contribution ratio in high-dimensional principal component analysis. (submitted).

[2] Fujikoshi, Y. (1980). Asymptotic expansions for distributions of the sample roots under nonnormality.Biometrika, 67, 45-51.

[3] Schott, J. R. (2006). A high dimensional test for the equality of the smallest eigenvalues of a covariance matrix. J. Multivariate Anal., 97,827-843.

[4] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. The annnals of Statistics, 29, 295-327.

4

参照

関連したドキュメント

In particular, we consider a reverse Lee decomposition for the deformation gra- dient and we choose an appropriate state space in which one of the variables, characterizing the

Burchuladze’s papers [4–5], where the asymptotic formu- las for the distribution of eigenfunctions of the boundary value oscillation problems are obtained for isotropic and

Furthermore, the upper semicontinuity of the global attractor for a singularly perturbed phase-field model is proved in [12] (see also [11] for a logarithmic nonlinearity) for two

Keywords: continuous time random walk, Brownian motion, collision time, skew Young tableaux, tandem queue.. AMS 2000 Subject Classification: Primary:

This paper is devoted to the investigation of the global asymptotic stability properties of switched systems subject to internal constant point delays, while the matrices defining

This paper develops a recursion formula for the conditional moments of the area under the absolute value of Brownian bridge given the local time at 0.. The method of power series

Under these hypotheses, the union, ⊂ M/T , of the set of zero and one dimensional orbits has the structure of a graph: Each connected component of the set of one-dimensional orbits

Maria Cecilia Zanardi, São Paulo State University (UNESP), Guaratinguetá, 12516-410 São Paulo,