• 検索結果がありません。

in canonical correlation analysis

N/A
N/A
Protected

Academic year: 2021

シェア "in canonical correlation analysis"

Copied!
4
0
0

読み込み中.... (全文を見る)

全文

(1)

正準相関分析における冗長性モデルに対する情報量規準 Information criteria for redundancy models

in canonical correlation analysis

数学専攻 神田 真吾

Shingo Kanda

1 Introduction

In discriminant analysis Rao’s [8] additional information hypothesis is known as a hypothesis concerning the relevance of a specified variable subset. Likewise, in canonical correlation analysis Siotani, Hayakawa and Fujikoshi [9] has formulated. This paper deals with the problem of selecting the best subsets of variables under the hypothesis, and let the model under an additional information hypothesis be a redundancy model.

We shall consider estimating the covariance matrix structure because a redundancy models is composed by some covariance matrix structure.

2 One set of redundancy model

Let x u = (x u1 , . . . , x up ) 0 and x v = (x v1 , . . . , x vq ) 0 be two random vectors, distributed as a joint (p+q)-variate normal distribution with the following means and a covariance matrix;

µ = µ µ u

µ v

, Σ =

µ Σ uu Σ uv

Σ vu Σ vv

.

Let ¯ x u , x ¯ v and S be the sample means and covariance matrix formed from samples of size N = n+1 observations X u = [x u1 , . . . , x uN ], X v = [x v1 , . . . , x vN ]

on x u and x v . Now, our problem is to find the best subset of {x u1 , . . . , x up } in the situation where we want to predict x v by x u , and we consider the case of x 0 1 = {x u1 , . . . , x uk } ⊂ {x u1 , . . . , x up }. Corresponding to a partition of x u = (x 0 1 , x 0 2 ) 0 , we use

µ u = µ µ 1

µ 2

, Σ uu =

µ Σ 11 Σ 12

Σ 21 Σ 22

, Σ uv = µ Σ 1v

Σ 2v

.

Similar notations for submatrix of S are also used. We shall consider the selection method based on AIC, and introduce a redundancy model M k by considering the hypothesis that x 2 has no additional information about x v , in presence of x 1 , i.e,

M k : Σ 2v·1 = 0,

where µ

Σ vv·1 Σ v2·1

Σ 2v·1 Σ 22·1

=

µ Σ vv Σ v2

Σ 2v Σ 22

µ Σ v1

Σ 21

Σ −1 11 (Σ 1v Σ 21 ).

Let ˆ µ and ˆ Σ k are the unbiased estimators of µ and Σ under M k respectively. Using their estimators, it is known that

R k = −n log ï ¯

¯ ¯ S vv·1 S v2·1

S 2v·1 S 22·1

¯ ¯

¯ ¯ ,

{|S vv·1 | · |S 22·1 |}

! + b k . Suppose E (0) is the expected value for the true model M 0 , then the bias term

b k = E (0) [n tr ˆ Σ −1 k Σ] n(p + q), (1) where we note that

tr ˆ Σ −1 k Σ = tr

µ S 11 S 12

S 21 S 22

−1 µ

Σ 11 Σ 12

Σ 21 Σ 22

¶ + tr

µ S 11 S 1v

S v1 S vv

−1 µ

Σ 11 Σ 1v

Σ v1 Σ vv

trS 11 −1 Σ 11 .

1

(2)

Therefore, AIC and M AIC result in the following:

AIC k = L(S, Σ ˆ k ) + (p + q)(p + q + 1) 2q(p k).

If a candidate model includes the true model, M AIC k = L(S, Σ ˆ k ) n(p + q) + n 2

µ k + q

n k q 1 + p

n p 1 k n k 1

, is known by Fujikoshi [4].

However, since these criteria seem to be biased as estimators for their risk when the true model is distributed as nonnormal, we now consider EIC which Ishiguro, Sakamoto and Kitagawa [6] suggested.

Let X u , X v be the bootstrap sample generated according to the emprical distribution G of X u , X v , X u =

µ X 1 X 2

=

µ x 11 , . . . , x 1N x 21 , . . . , x 2N

, X v = [x v1 , . . . , x vN ].

Then, the candidate model is

M :

x 1j x 2j x vj

0

i.i.d. G, j = 1, . . . N.

Here, let the bootstrap estimators ˆ µ and ˆ Σ k for µ and Σ. By replacing Σ and ˆ Σ k with the maximum likelihood estimator (n/N)S for Σ and the bootstrap estimator S with respect to (1), we shall derive the bias term for EIC ,

˜ b k (i) = E (0) [D (i)]

= E (0) [L( n

N S, Σ ˆ k ) L(S , Σ ˆ k )]

= n 2 N E (0)

"

tr

µ S 11 S 12 S 21 S 22

−1 µ

S 11 S 12

S 21 S 22

+ tr

µ S 11 S 1v S v1 S vv

−1 µ

S 11 S 1v

S v1 S vv

trS 11 ∗−1 S 11

#

n(p + q), where i corresponds for ith bootstraap sample of B ones.Then

b k 1 B

X B

i=1

D (i), where B is a number of bootstrap. Therefore,

EIC k = L(S, Σ ˆ k ) + b k

3 Two sets of redundancy models

Likewise Section2, let x u = (x u1 , . . . , x up ) 0 and x v = (x v1 , . . . , x vq ) 0 be two random vectors, distributed as a joint (p + q)-variate normal distribution with the following means and a covariance matrix;

µ = µ µ u

µ v

, Σ =

µ Σ uu Σ uv

Σ vu Σ vv

In order to formulate two sets of redundancy models, we partition x u and x v as x u =

µ x 1

x 2

, x v =

µ x 3

x 4

,

2

(3)

where x 0 1 = {x u1 , . . . , x ur

1

} ⊂ {x u1 , . . . , x up }, x 0 3 = {x v1 , . . . , x vr

2

} ⊂ {x v1 , . . . , x vq }. Conformably, µ µ u

µ v

=

 

µ 1 µ 2 µ 3 µ 4

 

, Σ =

 

Σ 11 Σ 12 Σ 13 Σ 14 Σ 21 Σ 22 Σ 23 Σ 24

Σ 31 Σ 32 Σ 33 Σ 34

Σ 41 Σ 42 Σ 43 Σ 44

 

.

As we assume x 2 and x 4 are redundant for x v and x u respectively, candidate models are M r : Σ 2v·1 = 0, Σ 4u·3 = 0.

Then the conditional distribution of (x 0 2 , x 0 4 ) 0 given (x 0 1 , x 0 3 ) 0 is a (p + q r)-variate normal distribution with mean vector

E

x 2

x 4

¶ ¯¯ ¯

¯ ¯ µ x 1

x 3

¶#

= µ µ 2

µ 4

¶ +

µ B 21 B 23

B 41 B 43

¶ µ x 1 µ 1 x 3 µ 3

and covariance matrix

V

x 2

x 4

¶ ¯¯ ¯

¯ ¯ µ x 1

x 3

¶#

=

µ Σ 22·13 Σ 24·13

Σ 42·13 Σ 44·13

,

where r = r 1 + r 2 ,

B =

µ B 21 B 23

B 41 B 43

=

µ Σ 21 Σ 23

Σ 41 Σ 43

¶ µ Σ 11 Σ 13

Σ 31 Σ 33

−1

µ Σ 22·13 Σ 24·13

Σ 42·13 Σ 44·13

=

µ Σ 22 Σ 24

Σ 42 Σ 44

µ Σ 21 Σ 23

Σ 41 Σ 43

¶ µ Σ 11 Σ 13

Σ 31 Σ 33

−1 µ

Σ 12 Σ 14

Σ 32 Σ 34

.

The redundancy model M r can be expressed in term of the conditional set-up as M r : B 23 = 0, B 41 = 0, Σ 24·13 = 0, Σ 42·13 = 0.

Let ˆ µ and ˆ Σ r be the unbiased estimators of µ and Σ under M r respectively. Then, we have R r = −n log

ï ¯

¯ ¯ S 22·13 S 24·13

S 42·13 S 44·13

¯ ¯

¯ ¯ ,

{|S 22·1 | · |S 44·3 |}

! + b r . Moreover the bias term

b r = E (0) [n tr ˆ Σ −1 r Σ] n(p + q), (2) where we can transform as the following;

tr ˆ Σ −1 r Σ = tr

µ S 11 S 13

S 31 S 33

−1 µ

Σ 11 Σ 13

Σ 31 Σ 33

¶ + tr

µ S 11 S 12

S 21 S 22

−1 µ

Σ 11 Σ 12

Σ 21 Σ 22

+ tr

µ S 33 S 34

S 43 S 44

−1 µ

Σ 33 Σ 34

Σ 43 Σ 44

trS 11 −1 Σ 11 trS 33 −1 Σ 33 . Therefore, AIC and M AIC result the following:

AIC r = L(S, Σ ˆ r ) + (p + q)(p + q + 1) 2pq 2r 1 r 2 . If a candidate model includes the true model from Fujikoshi [5],

M AIC r = L(S, Σ ˆ r ) n(p + q) +n 2

µ r 1 + r 2

n r 1 r 2 1 + p

n p 1 + q

n q 1 r 1

n r 1 1 r 2

n r 2 1

.

Let X u , X v be the bootstrap sample generated according to the emprical distribution G of X u , X v , X u =

µ X 1 X 2

=

µ x 11 , . . . , x 1N x 21 , . . . , x 2N

, X v = µ X 3

X 4

=

µ x 31 , . . . , x 3N x 41 , . . . , x 4N

.

3

(4)

Then, the candidate model is

M :

 

x 1j x 2j x 3j x 4j

 

0

i.i.d. G j = 1, . . . N.

Here, the bootstrap estimators ˆ µ and ˆ Σ r of µ and Σ are given as the following; By replacing Σ and ˆ Σ r

with the maximum likelihood estimator (n/N)S for Σ and the bootstrap estimator S with respect to (2), we shall derive the bias term for EIC

˜ b r (i) = E (0) [D (i)]

= E (0) [L( n

N S, Σ ˆ r ) L(S , Σ ˆ r )]

= n 2 N E (0)

"

tr

µ S 11 S 13 S 31 S 33

−1 µ

S 11 S 13

S 31 S 33

¶ + tr

µ S 11 S 12 S 21 S 22

−1 µ

S 11 S 12

S 21 S 22

+ tr

µ S 33 S 34 S 43 S 44

−1 µ

S 33 S 34

S 43 S 44

trS 11 ∗−1 S 11 trS ∗−1 33 S 33

#

n(p + q).

where i corresponds for ith bootstraap sample of B ones.Then b r 1

B X B

i=1

D (i), where B is a number of bootstrap. Therefore,

EIC r = L(S, Σ ˆ r ) + b r

4 Simulation

We attempt to give an impression of the relative performances of AIC, M AIC and EIC. So we simulate these information criteria for some setting, and inspect them about an approximation with a true risk and about the probability for each model selected.

References

[1] Akaike,H.(1973). Information theory and an extension of the maximum likelihood principle. 2nd Interna- tional Symposium on Information Theory, Eds.B.N.Petrov and F.Cs´ aki,pp.267-281, Budapest: Akad´ emia Kiado.

[2] Anderson,T.W.(2003). An Introduction to Multivariate Statistical Analysis, Wiley Interscience, 3rd Edition.

[3] Fujikoshi,Y.(1982). A test for additional information in canonical correlation analysis. Ann. Inst. Statist.

Part A, 34, 523-530.

[4] Fujikoshi,Y.(1985). Selection of variables in discriminant analysis and canonical correlation analysis. Mul- tivariate Analysis, -VI,Ed. P.R. Krishnaian, 219-236, Elsevier Science Publishers B.V.

[5] Fujikoshi,Y.(2007). Corrected AIC for selecting of variables in canonical correlation analysis and some conditional independence structures. Submitted for publication.

[6] Ishiguro,M., Sakamoto,Y. and Kitagawa,G.(1996). Bootstrapping log-likelihood and EIC, an extension of AIC. Inst. Statist. Math..

[7] Konishi,S. and Kitagawa,G.(1996). Generalised information criteria in model selection. Biometrika, 83, 4, pp. 875-890.

[8] Rao,C.R.(1973). Linear Statistical Inference and Its Applications, John Wiley, New York.

[9] Siotani,M., Hayakawa,T. and Fujikoshi,Y.(1985). Modern Multivariate Statical Analysis: A Graduate Course and Handbook, American Sciences Press, Ohio.

4

参照

関連したドキュメント

This paper deals with a reverse of the Hardy-Hilbert’s type inequality with a best constant factor.. The other reverse of the form

It is suggested by our method that most of the quadratic algebras for all St¨ ackel equivalence classes of 3D second order quantum superintegrable systems on conformally flat

pole placement, condition number, perturbation theory, Jordan form, explicit formulas, Cauchy matrix, Vandermonde matrix, stabilization, feedback gain, distance to

In particular, we consider a reverse Lee decomposition for the deformation gra- dient and we choose an appropriate state space in which one of the variables, characterizing the

The main purpose of this paper is to show, under the hypothesis of uniqueness of the maximizing probability, a Large Deviation Principle for a family of absolutely continuous

This paper develops a recursion formula for the conditional moments of the area under the absolute value of Brownian bridge given the local time at 0.. The method of power series

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

The proof uses a set up of Seiberg Witten theory that replaces generic metrics by the construction of a localised Euler class of an infinite dimensional bundle with a Fredholm