• 検索結果がありません。

10 Asymptotic Theory

N/A
N/A
Protected

Academic year: 2021

シェア "10 Asymptotic Theory"

Copied!
20
0
0

読み込み中.... (全文を見る)

全文

(1)

10 Asymptotic Theory

1. Definition: Convergence in Distribution (分布収束)

A series of random variablesX1,X2,· · ·,Xn,· · ·have distribution functionsF1, F2,· · ·, respectively.

If

n→∞lim Fn = F,

then we say that a series of random variables X1, X2, · · · converges to F in distribution.

2. Consistency (一致性):

(a) Definition: Convergence in Probability (確率収束) Let{Zn : n=1,2,· · ·}be a series of random variables.

(2)

If the following holds,

n→∞limP(|Zn−θ|< )=1,

for any positive, then we say thatZnconverges toθin probability.

θis called aprobability limit (確率極限)ofZn. plimZn =θ.

(b) Let ˆθnbe an estimator of parameterθ.

If ˆθnconverges toθin probability, we say that ˆθnis a consistent estimator ofθ.

3. A General Case ofChebyshev’s Inequality:

Forg(X)≥ 0,

P(g(X)k)≤ E(g(X)) k ,

(3)

wherekis a positive constant.

4. Example: For a random variableX, setg(X)= (X−µ)0(X−µ), E(X)=µand V(X)= Σ.

Then, we have the following inequality:

P((X−µ)0(X−µ)≥ k)≤ tr(Σ) k .

Note as follows:

E((X−µ)0(X−µ))=E

tr((X−µ)0(X−µ))

= E

tr((X−µ)(X−µ)0)

=tr

E((X−µ)(X−µ)0)

= tr(Σ).

(4)

5. Example 1 (Univariate Case):

Suppose thatXi ∼(µ, σ2),i= 1,2,· · ·,n.

Then, the sample averageX is a consistent estimator ofµ.

Proof:

Note thatg(X)=(X−µ)2,2 =k, E(g(X))= V(X)= σ2 n . Use Chebyshev’s inequality.

Ifn−→ ∞,

P(|X−µ| ≥)≤ σ2

n2 −→0, for any. That is. for any,

n→∞limP(|X−µ|< )=1.

=⇒ Chebyshev’s inequality

(5)

6. Example 2 (Multivariate Case):

Suppose thatXi ∼(µ,Σ),i=1,2,· · ·,n.

Then, the sample averageX is a consistent estimator ofµ.

Proof:

Note thatg(X)=(X−µ)0(X−µ),2 =k, E(g(X))=tr V(X)

=tr1 nΣ

. Use Chebyshev’s inequality.

Ifn−→ ∞,

P((X−µ)0(X−µ)≥ k)= P(|X−µ| ≥)≤ tr(Σ)

n2 −→0, for any positive. That is. for any positive, limn→∞P((X−µ)0(X−µ)< k)=1.

Note that|X−µ|= q

(X−µ)0(X−µ), which is the distance betweenX andµ.

=⇒ Chebyshev’s inequality

(6)

7. Some Formulas:

LetXnandYnbe the random variables which satisfy plimXn =cand plimYn= d. Then,

(a) plim (Xn+Yn)=c+d (b) plimXnYn =cd

(c) plimXn/Yn =c/dford ,0

(d) plimg(Xn)=g(c) for a functiong(·)

=⇒ Slutsky’s Theorem (スルツキー定理)

(7)

8. Central Limit Theorem (中心極限定理)

Univariate Case: X1, X2,· · ·, Xn are mutually independently and identically distributed asXi ∼(µ, σ2).

Then,

X−E(X) q

V(X)

= X−µ σ/√

n −→ N(0,1), which implies

n(X−µ)= 1

n Xn

i=1

(Xi−µ) −→ N(0, σ2).

(8)

Multivariate Case: X1,X2,· · ·,Xnare mutually independently and identically distributed asXi ∼(µ, Σ).

Then,

√1 n

Xn i=1

(Xi−µ) −→ N(0,Σ) 9. Central Limit Theorem (Generalization)

X1, X2, · · ·, Xn are mutually independently and identically distributed as Xi ∼ (µ, Σi).

Then,

√1 n

Xn i=1

(Xi−µ) −→ N(0,Σ), where

Σ = lim

n→∞



1 n

Xn i=1

Σi



.

(9)

10. Definition: Let ˆθnbe a consistent estimator ofθ.

Suppose that √

n(ˆθn−θ) converges toN(0,Σ) in distribution.

Then, we say that ˆθnhas anasymptotic distribution (漸近分布): N(θ,Σ/n).

10.1 MLE: Asymptotic Properties

1. X1,X2,· · ·,Xn are random variables with density function f(x;θ).

Let ˆθnbe a maximum likelihood estimator ofθ.

Then, under someregularity conditions. ˆθn is a consistent estimator ofθand the asymptotic distribution of √

n(ˆθ−θ) is given by: N



0,lim I(θ) n

!−1

. 2. Regularity Conditions:

(a) The domain ofXi does not depend onθ.

(10)

(b) There exists at least third-order derivative of f(x;θ) with respect toθ, and their derivatives are finite.

3. Thus, MLE is

(i) consistent,

(ii) asymptotically normal,and (iii) asymptotically efficient.

Proof: The log-likelihood function is given by:

logL(θ)=log Yn

i=1

f(Xi;θ)= Xn

i=1

log f(Xi;θ) Note that the MLE ˜θsatisfies:

∂logL(˜θ)

∂θ =

Xn i=1

∂log f(Xi; ˜θ)

∂θ =0.

(11)

Xi is a random variable.

On the other hand, the integration of L(θ) with respect to x = (x1,x2,· · ·,xn) is one, becauseL(θ) is a joint distribution of x1, x2,· · ·, xn. Therefore, we have:

Z

L(θ)dx= 1.

Taking the first-derivative of the above equation on both sides with respect toθ, we

obtain: Z

∂L(θ)

∂θ dx=0, which is rewritten as:

Z ∂L(θ)

∂θ dx=

Z ∂logL(θ)

∂θ L(θ)dx= E∂logL(θ)

∂θ

=0.

Taking the derivative with respectiveθ, again (the second-derivative ofR

L(θ)dx= 1

(12)

on both sides with respect toθ), we have:

Z ∂2logL(θ)

∂θ2 L(θ)dx+

Z ∂logL(θ)

∂θ

∂logL(θ)

∂θ0 L(θ)dx= 0, which is rewritten as follows:

Z ∂2logL(θ)

∂θ2 L(θ)dx=

Z ∂logL(θ)

∂θ

∂logL(θ)

∂θ0 L(θ)dx.

That is, we can derive the following:

−E∂2logL(θ)

∂θ∂θ0

= E∂logL(θ)

∂θ

∂logL(θ)

∂θ0

=V∂logL(θ)

∂θ

I(θ),

where the second equality holds because of E∂logL(θ)

∂θ

= 0.

I(θ) is called Fisher’s information matrix (or simply, information matrix).

Thus, the first-derivative ofL(θ) is distributed as mean zero and varianceI(θ), i.e.,

∂logL(θ)

∂θ =

Xn i=1

∂logf(Xi;θ)

∂θ ∼ (0,I(θ)).

(13)

Note that we do not know the distribution of the first-derivative of L(θ), because we do not specify functional form of f(·)

Using the central limit theorem (generalization) shown above, asymptotically we ob- tain the following distribution:

√1 n

∂logL(θ)

∂θ = 1

n Xn

i=1

∂log f(Xi;θ)

∂θ −→ N(0,Σ), whereΣ = lim

n→∞

1 nI(θ)

.

Let ˜θbe the maximum likelihood estimator.

Linearizing ∂logL(˜θ)

∂θ around ˜θ= θ, we obtain:

0= 1

n

∂logL(˜θ)

∂θ ≈ 1

n

∂logL(θ)

∂θ + 1

n

2logL(θ)

∂θ∂θ0 (˜θ−θ),

where the rest of terms (i.e., the second-order term, the third-order term, ...) are ig-

(14)

nored, which implies that the distribution of 1

n

∂logL(θ)

∂θ is asymptotically equiva- lent to that of 1

n

2logL(θ)

∂θ∂θ0 (˜θ−θ).

We have already known the distribution of 1

n

∂logL(θ)

∂θ as follows:

√1 n

∂logL(θ)

∂θ ≈ − 1

n

2logL(θ)

∂θ∂θ0 (˜θ−θ)= −1 n

2logL(θ)

∂θ∂θ0

! √

n(˜θ−θ) −→ N(0,Σ).

Note as follows:

−1 n

2logL(θ)

∂θ∂θ0 −→ lim

n→∞

1 nE

−∂2logL(θ)

∂θ∂θ0

!= lim

n→∞

1 nI(θ)

= Σ.

Thus, −1 n

2logL(θ)

∂θ∂θ0

! √

n(˜θ−θ) asymptotically has the same distribution asΣ√ n(˜θ−

θ).

Therefore,

V(Σ√

n(bθ−θ))= ΣV(√

n(bθ−θ))Σ0 −→ Σ.

(15)

Note thatΣ = Σ0. Thus, we have the asymptotic variance of √

n(bθ−θ) as follows:

V(√

n(bθ−θ)) −→ Σ−1ΣΣ−1 = Σ−1. Finally, we obtain:

n(bθ−θ) −→ N(0,Σ−1).

(16)

11 Consistency and Asymptotic Normality of OLSE

Regression model: y= +u, u∼ (0, σ2In).

Consistency:

1. Let ˆβn = (X0X)−1X0ybe the OLS with sample sizen.

Consistency: Asnis large, ˆβnconverges toβ.

2. Assume the stationarity condition forX, i.e., 1

nX0X −→ Mxx. and no correlation betweenXandu, i.e.,

1

nX0u −→ 0.

(17)

3. Note that 1

nX0X −→ Mxx results in (1

nX0X)−1 −→ M−1xx.

=⇒Slutsky’s Theorem

(*)Slutsky’s Theorem g(ˆθ)−→g(θ), when ˆθ−→θ.

4. OLS is given by:

βˆn=β+(X0X)−1X0u= β+(1

nX0X)−1(1 nX0u).

Therefore,

βˆn−→β+M−1xx ×0=β Thus, OLSE is a consistent estimator.

(18)

Asymptotic Normality:

1. Asymptotic Normality of OLSE

n( ˆβn−β) −→ N(0.σ2M−1xx), whenn −→ ∞.

2. Central Limit Theorem:Greenberg and Webster (1983)

Z1, Z2, · · ·, Zn are mutually independent. Zi is distributed with mean µ and varianceΣi fori=1,2,· · ·,n.

Then, we have the following result:

√1 n

Xn i=1

(Zi−µ) −→ N(0,Σ), where

Σ = lim

n→∞



1 n

Xn i=1

Σi



. Note that the distribution ofZi is not assumed.

(19)

3. DefineZi = xi0ui. Then,Σi =V(Zi)=σ2x0ixi. 4. Σis defined as:

Σ = lim

n→∞



1 n

Xn i=1

σ2x0ixi



= σ2lim

n→∞

1 nX0X

!

2Mxx,

where

X =







x1 x2 ...

xn







5. Applying Central Limit Theorem (Greenberg and Webster (1983), we obtain the following:

√1 n

Xn i=1

x0iui = 1

nX0u−→ N(0, σ2Mxx).

(20)

On the other hand, from ˆβn =β+(X0X)−1X0u, we can rewrite as:

n( ˆβ−β)=1

nX0X−1 1

nX0u.

V 1

nX0X−1 1

nX0u

!

=E 1

nX0X−1 1

nX0u1

nX0X−1 1

nX0u0!

=1

nX0X−11

nX0E(uu0)X1

nX0X−1

21

nX0X−11

nX0X1

nX0X−1

−→ σ2M−1xxMxxM−1xx2M−1xx. Therefore,

n( ˆβ−β) −→ N(0, σ2Mxx−1)

=⇒Asymptotic normality (漸近的正規性) of OLSE The distribution ofuiis not assumed.

参照

関連したドキュメント

Burchuladze’s papers [4–5], where the asymptotic formu- las for the distribution of eigenfunctions of the boundary value oscillation problems are obtained for isotropic and

Operation is subject to the ing two conditions: (1) This device may not cause harmful interference, ) this device must accept any interference received, including interference ay

Using meshes defined by the nodal hierarchy, an edge based multigrid hierarchy is developed, which includes inter-grid transfer operators, coarse grid discretizations, and coarse

Keywords: continuous time random walk, Brownian motion, collision time, skew Young tableaux, tandem queue.. AMS 2000 Subject Classification: Primary:

This paper is devoted to the investigation of the global asymptotic stability properties of switched systems subject to internal constant point delays, while the matrices defining

Proof.. One can choose Z such that is has contractible connected components. This simply follows from the general fact that under the assumption that the functor i : Gr // T is

In fact, we have shown that, for the more natural and general condition of initial-data, any 2 × 2 totally degenerated system of conservation laws, which the characteristics speeds

We study the classical invariant theory of the B´ ezoutiant R(A, B) of a pair of binary forms A, B.. We also describe a ‘generic reduc- tion formula’ which recovers B from R(A, B)