10 Asymptotic Theory

(1)

10 Asymptotic Theory

1. Definition: Convergence in Distribution (分布収束)

A series of random variablesX₁,X₂,· · ·,X_n,· · ·have distribution functionsF₁, F2,· · ·, respectively.

If

n→∞lim F_n = F,

then we say that a series of random variables X₁, X₂, · · · converges to F in distribution.

2. Consistency (一致性):

(a) Definition: Convergence in Probability (確率収束) Let{Z_n : n=1,2,· · ·}be a series of random variables.

(2)

If the following holds,

n→∞limP(|Zn−θ|< )=1,

for any positive, then we say thatZ_nconverges toθin probability.

θis called aprobability limit (確率極限)ofZn. plimZ_n =θ.

(b) Let ˆθ_nbe an estimator of parameterθ.

If ˆθ_nconverges toθin probability, we say that ˆθ_nis a consistent estimator ofθ.

3. A General Case ofChebyshev’s Inequality:

Forg(X)≥ 0,

P(g(X)≥k)≤ E(g(X)) k ,

(3)

wherekis a positive constant.

4. Example: For a random variableX, setg(X)= (X−µ)⁰(X−µ), E(X)=µand V(X)= Σ.

Then, we have the following inequality:

P((X−µ)⁰(X−µ)≥ k)≤ tr(Σ) k .

Note as follows:

E((X−µ)⁰(X−µ))=E

tr((X−µ)⁰(X−µ))

= E

tr((X−µ)(X−µ)⁰)

=tr

E((X−µ)(X−µ)⁰)

= tr(Σ).

(4)

5. Example 1 (Univariate Case):

Suppose thatX_i ∼(µ, σ²),i= 1,2,· · ·,n.

Then, the sample averageX is a consistent estimator ofµ.

Proof:

Note thatg(X)=(X−µ)²,² =k, E(g(X))= V(X)= σ² n . Use Chebyshev’s inequality.

Ifn−→ ∞,

P(|X−µ| ≥)≤ σ²

n² −→0, for any. That is. for any,

n→∞limP(|X−µ|< )=1.

=⇒ Chebyshev’s inequality

(5)

6. Example 2 (Multivariate Case):

Suppose thatXi ∼(µ,Σ),i=1,2,· · ·,n.

Then, the sample averageX is a consistent estimator ofµ.

Proof:

Note thatg(X)=(X−µ)⁰(X−µ),² =k, E(g(X))=tr V(X)

=tr1 nΣ

. Use Chebyshev’s inequality.

Ifn−→ ∞,

P((X−µ)⁰(X−µ)≥ k)= P(|X−µ| ≥)≤ tr(Σ)

n² −→0, for any positive. That is. for any positive, lim_n→∞P((X−µ)⁰(X−µ)< k)=1.

Note that|X−µ|= q

(X−µ)⁰(X−µ), which is the distance betweenX andµ.

=⇒ Chebyshev’s inequality

(6)

7. Some Formulas:

LetX_nandY_nbe the random variables which satisfy plimX_n =cand plimY_n= d. Then,

(a) plim (X_n+Y_n)=c+d (b) plimX_nY_n =cd

(c) plimX_n/Y_n =c/dford ,0

(d) plimg(X_n)=g(c) for a functiong(·)

=⇒ Slutsky’s Theorem (スルツキー定理)

(7)

8. Central Limit Theorem (中心極限定理)

Univariate Case: X₁, X₂,· · ·, X_n are mutually independently and identically distributed asXi ∼(µ, σ²).

Then,

X−E(X) q

V(X)

= X−µ σ/√

n −→ N(0,1), which implies

√n(X−µ)= 1

√n Xn

i=1

(X_i−µ) −→ N(0, σ²).

(8)

Multivariate Case: X₁,X₂,· · ·,X_nare mutually independently and identically distributed asXi ∼(µ, Σ).

Then,

√1 n

Xn i=1

(X_i−µ) −→ N(0,Σ) 9. Central Limit Theorem (Generalization)

X₁, X₂, · · ·, X_n are mutually independently and identically distributed as X_i ∼ (µ, Σi).

Then,

√1 n

Xn i=1

(X_i−µ) −→ N(0,Σ), where

Σ = lim

n→∞



1 n

Xn i=1

Σ_i



.

(9)

10. Definition: Let ˆθ_nbe a consistent estimator ofθ.

Suppose that √

n(ˆθ_n−θ) converges toN(0,Σ) in distribution.

Then, we say that ˆθ_nhas anasymptotic distribution (漸近分布): N(θ,Σ/n).

10.1 MLE: Asymptotic Properties

1. X₁,X₂,· · ·,X_n are random variables with density function f(x;θ).

Let ˆθ_nbe a maximum likelihood estimator ofθ.

Then, under someregularity conditions. ˆθ_n is a consistent estimator ofθand the asymptotic distribution of √

n(ˆθ−θ) is given by: N



0,lim I(θ) n

!₋₁

. 2. Regularity Conditions:

(a) The domain ofX_i does not depend onθ.

(10)

(b) There exists at least third-order derivative of f(x;θ) with respect toθ, and their derivatives are finite.

3. Thus, MLE is

(i) consistent，

(ii) asymptotically normal，and (iii) asymptotically efficient.

Proof: The log-likelihood function is given by:

logL(θ)=log Yn

i=1

f(X_i;θ)= Xn

i=1

log f(X_i;θ) Note that the MLE ˜θsatisfies:

∂logL(˜θ)

∂θ =

Xn i=1

∂log f(Xi; ˜θ)

∂θ =0.

(11)

X_i is a random variable.

On the other hand, the integration of L(θ) with respect to x = (x1,x2,· · ·,xn) is one, becauseL(θ) is a joint distribution of x₁, x₂,· · ·, x_n. Therefore, we have:

Z

L(θ)dx= 1.

Taking the first-derivative of the above equation on both sides with respect toθ, we

obtain: Z

∂L(θ)

∂θ dx=0, which is rewritten as:

Z ∂L(θ)

∂θ dx=

Z ∂logL(θ)

∂θ L(θ)dx= E∂logL(θ)

∂θ

=0.

Taking the derivative with respectiveθ, again (the second-derivative ofR

L(θ)dx= 1

(12)

on both sides with respect toθ), we have:

Z ∂²logL(θ)

∂θ² L(θ)dx+

Z ∂logL(θ)

∂θ

∂logL(θ)

∂θ⁰ L(θ)dx= 0, which is rewritten as follows:

−

Z ∂²logL(θ)

∂θ² L(θ)dx=

Z ∂logL(θ)

∂θ

∂logL(θ)

∂θ⁰ L(θ)dx.

That is, we can derive the following:

−E∂²logL(θ)

∂θ∂θ⁰

= E∂logL(θ)

∂θ

∂logL(θ)

∂θ⁰

=V∂logL(θ)

∂θ

≡ I(θ),

where the second equality holds because of E∂logL(θ)

∂θ

= 0.

I(θ) is called Fisher’s information matrix (or simply, information matrix).

Thus, the first-derivative ofL(θ) is distributed as mean zero and varianceI(θ), i.e.,

∂logL(θ)

∂θ =

Xn i=1

∂logf(Xi;θ)

∂θ ∼ (0,I(θ)).

(13)

Note that we do not know the distribution of the first-derivative of L(θ), because we do not specify functional form of f(·)

Using the central limit theorem (generalization) shown above, asymptotically we obtain the following distribution:

√1 n

∂logL(θ)

∂θ = 1

√n Xn

i=1

∂log f(X_i;θ)

∂θ −→ N(0,Σ), whereΣ = lim

n→∞

1 nI(θ)

.

Let ˜θbe the maximum likelihood estimator.

Linearizing ∂logL(˜θ)

∂θ around ˜θ= θ, we obtain:

0= 1

√n

∂logL(˜θ)

∂θ ≈ 1

√n

∂logL(θ)

∂θ + 1

√n

∂²logL(θ)

∂θ∂θ⁰ (˜θ−θ),

where the rest of terms (i.e., the second-order term, the third-order term, ...) are ig-

(14)

nored, which implies that the distribution of 1

√n

∂logL(θ)

∂θ is asymptotically equiva- lent to that of 1

√n

∂²logL(θ)

∂θ∂θ⁰ (˜θ−θ).

We have already known the distribution of 1

√n

∂logL(θ)

∂θ as follows:

√1 n

∂logL(θ)

∂θ ≈ − 1

√n

∂²logL(θ)

∂θ∂θ⁰ (˜θ−θ)= −1 n

∂²logL(θ)

∂θ∂θ⁰

! √

n(˜θ−θ) −→ N(0,Σ).

Note as follows:

−1 n

∂²logL(θ)

∂θ∂θ⁰ −→ lim

n→∞

1 nE

−∂²logL(θ)

∂θ∂θ⁰

!= lim

n→∞

1 nI(θ)

= Σ.

Thus, −1 n

∂²logL(θ)

∂θ∂θ⁰

! √

n(˜θ−θ) asymptotically has the same distribution asΣ√ n(˜θ−

θ).

Therefore,

V(Σ√

n(bθ−θ))= ΣV(√

n(bθ−θ))Σ⁰ −→ Σ.

(15)

Note thatΣ = Σ⁰. Thus, we have the asymptotic variance of √

n(bθ−θ) as follows:

V(√

n(bθ−θ)) −→ Σ⁻¹ΣΣ⁻¹ = Σ⁻¹. Finally, we obtain:

√n(bθ−θ) −→ N(0,Σ⁻¹).

(16)

11 Consistency and Asymptotic Normality of OLSE

Regression model: y= Xβ+u, u∼ (0, σ²I_n).

Consistency:

1. Let ˆβ_n = (X⁰X)⁻¹X⁰ybe the OLS with sample sizen.

Consistency: Asnis large, ˆβ_nconverges toβ.

2. Assume the stationarity condition forX, i.e., 1

nX⁰X −→ M_xx. and no correlation betweenXandu, i.e.,

1

nX⁰u −→ 0.

(17)

3. Note that 1

nX⁰X −→ M_xx results in (1

nX⁰X)⁻¹ −→ M⁻¹_xx.

=⇒Slutsky’s Theorem

(*)Slutsky’s Theorem g(ˆθ)−→g(θ), when ˆθ−→θ.

4. OLS is given by:

βˆ_n=β+(X⁰X)⁻¹X⁰u= β+(1

nX⁰X)⁻¹(1 nX⁰u).

Therefore,

βˆ_n−→β+M⁻¹_xx ×0=β Thus, OLSE is a consistent estimator.

(18)

Asymptotic Normality:

1. Asymptotic Normality of OLSE

√n( ˆβ_n−β) −→ N(0.σ²M⁻¹_xx), whenn −→ ∞.

2. Central Limit Theorem:Greenberg and Webster (1983)

Z₁, Z₂, · · ·, Z_n are mutually independent. Z_i is distributed with mean µ and varianceΣ_i fori=1,2,· · ·,n.

Then, we have the following result:

√1 n

Xn i=1

(Z_i−µ) −→ N(0,Σ), where

Σ = lim

n→∞



1 n

Xn i=1

Σ_i



. Note that the distribution ofZ_i is not assumed.

(19)

3. DefineZ_i = x_i⁰u_i. Then,Σ_i =V(Z_i)=σ²x⁰_ix_i. 4. Σis defined as:

Σ = lim

n→∞



1 n

Xn i=1

σ²x⁰_ix_i



= σ²lim

n→∞

1 nX⁰X

!

=σ²M_xx,

where

X =







x₁ x₂ ...

x_n







5. Applying Central Limit Theorem (Greenberg and Webster (1983), we obtain the following:

√1 n

Xn i=1

x⁰_iu_i = 1

√nX⁰u−→ N(0, σ²M_xx).

(20)

On the other hand, from ˆβ_n =β+(X⁰X)⁻¹X⁰u, we can rewrite as:

√n( ˆβ−β)=1

nX⁰X₋₁ 1

√nX⁰u.

V 1

nX⁰X₋₁ 1

√nX⁰u

!

=E 1

nX⁰X₋₁ 1

√nX⁰u1

nX⁰X₋₁ 1

√nX⁰u₀!

=1

nX⁰X₋₁1

nX⁰E(uu⁰)X1

nX⁰X₋₁

=σ²1

nX⁰X₋₁1

nX⁰X1

nX⁰X₋₁

−→ σ²M⁻¹_xxM_xxM⁻¹_xx =σ²M⁻¹_xx. Therefore,

√n( ˆβ−β) −→ N(0, σ²M_xx⁻¹)

=⇒Asymptotic normality (漸近的正規性) of OLSE The distribution ofu_iis not assumed.