10 Asymptotic Theory
1. Definition: Convergence in Distribution (分布収束)
A series of random variablesX1,X2,· · ·,Xn,· · ·have distribution functionsF1, F2,· · ·, respectively.
If
n→∞lim Fn = F,
then we say that a series of random variables X1, X2, · · · converges to F in distribution.
2. Consistency (一致性):
(a) Definition: Convergence in Probability (確率収束) Let{Zn : n=1,2,· · ·}be a series of random variables.
If the following holds,
n→∞limP(|Zn−θ|< )=1,
for any positive, then we say thatZnconverges toθin probability.
θis called aprobability limit (確率極限)ofZn. plimZn =θ.
(b) Let ˆθnbe an estimator of parameterθ.
If ˆθnconverges toθin probability, we say that ˆθnis a consistent estimator ofθ.
3. A General Case ofChebyshev’s Inequality:
Forg(X)≥ 0,
P(g(X)≥k)≤ E(g(X)) k ,
wherekis a positive constant.
4. Example: For a random variableX, setg(X)= (X−µ)0(X−µ), E(X)=µand V(X)= Σ.
Then, we have the following inequality:
P((X−µ)0(X−µ)≥ k)≤ tr(Σ) k .
Note as follows:
E((X−µ)0(X−µ))=E
tr((X−µ)0(X−µ))
= E
tr((X−µ)(X−µ)0)
=tr
E((X−µ)(X−µ)0)
= tr(Σ).
5. Example 1 (Univariate Case):
Suppose thatXi ∼(µ, σ2),i= 1,2,· · ·,n.
Then, the sample averageX is a consistent estimator ofµ.
Proof:
Note thatg(X)=(X−µ)2,2 =k, E(g(X))= V(X)= σ2 n . Use Chebyshev’s inequality.
Ifn−→ ∞,
P(|X−µ| ≥)≤ σ2
n2 −→0, for any. That is. for any,
n→∞limP(|X−µ|< )=1.
=⇒ Chebyshev’s inequality
6. Example 2 (Multivariate Case):
Suppose thatXi ∼(µ,Σ),i=1,2,· · ·,n.
Then, the sample averageX is a consistent estimator ofµ.
Proof:
Note thatg(X)=(X−µ)0(X−µ),2 =k, E(g(X))=tr V(X)
=tr1 nΣ
. Use Chebyshev’s inequality.
Ifn−→ ∞,
P((X−µ)0(X−µ)≥ k)= P(|X−µ| ≥)≤ tr(Σ)
n2 −→0, for any positive. That is. for any positive, limn→∞P((X−µ)0(X−µ)< k)=1.
Note that|X−µ|= q
(X−µ)0(X−µ), which is the distance betweenX andµ.
=⇒ Chebyshev’s inequality
7. Some Formulas:
LetXnandYnbe the random variables which satisfy plimXn =cand plimYn= d. Then,
(a) plim (Xn+Yn)=c+d (b) plimXnYn =cd
(c) plimXn/Yn =c/dford ,0
(d) plimg(Xn)=g(c) for a functiong(·)
=⇒ Slutsky’s Theorem (スルツキー定理)
8. Central Limit Theorem (中心極限定理)
Univariate Case: X1, X2,· · ·, Xn are mutually independently and identically distributed asXi ∼(µ, σ2).
Then,
X−E(X) q
V(X)
= X−µ σ/√
n −→ N(0,1), which implies
√n(X−µ)= 1
√n Xn
i=1
(Xi−µ) −→ N(0, σ2).
Multivariate Case: X1,X2,· · ·,Xnare mutually independently and identically distributed asXi ∼(µ, Σ).
Then,
√1 n
Xn i=1
(Xi−µ) −→ N(0,Σ) 9. Central Limit Theorem (Generalization)
X1, X2, · · ·, Xn are mutually independently and identically distributed as Xi ∼ (µ, Σi).
Then,
√1 n
Xn i=1
(Xi−µ) −→ N(0,Σ), where
Σ = lim
n→∞
1 n
Xn i=1
Σi
.
10. Definition: Let ˆθnbe a consistent estimator ofθ.
Suppose that √
n(ˆθn−θ) converges toN(0,Σ) in distribution.
Then, we say that ˆθnhas anasymptotic distribution (漸近分布): N(θ,Σ/n).
10.1 MLE: Asymptotic Properties
1. X1,X2,· · ·,Xn are random variables with density function f(x;θ).
Let ˆθnbe a maximum likelihood estimator ofθ.
Then, under someregularity conditions. ˆθn is a consistent estimator ofθand the asymptotic distribution of √
n(ˆθ−θ) is given by: N
0,lim I(θ) n
!−1
. 2. Regularity Conditions:
(a) The domain ofXi does not depend onθ.
(b) There exists at least third-order derivative of f(x;θ) with respect toθ, and their derivatives are finite.
3. Thus, MLE is
(i) consistent,
(ii) asymptotically normal,and (iii) asymptotically efficient.
Proof: The log-likelihood function is given by:
logL(θ)=log Yn
i=1
f(Xi;θ)= Xn
i=1
log f(Xi;θ) Note that the MLE ˜θsatisfies:
∂logL(˜θ)
∂θ =
Xn i=1
∂log f(Xi; ˜θ)
∂θ =0.
Xi is a random variable.
On the other hand, the integration of L(θ) with respect to x = (x1,x2,· · ·,xn) is one, becauseL(θ) is a joint distribution of x1, x2,· · ·, xn. Therefore, we have:
Z
L(θ)dx= 1.
Taking the first-derivative of the above equation on both sides with respect toθ, we
obtain: Z
∂L(θ)
∂θ dx=0, which is rewritten as:
Z ∂L(θ)
∂θ dx=
Z ∂logL(θ)
∂θ L(θ)dx= E∂logL(θ)
∂θ
=0.
Taking the derivative with respectiveθ, again (the second-derivative ofR
L(θ)dx= 1
on both sides with respect toθ), we have:
Z ∂2logL(θ)
∂θ2 L(θ)dx+
Z ∂logL(θ)
∂θ
∂logL(θ)
∂θ0 L(θ)dx= 0, which is rewritten as follows:
−
Z ∂2logL(θ)
∂θ2 L(θ)dx=
Z ∂logL(θ)
∂θ
∂logL(θ)
∂θ0 L(θ)dx.
That is, we can derive the following:
−E∂2logL(θ)
∂θ∂θ0
= E∂logL(θ)
∂θ
∂logL(θ)
∂θ0
=V∂logL(θ)
∂θ
≡ I(θ),
where the second equality holds because of E∂logL(θ)
∂θ
= 0.
I(θ) is called Fisher’s information matrix (or simply, information matrix).
Thus, the first-derivative ofL(θ) is distributed as mean zero and varianceI(θ), i.e.,
∂logL(θ)
∂θ =
Xn i=1
∂logf(Xi;θ)
∂θ ∼ (0,I(θ)).
Note that we do not know the distribution of the first-derivative of L(θ), because we do not specify functional form of f(·)
Using the central limit theorem (generalization) shown above, asymptotically we ob- tain the following distribution:
√1 n
∂logL(θ)
∂θ = 1
√n Xn
i=1
∂log f(Xi;θ)
∂θ −→ N(0,Σ), whereΣ = lim
n→∞
1 nI(θ)
.
Let ˜θbe the maximum likelihood estimator.
Linearizing ∂logL(˜θ)
∂θ around ˜θ= θ, we obtain:
0= 1
√n
∂logL(˜θ)
∂θ ≈ 1
√n
∂logL(θ)
∂θ + 1
√n
∂2logL(θ)
∂θ∂θ0 (˜θ−θ),
where the rest of terms (i.e., the second-order term, the third-order term, ...) are ig-
nored, which implies that the distribution of 1
√n
∂logL(θ)
∂θ is asymptotically equiva- lent to that of 1
√n
∂2logL(θ)
∂θ∂θ0 (˜θ−θ).
We have already known the distribution of 1
√n
∂logL(θ)
∂θ as follows:
√1 n
∂logL(θ)
∂θ ≈ − 1
√n
∂2logL(θ)
∂θ∂θ0 (˜θ−θ)= −1 n
∂2logL(θ)
∂θ∂θ0
! √
n(˜θ−θ) −→ N(0,Σ).
Note as follows:
−1 n
∂2logL(θ)
∂θ∂θ0 −→ lim
n→∞
1 nE
−∂2logL(θ)
∂θ∂θ0
!= lim
n→∞
1 nI(θ)
= Σ.
Thus, −1 n
∂2logL(θ)
∂θ∂θ0
! √
n(˜θ−θ) asymptotically has the same distribution asΣ√ n(˜θ−
θ).
Therefore,
V(Σ√
n(bθ−θ))= ΣV(√
n(bθ−θ))Σ0 −→ Σ.
Note thatΣ = Σ0. Thus, we have the asymptotic variance of √
n(bθ−θ) as follows:
V(√
n(bθ−θ)) −→ Σ−1ΣΣ−1 = Σ−1. Finally, we obtain:
√n(bθ−θ) −→ N(0,Σ−1).
11 Consistency and Asymptotic Normality of OLSE
Regression model: y= Xβ+u, u∼ (0, σ2In).
Consistency:
1. Let ˆβn = (X0X)−1X0ybe the OLS with sample sizen.
Consistency: Asnis large, ˆβnconverges toβ.
2. Assume the stationarity condition forX, i.e., 1
nX0X −→ Mxx. and no correlation betweenXandu, i.e.,
1
nX0u −→ 0.
3. Note that 1
nX0X −→ Mxx results in (1
nX0X)−1 −→ M−1xx.
=⇒Slutsky’s Theorem
(*)Slutsky’s Theorem g(ˆθ)−→g(θ), when ˆθ−→θ.
4. OLS is given by:
βˆn=β+(X0X)−1X0u= β+(1
nX0X)−1(1 nX0u).
Therefore,
βˆn−→β+M−1xx ×0=β Thus, OLSE is a consistent estimator.
Asymptotic Normality:
1. Asymptotic Normality of OLSE
√n( ˆβn−β) −→ N(0.σ2M−1xx), whenn −→ ∞.
2. Central Limit Theorem:Greenberg and Webster (1983)
Z1, Z2, · · ·, Zn are mutually independent. Zi is distributed with mean µ and varianceΣi fori=1,2,· · ·,n.
Then, we have the following result:
√1 n
Xn i=1
(Zi−µ) −→ N(0,Σ), where
Σ = lim
n→∞
1 n
Xn i=1
Σi
. Note that the distribution ofZi is not assumed.
3. DefineZi = xi0ui. Then,Σi =V(Zi)=σ2x0ixi. 4. Σis defined as:
Σ = lim
n→∞
1 n
Xn i=1
σ2x0ixi
= σ2lim
n→∞
1 nX0X
!
=σ2Mxx,
where
X =
x1 x2 ...
xn
5. Applying Central Limit Theorem (Greenberg and Webster (1983), we obtain the following:
√1 n
Xn i=1
x0iui = 1
√nX0u−→ N(0, σ2Mxx).
On the other hand, from ˆβn =β+(X0X)−1X0u, we can rewrite as:
√n( ˆβ−β)=1
nX0X−1 1
√nX0u.
V 1
nX0X−1 1
√nX0u
!
=E 1
nX0X−1 1
√nX0u1
nX0X−1 1
√nX0u0!
=1
nX0X−11
nX0E(uu0)X1
nX0X−1
=σ21
nX0X−11
nX0X1
nX0X−1
−→ σ2M−1xxMxxM−1xx =σ2M−1xx. Therefore,
√n( ˆβ−β) −→ N(0, σ2Mxx−1)
=⇒Asymptotic normality (漸近的正規性) of OLSE The distribution ofuiis not assumed.