TA session note#11
Shouto Yonekura
June 29, 2016
Abstract
Although I am going to use this note on 12th July, reading this might be useful to you to understand assingments.
Contents
1 Modes of convergence 1
2 Some useful tools for asymptotic theory 2
3 LLN&CLT 5
4 Consistency 9
5 Asymptotic properties of the OLSE 10
1 Modes of convergence
Def11.1
For random vector X := (X1, X2, · · · Xm) ∈ Rn, the distribtuin function of X, defined for x := (x1, x2, · · · , xn) ∈ Rn, denoted by FX(x) := P (X ≤ x). Let {Xn} be random vectors with values in Rn.
(1) Xn converges almost shurely to X, Xn→a.s.X is P (limn→∞Xn = X) = 1
(2) For a real number r > 0, Xn converges in the rth mean to X, Xn→rX, if E[(Xn− X)r] → 0 as n → ∞. (3) Xn converges in probability to X, Xn→pX, if for every ǫ > 0, limn→∞P (| Xn− X |≤ ǫ) = 1
(4) Xn converges in law or in distribution to X, Xn→dX, if FXn(x) → FX(x) as n → ∞, for all points x at which FX(x) is continuous
Example1
We say that a random vector X ∈ Rn is degenerate at a point c ∈ Rn is P (X = c) = 1. Let Xn ∈ R be degenerate at 1/n for n = 1, 2, · · · and X ∈ R be degenerated at 0. Since 1/n → 0 as n → ∞, it might be expected that Xn →dX. The distribution function of Xn is FXn= 1[1/n,∞), and that of X is FX = 1[0,∞]. Then FXn → FX
for all x except x = 0, and for x = 0 we get FXn(0) = 0 6= 1 = FX(0). However, since FX(x) is not continuous at x = 0, we nevertheless have Xn→dX.
Prop11.2
(a) Xn→a.sX =⇒ Xn→pX (b) Xn →rX =⇒ Xn→pX (c) Xn→pX =⇒ Xn →d X Proof
(a)
Let ǫ > 0. Then
P (| Xn− X |> ǫ) = E[1(ǫ,∞)(| Xn− X |)] holds. Since Xn →a.sX, using the dominated convergence theorem, we get
limn→∞E[1(ǫ,∞)(| Xn− X |)] = E[limn→∞1(ǫ,∞)(| Xn− X |)]
= 0
(b)
By the Chebyshev’s inequality, we will get
P (| Xn− X |> ǫ) ≤ ǫ1pE[| Xn− X |p]
→ 0 as n → ∞.
(c)
Let ǫ > 0 and let ι ∈ Rn represent the vector whose componets are all 1. If Xn ≤ x0, then either X ≤ x0+ ǫι or | X − Xn|> ǫ hold. In other words,{Xn≤ x0} ⊂ {X ≤ x0+ ǫι} ∪ {| X − Xn |> ǫ}. Hence
FXn(x0) ≤ FX(x0+ ǫι) + P (| X − Xn|> ǫ). Similarly,
FX(x0− ǫι) ≤ FXn(x0) + P (| X − Xn|> ǫ). Therefore, since P (| Xn− X |> ǫ) → 0,
FX(x0− ǫι) ≤ liminfFXn(x0) ≤ limsupFXn(x0) ≤ FX(x0+ ǫι).
If FX(x) is continious at x0, then the left and right ends of this inequality both converge to FX(x0) as ǫ → 0. This means FXn(x0) → FX(x0). Q.E.D.
2 Some useful tools for asymptotic theory
Prop11.3 (Markov’s inequality)
Let X be a random variable and h() be a non-negative function. If E[h(X)] < ∞, then P (h(X) ≥ a) ≤ 1aE[h(X)] ∀a > 0
holds. Proof
Let FX be the distribution function of X. Since ∀x, h(x) ≥ 0 E[h(X)] =´ h(X)dF (x)
≥´{h(x)≥a}h(X)dF (x)
=´ 1{h(x)≥a}h(X)dF (x)
≥´ 1{h(x)≥a}adF (x)
= aP (h(x) ≥ a) Q.E.D.
Note that 1A(x) is called an indicator function defined as 1A(x) =
(1 if x ∈ A 0 if x ∈ Ac. This function has a following property;
E[1A(x)] = P (A) × 1 + P (Ac) × 0
= P (A)
Lem11.4 (Chebyshev’s inequality)
Let X be a random variable with mean=µ and varince=σ2. Then P (| X − µ |≥ ǫ) ≤ ǫ12σ2 ∀ǫ > 0 Proof.
You just set h(X) =| X − µ |2 and a = ǫ2in Prop11.3. Q.E.D. Thm11.5 (Continuous mapping theorem)
(a) Let {Xn} be random viriables or random vectors such that Xn →pX and let h() be a continuous real-valued function Then h(Xn) →ph(X).
(b)Let {Xn} be random viriables or random vectors such that Xn →d X and let h() be a continuous real-valued function Then h(Xn) →dh(X).
Proof
Without loss of generality, I only provide the proof in the case of random variables. Let ǫ > 0. Since h() is continuous at ∀a ∈ R, there exists δ > 0 such that | h(x) − h(a) |< δ ⇒| h(x) − h(a) |< ǫ. That is
| h(x) − h(a) |≥ ǫ =⇒| h(x) − h(a) |≥ δ. Next we set Xn = x, then we get
P (| h(Xn) − h(a) |≥ ǫ) =⇒ P (| h(Xn) − h(a) |≥ δ)
→ 0 as n → ∞. (b) is followed from Prop11.2. Q.E.D.
Remark
Above theorem says, for example, if Xn→pX, then
Xn−1→pX−1 Xn2→pX2 hold.
When you need to show that some r.v.s converge in probability, following theorem might be useful. Thm 11.6
Xn→ Xp iff limn→∞E[1+|X|Xn−X|n−X|] = 0 Proof
Without loss of generality, we can assume that X = 0. Thus we need to show that Xn→ 0 iff limn→∞E[1+|X|Xn|n|] = 0.
Suppose that Xn →p0. Then for any ǫ > 0,
|Xn|
1+|Xn| ≤ |X
n|
1+|Xn|1{|Xn|>ǫ}+ ǫ1{|Xn|≤ǫ}
≤ 1{|Xn|>ǫ}+ ǫ,
⇐⇒ E[1+|X|Xn|n|] ≤ P (1{|Xn|>ǫ}) + ǫ,
⇐⇒ limn→∞E[1+|X|Xn|n|] ≤ ǫ,
holds. Thus we have limn→∞E[1+|X|Xn|n|] = 0 since ǫ was arbitrary. Next suppose that limn→∞E[1+|X|Xn−X|n−X|] = 0 holds. Since 1+xx is an increase function, we get
ǫ
1+ǫ1{|Xn|>ǫ}≤1+|X|Xn|n|1{|Xn|>ǫ}
≤1+|X|Xn|n|. Taking expectations and limits gives
ǫ
1+ǫlim→∞P (| Xn|> ǫ)
≤ limn→∞E[1+|X|Xn|n|] = 0 Q.E.D
Thm11.7 (Slutsky’s theorem)
Let {Xn}, {Yn} be random variables or random vectors. Suppose that Xn →dX and Yn →p c, where c is a fixed real number.
(a) Xn+ Yn→dX + c; (b) YnXn→d cX;
(c) Xn/Yn→dX/c if c 6= 0. Proof
I only provide the proof of (a) and in the case of random variables. Let t ∈ R and ǫ > 0. Then FXn+Yn(t) = P (Xn+ Yn≤ t)
≤ P ({Xn+ Yn≤ t} ∩ {| Yn− c |< ǫ}) + P (| Yn− c |≥ ǫ)
≤ P (Xn≤ t − c + ǫ) + P (| Yn− c |≥ ǫ). and, similarly,
FXn+Yn(t) ≥ P (Xn≤ t − c − ǫ) + P (| Yn− c |≥ ǫ).
If t − c, t − c + ǫ and t − c − ǫ are continuity points of FX, the it follows from the continuous mapping theorem nad the assumptions of the theore that
FX(t − c − ǫ) ≤ liminfFXn+n(t) ≤ limsupFXn+Yn(t) ≤ FX(t − c + ǫ). Thus we get
limn→∞FXn+Yn(t) = FX(t − c). The result follows from FX+c(t) = FX(t − c). Q.E.D.
Example2
Let x ∼ t(n). If n → ∞, then t(n) →dN (0, 1). Proof
Let z ∼ N(0, 1) and y ∼ χ2(n). If vi∼iid χ2(1) , i = 1, 2, · · · n, thenPni=1vi∼ χ2(n).Since E[vi] = 1 (prop5.2), we can show that E[n1Pni vi] →p 1 by using the law of large number. Thus pyn →p 1 (Continuous mapping theorem).From these results, we can finally obtain that √z
y/n →d N (0, 1) (Slutsky’s theorem) as required. Q.E.D. Example3
Let y ∼ F (l, m). Then ly →dχ2(l) as m → ∞. Proof
Let y ∼ F (l, m),x ∼ χ2(l) and z ∼ χ2(m). Suppose that they are mutually independent. Then by the definiiton of F-distribution, z/mx = ly.Since E[x] = E[Pmi=1χ2i(1)] = m, z/m →p 1. Thusz/mx →d x ∼ χ2(l). Hence we get ly →dχ2(l) Q.E.D.
3 LLN&CLT
Thm11.8 ((weak)Law of Large Numbers)
Let {Xn} be iid random variables with mean µ and variance σ2< ∞ and let ¯X := n−1PiXi. Then
X →¯ pE[X1] = µ Proof
First we have to calculate E[ ¯X] and V [ ¯X] and they will be calculated as follows: E[ ¯X] = E[n−1PiXi]
= n−1E[PiXi]
= n−1nµ = µ; V [ ¯X] = V [n−1PiXi]
= n−2PiV [Xi]
= n−2nσ2=σn2. Next we apply the Chebyshev’s inequality to above:
P (| ¯X − µ |> ǫ) ≤ ǫ−2 σn2
→ 0 Q.E.D.
Thm11.9 (Central Limit Theorem (Lindeberg-Levy))
Let {Xn} be iid random variables with mean µ and variance σ2< ∞. Then
X−µ¯
√σ2/n→dN (0, 1)
or √n( ¯X − µ) →dN (0, σ2) holds.
Proof
LetYn= X1+X2√+···Xn−nµ
nσ2 and Zi= Xi−µ
σ , i = 1, 2, · · · n. By the assumptions,{Zi} are iid and Tn =
Pn i=1Zi
n1/2 holds. Then characteristic function of Zi is given below:
ϕTn(t) = E[eitTn]
= E[ein1/2t Pni=1Zi]
=
n
Y
i=1
E[ein1/2t Zi]
=
n
Y
i=1
ϕZi( t n1/2)
Sinc {Zi} areidentically distributed,
ϕTn(t) = {Z1( t n1/2)}
n
also holds,Next we apply Taylor expansion to f (x) = eix around 0, then we get
eix = 1 + ix − x2 ˆ 1
0 (1 − s)e isxdx.
Let x = Z1n1/2t . Then we get following:
ein1/2t Z1 = 1 + it n1/2Z1−
t2 n(Z1)
2ˆ 1
0 (1 − s)e i st
n1/2Z1ds
= 1 + it n1/2Z1−
t2 2n(Z1)
2+t2
n(Z1)
2ˆ 1
0 (1 − s)(1 − ein1/2st Z1)ds. Since for any i,
E[Zi] = E[Xi− µ σ ] = 0 E[Zi2] = V [Zi] = V [Xi− µ
σ ] = 1 hold, the followings also hold
ϕT1(t) = 1 − t
2
2n+ t2 nE[(Z1)
2ˆ 1
0 (1 − s)(1 − ein1/2st Z1)ds]
= 1 − t
2
2n+ t2 n
ˆ 1
0 (1 − s)(ϕ
′′
Z1(st) − ϕ′′Z1(0))ds.
.
Next we can evaluate ϕ′′Z1(st) − ϕ′′Z1(0) as follows:
α(s; t) := ϕ′′Z1(st) − ϕ′′Z1(0) = −E[Z12eistZ1n1/2 − E[Z12]]
= −E[Z12(eistZ1n1/2 − 1)] and,
| Z12(e
istZ1
n1/2 − 1) | ≤ 2Z12
is also valid. Using the dominated convergence theorem, we can get limt→0α(s; t) = 0. Therefore we can show that:
ϕTn(t) = {1 − t
2
2n+ o( t2
n)}
n
= (1 − t
2
2n)
n+ no(t2
n)
and asn → ∞,ϕTn → e−t22. This is the characteristic function of standard normal distribution.Q.E.D Example4
If {Xn} ∼iid U (0, 1), then CLT says that Pi√Xi−n12
n121 →dN (0, 1) since E[X1] = 12 and V [X1] = 121. Below figure shows how Pi√Xi−n12
n121 converges to N (0, 1) as n becoms large.
Example5
If {Xn} ∼iidBe(α, β), then CLT implies that
P
iXi−nα+βα
qn(α+β+1)(α+β)2αβ
→d N (0, 1) since E[X1] = α+βα and V [X1] =
αβ
(α+β+1)(α+β)2. Below figure shows how
P
iXi−nα+βα
qn αβ
(α+β+1)(α+β)2
converges to N (0, 1) as n becoms large in the case of α = 1, β = 2.
4 Consistency
Def11.10 Consistency
Let {Xn}be random variables and ˆθn be an estimator of θ ∈ Θ ⊆ Rk based on {Xn}. If θˆn →pθ ∀θi
holds, then ˆθn is said to be a consistent estimator of θ. Example6
Let {Xi} ∼iid N (µ, σ2) and ¯X := n−1PiXi. Then ¯X is a consistent estimator of µ. Proof
By the LLN, we can show that n−1PiXi→pE[X1] = µ. Q.E.D. Example7
Let {Xi} ∼iid N (µ, σ2) and ˆσ2:= n−1Pi(Xi− ¯X)2. Then ˆσ2 is a consistent estimator of σ2. Proof
First we can decompose ˆσ2as follows:
σˆ2:= n−1Pi(Xi− ¯X)2
= n−1P((Xi− µ) − ( ¯X − µ)2)
= n−1Pi{(Xi− µ)2− 2(Xi− µ)( ¯X − µ) + ( ¯X − µ)2}
= n−1Pi(Xi− µ)2− (µ − ¯X)2
= n−1Pi(Xi2− 2Xiµ + µ2) − (µ − ¯X)2
→pσ2+ µ2− 2µ2+ µ2− 0
= σ2. Q.E.D
5 Asymptotic properties of the OLSE
Prop11.11
Suppose that assumptions A1 ∼ A5 hold. Under these assumptions, (a) β →ˆ pβ
(b) s2:= n−ke′e →pσ2 holds. Where ˆβ is the OLSE of β and e := y − X ˆβ.
Proof
(a) First we can decompose ˆβ as follows:
β = (Xˆ ′X)−1X′y
= (X′X)−1X′(Xβ + u)
= β + (X′X)−1X′u
= β + (n1X′X)−1 1nX′u
= β + Q−1xxQxu. Using the LLN, we can easily that Qxu→p0 since
1 nX
′u →pX′E[u]
= 0.
Suppose that Qxxconverges to some finite matrix Mxx. Then QxxQxu→p0. Hence we get ˆβ →pβ. (b) Since e = y − X ˆβ = u − X( ˆβ − β), we can rewiten s2 as follows:
s2=n−k1 Pie2i
n n−k(n−1u
′u − ( ˆβ − β)′n1X′u + ( ˆβ − β)′1nX′X( ˆβ − β))
= n−kn (n−1Piu2i − ( ˆβ − β)′n1Pixiui+ ( ˆβ − β)′n1Pixix′i( ˆβ − β)).
We have already showed that ˆβ →pβ and suppose that Qxxconverges to some finite matrix Mxx. This means that
1 n−k
P
ie2i →pE[u2i] = σ2 (n−kn → 1 as n → ∞) Q.E.D.
Below figure shows how ˆβ converges to β in the case of example1 in TA note#7.
Prop11.12
Suppose that assumptions A1 ∼ A5 hold. In addtion to these, E[u4i] and (xix
′
i)2are finite for any i. Where xi∈ Rn. Under these assumptions,
E[k xiuik2] < ∞;
√1
nP xiui→dN (0, Ω),
hold. Where Ω := E[xix′iu2i]. Proof
By the triangle inequality and Jensen’s inequality, we can show that k E[xix′iu2i] k2≤ E[k xix′iu2i k2]
= E[k x2iu2i k] =k xik2E[u2i].
Using Cauchy–Schwarz inequality, we will obtain
k xik2E[u2i] ≤ (k xik4)1/2(E[u4i])1/2
< ∞. Therefore, we finally get √1
n
P
ixiui→dN (0, Ω) by CLT. Q.E.D. Note that√n ¯X =√n1nPiXi= n√nnPiXi= √1nPiXi.
Prop11.13
Suppose that assumptions A1 ∼ A5 hold. In addtion to these, E[u4i] and (xix′i)2are finite for any i. Where xi∈ Rn. Under these assumptions,
√n( ˆβ − β) →dNk(0, V )
holds. Where V := Mxx−1ΩMxx−1. Proof
First, we can rewriten √n( ˆβ − β) as follows:
√n( ˆ
β − β) =√n(X′X)−1X′u,
= (n1X′X)−1 1√nX′u,
= Q−1xx√1nX′u.
From Prop11.11,√1nX′uui→dN (0, Ω).Suppose that Qxxconverges to some finite matrix Mxx. Then using Slutsky’s theorem, we obtain that Q−1xx√1nX′u →dMxx−1N (0, Ω). Hence we can finally show that:
√n( ˆ
β − β) →dNk(0, Mxx−1ΩMxx−1) , (Mxx′ = Mxx)
= Nk(0, V ) Q.E.D.