6.1 Introduction
The most widely-used tool in sampling theory is large sample asymptotics. By “asymptotics” we mean approximating a finite-sample sampling distribution by taking its limit as the sample size diverges to infinity. In this chapter we provide a brief review of the main results of large sample asymptotics. It is meant as a reference, not as a teaching guide. Asymptotic theory is covered in detail in Chapters 7-9 ofIntroduction to Econometrics. If you have not previous studied asymptotic theory in detail you should study these chapters before proceeding.
6.2 Modes of Convergence
Definition 6.1 A sequence of random vectorsZn∈Rkconverges in probabil- itytoZ asn→ ∞, denotedZn−→p Z or alternatively plimn→∞Zn=Z, if for all δ>0,
n→∞lim P[kZn−Zk ≤δ]=1. (6.1) We callZ theprobability limit(orplim) ofZn.
The above definition treats random variables and random vectors simultaneously using the vector norm. It is useful to know that for a random vector, (6.1) holds if and only if each element in the vector converges in probability to its limit.
Definition 6.2 Let Zn be a sequence of random vectors with distributions Fn(u)=P[Zn≤u] . We say thatZn converges in distributiontoZ asn→ ∞, denotedZn−→
d Z, if for alluat whichF(u)=P[Z≤u] is continuous,Fn(u)→ F(u) asn→ ∞. We refer toZ and its distributionF(u) as theasymptotic dis- tribution,large sample distribution, orlimit distributionofZn.
155
6.3 Weak Law of Large Numbers
Theorem 6.1 Weak Law of Large Numbers (WLLN) IfYi∈Rkare i.i.d. andEkYk < ∞, then asn→ ∞,
Y =1 n
n
X
i=1
Yi−→p E[Y] .
The WLLN shows that the sample meanY converges in probability to the true population expecta- tionµ. The result applies to any transformation of a random vector with a finite mean.
Theorem 6.2 IfYi ∈Rk are i.i.d.,h(y) :Rk →Rq, andEkh(Y)k < ∞, thenµb=
1 n
Pn
i=1h(Yi)−→p µ=E[h(Y)] asn→ ∞.
An estimator which converges in probability to the population value is calledconsistent.
Definition 6.3 An estimatorθbofθisconsistentifθb−→
p θasn→ ∞.
6.4 Central Limit Theorem
Theorem 6.3 Multivariate Lindeberg-Lévy Central Limit Theorem (CLT). If Yi∈Rkare i.i.d. andEkYk2< ∞, then asn→ ∞
pn³ Y −µ´
−→d N (0,V) whereµ=E[Y] andV =Eh¡
Y−µ¢ ¡
Y −µ¢0i .
The central limit theorem shows that the distribution of the sample mean is approximately normal in large samples. For some applications it may be useful to notice that Theorem 6.3 does not impose any restrictions onV other than that the elements are finite. Therefore this result allows for the possibility of singularV.
The following two generalizations allow for heterogeneous random variables.
Theorem 6.4 Multivariate Lindeberg CLT. Suppose that for alln,Yni∈Rk,i= 1, ...,rn, are independent but not necessarily identically distributed with expec- tationsE[Yni]=0 and variance matricesVni =E£
YniYni0 ¤
. SetVn=Pn i=1Vni. Supposeν2n=λmin(Vn)>0 and for all²>0
n→∞lim 1 ν2n
rn
X
i=1
E£
kYnik21©kYnik2≥²ν2n
ª¤=0. (6.2)
Then asn→ ∞
V−n1/2
rn
X
i=1
Yni −→
d N (0,Ik) .
Theorem 6.5 Suppose Yni ∈Rk are independent but not necessarily identi- cally distributed with expectations E[Yni]=0 and variance matrices Vni = E£
YniYni0 ¤
. Suppose
1 n
n
X
i=1
Vni→V >0 and for someδ>0
sup
n,i EkYnik2+δ< ∞. (6.3)
Then asn→ ∞ p
n Y −→
d N (0,V) .
6.5 Continuous Mapping Theorem and Delta Method
Continuous functions are limit-preserving. There are two forms of the continuous mapping theorem, for convergence in probability and convergence in distribution.
Theorem 6.6 Continuous Mapping Theorem (CMT). Let Zn ∈Rk andg(u) : Rk→Rq. IfZn−→p casn→ ∞andg(u) is continuous atctheng(Zn)−→p g(c) asn→ ∞.
Theorem 6.7 Continuous Mapping Theorem. IfZn−→
d Z as n→ ∞andg :
Rm→Rkhas the set of discontinuity pointsDg such thatP£
Z∈Dg¤
=0, then g(Zn)−→
d g(Z) asn→ ∞.
Differentiable functions of asymptotically normal random estimators are asymptotically normal.
Theorem 6.8 Delta Method. Letµ∈Rkandg(u) :Rk→Rq. Ifp n¡
µb−µ¢
−→d ξ, whereg(u) is continuously differentiable in a neighborhood ofµ, then asn→
∞ p
n¡ g¡
µb¢
−g(µ)¢
−→d G0ξ (6.4)
whereG(u)=∂∂ug(u)0andG=G(µ). In particular, ifξ∼N (0,V) then asn→ ∞ pn¡
g¡ µb¢
−g(µ)¢
−→d N¡
0,G0V G¢
. (6.5)
6.6 Smooth Function Model
The smooth function model isθ=g¡ µ¢
whereµ=E[h(Y)] andg¡ µ¢
is smooth in a suitable sense.
The parameter θ =g¡ µ¢
is not a population moment so it does not have a direct moment esti- mator. Instead, it is common to use aplug-in estimatorformed by replacing the unknownµwith its point estimatorµband then “plugging” this into the expression forθ. The first step is the sample mean µb=n−1Pn
i=1h(Yi). The second step is the transformationθb=g¡ θb¢
. The hat “^” indicates thatθbis a sam- ple estimator ofθ. The smooth function model includes a broad class of estimators including sample variances and the least squares estimator.
Theorem 6.9 If Yi ∈Rm are i.i.d., h(u) :Rm →Rk,Ekh(Y)k < ∞, and g(u) : Rk→Rqis continuous atµ, thenθb−→p θasn→ ∞.
Theorem 6.10 IfYi∈Rmare i.i.d.,h(u) :Rm→Rk,Ekh(Y)k2< ∞,g(u) :Rk→ Rq, andG(u)= ∂
∂ug(u)0is continuous in a neighborhood ofµ, then asn→ ∞ pn¡
θb−θ¢
−→d N (0,Vθ) whereVθ=G0V G,V =Eh
¡h(Y)−µ¢ ¡
h(Y)−µ¢0i
, andG=G¡ µ¢
.
Theorem 6.9 establishes the consistency ofθbforθand Theorem 6.10 establishes its asymptotic nor- mality. It is instructive to compare the conditions. Consistency requires thath(Y) has a finite expecta- tion; asymptotic normality requires thath(Y) has a finite variance. Consistency requires that g(u) be continuous; asymptotic normality requires thatg(u) is continuously differentiable.
6.7 Best Unbiased Estimation
This section presents an efficiency bound for estimation of the mean. The result is are finite-sample rather than asymptotic, but is convenient to introduce at this point since the bound is identical to the asymptotic variance.
Theorem 6.11 SupposeYi are i.i.d., µ=E[h(Y)], and Ekh(Y)k2< ∞. Ifµeis unbiased forµthen var£
µe¤
≥n−1V whereV =Eh¡
h(Y)−µ¢ ¡
h(Y)−µ¢0i .
For details and a proof see Section 11.6 ofIntroduction to Econometrics. Theorem 6.11 is an analog of the Cramér-Rao lower bound for semiparametric estimation. The result shows that the asymptotic vari- ance from Theorems 6.3 is the best possible in any finite sample among unbiased estimators. Theorem 6.11 is sharp, since the sample mean has the finite sample variancen−1V.
6.8 Stochastic Order Symbols
It is convenient to have simple symbols for random variables and vectors which converge in prob- ability to zero or are stochastically bounded. In this section we introduce some of the most common notation.
LetZnandan,n=1, 2, ... be sequences of random variables and constants. The notation Zn=op(1)
(“small oh-P-one”) means thatZn−→p 0 asn→ ∞. We also write Zn=op(an) ifa−1n Zn=op(1).
Similarly, the notation Zn=Op(1) (“big oh-P-one”) means that Zn is bounded in probability. Pre- cisely, for any²>0 there is a constantM²< ∞such that
lim sup
n→∞ P[|Zn| >M²]≤².
Furthermore, we write
Zn=Op(an) ifa−1n Zn=Op(1).
Op(1) is weaker thanop(1) in the sense thatZn=op(1) impliesZn=Op(1) but not the reverse. How- ever, ifZn=Op(an) thenZn=op(bn) for anybnsuch thatan/bn→0.
A random sequence with a bounded moment is stochastically bounded.
Theorem 6.12 IfZnis a random vector which satisfies EkZnkδ=O(an) for some sequenceanandδ>0, then
Zn=Op(a1/nδ).
Similarly,EkZnkδ=o(an) impliesZn=op(a1/δn ).
There are many simple rules for manipulatingop(1) andOp(1) sequences which can be deduced from the continuous mapping theorem. For example,
op(1)+op(1)=op(1) op(1)+Op(1)=Op(1) Op(1)+Op(1)=Op(1) op(1)op(1)=op(1) op(1)Op(1)=op(1) Op(1)Op(1)=Op(1).
6.9 Convergence of Moments
We give a sufficient condition for the existence of the mean of the asymptotic distribution, define uniform integrability, provide a primitive condition for uniform integrability, and show that uniform integrability is the key condition under whichE[Zn] converges toE[Z].
Theorem 6.13 IfZn−→
d ZandEkZnk ≤CthenEkZk ≤C.
Definition 6.4 The random vectorZnisuniformly integrableasn→ ∞if
Mlim→∞lim sup
n→∞ E[kZnk1{kZnk >M}]=0.
Theorem 6.14 If for someδ>0,EkZnk1+δ≤C< ∞, thenZnis uniformly inte- grable.
Theorem 6.15 IfZn−→
d ZandZnis uniformly integrable thenE[Zn]−→E[Z] .
6.10 Uniform Stochastic Bounds
Theorem 6.16 If|Yi|ris uniformly integrable, then asn→ ∞ n−1/r max
1≤i≤n|Yi| −→p 0. (6.6)
Equation (6.6) implies that ifY hasr finite moments then the largest observation will diverge at a rate slower thann1/r. The higher the moments, the slower the rate of divergence.