TA9 最近の更新履歴 Econometrics Ⅰ 2016 TA session

(1)

TA session note#9

Shouto Yonekura

June 20, 2016

Abstract TA session on 5th July is going to be canceled.

1 MLE

Suppose that data are the observed value of random variable X from some parametric family of densities or mass functions, X ∼ f (x; θ), where in general θ ∈ Θ ⊆ R^k. Let X be X := (X1, X2, · · · , Xn) and x be x := (x1, x2, · · · , xn). After observing x, the likelihood function is defied by

L(θ) := f (θ; x),

viewed as a function of θ. If X ∼iidf (x, θ), then L(θ) =^∑_if (θ; xi). Usually we work with log-likelihood function; l(θ) := lnL(θ).

Example1

Let X be a single observation taking values from {0, 1, 2} according to f (x; θ), where θ = θ0or θ1and the values of f (x; θj)({i}) are given below:

x = 0 x = 1 x = 2 θ = θ0 0.8 0.1 0.1 θ = θ1 0.2 0.3 0.5.

If X = 0 is observed, it is more plausible that it came from f (x; θ0), since f (x; θ0)({0}) is much lager than f (x; θ1)({0}). We then estimate θ by θ0. On the other hand, if X = 1 or 2, it is more plausible that it came from f (x; θ1). This implies the following estimator of θ;

T (X) =

{θ0 if X = 0 θ1 if X ̸= 0^. This leads to the following natural definition.

Def 9.1 The Maximum Likelihood Estimator(MLE)

Suppose that X ∼ f (x; θ), θ ∈ Θ ⊆ R^k. Let L(θ) be likelihood function. Then MLE ˆθ is defied by θ := supˆ _θ∈ΘL(θ),

or := −inf_θ∈ΘL(θ).

(2)

In most case l is defferentiable and ˆθ is obtained by solving the likelihood equation l^′(θ) = 0.

Since lnx is a strictly increasing function and L(θ) can be asssumed to be positive without loss of generality, ˆθ is an MLE if and only if it maximizes the log-likelihood function l(θ).

Example2

Let X ∼iidN (µ, σ²). Then the log-likelihood function is given by this: l(θ) = −ⁿ₂ln(2π) −ⁿ₂ln(σ²) −_2σ¹2

∑

i^(xⁱ^{− µ)} 2_.

The likelihood equation becomes:

∂µl(θ) =_σ¹²^∑_i(xi− µ) = 0,

∂_σ²l(θ) = −_2σⁿ2 +_2σ¹4

∑

i^(xⁱ^{− µ)}²^{= 0.}

By solving these euqations, we can get

ˆ

µ = n⁻¹^∑_ixi, σˆ²= n⁻¹^∑_i(xi− ˆµ)².

Example3

Consider following regression model:

y = Xβ + u , u ∼ Nk(0, σ²I). First, the distribtuin function of µis given by this:

fµ= ^√¹

2πσ²^exp(− u^′u 2σ²^).

By using transformation of variables, we get

fy= fu(y − Xβ) | ^∂u_∂y |

=^√_2πσ¹ 2exp(−^(y−Xβ)

′(y−Xβ) 2σ² ^).

Therefore the log-likelihood function of y is given by this:

l(θ) = −ⁿ₂ln(2π) −ⁿ₂ln(σ²) −^(y−Xβ)_2σ^′^(y−Xβ)2 . The likelihood equation becomes:

∂β(θ)l(θ) =^X^′^y−X_σ²^′^Xβ = 0,

∂σ²(θ)l(θ) = −_2σⁿ² +_2σ¹⁴(y − Xβ)^′(y − Xβ) = 0.

(3)

By solving these euqations, we can get

β = (Xˆ ^′X)⁻¹X^′y, σˆ²= ^{(y−X ˆ}^β)_n^′^{(y−X ˆ}^β)= ^e_n^′^e.

Note that ,in the case of OLS, ˆσ²=_n−kê^′ê and this is the unbiased estimator. However, in the case of MLE, E[ ˆσ²] = E[ê_n^′ê]

= ^(n−k)σ

2

n

< σ². this is not the unbiased estimator and called small sample bias.

2 The Fisher information matrix

Prop 9.2

B1 X ∼ {f (θ; X : θ ∈ Θ ⊆ R^k)} is twice differentiable with respect to θ. B2 There exist a integrable function ϕ(x) such that | ∂θif (θ; X) |< ϕ(x) ∀i, x. Under these condtions,

E[∂θilnf (θ; X)] = 0 ∀i holds.

Proof

Without loss of generality, let k = 1. Then

E[∂θlnf (θ; X)] =´ ∂θlnf (θ; X)f (θ; X)dx

=^´ ^∂^θ_f(θ;X)^f(θ;X)f (θ; X)dx

=´ ∂θf (θ; X)dx

= ∂θ´ f (θ; X)dx

= ∂θ1

= 0 Q.E.D.

Def 9.3 The Fisher information matrix

Let {X}n∼ {f (θ; X : θ ∈ Θ ⊆ R^k)} and V [Xn] < ∞ ∀n. Then the Fisher information matrix I(θ) is difined below: I(θ) := E[(∂θlnf (θ; X))(∂θlnf (θ; X))^′],

where (i,j) component of I(θ) is

I(θ)ij= E[∂θilnf (θ; X)∂θjlnf (θ; X)] i, j = 1, 2, · · · k.

(4)

If k = 1, then

E[(∂θlnf (θ; X))²] = E[(^∂^θ_f(θ;X)^f(θ;X))²]

=^´⁽^∂^θ_f(θ;X)^f(θ;X)⁾²f (θ; X)dx

=^´ ^(∂^θ^f(θ;X))

2

f(θ;X) ^dx.

Prop 9.4 B3 B1 and B2

B4 There exist a integrable function ϕ(x) such that | ∂_θ²²

if (θ; X) |< ϕ(x) ∀i, x. Under these condtions,

I(θ)ii= −E[∂²_θ²

ilnf (θ; X)] holds.

Proof

Without loss of generality, let k = 1. Then

l^′′(θ; X) = ∂θ(^∂^θ_f(θ;X)^f(θ;X))

= ^∂

2 θ²^f(θ;X)

f(θ;X) ⁻

(∂θf(θ;X) f(θ;X)

)2

holds. Multiplying both side by f (θ; X) and integrating with respect to x, we get E[l^′′(θ; X)] =^´ ^∂

2 θ²^f(θ;X)

f(θ;X) f (θ; X)dx −⁽^∂^θ_f(θ;X)^f(θ;X)⁾²f (θ; X)dx

=´ ∂_θ²²f (θ; X)dx − I(θ)

∂_θ²²´ f (θ; X)dx − I(θ)

= −I(θ). Therefore, I(θ) = −E[∂_θ²²lnf (θ; X)] holds. Q.E.D.

Example4

Let X ∼iidN (µ, σ²). Then I(θ) could be calculated as follows:

∂²_µσ²l(θ) = −_σ¹4

∑

i^(xⁱ^{− µ)}

∂_µ²²l(θ) = −_σⁿ2

∂_σ²22l(θ) =_2σⁿ⁴ −_σ¹⁶^∑_i(xi− µ)² I(θ) = −E

[ −_σⁿ2 −_σ¹4

∑

i^(xⁱ^{− µ)}

−_σ¹4

∑

i^(xⁱ^{− µ)} n 2σ⁴ ⁻

1 σ⁶

∑

i^(xⁱ^{− µ)} 2

]

= [ _n

σ² ⁰

0 _2σⁿ4

] .

(5)

Example5

Consider following regression model:

y = Xβ + u , u ∼ Nk(0, σ²I). Then I(θ) could be calculated as follows:

∂_βσ² ²l(θ) =^X^′^Xβ−X_σ⁴ ^′^y

∂_ββ² ′l(θ) = −^X_σ^′²^X

∂_σ²22l(θ) = _2σⁿ⁴ −_σ¹⁶(y − Xβ)^′(y − Xβ) I(θ) = −E

[ −^X_σ^′²^X ^X^′^Xβ−X_σ⁴ ^′^y

X^′_Xβ−X^′y σ⁴

n 2σ⁴ ⁻

1

σ⁶^{(y − Xβ)}

′(y − Xβ) ]

= −

[ −^X_σ^′2^X

X^′_Xβ−X^′Xβ σ⁴ X^′_Xβ−X^′Xβ

σ⁴

n 2σ⁴ ⁻

nσ² σ⁶

]

= [ X^′X

σ² ⁰

0 _2σⁿ⁴ ]

.

Prop 9.5 B5 B1 and B2 B6 {X}n are iid Under these condtions,

nI1(θ) = I(θ)

holds. Where I1(θ) is the Fisher information matrix of X1 and I(θ) is the Fisher information matrix of {Xn}. Proof

Without loss of generality, let k = 1. Since E[l^′(θ)] = 0,I(θ) could be rewritten as follows: I(θ) = E[l^′(θ)²]

=´ l^′(θ)²f (θ; X)dx

= V [l^′(θ)].

From the assumption, {Xn} are iid and this implies l^′(θ) =^∑ⁿ_i=1l^′₁(θ; xi). Therefore I(θ) = V [l^′(θ)]

= V [^∑ⁿ_i=1l₁^′(θ; Xi)]

= nV [l^′₁(θ; X1)] nI1(θ) Q.E.D.

Example6

(6)

Let X ∼iidN (µ, σ²). Then I1(θ) could be calculated as follows:

∂_µσ² ²l(θ) = −_σ¹⁴(x1− µ)

∂_µ²²l(θ) = −_σ¹²

∂²

σ²²^{l(θ) =} 1 2σ⁴ ⁻

1

σ⁶^(x¹^{− µ)} 2

I(θ)1= −E

[ −_σ¹² −_σ¹⁴(x1− µ)

−_σ¹⁴(x1− µ) _2σ¹⁴ −_σ¹⁶(x1− µ)² ]

= [ ₁

σ² ⁰

0 _2σ¹⁴ ]

.

= ¹_nI(θ).

3 The Cramer-Rao Lower Bound

Prop9.6 The Cramer-Rao Lower Bound(CRLB)

Let {Xn} ∼iid{f (x; θ : θ ∈ Θ ⊆ R^k)} and X := (X1, X2, · · · , Xn). Moreover, let fX(x) be the joint pdf of X and Tⁱ(X) be the unbiased estimator of θ ∀i.

C1 V [Tⁱ(X)] < ∞ , ∀i

C2 f (x; θ) is differentiable on Θ ∀i. C3 E[∂_θ²²

ilnf (θ; X)] < ∞ and ∀i. C4 E[∂θi^{lnf (X}i; θ)] = 0 , ∀i. C5 E[T (∂θilnfX(θ; X))] = 1 ∀i. Under these condtions,

V [Tⁱ(X)] ≥ I(θ)⁻¹ ∀i

holds. Proof

Without loss of generality, let k = 1. First we can get following:

∂θE[T (X)] = ∂θ´ T (X)f (x; θ)dx

⇐⇒ 1 = ∂θ´ T (X)f (x; θ)dx

=´ ∂θT (X)f (x; θ)dx

=´ T (X)∂θlnfX(x; θ)fX(x; θ)dx E[T (X)l^′(θ)].

Since E[l^′(θ)] = 0 (Prop9.2,) this can be rewriten as follows:

E[T (X)l^′(θ)] = E[(T (X) − θ)l^′(θ)]

= Cov(T (X), l^′(θ)).

(7)

This leads to

1 = Cov((T (X), l^′(θ)))²≤ V [T (X)]V [l^′(θ)] , (−1 ≤ ^{Cov(X, Y )}

√V [X]√V [Y ] ^{≤ 1)}

= V [T (X)]I(θ). Therefore, V [T (X)] ≥ 1/I(θ) holds. Q.E.D.

Def9.7 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Let T (X) and T^′(X) be the unbiased estimator of θ. If

V [T^′(X)] ≥ V [T (X)] f or any T^′

holds, then T (X) is said to be Uniformly Minimum Variance Unbiased Estimator(UMVUE)

Thm9.8

Let T (X) be the unbiased estimator of θ. If

V [T (X)] = I(θ)⁻¹, ∀θ holds, then T (X) is UMVUE.

Proof Obvious Example7

Let {Xn} ∼iidN (µ, σ²) σ²< ∞. Then ¯X := n⁻¹^∑_iXi is UMVUE Proof

First we have to check assumptions C1 ∼ C5. C1 V [ ¯X] = n⁻¹σ²< ∞

C2 µis differentiable on Θ ∀i C3 E[∂_µ²²l(θ)] = −_σⁿ² < ∞

C4 E[∂µl(θ)] = _σ¹²E^∑_i[(xi− µ)] = 0.

C5 E[T (∂θi^lnfX(θ; X))] = E[T (_σⁿ²( ¯X − µ)]] =_σⁿ²E[( ¯X)²− µ( ¯X)] = _σⁿ²V [ ¯X] = 1. Thus CRLB is given by 1/I(θ) = ^σ_n² = V [T (X)]. Therefore ¯X := n⁻¹^∑_iXi is UMVUE.