1. Regression model: y = X β + u, u ∼ N(0 , σ

(1)

8 Generalized Least Squares Method (GLS, ^一般化最小自乗法 )

1. Regression model: y = X β + u, u ∼ N(0 , σ

²

Ω ) 2. Heteroscedasticity (

不等分散，不均一分散

)

σ

²

Ω =

 





σ

²₁

0 · · · 0 0 σ

²₂

... ...

... ... ... 0 0 · · · 0 σ

²n

 





(2)

First-Order Autocorrelation (

一階の自己相関，系列相関

)

In the case of time series data, the subscript is conventionally given by t, not i . u

_t

= ρ u

_t₋₁

+

t

,

t

∼ iid N(0 , σ

²

)

σ

²

Ω = σ

²

1 − ρ

²

 





1 ρ ρ

²

· · · ρ

ⁿ⁻¹

ρ 1 ρ · · · ρ

ⁿ⁻²

ρ

²

ρ 1 · · · ρ

ⁿ⁻³

... ... ... ... ...

ρ

ⁿ⁻¹

ρ

ⁿ⁻²

ρ

ⁿ⁻³

· · · 1

 





V(u

_t

) = σ

²

= σ

²

1 − ρ

²

3. The Generalized Least Squares (GLS

，一般化最小二乗法

) estimator of β ,

(3)

denoted by b, solves the following minimization problem:

min

b

(y − Xb)

⁰

Ω

⁻¹

(y − Xb)

The GLSE of β is:

b = (X

⁰

Ω

⁻¹

X)

⁻¹

X

⁰

Ω

⁻¹

y

4. In general, when Ω is symmetric, Ω is decomposed as follows.

Ω = A

⁰

Λ A

Λ is a diagonal matrix, where the diagonal elements of Λ are given by the eigen values.

A is a matrix consisting of eigen vectors.

When Ω is a positive definite matrix, all the diagonal elements of Λ are positive.

(4)

5. There exists P such that Ω = PP

⁰

(i.e., take P = A

⁰

Λ

¹^/²

). = ⇒ P

⁻¹

Ω P

⁰⁻¹

= I

_n

Multiply P

⁻¹

on both sides of y = X β + u.

We have:

y

^?

= X

^?

β + u

^?

,

where y

^?

= P

⁻¹

y, X

^?

= P

⁻¹

X, and u

^?

= P

⁻¹

u.

The variance of u

^?

is:

V(u

^?

) = V(P

⁻¹

u) = P

⁻¹

V(u)P

⁰⁻¹

= σ

²

P

⁻¹

Ω P

⁰⁻¹

= σ

²

I

_n

. because Ω = PP

⁰

, i.e., P

⁻¹

Ω P

⁰⁻¹

= I

_n

.

Accordingly, the regression model is rewritten as:

y

^?

= X

^?

β + u

^?

, u

^?

∼ (0 , σ

²

I

_n

)

(5)

Apply OLS to the above model.

Let b be as estimator of β from the above model.

That is, the minimization problem is given by:

min

b

(y

^?

− X

^?

b)

⁰

(y

^?

− X

^?

b) ,

which is equivalent to:

min

b

(y − Xb)

⁰

Ω

⁻¹

(y − Xb) .

Solving the minimization problem above, we have the following estimator:

b = (X

^?0

X

^?

)

⁻¹

X

^?0

y

^?

= (X

⁰

Ω

⁻¹

X)

⁻¹

X

⁰

Ω

⁻¹

y ,

(6)

which is called GLS (Generalized Least Squares) estimator.

b is rewritten as follows:

b = β + (X

^?0

X

^?

)

⁻¹

X

^?0

u

^?

= β + (X

⁰

Ω

⁻¹

X)

⁻¹

X

⁰

Ω

⁻¹

u The mean and variance of b are given by:

E(b) = β,

V(b) = σ

²

(X

^?0

X

^?

)

⁻¹

= σ

²

(X

⁰

Ω

⁻¹

X)

⁻¹

. 6. Suppose that the regression model is given by:

y = X β + u , u ∼ N(0 , σ

²

Ω ) . In this case, when we use OLS, what happens?

β ˆ = (X

⁰

X)

⁻¹

X

⁰

y = β + (X

⁰

X)

⁻¹

X

⁰

u

(7)

V( ˆ β ) = σ

²

(X

⁰

X)

⁻¹

X

⁰

Ω X(X

⁰

X)

⁻¹

Compare GLS and OLS.

(a) Expectation:

E( ˆ β ) = β, and E(b) = β Thus, both ˆ β and b are unbiased estimator.

(b) Variance:

V( ˆ β ) = σ

²

(X

⁰

X)

⁻¹

X

⁰

Ω X(X

⁰

X)

⁻¹

V(b) = σ

²

(X

⁰

Ω

⁻¹

X)

⁻¹

Which is more e ﬃ cient, OLS or GLS?.

(8)

V( ˆ β ) − V(b) = σ

²

(X

⁰

X)

⁻¹

X

⁰

Ω X(X

⁰

X)

⁻¹

− σ

²

(X

⁰

Ω

⁻¹

X)

⁻¹

= σ

²

(

(X

⁰

X)

⁻¹

X

⁰

− (X

⁰

Ω

⁻¹

X)

⁻¹

X

⁰

Ω

⁻¹

) Ω

× (

(X

⁰

X)

⁻¹

X

⁰

− (X

⁰

Ω

⁻¹

X)

⁻¹

X

⁰

Ω

⁻¹

)

₀

= σ

²

A Ω A

⁰

Ω is the variance-covariance matrix of u, which is a positive definite matrix.

Therefore, except for Ω = I

_n

, A Ω A

⁰

is also a positive definite matrix.

This implies that V( ˆ β

i

) − V(b

_i

) > 0 for the ith element of β . Accordingly, b is more e ﬃ cient than ˆ β .

7. If u ∼ N(0 , σ

²

Ω ), then b ∼ N( β, σ

²

(X

⁰

Ω

⁻¹

X)

⁻¹

).

(9)

Consider testing the hypothesis H

₀

: R β = r.

R : G × k, rank(R) = G ≤ k.

Rb ∼ N(R β, σ

²

R(X

⁰

Ω

⁻¹

X)

⁻¹

R

⁰

).

Therefore, the following quadratic form is distributed as:

(Rb − r)

⁰

(R(X

⁰

Ω

⁻¹

X)

⁻¹

R

⁰

)

⁻¹

(Rb − r)

σ

²

∼ χ

²

(G)

8. Because (y

^?

− X

^?

b)

⁰

(y

^?

− X

^?

b) /σ

²

∼ χ

²

(n − k), we obtain:

(y − Xb)

⁰

Ω

⁻¹

(y − Xb)

σ

²

∼ χ

²

(n − k)

9. Furthermore, from the fact that b is independent of y − Xb, the following F

distribution can be derived:

(10)

(Rb − r)

⁰

(R(X

⁰

Ω

⁻¹

X)

⁻¹

R

⁰

)

⁻¹

(Rb − r) / G

(y − Xb)

⁰

Ω

⁻¹

(y − Xb) / (n − k) ∼ F(G , n − k) 10. Let b be the unrestricted GLSE and ˜b be the restricted GLSE.

Their residuals are given by e and ˜u, respectively.

e = y − Xb , ˜u = y − X ˜b

Then, the F test statistic is written as follows:

( ˜u

⁰

Ω

⁻¹

˜u − e

⁰

Ω

⁻¹

e) / G

e

⁰

Ω

⁻¹

e / (n − k) ∼ F(G , n − k)

(11)

8.1 Example: Mixed Estimation (Theil and Goldberger Model)

A generalization of the restricted OLS = ⇒ Stochastic linear restriction:

r = R β + v , E(v) = 0 and V(v) = σ

²

Ψ y = X β + u , E(u) = 0 and V(u) = σ

²

I

n

Using a matrix form,

( y

r )

=

( X

R )

β +

( u

v )

, E

( u

v )

=

( 0

0 )

and V

( u

v )

= σ

²

( I

_n

0 0 Ψ

)

For estimation, we do not need normality assumption.

Applying GLS, we obtain:

b =

 

 ( X

⁰

R

⁰

)

( I

_n

0 0 Ψ

)

−1

( X

R ) 

−1





 ( X

⁰

R

⁰

)

( I

_n

0 0 Ψ

)

−1

( y r

) 

= (

X

⁰

X + R

⁰

Ψ

⁻¹

R )

₋1

(

X

⁰

y + R

⁰

Ψ

⁻¹

r )

.

(12)

Mean and Variance of b: b is rewritten as follows:

b =

 

 ( X

⁰

R

⁰

)

( I

n

0 0 Ψ

)

−1

( X

R ) 

−1





 ( X

⁰

R

⁰

)

( I

n

0 0 Ψ

)

−1

( y r

) 

= β +

 

 ( X

⁰

R

⁰

)

( I

_n

0 0 Ψ

)

−1

( X

R ) 

−1

( u v )

Therefore, the mean and variance are given by:

E(b) = β = ⇒ b is unbiased.

V(b) = σ

²

 

 ( X

⁰

R

⁰

)

( I

_n

0 0 Ψ

)

−1

( X R

) 

−1

= σ

²

(

X

⁰

X + R

⁰

Ψ

⁻¹

R )

₋1

(13)

9 Maximum Likelihood Estimation (MLE, ^最尤法 )

−→ Review

1. The distribution function of { X

_i

}

ⁿ_i₌₁

is f (x; θ ), where x = (x

₁

, x

₂

, · · · , x

_n

) and θ = ( µ, Σ ).

Note that X is a vector of random variables and x is a vector of their realizations (i.e., observed data).

Likelihood function L( · ) is defined as L( θ ; x) = f (x; θ ).

Note that f (x; θ ) = ∏

_n

i=1

f (x

_i

; θ ) when X

₁

, X

₂

, · · · , X

_n

are mutually indepen-

dently and identically distributed.

(14)

The maximum likelihood estimator (MLE) of θ is θ such that:

max

θ

L( θ ; X) . ⇐⇒ max

θ

log L( θ ; X) .

MLE satisfies the following two conditions:

(a) ∂ log L( θ ; X)

∂θ = 0.

(b) ∂

²

log L( θ ; X)

∂θ∂θ

⁰

is a negative definite matrix.

2. Fisher’s information matrix (フィッシャーの情報行列) is defined as:

1. Regression model: y = X β + u, u ∼ N(0 , σ

8 Generalized Least Squares Method (GLS, 一般化最 小自乗法 )