8 Generalized Least Squares Method (GLS, 一般化最 小自乗法 )
1. Regression model: y = X β + u, u ∼ N(0 , σ
2Ω ) 2. Heteroscedasticity (
不等分散,不均一分散)
σ
2Ω =
σ
210 · · · 0 0 σ
22... ...
... ... ... 0 0 · · · 0 σ
2n
First-Order Autocorrelation (
一階の自己相関,系列相関)
In the case of time series data, the subscript is conventionally given by t, not i . u
t= ρ u
t−1+
t,
t∼ iid N(0 , σ
2)
σ
2Ω = σ
21 − ρ
2
1 ρ ρ
2· · · ρ
n−1ρ 1 ρ · · · ρ
n−2ρ
2ρ 1 · · · ρ
n−3... ... ... ... ...
ρ
n−1ρ
n−2ρ
n−3· · · 1
V(u
t) = σ
2= σ
21 − ρ
23. The Generalized Least Squares (GLS
,一般化最小二乗法) estimator of β ,
denoted by b, solves the following minimization problem:
min
b
(y − Xb)
0Ω
−1(y − Xb)
The GLSE of β is:
b = (X
0Ω
−1X)
−1X
0Ω
−1y
4. In general, when Ω is symmetric, Ω is decomposed as follows.
Ω = A
0Λ A
Λ is a diagonal matrix, where the diagonal elements of Λ are given by the eigen values.
A is a matrix consisting of eigen vectors.
When Ω is a positive definite matrix, all the diagonal elements of Λ are positive.
5. There exists P such that Ω = PP
0(i.e., take P = A
0Λ
1/2). = ⇒ P
−1Ω P
0−1= I
nMultiply P
−1on both sides of y = X β + u.
We have:
y
?= X
?β + u
?,
where y
?= P
−1y, X
?= P
−1X, and u
?= P
−1u.
The variance of u
?is:
V(u
?) = V(P
−1u) = P
−1V(u)P
0−1= σ
2P
−1Ω P
0−1= σ
2I
n. because Ω = PP
0, i.e., P
−1Ω P
0−1= I
n.
Accordingly, the regression model is rewritten as:
y
?= X
?β + u
?, u
?∼ (0 , σ
2I
n)
Apply OLS to the above model.
Let b be as estimator of β from the above model.
That is, the minimization problem is given by:
min
b
(y
?− X
?b)
0(y
?− X
?b) ,
which is equivalent to:
min
b
(y − Xb)
0Ω
−1(y − Xb) .
Solving the minimization problem above, we have the following estimator:
b = (X
?0X
?)
−1X
?0y
?= (X
0Ω
−1X)
−1X
0Ω
−1y ,
which is called GLS (Generalized Least Squares) estimator.
b is rewritten as follows:
b = β + (X
?0X
?)
−1X
?0u
?= β + (X
0Ω
−1X)
−1X
0Ω
−1u The mean and variance of b are given by:
E(b) = β,
V(b) = σ
2(X
?0X
?)
−1= σ
2(X
0Ω
−1X)
−1. 6. Suppose that the regression model is given by:
y = X β + u , u ∼ N(0 , σ
2Ω ) . In this case, when we use OLS, what happens?
β ˆ = (X
0X)
−1X
0y = β + (X
0X)
−1X
0u
V( ˆ β ) = σ
2(X
0X)
−1X
0Ω X(X
0X)
−1Compare GLS and OLS.
(a) Expectation:
E( ˆ β ) = β, and E(b) = β Thus, both ˆ β and b are unbiased estimator.
(b) Variance:
V( ˆ β ) = σ
2(X
0X)
−1X
0Ω X(X
0X)
−1V(b) = σ
2(X
0Ω
−1X)
−1Which is more e ffi cient, OLS or GLS?.
V( ˆ β ) − V(b) = σ
2(X
0X)
−1X
0Ω X(X
0X)
−1− σ
2(X
0Ω
−1X)
−1= σ
2(
(X
0X)
−1X
0− (X
0Ω
−1X)
−1X
0Ω
−1) Ω
× (
(X
0X)
−1X
0− (X
0Ω
−1X)
−1X
0Ω
−1)
0= σ
2A Ω A
0Ω is the variance-covariance matrix of u, which is a positive definite ma- trix.
Therefore, except for Ω = I
n, A Ω A
0is also a positive definite matrix.
This implies that V( ˆ β
i) − V(b
i) > 0 for the ith element of β . Accordingly, b is more e ffi cient than ˆ β .
7. If u ∼ N(0 , σ
2Ω ), then b ∼ N( β, σ
2(X
0Ω
−1X)
−1).
Consider testing the hypothesis H
0: R β = r.
R : G × k, rank(R) = G ≤ k.
Rb ∼ N(R β, σ
2R(X
0Ω
−1X)
−1R
0).
Therefore, the following quadratic form is distributed as:
(Rb − r)
0(R(X
0Ω
−1X)
−1R
0)
−1(Rb − r)
σ
2∼ χ
2(G)
8. Because (y
?− X
?b)
0(y
?− X
?b) /σ
2∼ χ
2(n − k), we obtain:
(y − Xb)
0Ω
−1(y − Xb)
σ
2∼ χ
2(n − k)
9. Furthermore, from the fact that b is independent of y − Xb, the following F
distribution can be derived:
(Rb − r)
0(R(X
0Ω
−1X)
−1R
0)
−1(Rb − r) / G
(y − Xb)
0Ω
−1(y − Xb) / (n − k) ∼ F(G , n − k) 10. Let b be the unrestricted GLSE and ˜b be the restricted GLSE.
Their residuals are given by e and ˜u, respectively.
e = y − Xb , ˜u = y − X ˜b
Then, the F test statistic is written as follows:
( ˜u
0Ω
−1˜u − e
0Ω
−1e) / G
e
0Ω
−1e / (n − k) ∼ F(G , n − k)
8.1 Example: Mixed Estimation (Theil and Goldberger Model)
A generalization of the restricted OLS = ⇒ Stochastic linear restriction:
r = R β + v , E(v) = 0 and V(v) = σ
2Ψ y = X β + u , E(u) = 0 and V(u) = σ
2I
nUsing a matrix form,
( y
r )
=
( X
R )
β +
( u
v )
, E
( u
v )
=
( 0
0 )
and V
( u
v )
= σ
2( I
n0
0 Ψ
)
For estimation, we do not need normality assumption.
Applying GLS, we obtain:
b =
( X
0R
0)
( I
n0
0 Ψ
)
−1( X
R )
−1
( X
0R
0)
( I
n0
0 Ψ
)
−1( y r
)
= (
X
0X + R
0Ψ
−1R )
−1(
X
0y + R
0Ψ
−1r )
.
Mean and Variance of b: b is rewritten as follows:
b =
( X
0R
0)
( I
n0
0 Ψ
)
−1( X
R )
−1
( X
0R
0)
( I
n0
0 Ψ
)
−1( y r
)
= β +
( X
0R
0)
( I
n0
0 Ψ
)
−1( X
R )
−1
( u v )
Therefore, the mean and variance are given by:
E(b) = β = ⇒ b is unbiased.
V(b) = σ
2
( X
0R
0)
( I
n0
0 Ψ
)
−1( X R
)
−1
= σ
2(
X
0X + R
0Ψ
−1R )
−19 Maximum Likelihood Estimation (MLE, 最尤法 )
−→ Review
1. The distribution function of { X
i}
ni=1is f (x; θ ), where x = (x
1, x
2, · · · , x
n) and θ = ( µ, Σ ).
Note that X is a vector of random variables and x is a vector of their realizations (i.e., observed data).
Likelihood function L( · ) is defined as L( θ ; x) = f (x; θ ).
Note that f (x; θ ) = ∏
ni=1
f (x
i; θ ) when X
1, X
2, · · · , X
nare mutually indepen-
dently and identically distributed.
The maximum likelihood estimator (MLE) of θ is θ such that:
max
θ
L( θ ; X) . ⇐⇒ max
θ