Econometrics I: Solutions of Homework 1
Hiroki Kato * April 19, 2020
Contents
1 Solutions 1
1.1 Question 1 . . . . 1 1.2 Question 2 . . . . 4 1.3 Question 3 . . . . 5
2 Review 7
2.1 Properties of Expectaion and Variance . . . . 7 2.2 Optimization . . . . 9
1 Solutions
1.1 Question 1
Let S(α, β) be the sum of squares residuals:
S(α, β ) = X
Tt=1
u
2t= X
Tt=1
(y
t− α − βX
t)
2(1)
*email: [email protected]. Room 503. All materials I made are published in github:
https://github.com/KatoPachi/2020EconometricsTA.git. If you have any errors in handouts and materials, please contact me via email or make an issue in github.
The Ordinary Least Squares estimators (hereafter, OLS estimators) can be derived by minimizing (1):
( ˆ α, β) ˆ ∈ arg min
α,β
S(α, β) (2)
The firstorder conditions of this problem are
∂S(α, β )
∂α = − 2 X
Tt=1
(y
t− α ˆ − βX ˆ
t) = 0 (3)
∂S(α, β )
∂β = − 2 X
Tt=1
X
t(y
t− α ˆ − βX ˆ
t) = 0 (4)
Note that the secondorder condition is hold since the Hessian matrix is positive defenite
H =
∂S(α,β)
∂α∂α
∂S(α,β)
∂α∂β
∂S(α,β)
∂β∂α
∂S(α,β)
∂β∂β
=
2T 2 P
t
X
t2 P
t
X
t2 P
t
X
t2
⇒| H | =
2T · 2 X
t
X
t2−
2 X
t
X
t· 2 X
t
X
t= 4T 1
T X
t
X
t2− ( X
t
X
t/T )
2⇒| H | = 4T · V (X
t) > 0 where V (X
t) is the variance of X
tBy equation (3), we have
ˆ α = 1
T ( X
t
y
t− β ˆ X
t
X
t)
= ¯ y − β ˆ X ¯ (5)
where y ¯ = P
t
y
t/T and X ¯ = P
t
X
t/T (sample mean). Substituting equation(5) into equation (4) yields
β ˆ X
t
X
t2− X ¯ X
t
X
t= X
t
y
tX
t− y ¯ X
t
X
tβ ˆ = P
t
y
tX
t− y ¯ P
t
X
tP
t
X
t2− X ¯ P
t
X
tβ ˆ =
P
t
X
t(y
t− y) ¯ P
t
X
t(X
t− X) ¯ (6)
Thus, OLS estimators are
( ˆ α, β) = ˆ
¯
y − β ˆ X, ¯ P
t
X
t(y
t− y) ¯ P
t
X
t(X
t− X) ¯
(7)
Note that the estimator of β, β, can be rewritten as follows: ˆ β ˆ = Cov(y
t, X
t)
V (X
t) (8)
where Cov(y
t, X
t) is covariance (
共分散) between y
tand X
t, and V (X
t) is variance (
分散) of X
t. To prove it, we need to recall the defenition of covariance and variance. First, the defenition of covairance is
Cov(y
t, X
t) = E[(y
t− E(y
t))(X
t− E(X
t))].
Then, sample covariance is S
yt,Xt= 1
T − 1 X
t
(y
t− y)(X ¯
t− X) ¯
= 1
T − 1 X
t
(y
tX
t− y
tX ¯ − yX ¯
t+ ¯ X y) ¯
= 1
T − 1 X
t
y
tX
t− X ¯ X
t
y
t− y ¯ X
t
X
t+ T X ¯ y ¯
= 1
T − 1 X
t
y
tX
t− y ¯ X
t
X
t= 1
T − 1 X
t
X
t(y
t− y) ¯
Second, the defenition of variance is
V (X
t) = E[(X
t− E(X
t))
2].
Then, sample variance is
S
X2t= 1 T − 1
X
t
(X
t− X) ¯
2= 1
T − 1 X
t
X
t2− 2 ¯ X T − 1
X
t
X
t+ 1
T − 1 T X ¯
2= 1
T − 1 X
t
X
t2− X ¯ X
t
X
t= 1
T − 1 X
t
X
t(X
t− X) ¯
Finally, we have
β ˆ = S
yt,XtS
X2t
= P
t
X
t(y
t− y) ¯ P
t
X
t(X
t− X) ¯ . Thus, the OLS estimator of β is β ˆ = Cov(y
t, X
t)/V (X
t), or
β ˆ = P
t
(y
t− y)(X ¯
t− X) ¯ P
t
(X
t− X) ¯
2(9)
1.2 Question 2
From (9), we have
β ˆ = P
t
y
t(X
t− X) ¯ − y ¯ P
t
(X
t− X) ¯ P
t
(X
t− X) ¯
2= P
t
y
t(X
t− X) ¯ P
t
(X
t− X) ¯
2since P
t
(X
t− X) = ¯ P
t
X
t− T X ¯ = P
t
X
t− P
t
X
t= 0. Substituting y
t= α + βX
t+ u
tinto this equation yields
β ˆ = P
t
(X
t− X)(α ¯ + βX
t+ u
t) P
t
(X
t− X) ¯
2= β P
t
(X
t− X)X ¯
t+ P
t
(X
t− X)u ¯
tP
t
(X
t− X) ¯
2= β + P
t
(X
t− X)u ¯
tP
t
(X
t− X) ¯
2= β + X
t
ω
tu
twhere ω
t= (X
t− X)/ ¯ P
t
(X
t− X) ¯
2.
Because β and X
tare not random variables, E( ˆ β) = β + X
t
ω
tE(u
t) = β.
The variance of β ˆ is
V ( ˆ β) = E [( ˆ β − E( ˆ β))
2] = E( ˆ β − β)
2= E( X
t
ω
tu
t)
2= E [ X
t
ω
t2u
2t+ 2 X
t
X
t′̸=t
ω
tω
t′u
tu
t′]
= X
t
ω
t2E(u
2t) + 2 X
t
X
t′̸=t
ω
tω
t′E(u
tu
t′)
= X
t
ω
t2E[(u
t− E(u
t))
2] + 2 X
t
X
t′̸=t
ω
tω
t′E(u
t)E(u
t′)
= σ
2P
t
(X
t− X) ¯
2( P
t
(X
t− X) ¯
2)
2= σ
2P
t
(X
t− X) ¯
2. To derive it, we use following properties:
• mutual independece assumption implies E(u
tu
t′) = E(u
t)E(u
t′).
• By E(u
t) = 0, E(u
2t) = E[(u
t− E(u
t))
2] = V (u
t).
1.3 Question 3
Recall that the estimator of α is
ˆ
α = ¯ y − β ˆ X. ¯
Substituting y ¯ = α + β X ¯ + ¯ u (average both sides over t ∈ { 1, . . . T } ) into this equation implies ˆ
α = α − ( ˆ β − β) ¯ X + ¯ u.
Since E( ˆ β) = β and E(¯ u) = E( P
t
u
t)/T = P
t
E(u
t)/T = 0, we obtain
E( ˆ α) = α.
The variance of α ˆ is
V ( ˆ α) = E[( ˆ α − E( ˆ α))
2] = E( ˆ α − α)
2= E( − ( ˆ β − β) ¯ X + ¯ u)
2= E[( ˆ β − β)
2X ¯
2− 2( ˆ β − β) ¯ X u ¯ + ¯ u
2]
= ¯ X
2E( ˆ β − β)
2− 2 ¯ XE( ˆ β − β)¯ u + E(¯ u
2). (10) The first term of equation (10) is
X ¯
2E( ˆ β − β)
2= X ¯
2σ
2P
t
(X
t− X) ¯
2. The third term of equation (10) is
E(¯ u
2) = E( P
t
u
t)
2T
2= E[ P
t
u
2t+ 2 P
t
P
t′̸=t
u
tu
t′] T
2= P
t
E(u
2t) + 2 P
t
P
t′
E(u
tu
t′) T
2= P
t
E[(u
t− E(u
t))
2] + 2 P
t
P
t′
E(u
t)E(u
t′) T
2= σ
2T
The second term of equation (10) is rewritten as follows:
2 ¯ XE( ˆ β − β)¯ u
=2 ¯ XE P
t
(X
t− X)u ¯
tP
t
(X
t− X) ¯
2P
t
u
tT
= 2 ¯ X T P
t
(X
t− X) ¯
2E X
t
(X
t− X)u ¯
tX
t
u
t= 2 ¯ X T P
t
(X
t− X) ¯
2E X
t
(X
t− X) ¯
u
2t+ X
t′̸=t
u
tu
t′= 2 ¯ X T P
t
(X
t− X) ¯
2X
t
(X
t− X)E(u ¯
2t) + X
t
X
t′
(X
t− X)E(u ¯
tu
t′)
= 2 ¯ X T P
t
(X
t− X) ¯
2X
t
(X
t− X)E[(u ¯
t− E(u
t))
2] + X
t
X
t′
(X
t− X)E(u ¯
t)E(u
t′)
= 2 ¯ X T P
t
(X
t− X) ¯
2σ
2X
t
(X
t− X) = 0. ¯
Hence, we have the variance of α ˆ as follows:
V ( ˆ α) =
X ¯
2σ
2P
t
(X
t− X) ¯
2+ σ
2T
= σ
2P
t
X
t2T P
t
(X
t− X) ¯
22 Review
2.1 Properties of Expectaion and Variance
In this section, we will review some properties of expctation and variance that are used to solve homework.
Mutual Independence
Let X and Y be random variables, which are mutually and independentlly distributed. Then, following properties of expectation and variance must be hold:
1. E(XY ) = E(X)E(Y );
2. Cov(X, Y ) = 0
These two properties means that there is no correlation between X and Y .
Caveate: Independece is sufficient condition for uncorrelatedness (exceptional cases: multivari
ate normal distribution).
Proof. Without loss of generarity, we assume X and Y are continuous random variables. Let f (x, y) be joint distribution of X and Y . By defenition, mutual independce leads to f (x, y) = f
X(x)f
Y(y).
1. By defenition,
E(XY ) = Z Z
xyf (x, y)dydx
= Z Z
xyf
X(x)f
Y(y)dydx
= Z
xf
X(x)dx Z
yf
Y(y)dy
= E(X)E(Y )
2. By defenition,
Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))]
= E[XY − XE(Y ) − E(X)Y + E(X)E(Y )]
= E(XY ) − E(X)E(Y )
= E(X)E(Y ) − E(X)E(Y ) = 0.
Additivity of Expectaion and Variance
Let X
1, . . . , X
nbe random variables, and let a
1, . . . , a
nbe constants. Then, the following prop
erties is hold:
1. E( P
i
a
iX
i+ b) = P
i
a
iE(X
i) + b;
2. V ( P
i
a
iX
i+ b) = P
i
a
2iV (X
i) + 2 P
i
P
j̸=i
a
ia
jCov(X
i, X
j)
Proof. Without loss of generarity, we assume X
iare continuous random variables. Let f (x
1, . . . x
n) be joint distribution of X
1, . . . , X
n.
1. By defenition, E( X
i
a
iX
i+ b)
= Z
· · · Z
(a
1X
1+ · · · + a
nX
n+ b)f(x
1, . . . x
n)dx
1· · · dx
n=a
1Z
· · · Z
X
1f (x
1, . . . x
n)dx
1· · · dx
n+ · · · + b Z
· · · Z
f (x
1, . . . x
n)dx
1· · · dx
n=a
1Z X
1Z
X2
· · · Z
Xn
f (x
1, . . . x
n)dx
2· · · dx
ndx
1+ · · · + b
=a
1Z
X
1f (x
1)dx
1+ · · · + b = X
i
a
iE(X
i) + b
2. By defenition, V ( X
i
a
iX
i+ b)
=E
( X
i
a
iX
i+ b) − E( X
i
a
iX
i+ b)
2=E
( X
i
a
iX
i+ b) − ( X
i
a
iE(X
i) + b)
2=E X
i
a
i(X
i− E(X
i))
2=E X
i
a
2i(X
i− E(X
i))
2+ X
i
X
j̸=i
a
ia
j(X
i− E(X
i))(X
j− E(X
j))
= X
i
a
2iE(X
i− E(X
i))
2+ X
i
X
j̸=i
a
ia
jE(X
i− E(X
i))(X
j− E(X
j))
= X
i
a
2iV (X
i) + X
i
X
j̸=i
a
ia
jCov(X
i, X
j)
2.2 Optimization
Consider a case that we aim to obtain a point x ∈ R which maximizes or minimizes a function y = f (x). In this case, if x
0∈ R attains the maximum or minimum, we have the following first order condition at the beginning:
df (x) dx
x=x0
= 0. (11)
In addition, when we consider whether the optimum is a maximum or a minimum, the sufficient condition for the optimum becomes as follows:
d
2f (x) dx
2x=x0
< 0 for a maximum; (12)
d
2f (x) dx
2x=x0
> 0 for a minimum. (13)
Here consider a function y = g(x)( ∈ R ) where x = (x
1, . . . , x
n)
′∈ R
n, denoted as g : R
n→ R .
If an x
0= (x
01, . . . , x
0n)
′∈ R
nmaximizes or minimizes g(x), we apply the following theorem.
If a function g : R
n→ R is maximized (minimized) at the point x
0= (x
01, . . . , x
0n), then the following equation holds:
∂g(x)
∂x
x=x0
=
∂g(x0)
∂x1
.. .
∂g(x0)
∂xn
= 0. (14)
Moreover, we use the following Hessian matrix to discern a maximum and a minimum.
A Hessian matrix of a function g : R
n→ R is defined as follows:
H = ∂g(x)
∂xx
′=
∂2g(x)
∂x1∂x1
· · ·
∂x∂21g(x)∂xn.. . . .. .. .
∂2g(x)
∂xn∂x1
· · ·
∂x∂2ng(x)∂xn
Assume that g
x1(x
0) = g
x2(x
0) = · · · = g
xn(x
0) = 0 holds, where g
xi(x) for i ∈ { 1, . . . , n } denotes the partial derivative of g(x) with respect to x
i. The following theorem is a way to distinguish whether x attains a muximum and a minimum.
Suppose that a smooth function g : R
n→ R satisfies g
x1(x
0) = · · · = g
xn(x
0) = 0. Then, we can confirm that if:
1. H is a negative definite matrix, x
0is a maximum point.
2. H is a positive definite matrix, x
0is a minimum point.
As for the positiveness or negativeness of a matrix, we have the following theorem.
A necessary and sufficient condition for a symmetric matrix A ∈ R
n×nto be positive (negative) definite is that eigenvalues λ such that det(A − λI) = 0 are protive (negative), where I is identity matrix:
I =
1 · · · 0 .. . 1 .. . 0 · · · 1