Econometrics I: Solutions of Homework 1

(1)

Econometrics I: Solutions of Homework 1

Hiroki Kato ^* April 19, 2020

1 Solutions 1

1.1 Question 1 . . . . 1 1.2 Question 2 . . . . 4 1.3 Question 3 . . . . 5

2 Review 7

2.1 Properties of Expectaion and Variance . . . . 7 2.2 Optimization . . . . 9

1 Solutions

1.1 Question 1

Let S(α, β) be the sum of squares residuals:

S(α, β ) = X

T

t=1

u

²_t

= X

T

t=1

(y

_t

− α − βX

_t

)

²

(1)

*email: [email protected]. Room 503. All materials I made are published in github:

https://github.com/KatoPachi/2020EconometricsTA.git. If you have any errors in handouts and materials, please contact me via email or make an issue in github.

(2)

The Ordinary Least Squares estimators (hereafter, OLS estimators) can be derived by minimizing (1):

( ˆ α, β) ˆ ∈ arg min

α,β

S(α, β) (2)

The firstorder conditions of this problem are

∂S(α, β )

∂α = − 2 X

T

t=1

(y

_t

− α ˆ − βX ˆ

_t

) = 0 (3)

∂S(α, β )

∂β = − 2 X

T

t=1

X

_t

(y

_t

− α ˆ − βX ˆ

_t

) = 0 (4)

Note that the secondorder condition is hold since the Hessian matrix is positive defenite

H =



 



∂S(α,β)

∂α∂α

∂S(α,β)

∂α∂β

∂S(α,β)

∂β∂α

∂S(α,β)

∂β∂β



 

 =



 



2T 2 P

t

X

_t

2 P

t

X

_t

2 P

t

X

_t²



 



⇒| H | =

2T · 2 X

t

X

_t²

−

2 X

t

X

_t

· 2 X

t

X

_t

= 4T 1

T X

t

X

_t²

− ( X

t

X

_t

/T )

²

⇒| H | = 4T · V (X

_t

) > 0 where V (X

_t

) is the variance of X

_t

By equation (3), we have

ˆ α = 1

T ( X

t

y

_t

− β ˆ X

t

X

_t

)

= ¯ y − β ˆ X ¯ (5)

where y ¯ = P

t

y

t

/T and X ¯ = P

t

X

t

/T (sample mean). Substituting equation(5) into equation (4) yields

β ˆ X

t

X

_t²

− X ¯ X

t

X

t

= X

t

y

t

X

t

− y ¯ X

t

X

t

β ˆ = P

t

y

_t

X

_t

− y ¯ P

t

X

_t

P

t

X

_t²

− X ¯ P

t

X

_t

β ˆ =

P

t

X

_t

(y

_t

− y) ¯ P

t

X

_t

(X

_t

− X) ¯ (6)

(3)

Thus, OLS estimators are

( ˆ α, β) = ˆ

¯

y − β ˆ X, ¯ P

t

X

_t

(y

_t

− y) ¯ P

t

X

_t

(X

_t

− X) ¯

(7)

Note that the estimator of β, β, can be rewritten as follows: ˆ β ˆ = Cov(y

_t

, X

_t

)

V (X

_t

) (8)

where Cov(y

_t

, X

_t

) is covariance (

^共分散

) between y

_t

and X

_t

, and V (X

_t

) is variance (

^分散

) of X

_t

. To prove it, we need to recall the defenition of covariance and variance. First, the defenition of covairance is

Cov(y

_t

, X

_t

) = E[(y

_t

− E(y

_t

))(X

_t

− E(X

_t

))].

Then, sample covariance is S

_y_t_,X_t

= 1

T − 1 X

t

(y

_t

− y)(X ¯

_t

− X) ¯

= 1

T − 1 X

t

(y

_t

X

_t

− y

_t

X ¯ − yX ¯

_t

+ ¯ X y) ¯

= 1

T − 1 X

t

y

t

X

t

− X ¯ X

t

y

t

− y ¯ X

t

X

t

+ T X ¯ y ¯

= 1

T − 1 X

t

y

_t

X

_t

− y ¯ X

t

X

_t

= 1

T − 1 X

t

X

_t

(y

_t

− y) ¯

Second, the defenition of variance is

V (X

_t

) = E[(X

_t

− E(X

_t

))

²

].

(4)

Then, sample variance is

S

_X²_t

= 1 T − 1

X

t

(X

_t

− X) ¯

²

= 1

T − 1 X

t

X

_t²

− 2 ¯ X T − 1

X

t

X

_t

+ 1

T − 1 T X ¯

²

= 1

T − 1 X

t

X

_t²

− X ¯ X

t

X

t

= 1

T − 1 X

t

X

t

(X

t

− X) ¯

Finally, we have

β ˆ = S

yt,Xt

S

_X²

t

= P

t

X

_t

(y

_t

− y) ¯ P

t

X

_t

(X

_t

− X) ¯ . Thus, the OLS estimator of β is β ˆ = Cov(y

_t

, X

_t

)/V (X

_t

), or

β ˆ = P

t

(y

_t

− y)(X ¯

_t

− X) ¯ P

t

(X

_t

− X) ¯

²

(9)

1.2 Question 2

From (9), we have

β ˆ = P

t

y

_t

(X

_t

− X) ¯ − y ¯ P

t

(X

_t

− X) ¯ P

t

(X

_t

− X) ¯

²

= P

t

y

_t

(X

_t

− X) ¯ P

t

(X

_t

− X) ¯

²

since P

t

(X

_t

− X) = ¯ P

t

X

_t

− T X ¯ = P

t

X

_t

− P

t

X

_t

= 0. Substituting y

_t

= α + βX

_t

+ u

_t

into this equation yields

β ˆ = P

t

(X

_t

− X)(α ¯ + βX

_t

+ u

_t

) P

t

(X

_t

− X) ¯

²

= β P

t

(X

_t

− X)X ¯

_t

+ P

t

(X

_t

− X)u ¯

_t

P

t

(X

t

− X) ¯

²

= β + P

t

(X

t

− X)u ¯

t

P

t

(X

_t

− X) ¯

²

= β + X

t

ω

_t

u

_t

(5)

where ω

t

= (X

t

− X)/ ¯ P

t

(X

t

− X) ¯

²

.

Because β and X

_t

are not random variables, E( ˆ β) = β + X

t

ω

_t

E(u

_t

) = β.

The variance of β ˆ is

V ( ˆ β) = E [( ˆ β − E( ˆ β))

²

] = E( ˆ β − β)

²

= E( X

t

ω

_t

u

_t

)

²

= E [ X

t

ω

_t²

u

²_t

+ 2 X

t

X

t^′̸=t

ω

_t

ω

_t′

u

_t

u

_t′

]

= X

t

ω

_t²

E(u

²_t

) + 2 X

t

X

t^′̸=t

ω

_t

ω

_t′

E(u

_t

u

_t′

)

= X

t

ω

_t²

E[(u

_t

− E(u

_t

))

²

] + 2 X

t

X

t^′̸=t

ω

_t

ω

_t′

E(u

_t

)E(u

_t′

)

= σ

²

P

t

(X

_t

− X) ¯

²

( P

t

(X

_t

− X) ¯

²

)

²

= σ

²

P

t

(X

_t

− X) ¯

²

. To derive it, we use following properties:

• mutual independece assumption implies E(u

_t

u

_t′

) = E(u

_t

)E(u

_t′

).

• By E(u

_t

) = 0, E(u

²_t

) = E[(u

_t

− E(u

_t

))

²

] = V (u

_t

).

1.3 Question 3

Recall that the estimator of α is

ˆ

α = ¯ y − β ˆ X. ¯

Substituting y ¯ = α + β X ¯ + ¯ u (average both sides over t ∈ { 1, . . . T } ) into this equation implies ˆ

α = α − ( ˆ β − β) ¯ X + ¯ u.

Since E( ˆ β) = β and E(¯ u) = E( P

t

u

_t

)/T = P

t

E(u

_t

)/T = 0, we obtain

E( ˆ α) = α.

(6)

The variance of α ˆ is

V ( ˆ α) = E[( ˆ α − E( ˆ α))

²

] = E( ˆ α − α)

²

= E( − ( ˆ β − β) ¯ X + ¯ u)

²

= E[( ˆ β − β)

²

X ¯

²

− 2( ˆ β − β) ¯ X u ¯ + ¯ u

²

]

= ¯ X

²

E( ˆ β − β)

²

− 2 ¯ XE( ˆ β − β)¯ u + E(¯ u

²

). (10) The first term of equation (10) is

X ¯

²

E( ˆ β − β)

²

= X ¯

²

σ

²

P

t

(X

_t

− X) ¯

²

. The third term of equation (10) is

E(¯ u

²

) = E( P

t

u

_t

)

²

T

²

= E[ P

t

u

²_t

+ 2 P

t

P

t^′̸=t

u

_t

u

_t′

] T

²

= P

t

E(u

²_t

) + 2 P

t

P

t^′

E(u

_t

u

_t′

) T

²

= P

t

E[(u

_t

− E(u

_t

))

²

] + 2 P

t

P

t^′

E(u

_t

)E(u

_t′

) T

²

= σ

²

T

The second term of equation (10) is rewritten as follows:

2 ¯ XE( ˆ β − β)¯ u

=2 ¯ XE P

t

(X

t

− X)u ¯

t

P

t

(X

t

− X) ¯

²

P

t

u

t

T

= 2 ¯ X T P

t

(X

_t

− X) ¯

²

E X

t

(X

t

− X)u ¯

t

X

t

u

t

= 2 ¯ X T P

t

(X

_t

− X) ¯

²

E X

t

(X

_t

− X) ¯

u

²_t

+ X

t^′̸=t

u

_t

u

_t′

= 2 ¯ X T P

t

(X

t

− X) ¯

²

X

t

(X

_t

− X)E(u ¯

²_t

) + X

t

X

t^′

(X

_t

− X)E(u ¯

_t

u

_t′

)

= 2 ¯ X T P

t

(X

_t

− X) ¯

²

X

t

(X

t

− X)E[(u ¯

t

− E(u

t

))

²

] + X

t

X

t^′

(X

t

− X)E(u ¯

t

)E(u

t^′

)

= 2 ¯ X T P

t

(X

_t

− X) ¯

²

σ

²

X

t

(X

_t

− X) = 0. ¯

(7)

Hence, we have the variance of α ˆ as follows:

V ( ˆ α) =

X ¯

²

σ

²

P

t

(X

_t

− X) ¯

²

+ σ

²

T

= σ

²

P

t

X

_t²

T P

t

(X

_t

− X) ¯

²

2 Review

2.1 Properties of Expectaion and Variance

In this section, we will review some properties of expctation and variance that are used to solve homework.

Mutual Independence

Let X and Y be random variables, which are mutually and independentlly distributed. Then, following properties of expectation and variance must be hold:

1. E(XY ) = E(X)E(Y );

2. Cov(X, Y ) = 0

These two properties means that there is no correlation between X and Y .

Caveate: Independece is sufficient condition for uncorrelatedness (exceptional cases: multivari

ate normal distribution).

Proof. Without loss of generarity, we assume X and Y are continuous random variables. Let f (x, y) be joint distribution of X and Y . By defenition, mutual independce leads to f (x, y) = f

_X

(x)f

_Y

(y).

1. By defenition,

E(XY ) = Z Z

xyf (x, y)dydx

= Z Z

xyf

_X

(x)f

_Y

(y)dydx

= Z

xf

X

(x)dx Z

yf

Y

(y)dy

= E(X)E(Y )

(8)

2. By defenition,

Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))]

= E[XY − XE(Y ) − E(X)Y + E(X)E(Y )]

= E(XY ) − E(X)E(Y )

= E(X)E(Y ) − E(X)E(Y ) = 0.

Additivity of Expectaion and Variance

Let X

₁

, . . . , X

_n

be random variables, and let a

₁

, . . . , a

_n

be constants. Then, the following prop

erties is hold:

1. E( P

i

a

_i

X

_i

+ b) = P

i

a

_i

E(X

_i

) + b;

2. V ( P

i

a

_i

X

_i

+ b) = P

i

a

²_i

V (X

_i

) + 2 P

i

P

j̸=i

a

_i

a

_j

Cov(X

_i

, X

_j

)

Proof. Without loss of generarity, we assume X

_i

are continuous random variables. Let f (x

₁

, . . . x

_n

) be joint distribution of X

₁

, . . . , X

_n

.

1. By defenition, E( X

i

a

_i

X

_i

+ b)

= Z

· · · Z

(a

₁

X

₁

+ · · · + a

_n

X

_n

+ b)f(x

₁

, . . . x

_n

)dx

₁

· · · dx

_n

=a

₁

Z

· · · Z

X

₁

f (x

₁

, . . . x

_n

)dx

₁

· · · dx

_n

+ · · · + b Z

· · · Z

f (x

₁

, . . . x

_n

)dx

₁

· · · dx

_n

=a

1

Z X

1

Z

X2

· · · Z

Xn

f (x

1

, . . . x

n

)dx

2

· · · dx

n

dx

1

+ · · · + b

=a

₁

Z

X

₁

f (x

₁

)dx

₁

+ · · · + b = X

i

a

_i

E(X

_i

) + b

(9)

2. By defenition, V ( X

i

a

_i

X

_i

+ b)

=E

( X

i

a

_i

X

_i

+ b) − E( X

i

a

_i

X

_i

+ b)

2

=E

( X

i

a

_i

X

_i

+ b) − ( X

i

a

_i

E(X

_i

) + b)

2

=E X

i

a

_i

(X

_i

− E(X

_i

))

2

=E X

i

a

²_i

(X

_i

− E(X

_i

))

²

+ X

i

X

j̸=i

a

_i

a

_j

(X

_i

− E(X

_i

))(X

_j

− E(X

_j

))

= X

i

a

²_i

E(X

_i

− E(X

_i

))

²

+ X

i

X

j̸=i

a

_i

a

_j

E(X

_i

− E(X

_i

))(X

_j

− E(X

_j

))

= X

i

a

²_i

V (X

_i

) + X

i

X

j̸=i

a

_i

a

_j

Cov(X

_i

, X

_j

)

2.2 Optimization

Consider a case that we aim to obtain a point x ∈ R which maximizes or minimizes a function y = f (x). In this case, if x

⁰

∈ R attains the maximum or minimum, we have the following first order condition at the beginning:

df (x) dx

x=x⁰

= 0. (11)

In addition, when we consider whether the optimum is a maximum or a minimum, the sufficient condition for the optimum becomes as follows:

d

²

f (x) dx

²

x=x⁰

< 0 for a maximum; (12)

d

²

f (x) dx

²

x=x⁰

> 0 for a minimum. (13)

Here consider a function y = g(x)( ∈ R ) where x = (x

₁

, . . . , x

_n

)

^′

∈ R

ⁿ

, denoted as g : R

ⁿ

→ R .

If an x

⁰

= (x

⁰₁

, . . . , x

⁰_n

)

^′

∈ R

ⁿ

maximizes or minimizes g(x), we apply the following theorem.

(10)

If a function g : R

ⁿ

→ R is maximized (minimized) at the point x

⁰

= (x

⁰₁

, . . . , x

⁰_n

), then the following equation holds:

∂g(x)

∂x

x=x⁰

=



 



∂g(x⁰)

∂x1

.. .

∂g(x⁰)

∂xn



 



= 0. (14)

Moreover, we use the following Hessian matrix to discern a maximum and a minimum.

A Hessian matrix of a function g : R

ⁿ

→ R is defined as follows:

H = ∂g(x)

∂xx

^′

=



 



∂²g(x)

∂x1∂x1

· · ·

_∂x^∂²₁^g(x)_∂x_n

.. . . .. .. .

∂²g(x)

∂xn∂x1

· · ·

_∂x^∂²_n^g(x)_∂x_n



 



Assume that g

_x₁

(x

⁰

) = g

_x₂

(x

⁰

) = · · · = g

_x_n

(x

⁰

) = 0 holds, where g

_x_i

(x) for i ∈ { 1, . . . , n } denotes the partial derivative of g(x) with respect to x

i

. The following theorem is a way to distinguish whether x attains a muximum and a minimum.

Suppose that a smooth function g : R

ⁿ

→ R satisfies g

x1

(x

⁰

) = · · · = g

xn

(x

⁰

) = 0. Then, we can confirm that if:

1. H is a negative definite matrix, x

⁰

is a maximum point.

2. H is a positive definite matrix, x

⁰

is a minimum point.

As for the positiveness or negativeness of a matrix, we have the following theorem.

(11)

A necessary and sufficient condition for a symmetric matrix A ∈ R

ⁿ^×ⁿ

to be positive (negative) definite is that eigenvalues λ such that det(A − λI) = 0 are protive (negative), where I is identity matrix:

I =



 



1 · · · 0 .. . 1 .. . 0 · · · 1



 



For example, in the case of f (x, y) = x

²

+ 4xy + 5y

²

− 2x − 8y + 5, we have f

_x

= 2x + 4y − 2 and f

_y

= 4x + 10y − 8. By solving f

_x

= f

_y

= 0, we obtain an optimum point (x, y) = ( − 3, 2).

Also, the Hessian matrix is given as follows:

H

_f

=



 

 2 4 4 10



 

 .

To obtain eigenvalues, we subtract the diagonal matrix with eigenvalues from the Hessian matrix:

H

_f

− λI =



 



2 − λ 4 4 10 − λ



 



Then, the determinant of this matrix is

f (λ) = (2 − λ)(10 − λ) − 16.

Since f (λ) is convex, all eigenvalues are positives if f (0) > 0 (Write rough graph by yourself).

Then, f(0) = 2 · 10 − 16 > 0. Note that f (0) is correspond to the determinant of Hessian matrix.

Thus, (x, y) = ( − 3, 2) is a minimum point. We can analyze an optimum of a multivariable function

for more variables in the same manner.

Econometrics I: Solutions of Homework 1