When these conditions hold, we can rewrite e β

(1)

When these conditions hold, we can rewrite e β

2

as:

e β

₂

= β

₂

+

X

n i=1

(ω

_i

+ d

_i

)u

_i

. The variance of e β

2

is derived as:

V(e β

₂

) = V β

₂

+

X

n i=1

(ω

_i

+ d

_i

)u

_i

= V X

ⁿ

i=1

(ω

_i

+ d

_i

)u

_i

= X

n

i=1

V

(ω

_i

+ d

_i

)u

_i

= X

n

i=1

(ω

i

+ d

i

)

²

V(u

i

) = σ

²

( X

n

i=1

ω

²_i

+ 2 X

n

i=1

ω

i

d

i

+ X

n

i=1

d

²_i

)

= σ

²

( X

n

i=1

ω

²_i

+ X

n

i=1

d

²_i

).

24 From unbiasedness ofe β

2

, using P

n

i=1

d

i

= 0 and P

n

i=1

d

i

x

i

= 0, we obtain:

X

n i=1

ω

i

d

i

= P

n

i=1

(x

i

− x)d

i

P

n

i=1

(x

i

− x)

²

= P

n

i=1

x

i

d

i

− X P

n i=1

d

i

P

n

i=1

(x

i

− x)

²

= 0,

which is utilized to obtain the variance of e β

2

in the third line of the above equation.

From (15), the variance of ˆ β

2

is given by: V( ˆ β

2

) = σ

²

P

n i=1

ω

²_i

. Therefore, we have:

V(e β

2

) ≥ V( ˆ β

2

), because of P

n

i=1

d

²_i

≥ 0.

25 When P

n

i=1

d

_i²

= 0, i.e., when d

1

= d

2

= · · · = d

n

= 0, we have the equality: V(e β

2

)

= V( ˆ β

2

).

Thus, in the case of d

1

= d

2

= · · · = d

n

= 0, ˆ β

2

is equivalent to e β

2

.

As shown above, the least squares estimator ˆ β

₂

gives us the minimum variance linear unbiased estimator ( ࠷খ෼ࢄઢܗෆภਪఆྔ ), or equivalently the best linear unbiased estimator ( ࠷ྑઢܗෆภਪఆྔɼ BLUE), which is called the Gauss-Markov theorem ( Ψ΢εɾϚϧίϑఆཧ ).

26 Asymptotic Properties of ˆ β

2

: We assume that as n goes to infinity we have the following:

1 n

X

n i=1

(x

i

− x)

²

−→ m < ∞ , where m is a constant value. From (12), we obtain:

n X

n

i=1

ω

²_i

= 1 (1/n) P

n

i=1

(x

_i

− x) −→ 1

m .

Note that f(x

_n

) −→ f(m) when x

n

−→ m, called Slutsky’s theorem ( εϧπΩʔ ఆཧ ), where m is a constant value and f( · ) is a function.

27 We show both consistency of ˆ β

2

and asymptotic normality of √ n( ˆ β

2

− β

2

).

˔ First, we prove that ˆ β

₂

is a consistent estimator of β

₂

. Chebyshev’s inequality is given by:

P( | X − µ | > ) ≤ σ

²

, where µ = E(X) and σ

²

= V(X).

Replace X, E(X) and V(X) by:

β ˆ

2

, E( ˆ β

2

) = β

2

, and V( ˆ β

2

) = σ

²

X

n

i=1

ω

²_i

= σ

²

P

n

i=1

(x

i

− x) , respectively.

28 Then, when n −→ ∞ , we obtain the following result:

P( | β ˆ

₂

− β

₂

| > ) ≤ σ

²

P

n i=1

ω

²_i

²

= σ

²

n P

n

i=1

ω

²_i

n

²

−→ 0, where P

n

i=1

ω

²_i

−→ 0 because n P

n

i=1

ω

²_i

−→ 1

m from the assumption.

Thus, we obtain the result that ˆ β

₂

−→ β

₂

as n −→ ∞ .

Therefore, we can conclude that ˆ β

2

is a consistent estimator of β

2

.

29

(2)

˔ Next, we want to show that √

n( ˆ β

2

− β

2

) is asymptotically normal.

Note that ˆ β

₂

= β

₂

+ P

n

i=1

ω

_i

u

_i

as in (13).

From the central limit theorem, asymptotic normality is shown as follows:

P

n

i=1

ω

i

u

i

− E( P

n i=1

ω

i

u

i

) p V( P

n

i=1

ω

_i

u

_i

) =

P

n i=1

ω

i

u

i

σ qP

n i=1

ω

²_i

= β ˆ

2

− β

2

σ/ pP

n

i=1

(x

_i

− x)

²

−→ N(0,1), where E( P

n

i=1

ω

i

u

i

) = 0, V( P

n

i=1

ω

i

u

i

) = σ

²

P

n

i=1

ω

²_i

, and P

n

i=1

ω

i

u

i

= β ˆ

2

− β

2

are substituted in the first and second equalities.

30 Moreover, we can rewrite as follows:

β ˆ

2

− β

2

σ/ pP

n

i=1

(x

_i

− x)

²

=

√ n( ˆ β

2

− β

2

) σ/ p

(1/n) P

n

i=1

(x

_i

− x)

²

−→

√ n( ˆ β

2

− β

2

) σ/ √

m −→ N(0, 1), or equivalently,

√ n( ˆ β

2

− β

2

) −→ N(0, σ

²

m ).

Thus, the asymptotic normality of √

n( ˆ β

₂

− β

₂

) is shown.

31 Finally, replacing σ

²

by its consistent estimator s

²

, it is known as follows:

β ˆ

2

− β

2

s/ pP

n

i=1

(x

_i

− x)

²

−→ N(0,1), (16) where s

²

is defined as:

s

²

= 1 n − 2

X

n i=1

e

²_i

= 1 n − 2

X

n i=1

(y

_i

− β ˆ

₁

− β ˆ

₂

x

i

)

²

, (17) which is a consistent and unbiased estimator of σ

²

. −→ Proved later.

Thus, using (16), in large sample we can construct the confidence interval and test the hypothesis.

32 Exact Distribution of ˆ β

2

: We have shown asymptotic normality of √ n( ˆ β

2

− β

2

), which is one of the large sample properties.

Now, we discuss the small sample properties of ˆ β

₂

.

In order to obtain the distribution of ˆ β

2

in small sample, the distribution of the error term has to be assumed.

Therefore, the extra assumption is that u

i

∼ N(0, σ

²

).

Writing (13), again, ˆ β

2

is represented as:

β ˆ

₂

= β

₂

+ X

n

i=1

ω

_i

u

_i

.

First, we obtain the distribution of the second term in the above equation.

33 Using the moment-generating function, P

n

i=1

ω

i

u

i

is distributed as:

X

n i=1

ω

i

u

i

∼ N (0, σ

²

X

n

i=1

ω

²_i

).

Therefore, ˆ β

₂

is distributed as:

β ˆ

2

= β

2

+ X

n

i=1

ω

i

u

i

∼ N(β

2

, σ

²

X

n

i=1

ω

²_i

), or equivalently,

β ˆ

2

− β

2

σ qP

n i=1

ω

²_i

= β ˆ

2

− β

2

σ/ pP

n

i=1

(x

i

− x)

²

∼ N(0,1), for any n.

34 Moreover, replacing σ

²

by its estimator s

²

defined in (17), it is known that we have:

β ˆ

₂

− β

₂

s/ pP

n

i=1

(x

i

− x)

²

∼ t(n − 2),

where t(n − 2) denotes t distribution with n − 2 degrees of freedom.

Thus, under normality assumption on the error term u

i

, the t(n − 2) distribution is used for the confidence interval and the testing hypothesis in small sample.

Or, taking the square on both sides, β ˆ

2

− β

2

s/ pP

n i=1

(x

_i

− x)

²

2

∼ F(1,n − 2), which will be proved later.

35

(3)

Before going to multiple regression model ( ॏճؼϞσϧ ),

2 Some Formulas of Matrix Algebra

1. Let A =

 

 

 

a

11

a

12

· · · a

1k

a

21

a

22

· · · a

2k

.. . .. . . .. .. . a

l1

a

l2

· · · a

lk

 

 

 

= [a

_{i j}

],

which is a l × k matrix, where a

i j

denotes ith row and jth column of A.

36 The transposed matrix ( సஔߦྻ ) of A, denoted by A

⁰

, is defined as:

A

⁰

=

 

 

 

a

11

a

21

· · · a

l1

a

₁₂

a

₂₂

· · · a

_l2

.. . .. . . .. .. . a

1k

a

2k

· · · a

lk

 

 

 

= [a

ji

],

where the ith row of A

⁰

is the ith column of A.

2. (Ax)

⁰

= x

⁰

A

⁰

,

where A and x are a l × k matrix and a k × 1 vector, respectively.

37 3. a

⁰

= a,

where a denotes a scalar.

4. ∂a

⁰

x

∂x = a,

where a and x are k × 1 vectors.

5. ∂x

⁰

Ax

∂x = (A + A

⁰

)x,

where A and x are a k × k matrix and a k × 1 vector, respectively.

38 Especially, when A is symmetric,

∂x

⁰

Ax

∂x = 2Ax.

6. Let A and B be k × k matrices, and I

k

be a k × k identity matrix ( ୯Ґߦྻ ) (one in the diagonal elements and zero in the other elements).

When AB = I

k

, B is called the inverse matrix ( ٯߦྻ ) of A, denoted by B = A

⁻¹

.

That is, AA

⁻¹

= A

⁻¹

A = I

_k

. 39

7. Let A be a k × k matrix and x be a k × 1 vector.

If A is a positive definite matrix ( ਖ਼஋ఆූ߸ߦྻ ), for any x except for x = 0 we have:

x

⁰

Ax > 0.

If A is a positive semidefinite matrix ( ඇෛ஋ఆූ߸ߦྻ ), for any x except for x = 0 we have:

x

⁰

Ax ≥ 0.

40 If A is a negative definite matrix ( ෛ஋ఆූ߸ߦྻ ), for any x except for x = 0 we have:

x

⁰

Ax < 0.

If A is a negative semidefinite matrix ( ඇਖ਼஋ఆූ߸ߦྻ ), for any x except for x = 0 we have:

x

⁰

Ax ≤ 0.

41

(4)

Trace, Rank and etc.: A : k × k, B : n × k, C : k × n.

1. The trace ( τϨʔε ) of A is: tr(A) = X

k

i=1

a

ii

, where A = [a

i j

] .

2. The rank ( ϥϯΫɼ֊਺ ) of A is the maximum number of linearly indepen- dent column (or row) vectors of A, which is denoted by rank(A).