When these conditions hold, we can rewrite e β
2as:
e β
2= β
2+
X
n i=1(ω
i+ d
i)u
i. The variance of e β
2is derived as:
V(e β
2) = V β
2+
X
n i=1(ω
i+ d
i)u
i= V X
ni=1
(ω
i+ d
i)u
i= X
ni=1
V
(ω
i+ d
i)u
i= X
ni=1
(ω
i+ d
i)
2V(u
i) = σ
2( X
ni=1
ω
2i+ 2 X
ni=1
ω
id
i+ X
ni=1
d
2i)
= σ
2( X
ni=1
ω
2i+ X
ni=1
d
2i).
24
From unbiasedness ofe β
2, using P
ni=1
d
i= 0 and P
ni=1
d
ix
i= 0, we obtain:
X
n i=1ω
id
i= P
ni=1
(x
i− x)d
iP
ni=1
(x
i− x)
2= P
ni=1
x
id
i− X P
n i=1d
iP
ni=1
(x
i− x)
2= 0,
which is utilized to obtain the variance of e β
2in the third line of the above equation.
From (15), the variance of ˆ β
2is given by: V( ˆ β
2) = σ
2P
n i=1ω
2i. Therefore, we have:
V(e β
2) ≥ V( ˆ β
2), because of P
ni=1
d
2i≥ 0.
25
When P
ni=1
d
i2= 0, i.e., when d
1= d
2= · · · = d
n= 0, we have the equality: V(e β
2)
= V( ˆ β
2).
Thus, in the case of d
1= d
2= · · · = d
n= 0, ˆ β
2is equivalent to e β
2.
As shown above, the least squares estimator ˆ β
2gives us the minimum variance linear unbiased estimator ( ࠷খࢄઢܗෆภਪఆྔ ), or equivalently the best linear unbiased estimator ( ࠷ྑઢܗෆภਪఆྔɼ BLUE), which is called the Gauss-Markov theorem ( ΨεɾϚϧίϑఆཧ ).
26
Asymptotic Properties of ˆ β
2: We assume that as n goes to infinity we have the following:
1 n
X
n i=1(x
i− x)
2−→ m < ∞ , where m is a constant value. From (12), we obtain:
n X
ni=1
ω
2i= 1 (1/n) P
ni=1
(x
i− x) −→ 1
m .
Note that f(x
n) −→ f(m) when x
n−→ m, called Slutsky’s theorem ( εϧπΩʔ ఆཧ ), where m is a constant value and f( · ) is a function.
27
We show both consistency of ˆ β
2and asymptotic normality of √ n( ˆ β
2− β
2).
˔ First, we prove that ˆ β
2is a consistent estimator of β
2. Chebyshev’s inequality is given by:
P( | X − µ | > ) ≤ σ
2 2, where µ = E(X) and σ
2= V(X).
Replace X, E(X) and V(X) by:
β ˆ
2, E( ˆ β
2) = β
2, and V( ˆ β
2) = σ
2X
ni=1
ω
2i= σ
2P
ni=1
(x
i− x) , respectively.
28
Then, when n −→ ∞ , we obtain the following result:
P( | β ˆ
2− β
2| > ) ≤ σ
2P
n i=1ω
2i 2= σ
2n P
ni=1
ω
2in
2−→ 0, where P
ni=1
ω
2i−→ 0 because n P
ni=1
ω
2i−→ 1
m from the assumption.
Thus, we obtain the result that ˆ β
2−→ β
2as n −→ ∞ .
Therefore, we can conclude that ˆ β
2is a consistent estimator of β
2.
29
˔ Next, we want to show that √
n( ˆ β
2− β
2) is asymptotically normal.
Note that ˆ β
2= β
2+ P
ni=1
ω
iu
ias in (13).
From the central limit theorem, asymptotic normality is shown as follows:
P
ni=1
ω
iu
i− E( P
n i=1ω
iu
i) p V( P
ni=1
ω
iu
i) =
P
n i=1ω
iu
iσ qP
n i=1ω
2i= β ˆ
2− β
2σ/ pP
ni=1
(x
i− x)
2−→ N(0,1), where E( P
ni=1
ω
iu
i) = 0, V( P
ni=1
ω
iu
i) = σ
2P
ni=1
ω
2i, and P
ni=1
ω
iu
i= β ˆ
2− β
2are substituted in the first and second equalities.
30
Moreover, we can rewrite as follows:
β ˆ
2− β
2σ/ pP
ni=1
(x
i− x)
2=
√ n( ˆ β
2− β
2) σ/ p
(1/n) P
ni=1
(x
i− x)
2−→
√ n( ˆ β
2− β
2) σ/ √
m −→ N(0, 1), or equivalently,
√ n( ˆ β
2− β
2) −→ N(0, σ
2m ).
Thus, the asymptotic normality of √
n( ˆ β
2− β
2) is shown.
31
Finally, replacing σ
2by its consistent estimator s
2, it is known as follows:
β ˆ
2− β
2s/ pP
ni=1
(x
i− x)
2−→ N(0,1), (16) where s
2is defined as:
s
2= 1 n − 2
X
n i=1e
2i= 1 n − 2
X
n i=1(y
i− β ˆ
1− β ˆ
2x
i)
2, (17) which is a consistent and unbiased estimator of σ
2. −→ Proved later.
Thus, using (16), in large sample we can construct the confidence interval and test the hypothesis.
32
Exact Distribution of ˆ β
2: We have shown asymptotic normality of √ n( ˆ β
2− β
2), which is one of the large sample properties.
Now, we discuss the small sample properties of ˆ β
2.
In order to obtain the distribution of ˆ β
2in small sample, the distribution of the error term has to be assumed.
Therefore, the extra assumption is that u
i∼ N(0, σ
2).
Writing (13), again, ˆ β
2is represented as:
β ˆ
2= β
2+ X
ni=1
ω
iu
i.
First, we obtain the distribution of the second term in the above equation.
33
Using the moment-generating function, P
ni=1
ω
iu
iis distributed as:
X
n i=1ω
iu
i∼ N (0, σ
2X
ni=1
ω
2i).
Therefore, ˆ β
2is distributed as:
β ˆ
2= β
2+ X
ni=1
ω
iu
i∼ N(β
2, σ
2X
ni=1
ω
2i), or equivalently,
β ˆ
2− β
2σ qP
n i=1ω
2i= β ˆ
2− β
2σ/ pP
ni=1
(x
i− x)
2∼ N(0,1), for any n.
34
Moreover, replacing σ
2by its estimator s
2defined in (17), it is known that we have:
β ˆ
2− β
2s/ pP
ni=1
(x
i− x)
2∼ t(n − 2),
where t(n − 2) denotes t distribution with n − 2 degrees of freedom.
Thus, under normality assumption on the error term u
i, the t(n − 2) distribution is used for the confidence interval and the testing hypothesis in small sample.
Or, taking the square on both sides, β ˆ
2− β
2s/ pP
n i=1(x
i− x)
2 2∼ F(1,n − 2), which will be proved later.
35
Before going to multiple regression model ( ॏճؼϞσϧ ),
2 Some Formulas of Matrix Algebra
1. Let A =
a
11a
12· · · a
1ka
21a
22· · · a
2k.. . .. . . .. .. . a
l1a
l2· · · a
lk
= [a
i j],
which is a l × k matrix, where a
i jdenotes ith row and jth column of A.
36
The transposed matrix ( సஔߦྻ ) of A, denoted by A
0, is defined as:
A
0=
a
11a
21· · · a
l1a
12a
22· · · a
l2.. . .. . . .. .. . a
1ka
2k· · · a
lk
= [a
ji],
where the ith row of A
0is the ith column of A.
2. (Ax)
0= x
0A
0,
where A and x are a l × k matrix and a k × 1 vector, respectively.
37
3. a
0= a,
where a denotes a scalar.
4. ∂a
0x
∂x = a,
where a and x are k × 1 vectors.
5. ∂x
0Ax
∂x = (A + A
0)x,
where A and x are a k × k matrix and a k × 1 vector, respectively.
38
Especially, when A is symmetric,
∂x
0Ax
∂x = 2Ax.
6. Let A and B be k × k matrices, and I
kbe a k × k identity matrix ( ୯Ґߦྻ ) (one in the diagonal elements and zero in the other elements).
When AB = I
k, B is called the inverse matrix ( ٯߦྻ ) of A, denoted by B = A
−1.
That is, AA
−1= A
−1A = I
k. 39
7. Let A be a k × k matrix and x be a k × 1 vector.
If A is a positive definite matrix ( ਖ਼ఆූ߸ߦྻ ), for any x except for x = 0 we have:
x
0Ax > 0.
If A is a positive semidefinite matrix ( ඇෛఆූ߸ߦྻ ), for any x except for x = 0 we have:
x
0Ax ≥ 0.
40
If A is a negative definite matrix ( ෛఆූ߸ߦྻ ), for any x except for x = 0 we have:
x
0Ax < 0.
If A is a negative semidefinite matrix ( ඇਖ਼ఆූ߸ߦྻ ), for any x except for x = 0 we have:
x
0Ax ≤ 0.
41
Trace, Rank and etc.: A : k × k, B : n × k, C : k × n.
1. The trace ( τϨʔε ) of A is: tr(A) = X
ki=1