• The class of Special Lectures in Economics (Statistical Analysis), 経済学特論

(1)

Econometrics I

(Thur., 8:50-10:20)

Room # 4 ( ^{法経講義棟} )

• The prerequisite of this class is Basic Statistics ( 統計基礎 ) (by Prof. Fukushige, Tue., 16:20-17:50, this semester) and Econometrics ( エコノメトリックス ) (under- graduate level, next semester, 『計量経済学』山本拓著，新世社 ).

• The class of Special Lectures in Economics (Statistical Analysis), 経済学特論

（統計解析） (by Prof. Oya, Wed., 10:30-12:00, this semester) should be registered.

(2)

TA Session (by Mr. Yonekura and Mr.

Sakamoto):

Tue., 14:40 - 16:10

Room # 505 ( ^{法経大学院総合研究棟} )

Content: Basic Statistics, Matrix Algebra, and etc.

(3)

1 Regression Analysis ( ^回帰分析 )

1.1 Setup of the Model

When (x ₁ , y ₁ ), ( x ₂ , y ₂ ), · · · , ( x _n , y _n ) are available, suppose that there is a linear rela- tionship between y and x, i.e.,

y _i = β 1 + β 2 x _i + u _i , (1) for i = 1 , 2 , · · · , n. x _i and y _i denote the ith observations.

−→ Single (or simple) regression model ( 単回帰モデル )

y _i is called the dependent variable ( 従属変数 ) or the explained variable ( 被説明変

数 ), while x i is known as the independent variable ( 独立変数 ) or the explanatory

(or explaining) variable ( 説明変数 ).

(4)

β 1 = Intercept ( 切片 ), β 2 = Slope ( 傾き )

β 1 and β 2 are unknown parameters ( パラメータ，母数 ) to be estimated.

β 1 and β 2 are called the regression coe ﬃ cients ( 回帰係数 ).

u _i is the unobserved error term ( 誤差項 ) assumed to be a random variable with mean zero and variance σ ² .

σ ² is also a parameter to be estimated.

x _i is assumed to be nonstochastic ( 非確率的 ), but y _i is stochastic ( 確率的 ) because y _i depends on the error u _i .

The error terms u ₁ , u ₂ , · · · , u _n are assumed to be mutually independently and identi-

cally distributed, which is called iid. −→ discussed later.

(5)

Taking the expectation on both sides of (1), the expectation of y _i is represented as:

E(y i ) = E( β 1 + β 2 x i + u i ) = β 1 + β 2 x i + E(u i )

= β 1 + β 2 x i , (2)

for i = 1 , 2 , · · · , n. Using E(y _i ) we can rewrite (1) as y _i = E(y _i ) + u _i . (2) represents the true regression line.

Let ˆ β 1 and ˆ β 2 be estimates of β 1 and β 2 .

Replacing β 1 and β 2 by ˆ β 1 and ˆ β 2 , (1) turns out to be:

y _i = β ˆ 1 + β ˆ 2 x _i + e _i , (3) for i = 1 , 2 , · · · , n, where e _i is called the residual ( 残差 ).

The residual e _i is taken as the experimental value (or realization) of u _i .

(6)

We define ˆ y _i as follows:

ˆ

y _i = β ˆ 1 + β ˆ 2 x _i , (4) for i = 1 , 2 , · · · , n, which is interpreted as the predicted value ( 予測値 ) of y _i .

(4) indicates the estimated regression line, which is di ﬀ erent from (2).

Moreover, using ˆ y _i we can rewrite (3) as y _i = y ˆ _i + e _i . (2) and (4) are displayed in Figure 1.

Consider the case of n = 6 for simplicity. × indicates the observed data series.

The true regression line (2) is represented by the solid line, while the estimated re-

gression line (4) is drawn with the dotted line.

(7)

Figure 1. True and Estimated Regression Lines ( 回帰直線 )

y

x

XXXXX XX z Distributions

of the Errors

×

...

×

^....^....^....

...

× _

 

 

 



Error u

i

Residual e

i

(x

i

, y

i

)

×

@ @ I ˆ

y

_i

= β ˆ

1

+ β ˆ

2

x

_i

(Estimated Regression Line)

@ @ I

E(y

_i

) = β

1

+ β

2

x

_i

(True Regression Line)

In the next section, we consider how to obtain the estimates of β 1 and β 2 , i.e., ˆ β 1 and

β ˆ 2 .

(8)

1.2 Ordinary Least Squares Estimation

Suppose that (x ₁ , y ₁ ), (x ₂ , y ₂ ), · · · , (x _n , y _n ) are available.

For the regression model (1), we consider estimating β 1 and β 2 .

Replacing β 1 and β 2 by their estimates ˆ β 1 and ˆ β 2 , remember that the residual e _i is given by:

e _i = y _i − y ˆ _i = y _i − β ˆ 1 − β ˆ 2 x _i . The sum of squared residuals is defined as follows:

S ( ˆ β 1 , β ˆ 2 ) =

∑ n

i=1

e ² _i =

∑ n

i=1

(y _i − β ˆ 1 − β ˆ 2 x _i ) ² .

It might be plausible to choose the ˆ β 1 and ˆ β 2 which minimize the sum of squared residuals, i.e., S ( ˆ β 1 , β ˆ 2 ).

最小二乗法，

(9)

To minimize S ( ˆ β 1 , β ˆ 2 ) with respect to ˆ β 1 and ˆ β 2 , we set the partial derivatives equal to zero:

∂ S ( ˆ β 1 , β ˆ 2 )

∂ β ˆ 1

= − 2

∑ n

i=1

(y _i − β ˆ 1 − β ˆ 2 x _i ) = 0 ,

∂ S ( ˆ β 1 , β ˆ 2 )

∂ β ˆ 2

= − 2

∑ n

i=1

x _i (y _i − β ˆ 1 − β ˆ 2 x _i ) = 0 . The second order condition for minimization is:

( ∂

²

S( ˆ β

1

, β ˆ

2

)

∂ β ˆ

²₁

∂

²

S ( ˆ β

1

, β ˆ

2

)

∂ β ˆ

1

∂ β ˆ

2

∂

²

S( ˆ β

1

, β ˆ

2

)

∂ β ˆ

2

∂ β ˆ

1

∂

²

S ( ˆ β

1

, β ˆ

2

)

∂ β ˆ

²₂

)

=

( 2n 2 ∑ _n

i = 1 x _i 2 ∑ n

i = 1 x _i 2 ∑ n i = 1 x ² _i

)

should be a positive definite matrix.

The diagonal elements 2n and 2 ∑ n

i = 1 x ² _i are positive.

The determinant:

2n 2 ∑ _n

i = 1 x _i 2 ∑ n

i = 1 x _i 2 ∑ n i = 1 x ² _i

= 4n

∑ n

i = 1

x ² _i − 4(

∑ n

i = 1

x _i ) ² = 4n

∑ n

i = 1

(x _i − x) ²

(10)

is positive. = ⇒ The second-order condition is satisfied.

The first two equations yield the following two equations:

y = β ˆ 1 + β ˆ 2 x , (5)

∑ n

i = 1

x _i y _i = nx β ˆ 1 + β ˆ 2

∑ n

i = 1

x ² _i , (6)

where y = 1 n

∑ n

i = 1

y _i and x = 1 n

∑ n

i = 1

x _i .

Multiplying (5) by nx and subtracting (6), we can derive ˆ β 2 as follows:

β ˆ 2 =

∑ _n

i = 1 x _i y _i − nxy

∑ n

i=1 x ² _i − nx ² =

∑ _n

i = 1 (x _i − x)(y _i − y)

∑ n

i=1 (x _i − x) ² . (7)

From (5), ˆ β 1 is directly obtained as follows:

(11)

When the observed values are taken for y _i and x _i for i = 1 , 2 , · · · , n, we say that ˆ β 1

and ˆ β 2 are called the ordinary least squares estimates (or simply the least squares estimates, 最小二乗推定値 ) of β 1 and β 2 .

When y i for i = 1 , 2 , · · · , n are regarded as the random sample, we say that ˆ β 1 and ˆ β 2

are called the ordinary least squares estimators (or the least squares estimators, 最小二乗推定量 ) of β 1 and β 2 .

1.3 Properties of Least Squares Estimator

Equation (7) is rewritten as:

β ˆ 2 =

∑ n

i = 1 (x i − x)(y i − y)

∑ _n

i = 1 (x i − x) ² =

∑ n

i = 1 (x i − x)y i

∑ _n

i = 1 (x i − x) ² − y ∑ n

i = 1 (x i − x)

∑ _n

i = 1 (x i − x) ²

=

∑ n

i = 1

x _i − x

∑ _n

i = 1 (x _i − x) ² y _i =

∑ n

i = 1

ω i y _i . (9)

(12)

In the third equality,

∑ n

i=1

(x _i − x) = 0 is utilized because of x = 1 n

∑ n

i=1

x _i . In the fourth equality, ω i is defined as: ω i = x _i − x

∑ _n

i = 1 (x _i − x) ² . ω i is nonstochastic because x _i is assumed to be nonstochastic.

ω i has the following properties:

∑ n

i = 1

ω i =

∑ n

i = 1

x _i − x

∑ _n

i = 1 (x i − x) ² =

∑ _n

i = 1 (x i − x)

∑ _n

i = 1 (x i − x) ² = 0 , (10)

∑ n

i = 1

ω i x i =

∑ n

i = 1

ω i (x i − x) =

∑ n

i=1 (x _i − x) ²

∑ n

i=1 (x _i − x) ² = 1 , (11)

∑ n

i = 1

ω ² i =

∑ n

i = 1

( x _i − x

∑ n

i = 1 (x _i − x) ² ) 2

=

∑ n

i = 1 (x _i − x) ² (∑ n

i = 1 (x _i − x) ² ) 2 = 1

∑ n

i = 1 (x _i − x) ² . (12)

The first equality of (11) comes from (10).

(13)

From now on, we focus only on ˆ β 2 , because usually β 2 is more important than β 1 in the regression model (1).

In order to obtain the properties of the least squares estimator ˆ β 2 , we rewrite (9) as:

β ˆ 2 =

∑ n

i = 1

ω i y i =

∑ n

i = 1

ω i ( β 1 + β 2 x i + u i )

= β 1

∑ n

i = 1

ω i + β 2

∑ n

i = 1

ω i x i +

∑ n

i = 1

ω i u i = β 2 +

∑ n

i = 1

ω i u i . (13)

In the fourth equality of (13), (10) and (11) are utilized.

(14)

[Review] Random Variables:

Let X ₁ , X ₂ , · · · , X _n be n random variavles, which are mutually independently and identically distributed.

mutually independent = ⇒ f (x i , x j ) = f i (x i ) f j (x j ) for i , j.

f (x _i , x _j ) denotes a joint distribution of X _i and X _j . f i (x) indicates a marginal distribution of X i . identical = ⇒ f _i (x) = f _j (x) for i , j.

[End of Review]

(15)

[Review] Mean and Variance:

Let X and Y be random variables (continuous type), which are independently distributed.

Definition and Formulas:

• E(g(X)) =

∫

g(x) f (x)dx for a function g( · ) and a density function f ( · ).

• V(X) = E((X − µ ) ² ) =

∫

(x − µ ) ² f (x)dx for µ = E(X).

• E(aX + b) = aE(X) + b and V(aX + b) = V(aX) = a ² V(X) for constant a and b.

• E(X ± Y ) = E(X) ± E(Y ) and V(X ± Y) = V(X) + V(Y ).

[End of Review]

(16)

Mean and Variance of ˆ β 2 : u ₁ , u ₂ , · · · , u _n are assumed to be mutually independently and identically distributed with mean zero and variance σ ² , but they are not necessarily normal.

Remember that we do not need normality assumption to obtain mean and variance but the normality assumption is required to test a hypothesis.

From (13), the expectation of ˆ β 2 is derived as follows:

E( ˆ β 2 ) = E( β 2 +

∑ n

i = 1

ω i u _i ) = β 2 + E(

∑ n

i = 1

ω i u _i ) = β 2 +

∑ n

i = 1

ω i E(u _i ) = β 2 . (14)

It is shown from (14) that the ordinary least squares estimator ˆ β 2 is an unbiased estimator of β 2 .

From (13), the variance of ˆ β 2 is computed as:

β = β +

∑ n

ω =

∑ n

ω =

∑ n

ω =

∑ n

ω ²

(17)

= σ ²

∑ n

i=1

ω ² i = σ ²

∑ n

i = 1 (x i − x) ² . (15)

The third equality holds because u ₁ , u ₂ , · · · , u _n are mutually independent.

The last equality comes from (12).

Thus, E( ˆ β 2 ) and V( ˆ β 2 ) are given by (14) and (15).

(18)

[Review] Three Good Properties on Estimator:

θ : Parameter

θ ˆ : Estimator of θ , i.e., ˆ θ = θ ˆ (X ₁ , X ₂ , · · · , X _n ),

where X ₁ , X ₂ , · · · , X _n are mutually independent random variables.

(*) Estimate of θ : ˆ θ = θ ˆ (x ₁ , x ₂ , · · · , x _n ), where x _i denotes the observed data of X _i .

• Unbiasedness ( 不偏性 ): E(ˆ θ ) = θ .

• E ﬃ ciency ( 有効性 ):

The minimum variance estimator within all the unbiased estimators.

(*) It is not easy to check e ﬃ ciency in general. Instead, consider the best linear unbiased estimator (BLUE, 最良線型不偏推定量 ).

• Consistency ( 一致性 ): ˆ θ −→ θ as n −→ ∞ . Note that ˆ θ depends on # of obs.

[End of Review]

(19)

Gauss-Markov Theorem ( ガウス・マルコフ定理 ): It has been discussed above that ˆ β 2 is represented as (9), which implies that ˆ β 2 is a linear estimator, i.e., linear in y _i .

In addition, (14) indicates that ˆ β 2 is an unbiased estimator.

Therefore, summarizing these two facts, it is shown that ˆ β 2 is a linear unbiased estimator ( 線形不偏推定量 ).

Furthermore, here we show that ˆ β 2 has minimum variance within a class of the linear unbiased estimators.

Consider the alternative linear unbiased estimator ˜ β 2 as follows:

β ˜ 2 =

∑ n

i = 1

c _i y _i =

∑ n

i = 1

( ω i + d _i )y _i ,

where c _i = ω i + d _i is defined and d _i is nonstochastic.

(20)

Then, ˜ β 2 is transformed into:

β ˜ 2 =

∑ n

i = 1

c i y i =

∑ n

i = 1

( ω i + d i )( β 1 + β 2 x i + u i )

= β 1

∑ n

i = 1

ω i + β 2

∑ n

i = 1

ω i x i +

∑ n

i = 1

ω i u i + β 1

∑ n

i = 1

d i + β 2

∑ n

i = 1

d i x i +

∑ n

i = 1

d i u i

= β 2 + β 1

∑ n

i = 1

d i + β 2

∑ n

i = 1

d i x i +

∑ n

i = 1

ω i u i +

∑ n

i = 1

d i u i . Equations (10) and (11) are used in the forth equality.

Taking the expectation on both sides of the above equation, we obtain:

E( ˜ β 2 ) = β 2 + β 1

∑ n

i = 1

d _i + β 2

∑ n

i = 1

d _i x _i +

∑ n

i = 1

ω i E(u _i ) +

∑ n

i = 1

d _i E(u _i )

= β 2 + β 1

∑ n

i = 1

d _i + β 2

∑ n

i = 1

d _i x _i .

(21)

Since ˜ β 2 is assumed to be unbiased, we need the following conditions:

∑ n

i = 1

d i = 0 ,

∑ n

i = 1

d i x i = 0 .

When these conditions hold, we can rewrite ˜ β 2 as:

β ˜ 2 = β 2 +

∑ n

i = 1

( ω i + d _i )u _i . The variance of ˜ β 2 is derived as:

V( ˜ β 2 ) = V ( β 2 +

∑ n

i = 1

( ω i + d i )u i

) = V ( ∑ ⁿ

i = 1

( ω i + d i )u i

) =

∑ n

i = 1

V (

( ω i + d i )u i

)

=

∑ n

i = 1

( ω i + d i ) ² V(u i ) = σ ² (

∑ n

i = 1

ω ² i + 2

∑ n

i = 1

ω i d i +

∑ n

i = 1

d ² _i )

= σ ² (

∑ n

i = 1

ω ² i +

∑ n

i = 1

d ² _i ) .

(22)

From unbiasedness of ˜ β 2 , using ∑ _n

i = 1 d _i = 0 and ∑ _n

i = 1 d _i x _i = 0, we obtain:

∑ n

i = 1

ω i d _i =

∑ _n

i = 1 (x i − x)d i

∑ _n

i = 1 (x _i − x) ² =

∑ _n

i = 1 x i d i − x ∑ _n

i = 1 d i

∑ _n

i = 1 (x _i − x) ² = 0 ,

which is utilized to obtain the variance of ˜ β 2 in the third line of the above equation.

From (15), the variance of ˆ β 2 is given by: V( ˆ β 2 ) = σ ² ∑ n i=1 ω ² _i . Therefore, we have:

V( ˜ β 2 ) ≥ V( ˆ β 2 ) , because of ∑ _n

i = 1 d ² _i ≥ 0.

When ∑ n

i = 1 d _i ² = 0, i.e., when d 1 = d 2 = · · · = d n = 0,

we have the equality: V( ˜ β 2 ) = V( ˆ β 2 ).

(23)

As shown above, the least squares estimator ˆ β 2 gives us the minimum variance lin-

ear unbiased estimator ( 最小分散線形不偏推定量 ), or equivalently the best linear

unbiased estimator ( 最良線形不偏推定量， BLUE), which is called the Gauss-

Markov theorem ( ガウス・マルコフ定理 ).

(24)

Asymptotic Properties (

ぜん

漸

^きん

近的性質 ) of ˆ β 2 : We assume that as n goes to infinity we have the following:

1 n

∑ n

i=1

(x _i − x) ² −→ m < ∞, where m is a constant value. From (12), we obtain:

n

∑ n

i = 1

ω ² _i = 1

(1 / n) ∑ _n

i = 1 (x i − x) −→ 1 m .

Note that f (x n ) −→ f (m) when x n −→ m, called Slutsky’s theorem ( スルツキー定理 ), where m is a constant value and f ( · ) is a function.

We show both consistency ( 一致性 ) of ˆ β 2 and asymptotic normality ( 漸近正規性 ) of √

n( ˆ β 2 − β 2 ).

(25)

● First, we prove that ˆ β 2 is a consistent estimator of β 2 .

[Review] Chebyshev’s inequality ( チェビシェフの不等式 ) is given by:

P( | X − µ| > ) ≤ σ ²

² , where µ = E(X), σ ² = V(X) and any > 0.

[End of Review]

Replace X, E(X) and V(X) by:

β ˆ 2 , E( ˆ β 2 ) = β 2 , and V( ˆ β 2 ) = σ ²

∑ n

i = 1

ω ² _i = ∑ _n σ ²

i = 1 (x i − x) . Then, when n −→ ∞ , we obtain the following result:

P( | β ˆ 2 − β 2 | > ) ≤ σ ² ∑ n i=1 ω ² _i

² = σ ² n ∑ n i=1 ω ² _i

n ² −→ 0 , where ∑ _n

i = 1 ω ² _i −→ 0 because n ∑ _n

i = 1 ω ² _i −→ 1

m from the assumption.

Thus, we obtain the result that ˆ β 2 −→ β 2 as n −→ ∞ .

Therefore, we can conclude that ˆ β 2 is a consistent estimator ( 一致推定量 ) of β 2 .

(26)

● Next, we want to show that √

n( ˆ β 2 − β 2 ) is asymptotically normal.

[Review] The Central Limit Theorem ( 中心極限定理 , CLT) is: for random variables X ₁ , X ₂ , · · · , X _n ,

X − E(X)

√ V(X)

=

∑ n

i = 1 X _i − E( ∑ n i = 1 X _i )

√ V( ∑ _n

i = 1 X _i ) −→ N(0 , 1) , as n −→ ∞, where X = 1

n

∑ n

i = 1

X i .

X ₁ , X ₂ , · · · , X _n are not necesarily iid, if V(X) is finite as n goes to infinity.

[End of Review]

(27)

Note that ˆ β 2 = β 2 + ∑ _n

i = 1 ω i u _i as in (13), and X _i is replaced by ω i u _i . From the central limit theorem, asymptotic normality is shown as follows:

∑ _n

i = 1 ω i u _i − E( ∑ _n

i = 1 ω i u _i )

√ V( ∑ n

i = 1 ω i u _i ) =

∑ _n

i = 1 ω i u _i σ √∑ n

i = 1 ω ² _i = β ˆ 2 − β 2

σ/ √∑ n

i=1 (x _i − x) ² −→ N (0 , 1) , where

• E( ∑ n

i = 1 ω i u _i ) = 0,

• V( ∑ _n

i = 1 ω i u _i ) = σ ² ∑ _n

i = 1 ω ² _i , and

• ∑ n

i = 1 ω i u _i = β ˆ 2 − β 2

are substituted in the first and second equalities.

(28)

Moreover, we can rewrite as follows:

β ˆ 2 − β 2

σ/ √∑ n

i = 1 (x i − x) ² =

√ n( ˆ β 2 − β 2 ) σ/ √

(1 / n) ∑ n

i = 1 (x i − x) ² . Replacing (1 / n) ∑ n

i = 1 (x i − x) ² by its converged value m, we have:

√ n( ˆ β 2 − β 2 ) σ/ √

m −→ N(0 , 1) , which implies

√ n( ˆ β 2 − β 2 ) −→ N(0 , σ ² m ) . Thus, the asymptotic normality of √

n( ˆ β 2 − β 2 ) is shown.

(29)

Finally, replacing σ ² by its consistent estimator s ² , it is known as follows:

β ˆ 2 − β 2

s / √∑ n

i = 1 (x i − x) ² −→ N(0 , 1) , (16)

where s ² is defined as:

s ² = 1 n − 2

∑ n

i = 1

e ² _i = 1 n − 2

∑ n

i = 1

(y _i − β ˆ 1 − β ˆ 2 x _i ) ² , (17) which is a consistent and unbiased estimator of σ ² . −→ Proved later.

Thus, using (16), in large sample we can construct the confidence interval and test

the hypothesis.

(30)

[Review] Confidence Interval ( 信頼区間，区間推定 )):

Suppose X ₁ , X ₂ , · · · , X _n are iid with mean µ and variance σ ² . −→ No N assumption From CLT, X − E(X)

√ V(X)

= X − µ σ/ √

n −→ N(0 , 1).

Replacing σ ² by S ² = 1 n − 1

∑ n

i = 1

(X _i − X) ² , we have: X − µ S / √

n −→ N(0 , 1).

That is, for large n, P (

− 1 . 96 < X − µ S / √

n < 1 . 96 )

= 0 . 95, i.e., P (

X − 1 . 96 S

√ n < µ < X + 1 . 96 S

√ n

) = 0 . 95.

Note that 1.96 is obtained from the normal distribution table.

Then, replacing the estimators X and S ² by the estimates x and s ² , we obtain the 95%

confidence interval of µ as follows:

(x − 1 . 96 s

√ n , x + 1 . 96 s

√ n ) .

(31)

Going back to OLS, we have:

β ˆ 2 − β 2

s / √∑ n

i = 1 (x _i − x) ² −→ N(0 , 1) . Therefore,

P (

− 2 . 576 < β ˆ 2 − β 2

s / √∑ _n

i = 1 (x _i − x) ² < 2 . 576 )

= 0 . 99 , i.e.,

P (

β ˆ 2 − 2 . 576 s

√∑ n

i = 1 (x i − x) ² < β 2 < β ˆ 2 + 2 . 576 s

√∑ n

i = 1 (x i − x) ²

) = 0 . 99 .

Note that 2.576 is 0.005 value of N(0 , 1), which comes from the statistical table.

Thus, the 99% confidence interval of β 2 is:

( β ˆ 2 − 2 . 576 s

√∑ n

i = 1 (x i − x) ² , β ˆ 2 + 2 . 576 s

√∑ n

i = 1 (x i − x) ² ) ,

where ˆ β 2 and s ² should be replaced by the observed data.

(32)

[Review] Testing the Hypothesis ( 仮説検定 ):

Suppose that X ₁ , X ₂ , · · · , X _n are iid with mean µ and variance σ ² . From CLT, X − µ

S / √

n −→ N(0 , 1), where S ² = 1 n − 1

∑ n

i = 1

(X _i − X) ² , which is known as the unbiased estimator of σ ² .

• The null hypothesis H 0 : µ = µ 0 , where µ 0 is a fixed number.

• The alternative hypothesis H ₁ : µ , µ 0

Under the null hypothesis, in large sample we have the following disribution:

X − µ 0

S / √

n ∼ N(0 , 1) . Replacing X and S ² by x and s ² , compare x − µ 0

s / √

n and N(0 , 1).

H 0 is rejected at significance level 0.05 when x − µ 0

s / √ n

> 1 . 96.

(33)

In the case of OLS, the hypotheses are as follows:

• The null hypothesis H 0 : β 2 = β ^∗ ₂

• The alternative hypothesis H ₁ : β 2 , β ^∗ ₂ Under H 0 , in large sample,

β ˆ 2 − β ^∗ ₂ s / √∑ _n

i=1 (x _i − x) ² ∼ N(0 , 1) . Replacing ˆ β 2 and s ² by the observed data, compare

β ˆ 2 − β ^∗ ₂ s / √∑ n

i = 1 (x _i − x) ² and N(0 , 1).

H ₀ is rejected at significance level 0.05 when β ˆ 2 − β ^∗ ₂ s / √∑ n

i = 1 (x _i − x) ²

> 1 . 96.

(34)

Exact Distribution of ˆ β 2 : We have shown asymptotic normality of √

n( ˆ β 2 − β 2 ), which is one of the large sample properties.

Now, we discuss the small sample properties of ˆ β 2 .

In order to obtain the distribution of ˆ β 2 in small sample, the distribution of the error term has to be assumed.

Therefore, the extra assumption is that u _i ∼ N(0 , σ ² ).

Writing (13), again, ˆ β 2 is represented as:

β ˆ 2 = β 2 +

∑ n

i = 1

ω i u i .

First, we obtain the distribution of the second term in the above equation.

(35)

[Review] Content of Special Lectures in Economics (Statistical Analysis) Note that the moment-generating function ( 積率母関数 , MGF) is given by M( θ ) ≡ E(exp( θ X)) = exp( µθ + ¹ ₂ σ ² θ ² ) when X ∼ N( µ, σ ² ).

X ₁ , X ₂ , · · · , X _n are mutually independently distributed as X _i ∼ N( µ i , σ ² _i ) for i = 1 , 2 , · · · , n.

MGF of X _i is M _i ( θ ) ≡ E(exp( θ X _i )) = exp( µ i θ + ¹ ₂ σ ² _i θ ² ).

Consider the distribution of Y = ∑ _n

i = 1 (a _i + b _i X _i ), where a _i and b _i are constant.

M y ( θ ) ≡ E(exp( θ Y)) = E(exp( θ ∑ n

i = 1 (a i + b i X i )))

= ∏ _n

i = 1 exp( θ a _i )E(exp( θ b _i X _i )) = ∏ _n

i = 1 exp( θ a _i )M _i ( θ b _i )

= ∏ n

i = 1 exp( θ a i ) exp( µ i θ b i + ¹ ₂ σ ² _i ( θ b i ) ² ) = exp( θ ∑ n

i = 1 (a i + b i µ i ) + ¹ ₂ θ ² ∑ n

i = 1 b ² _i σ ² _i ), which implies that Y ∼ N( ∑ _n

i = 1 (a _i + b _i µ i ) , ∑ _n

i = 1 b ² _i σ ² _i ).

[End of Review]

(36)

Substitute a _i = 0, µ i = 0, b _i = ω i and σ ² _i = σ ² . Then, using the moment-generating function, ∑ n

i = 1 ω i u i is distributed as:

∑ n

i = 1

ω i u i ∼ N(0 , σ ²

∑ n

i = 1

ω ² i ) .

Therefore, ˆ β 2 is distributed as:

β ˆ 2 = β 2 +

∑ n

i = 1

ω i u _i ∼ N( β 2 , σ ²

∑ n

i = 1

ω ² i ) , or equivalently,

β ˆ 2 − β 2

σ √∑ n

i = 1 ω ² _i = β ˆ 2 − β 2

σ/ √∑ n

i = 1 (x _i − x) ² ∼ N(0 , 1) ,

for any n.

(37)

[Review 1] t Distribution:

Z ∼ N(0 , 1), V ∼ χ ² (k), and Z is independent of V . Then, Z

√ V / k ∼ t(k).

[End of Review 1]

[Review 2] t Distribution:

Suppose that X ₁ , X ₂ . · · · , X _n are mutually independently, identically and normally distributed with mean µ and variance σ ² .

X ∼ N( µ, σ ² / n), i.e., X − µ σ/ √

n ∼ N(0 , 1).

Define S ² = 1 n − 1

∑ n

i = 1

(X _i − X) ² , which is an unbiased estimator of σ ² . It is known that (n − 1)S ²

σ ² ∼ χ ² (n − 1) and X is independesnt of S ² . (The proof is

skipped.)

(38)

Then, we obtain

X − µ σ/ √

v n

u u

u t (n − 1)S ² σ ²

/ (n − 1)

= X − µ S / √

n ∼ t(n − 1).

As a result, replacing σ ² by S ² , X − µ S / √

n ∼ t(n − 1).

[End of Review 2]

(39)

Back to OLS:

Replacing σ ² by its estimator s ² defined in (17), it is known that we have:

β ˆ 2 − β 2

s / √∑ n

i=1 (x _i − x) ² ∼ t(n − 2) ,

where t(n − 2) denotes t distribution with n − 2 degrees of freedom.

Thus, under normality assumption on the error term u _i , the t(n − 2) distribution is used for the confidence interval and the testing hypothesis in small sample.

Or, taking the square on both sides,

( β ˆ 2 − β 2

s / √∑ n

i = 1 (x i − x) ² ) 2

∼ F(1 , n − 2) , which will be proved later.

Before going to multiple regression model ( 重回帰モデル ),

(40)

2 Some Formulas of Matrix Algebra

1. Let A =

 





a 11 a 12 · · · a 1k

a ₂₁ a ₂₂ · · · a _2k ... ... ... ...

a _l1 a _l2 · · · a _lk

 



 = [a i j ],

which is a l × k matrix, where a _{i j} denotes ith row and jth column of A.

The transposed matrix ( 転置行列 ) of A, denoted by A ⁰ , is defined as:

A ⁰ =

 





a ₁₁ a ₂₁ · · · a _l1 a 12 a 22 · · · a l2

... ... ... ...

a 1k a 2k · · · a lk

 



 = [a _ji ],

where the ith row of A ⁰ is the ith column of A.

(41)

2. (Ax) ⁰ = x ⁰ A ⁰ ,

where A and x are a l × k matrix and a k × 1 vector, respectively.

3. a ⁰ = a,

where a denotes a scalar.

4. ∂ a ⁰ x

∂ x = a,

where a and x are k × 1 vectors.

5. ∂ x ⁰ Ax

∂ x = (A + A ⁰ )x,

where A and x are a k × k matrix and a k × 1 vector, respectively.

Especially, when A is symmetric,

∂ x ⁰ Ax

∂ x = 2Ax.

(42)

6. Let A and B be k × k matrices, and I _k be a k × k identity matrix ( 単位行列 ) (one in the diagonal elements and zero in the other elements).

When AB = I _k , B is called the inverse matrix ( 逆行列 ) of A, denoted by B = A ⁻ ¹ .

That is, AA ⁻ ¹ = A ⁻ ¹ A = I _k .

7. Let A be a k × k matrix and x be a k × 1 vector.

If A is a positive definite matrix ( 正値定符号行列 ), for any x except for x = 0 we have:

x ⁰ Ax > 0 .

If A is a positive semidefinite matrix ( 非負値定符号行列 ), for any x except

for x = 0 we have:

(43)

If A is a negative definite matrix ( 負値定符号行列 ), for any x except for x = 0 we have:

x ⁰ Ax < 0 .

If A is a negative semidefinite matrix ( 非正値定符号行列 ), for any x except for x = 0 we have:

x ⁰ Ax ≤ 0 .

Trace, Rank and etc.: A : k × k, B : n × k, C : k × n.

1. The trace ( トレース ) of A is: tr(A) =

∑ k

i = 1

a ii , where A = [a i j ] .

2. The rank ( ランク，階数 ) of A is the maximum number of linearly independent

column (or row) vectors of A, which is denoted by rank(A).

(44)

3. If A is an idempotent matrix ( べき等行列 ), A = A ² . 4. If A is an idempotent and symmetric matrix, A = A ² = A ⁰ A .

5. A is idempotent if and only if the eigen values of A consist of 1 and 0.

6. If A is idempotent, rank(A) = tr(A) . 7. tr(BC ) = tr(CB)

Distributions in Matrix Form:

1. Let X, µ and Σ be k × 1, k × 1 and k × k matrices.

When X ∼ N( µ, Σ ), the density function of X is given by:

f (x) = 1

exp (

− 1

(x − µ ) ⁰ Σ ⁻ ¹ (x − µ ) )

.

(45)

E(X) = µ and V(X) = E (

(X − µ )(X − µ ) ⁰ )

= Σ The moment-generating function: φ ( θ ) = E (

exp( θ ⁰ X) )

= exp( θ ⁰ µ + ¹ ₂ θ ⁰ Σθ ) (*) In the univariate case, when X ∼ N( µ, σ ² ), the density function of X is:

f (x) = 1

(2 πσ ² ) ¹ ^/ ² exp (

− 1

2 σ ² (x − µ ) ² ) .

2. If X ∼ N( µ, Σ ), then (X − µ ) ⁰ Σ ⁻ ¹ (X − µ ) ∼ χ ² (k).

Note that X ⁰ X ∼ χ ² (k) when X ∼ N(0 , I _k ).

3. X: n × 1, Y: m × 1, X ∼ N( µ x , Σ x ), Y ∼ N( µ y , Σ y ) X is independent of Y, i.e., E (

(X − µ x )(Y − µ y ) ⁰ )

= 0 in the case of normal random variables.

(X − µ x ) ⁰ Σ ⁻¹ _x (X − µ x ) / n

(Y − µ y ) ⁰ Σ ⁻ y ¹ (Y − µ y ) / m ∼ F(n , m)

(46)

4. If X ∼ N(0 , σ ² I _n ) and A is a symmetric idempotent n × n matrix of rank G, then X ⁰ AX /σ ² ∼ χ ² (G).

Note that X ⁰ AX = (AX) ⁰ (AX) and rank(A) = tr(A) because A is idempotent.

5. If X ∼ N(0 , σ ² I _n ), A and B are symmetric idempotent n × n matrices of rank G and K, and AB = 0, then

X ⁰ AX G σ ²

/ X ⁰ BX

K σ ² = X ⁰ AX / G

X ⁰ BX / K ∼ F(G , K) .

(47)

3 Multiple Regression Model ( ^{重回帰モデル} )

Up to now, only one independent variable, i.e., x _i , is taken into the regression model.

We extend it to more independent variables, which is called the multiple regression model ( 重回帰モデル ).

We consider the following regression model:

y _i = β 1 x _i _, ₁ + β 2 x _i _, ₂ + · · · + β k x _i _, _k + u _i = (x _i _, ₁ , x _i _, ₂ , · · · , x _i _, _k )

 





β 1

β 2

...

β k

 



 + u _i = x _i β + u _i ,

for i = 1 , 2 , · · · , n, where x i and β denote a 1 × k vector of the independent variables

(48)

and a k × 1 vector of the unknown parameters to be estimated, which are given by:

x i = (x i , 1 , x i , 2 , · · · , x i , k ) , β =

 





β 1

β 2

...

β k

 



 . x _i _, _j denotes the ith observation of the jth independent variable.

The case of k = 2 and x _i _, ₁ = 1 for all i is exactly equivalent to (1).

Therefore, the matrix form above is a generalization of (1).

Writing all the equations for i = 1 , 2 , · · · , n, we have:

y ₁ = β 1 x ₁ _, ₁ + β 2 x ₁ _, ₂ + · · · + β k x ₁ _, _k + u ₁ = x ₁ β + u ₁ , y ₂ = β 1 x ₂ _, ₁ + β 2 x ₂ _, ₂ + · · · + β k x ₂ _, _k + u ₂ = x ₂ β + u ₂ ,

...

(49)

which is rewritten as:

 





y ₁ y ₂ ...

y _n

 



 =

 





x _1,1 x _1,2 · · · x _1,k x ₂ _, ₁ x ₂ _, ₂ · · · x ₂ _, _k ... ... ... ...

x _n _, ₁ x _n _, ₂ · · · x _n _, _k

 





 





β 1

β 2

...

β k

 



 +

 





u ₁ u ₂ ...

u _n

 





=

 





x 1

x ₂ ...

x _n

 



 β +

 





u 1

u ₂ ...

u _n

 



 .

Again, the above equation is compactly rewritten as:

y = X β + u , (18)

(50)

where y, X and u are denoted by:

y =

 





y 1

y ₂ ...

y _n

 



 , X =

 





x 1 , 1 x 1 , 2 · · · x 1 , k

x ₂ _, ₁ x ₂ _, ₂ · · · x ₂ _, _k ... ... ... ...

x _n _, ₁ x _n _, ₂ · · · x _n _, _k

 



 =

 





x 1

x ₂ ...

x _n

 



 , u =

 





u 1

u ₂ ...

u _n

 



 .

Utilizing the matrix form (18), we derive the ordinary least squares estimator of β , denoted by ˆ β .

In (18), replacing β by ˆ β , we have the following equation:

y = X β ˆ + e , where e denotes a n × 1 vector of the residuals.

The ith element of e is given by e _i .

(51)

The sum of squared residuals is written as follows:

S ( ˆ β ) =

∑ n

i = 1

e ² _i = e ⁰ e = (y − X β ˆ ) ⁰ (y − X β ˆ ) = (y ⁰ − β ˆ ⁰ X ⁰ )(y − X β ˆ )

= y ⁰ y − y ⁰ X β ˆ − β ˆ ⁰ X ⁰ y + β ˆ ⁰ X ⁰ X β ˆ = y ⁰ y − 2y ⁰ X β ˆ + β ˆ ⁰ X ⁰ X β. ˆ In the last equality, note that ˆ β ⁰ X ⁰ y = y ⁰ X β ˆ because both are scalars.

To minimize S ( ˆ β ) with respect to ˆ β , we set the first derivative of S ( ˆ β ) equal to zero, i.e.,

∂ S ( ˆ β )

∂ β ˆ = − 2X ⁰ y + 2X ⁰ X β ˆ = 0 .

Solving the equation above with respect to ˆ β , the ordinary least squares estimator (OLS, 最小自乗推定量 ) of β is given by:

β ˆ = (X ⁰ X) ⁻¹ X ⁰ y . (19)

Thus, the ordinary least squares estimator is derived in the matrix form.

(52)

(*) Remark

The second order condition for minimization:

∂ ² S ( ˆ β )

∂ β∂ ˆ β ˆ ⁰ = 2X ⁰ X is a positive definite matrix.

Set c = Xd.

For any d , 0, we have c ⁰ c = d ⁰ X ⁰ Xd > 0.

(53)

Now, in order to obtain the properties of ˆ β such as mean, variance, distribution and so on, (19) is rewritten as follows:

β ˆ = (X ⁰ X) ⁻ ¹ X ⁰ y = (X ⁰ X) ⁻ ¹ X ⁰ (X β + u) = (X ⁰ X) ⁻ ¹ X ⁰ X β + (X ⁰ X) ⁻ ¹ X ⁰ u

= β + (X ⁰ X) ⁻ ¹ X ⁰ u . (20)

Taking the expectation on both sides of (20), we have the following:

E( ˆ β ) = E( β + (X ⁰ X) ⁻ ¹ X ⁰ u) = β + (X ⁰ X) ⁻ ¹ X ⁰ E(u) = β, because of E(u) = 0 by the assumption of the error term u _i .

Thus, unbiasedness of ˆ β is shown.

(54)

The variance of ˆ β is obtained as:

V( ˆ β ) = E(( ˆ β − β )( ˆ β − β ) ⁰ ) = E (

(X ⁰ X) ⁻ ¹ X ⁰ u((X ⁰ X) ⁻ ¹ X ⁰ u) ⁰ )

= E((X ⁰ X) ⁻ ¹ X ⁰ uu ⁰ X(X ⁰ X) ⁻ ¹ ) = (X ⁰ X) ⁻ ¹ X ⁰ E(uu ⁰ )X(X ⁰ X) ⁻ ¹

= σ ² (X ⁰ X) ⁻ ¹ X ⁰ X(X ⁰ X) ⁻ ¹ = σ ² (X ⁰ X) ⁻ ¹ .

The first equality is the definition of variance in the case of vector.

In the fifth equality, E(uu ⁰ ) = σ ² I n is used, which implies that E(u ² _i ) = σ ² for all i and E(u _i u _j ) = 0 for i , j.

Remember that u 1 , u 2 , · · · , u n are assumed to be mutually independently and identi-

cally distributed with mean zero and variance σ ² .

(55)

Under normality assumption on the error term u, it is known that the distribution of β ˆ is given by:

β ˆ ∼ N( β, σ ² (X ⁰ X) ⁻ ¹ ) . Proof:

First, when X ∼ N( µ, Σ ), the moment-generating function, i.e., φ ( θ ), is given by:

φ ( θ ) ≡ E (

exp( θ ⁰ X) )

= exp (

θ ⁰ µ + 1 2 θ ⁰ Σθ ) θ u : n × 1, u: n × 1, θ _β : k × 1, β ˆ : k × 1

The moment-generating function of u, i.e., φ u ( θ u ), is:

φ u ( θ u ) ≡ E (

exp( θ ⁰ u u) )

= exp ( σ ² 2 θ ⁰ u θ u

) ,

which is N(0 , σ ² I _n ).

(56)

The moment-generating function of ˆ β , i.e., φ _β ( θ _β ), is:

φ β ( θ β ) ≡ E (

exp( θ ⁰ _β β ˆ ) )

= E (

exp( θ _β ⁰ β + θ _β ⁰ (X ⁰ X) ⁻ ¹ X ⁰ u) )

= exp( θ ⁰ _β β )E (

exp( θ ⁰ _β (X ⁰ X) ⁻ ¹ X ⁰ u) )

= exp( θ ⁰ _β β ) φ u

( θ ⁰ _β (X ⁰ X) ⁻ ¹ X ⁰ )

= exp( θ ⁰ _β β ) exp ( σ ²

2 θ ⁰ _β (X ⁰ X) ⁻ ¹ θ β

) = exp (

θ _β ⁰ β + σ ²

2 θ ⁰ _β (X ⁰ X) ⁻ ¹ θ β

) ,

which is equivalent to the normal distribution with mean β and variance σ ² (X ⁰ X) ⁻ ¹ .

Note that θ u = X(X ⁰ X) ⁻ ¹ θ β . QED

(57)

Taking the jth element of ˆ β , its distribution is given by:

β ˆ j ∼ N( β j , σ ² a _{j j} ) , i.e., β ˆ j − β j

σ √ a _{j j} ∼ N(0 , 1) , where a _{j j} denotes the jth diagonal element of (X ⁰ X) ⁻ ¹ .

Replacing σ ² by its estimator s ² , we have the following t distribution:

β ˆ j − β j

s √ a _{j j} ∼ t(n − k) ,

where t(n − k) denotes the t distribution with n − k degrees of freedom.

(58)

[Review] Trace ( トレース ):

1. A: n × n, tr(A) = ∑ n

i = 1 a ii , where a i j denotes an element in the ith row and the jth column of a matrix A.

• The class of Special Lectures in Economics (Statistical Analysis), 経済学特論

Econometrics I

(Thur., 8:50-10:20)

Room # 4 ( 法経講義棟 )

• The prerequisite of this class is Basic Statistics ( 統計基礎 ) (by Prof. Fukushige, Tue., 16:20-17:50, this semester) and Econometrics ( エコノメトリックス ) (under- graduate level, next semester, 『計量経済学』山本 拓 著，新世社 ).