• 検索結果がありません。

• The class of Special Lectures in Economics (Statistical Analysis), 経済学特論

N/A
N/A
Protected

Academic year: 2021

シェア "• The class of Special Lectures in Economics (Statistical Analysis), 経済学特論"

Copied!
235
0
0

読み込み中.... (全文を見る)

全文

(1)

Econometrics I

(Thur., 8:50-10:20)

Room # 4 ( 法経講義棟 )

• The prerequisite of this class is Basic Statistics ( 統計基礎 ) (by Prof. Fukushige, Tue., 16:20-17:50, this semester) and Econometrics ( エコノメトリックス ) (under- graduate level, next semester, 『計量経済学』山本 拓 著,新世社 ).

• The class of Special Lectures in Economics (Statistical Analysis), 経済学特論

(統計解析) (by Prof. Oya, Wed., 10:30-12:00, this semester) should be registered.

(2)

TA Session (by Mr. Yonekura and Mr.

Sakamoto):

Tue., 14:40 - 16:10

Room # 505 ( 法経大学院総合研究棟 )

Content: Basic Statistics, Matrix Algebra, and etc.

(3)

1 Regression Analysis ( 回帰分析 )

1.1 Setup of the Model

When (x 1 , y 1 ), ( x 2 , y 2 ), · · · , ( x n , y n ) are available, suppose that there is a linear rela- tionship between y and x, i.e.,

y i = β 1 + β 2 x i + u i , (1) for i = 1 , 2 , · · · , n. x i and y i denote the ith observations.

−→ Single (or simple) regression model ( 単回帰モデル )

y i is called the dependent variable ( 従属変数 ) or the explained variable ( 被説明変

数 ), while x i is known as the independent variable ( 独立変数 ) or the explanatory

(or explaining) variable ( 説明変数 ).

(4)

β 1 = Intercept ( 切片 ), β 2 = Slope ( 傾き )

β 1 and β 2 are unknown parameters ( パラメータ,母数 ) to be estimated.

β 1 and β 2 are called the regression coe ffi cients ( 回帰係数 ).

u i is the unobserved error term ( 誤差項 ) assumed to be a random variable with mean zero and variance σ 2 .

σ 2 is also a parameter to be estimated.

x i is assumed to be nonstochastic ( 非確率的 ), but y i is stochastic ( 確率的 ) because y i depends on the error u i .

The error terms u 1 , u 2 , · · · , u n are assumed to be mutually independently and identi-

cally distributed, which is called iid. −→ discussed later.

(5)

Taking the expectation on both sides of (1), the expectation of y i is represented as:

E(y i ) = E( β 1 + β 2 x i + u i ) = β 1 + β 2 x i + E(u i )

= β 1 + β 2 x i , (2)

for i = 1 , 2 , · · · , n. Using E(y i ) we can rewrite (1) as y i = E(y i ) + u i . (2) represents the true regression line.

Let ˆ β 1 and ˆ β 2 be estimates of β 1 and β 2 .

Replacing β 1 and β 2 by ˆ β 1 and ˆ β 2 , (1) turns out to be:

y i = β ˆ 1 + β ˆ 2 x i + e i , (3) for i = 1 , 2 , · · · , n, where e i is called the residual ( 残差 ).

The residual e i is taken as the experimental value (or realization) of u i .

(6)

We define ˆ y i as follows:

ˆ

y i = β ˆ 1 + β ˆ 2 x i , (4) for i = 1 , 2 , · · · , n, which is interpreted as the predicted value ( 予測値 ) of y i .

(4) indicates the estimated regression line, which is di ff erent from (2).

Moreover, using ˆ y i we can rewrite (3) as y i = y ˆ i + e i . (2) and (4) are displayed in Figure 1.

Consider the case of n = 6 for simplicity. × indicates the observed data series.

The true regression line (2) is represented by the solid line, while the estimated re-

gression line (4) is drawn with the dotted line.

(7)

Figure 1. True and Estimated Regression Lines ( 回帰直線 )

y

x

XXXXX XX z Distributions

of the Errors

×

...

...

×

............

...

...

...

× 

 

 

 



Error u

i

Residual e

i

(x

i

, y

i

)

×

×

×

@ @ I ˆ

y

i

= β ˆ

1

+ β ˆ

2

x

i

(Estimated Regression Line)

@ @ I

E(y

i

) = β

1

+ β

2

x

i

(True Regression Line)

In the next section, we consider how to obtain the estimates of β 1 and β 2 , i.e., ˆ β 1 and

β ˆ 2 .

(8)

1.2 Ordinary Least Squares Estimation

Suppose that (x 1 , y 1 ), (x 2 , y 2 ), · · · , (x n , y n ) are available.

For the regression model (1), we consider estimating β 1 and β 2 .

Replacing β 1 and β 2 by their estimates ˆ β 1 and ˆ β 2 , remember that the residual e i is given by:

e i = y iy ˆ i = y i − β ˆ 1 − β ˆ 2 x i . The sum of squared residuals is defined as follows:

S ( ˆ β 1 , β ˆ 2 ) =

n

i=1

e 2 i =

n

i=1

(y i − β ˆ 1 − β ˆ 2 x i ) 2 .

It might be plausible to choose the ˆ β 1 and ˆ β 2 which minimize the sum of squared residuals, i.e., S ( ˆ β 1 , β ˆ 2 ).

最小二乗法,

(9)

To minimize S ( ˆ β 1 , β ˆ 2 ) with respect to ˆ β 1 and ˆ β 2 , we set the partial derivatives equal to zero:

S ( ˆ β 1 , β ˆ 2 )

∂ β ˆ 1

= − 2

n

i=1

(y i − β ˆ 1 − β ˆ 2 x i ) = 0 ,

S ( ˆ β 1 , β ˆ 2 )

∂ β ˆ 2

= − 2

n

i=1

x i (y i − β ˆ 1 − β ˆ 2 x i ) = 0 . The second order condition for minimization is:

( ∂

2

S( ˆ β

1

, β ˆ

2

)

∂ β ˆ

21

2

S ( ˆ β

1

, β ˆ

2

)

∂ β ˆ

1

∂ β ˆ

2

2

S( ˆ β

1

, β ˆ

2

)

∂ β ˆ

2

∂ β ˆ

1

2

S ( ˆ β

1

, β ˆ

2

)

∂ β ˆ

22

)

=

( 2n 2 ∑ n

i = 1 x i 2 ∑ n

i = 1 x i 2 ∑ n i = 1 x 2 i

)

should be a positive definite matrix.

The diagonal elements 2n and 2 ∑ n

i = 1 x 2 i are positive.

The determinant:

2n 2 ∑ n

i = 1 x i 2 ∑ n

i = 1 x i 2 ∑ n i = 1 x 2 i

= 4n

n

i = 1

x 2 i − 4(

n

i = 1

x i ) 2 = 4n

n

i = 1

(x ix) 2

(10)

is positive. = ⇒ The second-order condition is satisfied.

The first two equations yield the following two equations:

y = β ˆ 1 + β ˆ 2 x , (5)

n

i = 1

x i y i = nx β ˆ 1 + β ˆ 2

n

i = 1

x 2 i , (6)

where y = 1 n

n

i = 1

y i and x = 1 n

n

i = 1

x i .

Multiplying (5) by nx and subtracting (6), we can derive ˆ β 2 as follows:

β ˆ 2 =

n

i = 1 x i y inxy

n

i=1 x 2 inx 2 =

n

i = 1 (x ix)(y iy)

n

i=1 (x ix) 2 . (7)

From (5), ˆ β 1 is directly obtained as follows:

(11)

When the observed values are taken for y i and x i for i = 1 , 2 , · · · , n, we say that ˆ β 1

and ˆ β 2 are called the ordinary least squares estimates (or simply the least squares estimates, 最小二乗推定値 ) of β 1 and β 2 .

When y i for i = 1 , 2 , · · · , n are regarded as the random sample, we say that ˆ β 1 and ˆ β 2

are called the ordinary least squares estimators (or the least squares estimators, 最小二乗推定量 ) of β 1 and β 2 .

1.3 Properties of Least Squares Estimator

Equation (7) is rewritten as:

β ˆ 2 =

n

i = 1 (x ix)(y iy)

n

i = 1 (x ix) 2 =

n

i = 1 (x ix)y i

n

i = 1 (x ix) 2yn

i = 1 (x ix)

n

i = 1 (x ix) 2

=

n

i = 1

x ix

n

i = 1 (x ix) 2 y i =

n

i = 1

ω i y i . (9)

(12)

In the third equality,

n

i=1

(x ix) = 0 is utilized because of x = 1 n

n

i=1

x i . In the fourth equality, ω i is defined as: ω i = x ix

n

i = 1 (x ix) 2 . ω i is nonstochastic because x i is assumed to be nonstochastic.

ω i has the following properties:

n

i = 1

ω i =

n

i = 1

x ix

n

i = 1 (x ix) 2 =

n

i = 1 (x ix)

n

i = 1 (x ix) 2 = 0 , (10)

n

i = 1

ω i x i =

n

i = 1

ω i (x ix) =

n

i=1 (x ix) 2

n

i=1 (x ix) 2 = 1 , (11)

n

i = 1

ω 2 i =

n

i = 1

( x ix

n

i = 1 (x ix) 2 ) 2

=

n

i = 1 (x ix) 2 (∑ n

i = 1 (x ix) 2 ) 2 = 1

n

i = 1 (x ix) 2 . (12)

The first equality of (11) comes from (10).

(13)

From now on, we focus only on ˆ β 2 , because usually β 2 is more important than β 1 in the regression model (1).

In order to obtain the properties of the least squares estimator ˆ β 2 , we rewrite (9) as:

β ˆ 2 =

n

i = 1

ω i y i =

n

i = 1

ω i ( β 1 + β 2 x i + u i )

= β 1

n

i = 1

ω i + β 2

n

i = 1

ω i x i +

n

i = 1

ω i u i = β 2 +

n

i = 1

ω i u i . (13)

In the fourth equality of (13), (10) and (11) are utilized.

(14)

[Review] Random Variables:

Let X 1 , X 2 , · · · , X n be n random variavles, which are mutually independently and identically distributed.

mutually independent = ⇒ f (x i , x j ) = f i (x i ) f j (x j ) for i , j.

f (x i , x j ) denotes a joint distribution of X i and X j . f i (x) indicates a marginal distribution of X i . identical = ⇒ f i (x) = f j (x) for i , j.

[End of Review]

(15)

[Review] Mean and Variance:

Let X and Y be random variables (continuous type), which are independently dis- tributed.

Definition and Formulas:

• E(g(X)) =

g(x) f (x)dx for a function g( · ) and a density function f ( · ).

• V(X) = E((X − µ ) 2 ) =

(x − µ ) 2 f (x)dx for µ = E(X).

• E(aX + b) = aE(X) + b and V(aX + b) = V(aX) = a 2 V(X) for constant a and b.

• E(X ± Y ) = E(X) ± E(Y ) and V(X ± Y) = V(X) + V(Y ).

[End of Review]

(16)

Mean and Variance of ˆ β 2 : u 1 , u 2 , · · · , u n are assumed to be mutually indepen- dently and identically distributed with mean zero and variance σ 2 , but they are not necessarily normal.

Remember that we do not need normality assumption to obtain mean and variance but the normality assumption is required to test a hypothesis.

From (13), the expectation of ˆ β 2 is derived as follows:

E( ˆ β 2 ) = E( β 2 +

n

i = 1

ω i u i ) = β 2 + E(

n

i = 1

ω i u i ) = β 2 +

n

i = 1

ω i E(u i ) = β 2 . (14)

It is shown from (14) that the ordinary least squares estimator ˆ β 2 is an unbiased estimator of β 2 .

From (13), the variance of ˆ β 2 is computed as:

β = β +

n

ω =

n

ω =

n

ω =

n

ω 2

(17)

= σ 2

n

i=1

ω 2 i = σ 2

n

i = 1 (x ix) 2 . (15)

The third equality holds because u 1 , u 2 , · · · , u n are mutually independent.

The last equality comes from (12).

Thus, E( ˆ β 2 ) and V( ˆ β 2 ) are given by (14) and (15).

(18)

[Review] Three Good Properties on Estimator:

θ : Parameter

θ ˆ : Estimator of θ , i.e., ˆ θ = θ ˆ (X 1 , X 2 , · · · , X n ),

where X 1 , X 2 , · · · , X n are mutually independent random variables.

(*) Estimate of θ : ˆ θ = θ ˆ (x 1 , x 2 , · · · , x n ), where x i denotes the observed data of X i .

• Unbiasedness ( 不偏性 ): E(ˆ θ ) = θ .

• E ffi ciency ( 有効性 ):

The minimum variance estimator within all the unbiased estimators.

(*) It is not easy to check e ffi ciency in general. Instead, consider the best linear unbiased estimator (BLUE, 最良線型不偏推定量 ).

• Consistency ( 一致性 ): ˆ θ −→ θ as n −→ ∞ . Note that ˆ θ depends on # of obs.

[End of Review]

(19)

Gauss-Markov Theorem ( ガウス・マルコフ定理 ): It has been discussed above that ˆ β 2 is represented as (9), which implies that ˆ β 2 is a linear estimator, i.e., linear in y i .

In addition, (14) indicates that ˆ β 2 is an unbiased estimator.

Therefore, summarizing these two facts, it is shown that ˆ β 2 is a linear unbiased estimator ( 線形不偏推定量 ).

Furthermore, here we show that ˆ β 2 has minimum variance within a class of the linear unbiased estimators.

Consider the alternative linear unbiased estimator ˜ β 2 as follows:

β ˜ 2 =

n

i = 1

c i y i =

n

i = 1

( ω i + d i )y i ,

where c i = ω i + d i is defined and d i is nonstochastic.

(20)

Then, ˜ β 2 is transformed into:

β ˜ 2 =

n

i = 1

c i y i =

n

i = 1

( ω i + d i )( β 1 + β 2 x i + u i )

= β 1

n

i = 1

ω i + β 2

n

i = 1

ω i x i +

n

i = 1

ω i u i + β 1

n

i = 1

d i + β 2

n

i = 1

d i x i +

n

i = 1

d i u i

= β 2 + β 1

n

i = 1

d i + β 2

n

i = 1

d i x i +

n

i = 1

ω i u i +

n

i = 1

d i u i . Equations (10) and (11) are used in the forth equality.

Taking the expectation on both sides of the above equation, we obtain:

E( ˜ β 2 ) = β 2 + β 1

n

i = 1

d i + β 2

n

i = 1

d i x i +

n

i = 1

ω i E(u i ) +

n

i = 1

d i E(u i )

= β 2 + β 1

n

i = 1

d i + β 2

n

i = 1

d i x i .

(21)

Since ˜ β 2 is assumed to be unbiased, we need the following conditions:

n

i = 1

d i = 0 ,

n

i = 1

d i x i = 0 .

When these conditions hold, we can rewrite ˜ β 2 as:

β ˜ 2 = β 2 +

n

i = 1

( ω i + d i )u i . The variance of ˜ β 2 is derived as:

V( ˜ β 2 ) = V ( β 2 +

n

i = 1

( ω i + d i )u i

) = V ( ∑ n

i = 1

( ω i + d i )u i

) =

n

i = 1

V (

( ω i + d i )u i

)

=

n

i = 1

( ω i + d i ) 2 V(u i ) = σ 2 (

n

i = 1

ω 2 i + 2

n

i = 1

ω i d i +

n

i = 1

d 2 i )

= σ 2 (

n

i = 1

ω 2 i +

n

i = 1

d 2 i ) .

(22)

From unbiasedness of ˜ β 2 , using ∑ n

i = 1 d i = 0 and ∑ n

i = 1 d i x i = 0, we obtain:

n

i = 1

ω i d i =

n

i = 1 (x ix)d i

n

i = 1 (x ix) 2 =

n

i = 1 x i d ixn

i = 1 d i

n

i = 1 (x ix) 2 = 0 ,

which is utilized to obtain the variance of ˜ β 2 in the third line of the above equation.

From (15), the variance of ˆ β 2 is given by: V( ˆ β 2 ) = σ 2n i=1 ω 2 i . Therefore, we have:

V( ˜ β 2 ) ≥ V( ˆ β 2 ) , because of ∑ n

i = 1 d 2 i ≥ 0.

When ∑ n

i = 1 d i 2 = 0, i.e., when d 1 = d 2 = · · · = d n = 0,

we have the equality: V( ˜ β 2 ) = V( ˆ β 2 ).

(23)

As shown above, the least squares estimator ˆ β 2 gives us the minimum variance lin-

ear unbiased estimator ( 最小分散線形不偏推定量 ), or equivalently the best linear

unbiased estimator ( 最良線形不偏推定量, BLUE), which is called the Gauss-

Markov theorem ( ガウス・マルコフ定理 ).

(24)

Asymptotic Properties (

ぜん

きん

近的性質 ) of ˆ β 2 : We assume that as n goes to infinity we have the following:

1 n

n

i=1

(x ix) 2 −→ m < ∞, where m is a constant value. From (12), we obtain:

n

n

i = 1

ω 2 i = 1

(1 / n)n

i = 1 (x ix) −→ 1 m .

Note that f (x n ) −→ f (m) when x n −→ m, called Slutsky’s theorem ( スルツキー 定理 ), where m is a constant value and f ( · ) is a function.

We show both consistency ( 一致性 ) of ˆ β 2 and asymptotic normality ( 漸近正規性 ) of √

n( ˆ β 2 − β 2 ).

(25)

● First, we prove that ˆ β 2 is a consistent estimator of β 2 .

[Review] Chebyshev’s inequality ( チェビシェフの不等式 ) is given by:

P( | X − µ| > ) ≤ σ 2

2 , where µ = E(X), σ 2 = V(X) and any > 0.

[End of Review]

Replace X, E(X) and V(X) by:

β ˆ 2 , E( ˆ β 2 ) = β 2 , and V( ˆ β 2 ) = σ 2

n

i = 1

ω 2 i = ∑ n σ 2

i = 1 (x ix) . Then, when n −→ ∞ , we obtain the following result:

P( | β ˆ 2 − β 2 | > ) ≤ σ 2n i=1 ω 2 i

2 = σ 2 nn i=1 ω 2 i

n 2 −→ 0 , where ∑ n

i = 1 ω 2 i −→ 0 because nn

i = 1 ω 2 i −→ 1

m from the assumption.

Thus, we obtain the result that ˆ β 2 −→ β 2 as n −→ ∞ .

Therefore, we can conclude that ˆ β 2 is a consistent estimator ( 一致推定量 ) of β 2 .

(26)

● Next, we want to show that √

n( ˆ β 2 − β 2 ) is asymptotically normal.

[Review] The Central Limit Theorem ( 中心極限定理 , CLT) is: for random vari- ables X 1 , X 2 , · · · , X n ,

X − E(X)

√ V(X)

=

n

i = 1 X i − E( ∑ n i = 1 X i )

√ V( ∑ n

i = 1 X i ) −→ N(0 , 1) , as n −→ ∞, where X = 1

n

n

i = 1

X i .

X 1 , X 2 , · · · , X n are not necesarily iid, if V(X) is finite as n goes to infinity.

[End of Review]

(27)

Note that ˆ β 2 = β 2 + ∑ n

i = 1 ω i u i as in (13), and X i is replaced by ω i u i . From the central limit theorem, asymptotic normality is shown as follows:

n

i = 1 ω i u i − E( ∑ n

i = 1 ω i u i )

√ V( ∑ n

i = 1 ω i u i ) =

n

i = 1 ω i u i σ √∑ n

i = 1 ω 2 i = β ˆ 2 − β 2

σ/ √∑ n

i=1 (x ix) 2 −→ N (0 , 1) , where

• E( ∑ n

i = 1 ω i u i ) = 0,

• V( ∑ n

i = 1 ω i u i ) = σ 2n

i = 1 ω 2 i , and

• ∑ n

i = 1 ω i u i = β ˆ 2 − β 2

are substituted in the first and second equalities.

(28)

Moreover, we can rewrite as follows:

β ˆ 2 − β 2

σ/ √∑ n

i = 1 (x ix) 2 =

n( ˆ β 2 − β 2 ) σ/ √

(1 / n)n

i = 1 (x ix) 2 . Replacing (1 / n)n

i = 1 (x ix) 2 by its converged value m, we have:

n( ˆ β 2 − β 2 ) σ/ √

m −→ N(0 , 1) , which implies

n( ˆ β 2 − β 2 ) −→ N(0 , σ 2 m ) . Thus, the asymptotic normality of √

n( ˆ β 2 − β 2 ) is shown.

(29)

Finally, replacing σ 2 by its consistent estimator s 2 , it is known as follows:

β ˆ 2 − β 2

s / √∑ n

i = 1 (x ix) 2 −→ N(0 , 1) , (16)

where s 2 is defined as:

s 2 = 1 n − 2

n

i = 1

e 2 i = 1 n − 2

n

i = 1

(y i − β ˆ 1 − β ˆ 2 x i ) 2 , (17) which is a consistent and unbiased estimator of σ 2 . −→ Proved later.

Thus, using (16), in large sample we can construct the confidence interval and test

the hypothesis.

(30)

[Review] Confidence Interval ( 信頼区間,区間推定 )):

Suppose X 1 , X 2 , · · · , X n are iid with mean µ and variance σ 2 . −→ No N assumption From CLT, X − E(X)

√ V(X)

= X − µ σ/ √

n −→ N(0 , 1).

Replacing σ 2 by S 2 = 1 n − 1

n

i = 1

(X iX) 2 , we have: X − µ S / √

n −→ N(0 , 1).

That is, for large n, P (

− 1 . 96 < X − µ S / √

n < 1 . 96 )

= 0 . 95, i.e., P (

X − 1 . 96 S

n < µ < X + 1 . 96 S

n

) = 0 . 95.

Note that 1.96 is obtained from the normal distribution table.

Then, replacing the estimators X and S 2 by the estimates x and s 2 , we obtain the 95%

confidence interval of µ as follows:

(x − 1 . 96 s

n , x + 1 . 96 s

n ) .

(31)

Going back to OLS, we have:

β ˆ 2 − β 2

s / √∑ n

i = 1 (x ix) 2 −→ N(0 , 1) . Therefore,

P (

− 2 . 576 < β ˆ 2 − β 2

s / √∑ n

i = 1 (x ix) 2 < 2 . 576 )

= 0 . 99 , i.e.,

P (

β ˆ 2 − 2 . 576 s

√∑ n

i = 1 (x ix) 2 < β 2 < β ˆ 2 + 2 . 576 s

√∑ n

i = 1 (x ix) 2

) = 0 . 99 .

Note that 2.576 is 0.005 value of N(0 , 1), which comes from the statistical table.

Thus, the 99% confidence interval of β 2 is:

( β ˆ 2 − 2 . 576 s

√∑ n

i = 1 (x ix) 2 , β ˆ 2 + 2 . 576 s

√∑ n

i = 1 (x ix) 2 ) ,

where ˆ β 2 and s 2 should be replaced by the observed data.

(32)

[Review] Testing the Hypothesis ( 仮説検定 ):

Suppose that X 1 , X 2 , · · · , X n are iid with mean µ and variance σ 2 . From CLT, X − µ

S / √

n −→ N(0 , 1), where S 2 = 1 n − 1

n

i = 1

(X iX) 2 , which is known as the unbiased estimator of σ 2 .

• The null hypothesis H 0 : µ = µ 0 , where µ 0 is a fixed number.

• The alternative hypothesis H 1 : µ , µ 0

Under the null hypothesis, in large sample we have the following disribution:

X − µ 0

S / √

nN(0 , 1) . Replacing X and S 2 by x and s 2 , compare x − µ 0

s / √

n and N(0 , 1).

H 0 is rejected at significance level 0.05 when x − µ 0

s / √ n

> 1 . 96.

(33)

In the case of OLS, the hypotheses are as follows:

• The null hypothesis H 0 : β 2 = β 2

• The alternative hypothesis H 1 : β 2 , β 2 Under H 0 , in large sample,

β ˆ 2 − β 2 s / √∑ n

i=1 (x ix) 2N(0 , 1) . Replacing ˆ β 2 and s 2 by the observed data, compare

β ˆ 2 − β 2 s / √∑ n

i = 1 (x ix) 2 and N(0 , 1).

H 0 is rejected at significance level 0.05 when β ˆ 2 − β 2 s / √∑ n

i = 1 (x ix) 2

> 1 . 96.

(34)

Exact Distribution of ˆ β 2 : We have shown asymptotic normality of √

n( ˆ β 2 − β 2 ), which is one of the large sample properties.

Now, we discuss the small sample properties of ˆ β 2 .

In order to obtain the distribution of ˆ β 2 in small sample, the distribution of the error term has to be assumed.

Therefore, the extra assumption is that u iN(0 , σ 2 ).

Writing (13), again, ˆ β 2 is represented as:

β ˆ 2 = β 2 +

n

i = 1

ω i u i .

First, we obtain the distribution of the second term in the above equation.

(35)

[Review]   Content of Special Lectures in Economics (Statistical Analysis) Note that the moment-generating function ( 積率母関数 , MGF) is given by M( θ ) ≡ E(exp( θ X)) = exp( µθ + 1 2 σ 2 θ 2 ) when XN( µ, σ 2 ).

X 1 , X 2 , · · · , X n are mutually independently distributed as X iN( µ i , σ 2 i ) for i = 1 , 2 , · · · , n.

MGF of X i is M i ( θ ) ≡ E(exp( θ X i )) = exp( µ i θ + 1 2 σ 2 i θ 2 ).

Consider the distribution of Y = ∑ n

i = 1 (a i + b i X i ), where a i and b i are constant.

M y ( θ ) ≡ E(exp( θ Y)) = E(exp( θ ∑ n

i = 1 (a i + b i X i )))

= ∏ n

i = 1 exp( θ a i )E(exp( θ b i X i )) = ∏ n

i = 1 exp( θ a i )M i ( θ b i )

= ∏ n

i = 1 exp( θ a i ) exp( µ i θ b i + 1 2 σ 2 i ( θ b i ) 2 ) = exp( θ ∑ n

i = 1 (a i + b i µ i ) + 1 2 θ 2n

i = 1 b 2 i σ 2 i ), which implies that YN(n

i = 1 (a i + b i µ i ) , ∑ n

i = 1 b 2 i σ 2 i ).

[End of Review]

(36)

Substitute a i = 0, µ i = 0, b i = ω i and σ 2 i = σ 2 . Then, using the moment-generating function, ∑ n

i = 1 ω i u i is distributed as:

n

i = 1

ω i u iN(0 , σ 2

n

i = 1

ω 2 i ) .

Therefore, ˆ β 2 is distributed as:

β ˆ 2 = β 2 +

n

i = 1

ω i u iN( β 2 , σ 2

n

i = 1

ω 2 i ) , or equivalently,

β ˆ 2 − β 2

σ √∑ n

i = 1 ω 2 i = β ˆ 2 − β 2

σ/ √∑ n

i = 1 (x ix) 2N(0 , 1) ,

for any n.

(37)

[Review 1]   t Distribution:

ZN(0 , 1), V ∼ χ 2 (k), and Z is independent of V . Then, Z

V / kt(k).

[End of Review 1]

[Review 2]   t Distribution:

Suppose that X 1 , X 2 . · · · , X n are mutually independently, identically and normally dis- tributed with mean µ and variance σ 2 .

XN( µ, σ 2 / n), i.e., X − µ σ/ √

nN(0 , 1).

Define S 2 = 1 n − 1

n

i = 1

(X iX) 2 , which is an unbiased estimator of σ 2 . It is known that (n − 1)S 2

σ 2 ∼ χ 2 (n − 1) and X is independesnt of S 2 . (The proof is

skipped.)

(38)

Then, we obtain

X − µ σ/ √

v n

u u

u t (n − 1)S 2 σ 2

/ (n − 1)

= X − µ S / √

nt(n − 1).

As a result, replacing σ 2 by S 2 , X − µ S / √

nt(n − 1).

[End of Review 2]

(39)

Back to OLS:

Replacing σ 2 by its estimator s 2 defined in (17), it is known that we have:

β ˆ 2 − β 2

s / √∑ n

i=1 (x ix) 2t(n − 2) ,

where t(n − 2) denotes t distribution with n − 2 degrees of freedom.

Thus, under normality assumption on the error term u i , the t(n − 2) distribution is used for the confidence interval and the testing hypothesis in small sample.

Or, taking the square on both sides,

( β ˆ 2 − β 2

s / √∑ n

i = 1 (x ix) 2 ) 2

F(1 , n − 2) , which will be proved later.

Before going to multiple regression model ( 重回帰モデル ),

(40)

2 Some Formulas of Matrix Algebra

1. Let A =

 







a 11 a 12 · · · a 1k

a 21 a 22 · · · a 2k ... ... ... ...

a l1 a l2 · · · a lk

 





 = [a i j ],

which is a l × k matrix, where a i j denotes ith row and jth column of A.

The transposed matrix ( 転置行列 ) of A, denoted by A 0 , is defined as:

A 0 =

 







a 11 a 21 · · · a l1 a 12 a 22 · · · a l2

... ... ... ...

a 1k a 2k · · · a lk

 





 = [a ji ],

where the ith row of A 0 is the ith column of A.

(41)

2. (Ax) 0 = x 0 A 0 ,

where A and x are a l × k matrix and a k × 1 vector, respectively.

3. a 0 = a,

where a denotes a scalar.

4. ∂ a 0 x

x = a,

where a and x are k × 1 vectors.

5. ∂ x 0 Ax

x = (A + A 0 )x,

where A and x are a k × k matrix and a k × 1 vector, respectively.

Especially, when A is symmetric,

x 0 Ax

x = 2Ax.

(42)

6. Let A and B be k × k matrices, and I k be a k × k identity matrix ( 単位行列 ) (one in the diagonal elements and zero in the other elements).

When AB = I k , B is called the inverse matrix ( 逆行列 ) of A, denoted by B = A 1 .

That is, AA 1 = A 1 A = I k .

7. Let A be a k × k matrix and x be a k × 1 vector.

If A is a positive definite matrix ( 正値定符号行列 ), for any x except for x = 0 we have:

x 0 Ax > 0 .

If A is a positive semidefinite matrix ( 非負値定符号行列 ), for any x except

for x = 0 we have:

(43)

If A is a negative definite matrix ( 負値定符号行列 ), for any x except for x = 0 we have:

x 0 Ax < 0 .

If A is a negative semidefinite matrix ( 非正値定符号行列 ), for any x except for x = 0 we have:

x 0 Ax ≤ 0 .

Trace, Rank and etc.: A : k × k, B : n × k, C : k × n.

1. The trace ( トレース ) of A is: tr(A) =

k

i = 1

a ii , where A = [a i j ] .

2. The rank ( ランク,階数 ) of A is the maximum number of linearly independent

column (or row) vectors of A, which is denoted by rank(A).

(44)

3. If A is an idempotent matrix ( べき等行列 ), A = A 2 . 4. If A is an idempotent and symmetric matrix, A = A 2 = A 0 A .

5. A is idempotent if and only if the eigen values of A consist of 1 and 0.

6. If A is idempotent, rank(A) = tr(A) . 7. tr(BC ) = tr(CB)

Distributions in Matrix Form:

1. Let X, µ and Σ be k × 1, k × 1 and k × k matrices.

When XN( µ, Σ ), the density function of X is given by:

f (x) = 1

exp (

− 1

(x − µ ) 0 Σ 1 (x − µ ) )

.

(45)

E(X) = µ and V(X) = E (

(X − µ )(X − µ ) 0 )

= Σ The moment-generating function: φ ( θ ) = E (

exp( θ 0 X) )

= exp( θ 0 µ + 1 2 θ 0 Σθ ) (*) In the univariate case, when XN( µ, σ 2 ), the density function of X is:

f (x) = 1

(2 πσ 2 ) 1 / 2 exp (

− 1

2 σ 2 (x − µ ) 2 ) .

2. If XN( µ, Σ ), then (X − µ ) 0 Σ 1 (X − µ ) ∼ χ 2 (k).

Note that X 0 X ∼ χ 2 (k) when XN(0 , I k ).

3. X: n × 1, Y: m × 1, XN( µ x , Σ x ), YN( µ y , Σ y ) X is independent of Y, i.e., E (

(X − µ x )(Y − µ y ) 0 )

= 0 in the case of normal random variables.

(X − µ x ) 0 Σ −1 x (X − µ x ) / n

(Y − µ y ) 0 Σ y 1 (Y − µ y ) / mF(n , m)

(46)

4. If XN(0 , σ 2 I n ) and A is a symmetric idempotent n × n matrix of rank G, then X 0 AX2 ∼ χ 2 (G).

Note that X 0 AX = (AX) 0 (AX) and rank(A) = tr(A) because A is idempotent.

5. If XN(0 , σ 2 I n ), A and B are symmetric idempotent n × n matrices of rank G and K, and AB = 0, then

X 0 AX G σ 2

/ X 0 BX

K σ 2 = X 0 AX / G

X 0 BX / KF(G , K) .

(47)

3 Multiple Regression Model ( 重回帰モデル )

Up to now, only one independent variable, i.e., x i , is taken into the regression model.

We extend it to more independent variables, which is called the multiple regression model ( 重回帰モデル ).

We consider the following regression model:

y i = β 1 x i , 1 + β 2 x i , 2 + · · · + β k x i , k + u i = (x i , 1 , x i , 2 , · · · , x i , k )

 







β 1

β 2

...

β k

 





 + u i = x i β + u i ,

for i = 1 , 2 , · · · , n, where x i and β denote a 1 × k vector of the independent variables

(48)

and a k × 1 vector of the unknown parameters to be estimated, which are given by:

x i = (x i , 1 , x i , 2 , · · · , x i , k ) , β =

 







β 1

β 2

...

β k

 





 . x i , j denotes the ith observation of the jth independent variable.

The case of k = 2 and x i , 1 = 1 for all i is exactly equivalent to (1).

Therefore, the matrix form above is a generalization of (1).

Writing all the equations for i = 1 , 2 , · · · , n, we have:

y 1 = β 1 x 1 , 1 + β 2 x 1 , 2 + · · · + β k x 1 , k + u 1 = x 1 β + u 1 , y 2 = β 1 x 2 , 1 + β 2 x 2 , 2 + · · · + β k x 2 , k + u 2 = x 2 β + u 2 ,

...

(49)

which is rewritten as:

 







y 1 y 2 ...

y n

 





 =

 







x 1,1 x 1,2 · · · x 1,k x 2 , 1 x 2 , 2 · · · x 2 , k ... ... ... ...

x n , 1 x n , 2 · · · x n , k

 







 







β 1

β 2

...

β k

 





 +

 







u 1 u 2 ...

u n

 







=

 







x 1

x 2 ...

x n

 





 β +

 







u 1

u 2 ...

u n

 





 .

Again, the above equation is compactly rewritten as:

y = X β + u , (18)

(50)

where y, X and u are denoted by:

y =

 







y 1

y 2 ...

y n

 





 , X =

 







x 1 , 1 x 1 , 2 · · · x 1 , k

x 2 , 1 x 2 , 2 · · · x 2 , k ... ... ... ...

x n , 1 x n , 2 · · · x n , k

 





 =

 







x 1

x 2 ...

x n

 





 , u =

 







u 1

u 2 ...

u n

 





 .

Utilizing the matrix form (18), we derive the ordinary least squares estimator of β , denoted by ˆ β .

In (18), replacing β by ˆ β , we have the following equation:

y = X β ˆ + e , where e denotes a n × 1 vector of the residuals.

The ith element of e is given by e i .

(51)

The sum of squared residuals is written as follows:

S ( ˆ β ) =

n

i = 1

e 2 i = e 0 e = (y − X β ˆ ) 0 (y − X β ˆ ) = (y 0 − β ˆ 0 X 0 )(y − X β ˆ )

= y 0 yy 0 X β ˆ − β ˆ 0 X 0 y + β ˆ 0 X 0 X β ˆ = y 0 y − 2y 0 X β ˆ + β ˆ 0 X 0 X β. ˆ In the last equality, note that ˆ β 0 X 0 y = y 0 X β ˆ because both are scalars.

To minimize S ( ˆ β ) with respect to ˆ β , we set the first derivative of S ( ˆ β ) equal to zero, i.e.,

S ( ˆ β )

∂ β ˆ = − 2X 0 y + 2X 0 X β ˆ = 0 .

Solving the equation above with respect to ˆ β , the ordinary least squares estimator (OLS, 最小自乗推定量 ) of β is given by:

β ˆ = (X 0 X) −1 X 0 y . (19)

Thus, the ordinary least squares estimator is derived in the matrix form.

(52)

(*) Remark

The second order condition for minimization:

2 S ( ˆ β )

∂ β∂ ˆ β ˆ 0 = 2X 0 X is a positive definite matrix.

Set c = Xd.

For any d , 0, we have c 0 c = d 0 X 0 Xd > 0.

(53)

Now, in order to obtain the properties of ˆ β such as mean, variance, distribution and so on, (19) is rewritten as follows:

β ˆ = (X 0 X) 1 X 0 y = (X 0 X) 1 X 0 (X β + u) = (X 0 X) 1 X 0 X β + (X 0 X) 1 X 0 u

= β + (X 0 X) 1 X 0 u . (20)

Taking the expectation on both sides of (20), we have the following:

E( ˆ β ) = E( β + (X 0 X) 1 X 0 u) = β + (X 0 X) 1 X 0 E(u) = β, because of E(u) = 0 by the assumption of the error term u i .

Thus, unbiasedness of ˆ β is shown.

(54)

The variance of ˆ β is obtained as:

V( ˆ β ) = E(( ˆ β − β )( ˆ β − β ) 0 ) = E (

(X 0 X) 1 X 0 u((X 0 X) 1 X 0 u) 0 )

= E((X 0 X) 1 X 0 uu 0 X(X 0 X) 1 ) = (X 0 X) 1 X 0 E(uu 0 )X(X 0 X) 1

= σ 2 (X 0 X) 1 X 0 X(X 0 X) 1 = σ 2 (X 0 X) 1 .

The first equality is the definition of variance in the case of vector.

In the fifth equality, E(uu 0 ) = σ 2 I n is used, which implies that E(u 2 i ) = σ 2 for all i and E(u i u j ) = 0 for i , j.

Remember that u 1 , u 2 , · · · , u n are assumed to be mutually independently and identi-

cally distributed with mean zero and variance σ 2 .

(55)

Under normality assumption on the error term u, it is known that the distribution of β ˆ is given by:

β ˆ ∼ N( β, σ 2 (X 0 X) 1 ) . Proof:

First, when XN( µ, Σ ), the moment-generating function, i.e., φ ( θ ), is given by:

φ ( θ ) ≡ E (

exp( θ 0 X) )

= exp (

θ 0 µ + 1 2 θ 0 Σθ ) θ u : n × 1, u: n × 1, θ β : k × 1, β ˆ : k × 1

The moment-generating function of u, i.e., φ u ( θ u ), is:

φ u ( θ u ) ≡ E (

exp( θ 0 u u) )

= exp ( σ 2 2 θ 0 u θ u

) ,

which is N(0 , σ 2 I n ).

(56)

The moment-generating function of ˆ β , i.e., φ β ( θ β ), is:

φ β ( θ β ) ≡ E (

exp( θ 0 β β ˆ ) )

= E (

exp( θ β 0 β + θ β 0 (X 0 X) 1 X 0 u) )

= exp( θ 0 β β )E (

exp( θ 0 β (X 0 X) 1 X 0 u) )

= exp( θ 0 β β ) φ u

( θ 0 β (X 0 X) 1 X 0 )

= exp( θ 0 β β ) exp ( σ 2

2 θ 0 β (X 0 X) 1 θ β

) = exp (

θ β 0 β + σ 2

2 θ 0 β (X 0 X) 1 θ β

) ,

which is equivalent to the normal distribution with mean β and variance σ 2 (X 0 X) 1 .

Note that θ u = X(X 0 X) 1 θ β . QED

(57)

Taking the jth element of ˆ β , its distribution is given by:

β ˆ jN( β j , σ 2 a j j ) , i.e., β ˆ j − β j

σ √ a j jN(0 , 1) , where a j j denotes the jth diagonal element of (X 0 X) 1 .

Replacing σ 2 by its estimator s 2 , we have the following t distribution:

β ˆ j − β j

sa j jt(nk) ,

where t(nk) denotes the t distribution with nk degrees of freedom.

(58)

[Review] Trace ( トレース ):

1. A: n × n, tr(A) = ∑ n

i = 1 a ii , where a i j denotes an element in the ith row and the jth column of a matrix A.

2. a: scalar (1 × 1), tr(a) = a

3. A: n × k, B: k × n, tr(AB) = tr(BA)

4. tr(X(X 0 X) 1 X 0 ) = tr((X 0 X) 1 X 0 X) = tr(I k ) = k

5. When X is a square matrix of random variables, E(tr(AX)) = tr(AE(X))

End of Review

Figure 1. True and Estimated Regression Lines ( 回帰直線 ) y xXXXXXXXzDistributionsof the Errors×.............................................................×.............................................................×ErroruiRe

参照

関連したドキュメント

In particular, we consider a reverse Lee decomposition for the deformation gra- dient and we choose an appropriate state space in which one of the variables, characterizing the

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

In order to be able to apply the Cartan–K¨ ahler theorem to prove existence of solutions in the real-analytic category, one needs a stronger result than Proposition 2.3; one needs

Our method of proof can also be used to recover the rational homotopy of L K(2) S 0 as well as the chromatic splitting conjecture at primes p &gt; 3 [16]; we only need to use the

The proof uses a set up of Seiberg Witten theory that replaces generic metrics by the construction of a localised Euler class of an infinite dimensional bundle with a Fredholm

This paper presents an investigation into the mechanics of this specific problem and develops an analytical approach that accounts for the effects of geometrical and material data on

While conducting an experiment regarding fetal move- ments as a result of Pulsed Wave Doppler (PWD) ultrasound, [8] we encountered the severe artifacts in the acquired image2.

Keywords: Electrocardiogram; Parameterization; Quadratic spline wavelet; PCA variance estimator; Feature extraction; Validation; Principal component analysis; Independent