• 検索結果がありません。

Slide 8_distribution 最近の更新履歴 Keisuke Kawata's HP

N/A
N/A
Protected

Academic year: 2018

シェア "Slide 8_distribution 最近の更新履歴 Keisuke Kawata's HP"

Copied!
19
0
0

読み込み中.... (全文を見る)

全文

(1)

Econometrics

Linear Regression with Multiple Regressors

Keisuke Kawata

Hiroshima University

(2)

Linear Regression with Multiple Regressors

• In the matching approach, we often face the small-sample size problem.

⇒We use alternative approach: Linear Regression with Multiple Regressors.

• In the approach, we assume the conditional means follows a linear function:

where X are control variables.

⇒ We can estimate by using the least square estimation under a couple of assumptions.

�[�|�, , , … , ��] =

(3)

Population model with multi-regressor

• Supposing the (linear) population model:

where are parameters, and u chapters the effects of other factors.

capture the causal effect of treatment.

e.g.,) The estimation for impacts of education on income.

= + × � � � + u includes

= + × � � � + × � � +

⇒ u does not include

(4)

OLS estimators in multiple regression

• Same as the single regressor case, the OLS estimators , , , … , are determined to minimize the following total s ua ed gap :

�=

− − − ⋯ − � ��

• Using the estimator of error terms, = − − − ⋯ − � ��, a o e gap a e e itte as

�=

⇒ OLS estimators minimize the the total sum of error terms

(5)

Least Squares Assumptions in multiple regression

The least squires assumption in multiple regression 1. Your data is

2. The mean of is zero: 3. Fo a y T a d T’ 4. There is no perfect

• If the following least squares assumptions hold, OLS estimator are

– have u de the la ge sa ple size.

pu e a do sa pli g data.

� = 0.

(6)

Interpretation: �, , . . ,

= � �′, , . . ,

• � �, , . . , = � �′, , . . , is called as , which means that there are no covariates except for , . . , .

If there exist these covariates , . . , ′ , . . , ′ .

⇒If all covariates are included in data, conditional independency .

(7)

e.g. The impacts of education on income

• e.g.) “upposi g that i o e depe ds o o ly edu atio , ge de , a d lu k . – Education and gender are correlated.

– Luck is not correlated with education and gender.

• If you estimate � = + × � +

where = × � � +

⇒ OLS estimator have , because includes the effects of gender.

• If you estimate � = + × � + × � � +

where =

⇒ OLS estimator have , because includes only the effects of luck.

(8)

e.g. The impacts of education on income

• Why can we relax the assumption about the error term?

• If you do ’t o t ol the ge de You cannot distinguish the impact of

education from the impact of gender because education and gender co-move.

• If you control the gender You can estimate the impact of education holding the gender You can distinguish the impact of education from the impact of gender.

(9)

Multicollinearity

• If there exists between right-hand side variables,

⇒ We cannot get the OLS estimators. e.g.) Effect of birth year

You would like to estimate the effect of birth year on income.

• Current age is correlated with birth year and have any impacts on income.

⇒ We should control, and then our population model is

� = + × �� ℎ � � + × � +

• If you can use only cross section data,

(10)

Multicollinearity

• If there exists the perfect linear relationships between explanation variables as

where is any value.

⇒ We cannot get the OLS estimators!!!!

(11)

E.g.) The effects of birth year

You would like to estimate the effect of birth year on income.

• Current age is correlated with birth year and have any impacts on income

⇒ We should control, and then our population model is

� = + × �� ℎ � � + × � +

• If you can use only cross section data, we cannot get OLS estimators

← The relationship between age and birth year is

⇒ there exists the linear relationship!!!!

(12)

Multicollinearity in Practice

• If there exist multicollinearity between and , you cannot distinguish the effect of from the effect of .

• You must drop (or ), and the coefficient of must be interpreted as the effects of

e.g.) Birth year VS Age

If you d op age, the esti ated oeffi ie t of i th yea aptu e the effe t of a increase in birth year and a decrease in age.

(13)

Multicollinearity in Practice

• If there exist multicollinearity between control variables No serious problem.

← If you hold one variable constant by controlling, another variable is automatically

• If there exist multicollinearity between explanation variables Serious problem. You should change

(14)

The variance of OLS estimators

• The variance of OLS estimator is large if – The sample size is small.

– The variation of holding other regressor constant is (If there are no variation as multicollineality, the variance is ).

– The number of regressors are

(15)

Question

• True/False question. Suppose the pure random sampling data.

1. If there exist correlation between treatment and control variables, we cannot get the unbiased OLS estimators.

2. If all covariates can be controlled, you can get unbiased OLS estimators.

3. If you can use repeated cross section data (e.g. 1997 Bangladesh Household Survey), you can distinguish the effect of age from the effect of birth year. 4. To get the estimator of the causal effect of gender on income, you should

control the education year.

(16)

Bad o t ol

• If a variable is determined by explanation variables, you should not control.

• The effect of � through changes in should be the interpreted as

• If you control , your OLS estimator is not equal to the causal effects.

(↑) (↑ or ↓)

(↑)

Other factors

(17)

Guideline: Control variable

• What’s a ia les e should o t ol?

⇒ Variables have impacts on variables.

• What’s a ia les e should not control?

⇒Variables are determined by

• Actually, there exists the trade-off:

– The e efit of o t ol a ia les Reducing the – The osts of o t ol a ia les Increasing

• If your sample size is large, you should control

• If your sample size is not large, you should not control any variables which have

(18)

The condition of good data

• Including your interest explained and explanation variables.

• Having large sample size.

• Including

(19)

Conclusion

• Controlling covariates are strong tool to get unbiased estimators.

• Large number of regressors reduces the efficiency of estimation. If sample size is not large, you can control only few variables.

• You should pay attention the multicollinearity problem.

Future issues

• If you cannot control all covariates Omitted variable bias still remains.

• How can we do? The main issue of last part of this class.

参照

関連したドキュメント

The only thing left to observe that (−) ∨ is a functor from the ordinary category of cartesian (respectively, cocartesian) fibrations to the ordinary category of cocartesian

An easy-to-use procedure is presented for improving the ε-constraint method for computing the efficient frontier of the portfolio selection problem endowed with additional cardinality

The inclusion of the cell shedding mechanism leads to modification of the boundary conditions employed in the model of Ward and King (199910) and it will be

(Construction of the strand of in- variants through enlargements (modifications ) of an idealistic filtration, and without using restriction to a hypersurface of maximal contact.) At

It is suggested by our method that most of the quadratic algebras for all St¨ ackel equivalence classes of 3D second order quantum superintegrable systems on conformally flat

This paper develops a recursion formula for the conditional moments of the area under the absolute value of Brownian bridge given the local time at 0.. The method of power series

Answering a question of de la Harpe and Bridson in the Kourovka Notebook, we build the explicit embeddings of the additive group of rational numbers Q in a finitely generated group

The main problem upon which most of the geometric topology is based is that of classifying and comparing the various supplementary structures that can be imposed on a