DID in regression model
• Even if your interest treatments is binary, we sometimes use the regression model;
� = + ∗ � + ∗ � + ∗ � ∗ � +
where � = if treatment group and =0 if control group, � = if after treatment and =0 if before treatment.
Non-treatment
Treatment
Non-treatment
Non-treatment
� = � =
� � =
� � =
DID in regression model
• First Difference are
� � = − � � = = +
� � = − �[ |� = ] =
• The DID is .
• Post can be interpreted as time dummy, Treat can be interpreted as entity dummy, and � ∗ � can be interpreted as treatments
Econometrics
Instrumental Variables Approach
Keisuke Kawata
Hiroshima University
Plan of talks
1. Concepts of instrumental variables approach
2. Instrumental variable regression with one endogenous and instrument variables
3. General instrumental variable regression 4. Test for the assumptions
5. Some note
Remainder: The source of bias
• Suppose a population mode: = + � +
• If � ≠ �[ | ′], the OLS estimators have bias
⇒ Even if sample size is totally huge, estimators do not converge to true values.
• One of the source of bias is covariates. ⇒ How to eliminate? Control variable approach :
⇒The bias from omitted variables cannot be eliminated. Fixed effect approach:
⇒The bias from unobservable heterogeneous time trend cannot be eliminated.
⇒Panel data is needed.
Decomposition of variations
(Remainder) Causal effects are estimated by
⇒Generally, the variation can be decomposed by
e.g.) The effect of education on income: Why did (could) you go on to university? Variation of education level.
I’ ge ius
Good education environment I’ lu ky a !!!!
Estimation Coming from cognitive
ability
Coming from parent income
Idea of instrumental variable approach
OLS regression: Using of explanation variables, the causal effects are estimated. ⇒ may be correlated with variation of error terms (omitted variables). ⇒
Instrumental variable (IV) regression: try to estimate the parameters using only
• We should isolate such variation from others ⇒ Usi g I stru e tal aria les . Instrumental variables (IV): have the information about the movements in
treatments that
Example of IV: Randomized education voucher
• We try to estimate the causal effect of education on income.
• Suppose that government distributes education voucher for tertiary education.
• Due to the limitation of budget, vouchers are pure-randomly distributed.
⇒ Voucher dummy (=1 if the household receive, =0 if not) may be good IV to estimate the effect of education.
• We can suppose that voucher increases the probability that a child goes on to university. ← The variation is not correlated with other factors which have i pa ts o i o e e ause ou hers are pure-randomly distributed.
IV estimator with a single regressor
• We try to estimate the population model:
= + �� + ,
But there exist correlation between � and .
⇒To get unbiased estimator, we should use IV.
Endogenous variables: are with error terms Exogenous variables: are with error terms
⇒IV should be exogenous variables not endogenous variables.
Conditions for a valid IV
• A valid IV must satisfy two conditions: Instrument relevance:
⇒ IV is relevant to the explanation variable
⇒ can generate variation of � . Instrument exogeneity:
⇒ Error term is not depend on IV.
⇒ Error term is not depend on the variation of T from IV.
Two stage least squares estimation: 1
ststage
• The coefficient can be estimated by the IV regression called two stage least squares (TSLS or 2SLS).
1st stage: using standard OLS method, we estimate the following population model:
where is constant term, is the coefficient of IV, and is error term.
⇒Supporsing � is determined by
Note: From the assumption as instrument exogeneity,
• the correlation between must be
• the correlation between and other factors may be positive or negative.
Two stage least squares estimation: 2nd stage
• Using the estimators and , we can get predictor of � as � =
2nd stage: Using the predicted value � , e try to esti ate the i terest populatio model as
Point: From the instrument erogeneity , = .
⇒
⇒If assumptions of pure-random sampling holds, the estimator is
General IV estimator
• Generally, the IV population model can be defined as
where is control variables.
• The instrumental exogeneity can be rewritten as
E | , . . ℎ, , … , = � | ′, … , ℎ′, , … ,
• Given the control variables, there are no covariate between IV and outcome.
= + �� + + ⋯ + + ,
� = + � + ⋯ + � ℎ + + ⋯ + + ,
Graphical Intuition
IV Treatment Causal effect Outcome
Omitted variables
Summarize: IV regression with one regressor
The assumption in multiple regression 1. Your date are
2. The mean of error tem is zero: 3. Instrument relevance:
4. Instrument exogeneity:
• If the following assumptions hold, estimators � are –
– have u der the large sa ple size. u iased a d o siste t esti ators.
the or al distri utio s
pure ra do ly dra s.
� = � = .
Problem of Instrument relevance
• If the assumption of instrument relevance, , ≠ , is not hold
⇒ We cannot get estimators of IV regression.
• If the correlation between IV and endogenous variables are so small
⇒
⇒The distribution of estimators cannot be approximated by
⇒Standard method of statistical test (and confidence interval) cannot
Check for Instrument relevance
Thu rule for he ki g for weak i stru e ts
• You do ’t eed to orry a out eak i stru e ts if the first-stage regression F- statistic exceeds
• What do you do if you have weak IV?
1. You have a small number of strong IV and many weak IV
⇒You will be better off using only most relevant subset for your TSLS. 2. You have only weak IVs or the coefficients are the exactly identified
⇒You should find additional, stronger IVs ← Difficult. You should use other method than TSLS.
Question
• True/False questions, the pure random sampling data should be supposed. 1. The instrumental variables should be correlated with not only endogenous
variables but also error terms.
2. Good o trol aria les are also good i stru e tal aria les.
Control variable or Instrumental variable?
• Should be a variable used as whether control variable or instrumental variable? The condition for good control variable: should be correlated with explanation variables and
The condition for good instrumental variable: should be correlated with explanation variables but not correlated with
⇒should be uncorrelated with the error term.
Conclusion
• Using the instrumental variable approach, we can eliminate bias coming the correlation between explanation variables and error terms.
• The instrumental variables should be correlated with the endogenous variables but uncorrelated with the error term.
• You may test the validity of instrumental variables in overidentification cases, but descriptive justifications are necessary.