Econometrics
Regression Discontinuity Design
Keisuke Kawata IDEC
Brush-up: Omitted variable bias
• The serious source of bias is covariates; have impacts on both treatments and outcome.
• Observable covariates ⇒ should be control.
• Unobservable covariates (omitted variables) ⇒ try to eliminate by using IV and/or Fixed effect approaches.
⇒Requirements of these approaches are demanding. (requiring the panel data and/or good IVs).
• We should ha e alter ati e eapo to sol e ias pro le s Regression discontinuity design (called as RDD).
Idea of RDD approach
• Suppose our interest is the effect of college education on income.
• Our data is consisted of high school and college graduates.
• One simple estimator is the sample difference between them.
←Totally problematic; there exist so many covariates.
• Alternatives are exact-matching or OLS with control variables.
←There still remain bias due to omitted variables.
• Suppose that in the society, there exists common entrance exam; all applicants must take the exam, and the exam-score is observable.
• There exists threshold score to enroll college.
Idea of RDD approach: Sharp discontinuity
• The distribution of college dummy and exam-score is as below.
⇒College enrollment are by the examination score. Score College dummy
1
0
Threshold
Idea of RDD approach: Sharp discontinuity
• The distribution of income and exam-score is as below
• It is reasonable to assume the characteristics of covariates are very similar in the . ⇒The difference between college and high school graduate in the neighborhood is of causal effects.
Score income
Threshold
RD approach with sharp discontinuity
• Let potential outcome and treatments denote � � and ��.
• Additional, let suppose the running variable � which have an impact on
Sharp discontinuity; There exists a threshold �;
�� = if � > � and �� = if � < � or
�� = if � < � and �� = if � > �
� �� �
RDD estimator with sharp discontinuity
• Simple RDD estimator of causal effects is
where �+ and �− are any values to define the eigh orhood of �.
⇒Unbiased estimator if
where �+ ≥ � ≥ �−.
⇒With a range [�+, �−], the distribution of covariates are same between treatment and control groups ⇔The value of treatments is
RDD estimator with sharp discontinuity
• RDD estimator can be obtained by using OLS.
Score income
Threshold
RDD estimator with sharp discontinuity
• RDD estimator can be obtained by using OLS.
• Let define a dummy ��(=1 if � ≥ �, =0 if � ≥ �), the population model is
�[ �| � = , �� = �] =
using the samples within [�−, �+].
⇒Using estimated parameters, the RDD estimator can be obtain as .
Fuzzy Discontinuity: example
⇒College enrollment depend on the score.
⇒We should use score as IV. College dummy
1
0
Threshold
Fuzzy Discontinuity: Formal definition
Fuzzy discontinuity: In the threshold, is jumped;
Pr �� = � = � + � > Pr �� = � = � − � or
Pr �� = � = � + � < Pr �� = � = � − � even in � is converge to zero.
Estimation with Fuzzy Discontinuity
• We follow the two-stage least square; 1st stage: Ti =
using samples within [�−, �+].
By using OLS, we can obtain predicted treatments �� = + � + �� + ��� 2nd stage: � =
⇒The estimated � is also estimator of causal effects.
Note: Even if our interest treatments is continuous value, above two-stage least square strategy still well work.
Example of fuzzy discontinuity
We can find good discontinuity from actual rules;
• Micro-finance; Grameen Ba k’s tagets are households with landholdings of less than half an acre
• Pension programs; targeted population is above a certain age
• Scholarships; targeted students are having high scores on standardized test.
• High-way toll; the toll price are; 100¥ in 0km~10km, 200¥ in 11km~20km
Practical issue: Weak discontinuity
• If the gap of treatment probability is small, you face a similar problems as in weak IV.
⇒The prediction power in 1st stage is not enough ⇒ The small variation of
predicted treatment ⇒ The ariatio of esti ated ausal effe t is large, a d it’s distribution cannot be approximated by the normal distribution.
Practical issue: Range settings
• We estimate causal effects by using sub-samples within a range [�−, �+].
⇒How to determine the rage? ←Trade-off
: the unbiasedness are questionable because the assignment of treatments may not be pure-randomly determined within large range.
: the efficiency of estimators is down because we can use only small samples.
Practical issue: Manipulation
• If the existence of threshold are well known in society, we may try to manipulate the value of our running variables.
e.g.,) Test score is less than threshold as just one points ⇒ I may try to negotiate ith “e sei……
• I di iduals a o e threshold ay ha e ore po er tha u der threshold e e if our target range is enough small.
⇒ The assumption as random assignment of treatments
Practical issue: Robustness
• The assumption of random-assignment treatments requires that the expected alue of o ariates are ery si ilar et ee a o e/u der threshold ⇒ Values of covariates are in the threshold.
⇒For the observable covariates, we can check by using regression or scatter graph.
Running Covariates
Conclusion
• RDD approa h try to esti ate the ausal effe t y usi g ju p of treat e ts i a certain threshold of running variable.
• In fuzzy discontinuity case, the running variable can be used as the IV.
• In the actual paper, the non-parametric regression technique applies to get RDD estimators. ⇒ If you have an interest, please come to my office.