Econometrics: Exact Matching
Keisuke Kawata
IDEC, Hiroshima University
Econometrics 1
Bias problem from covariate
Econometrics 2
• We studies two approach to estimate the causal effects; sub-sample mean difference, OLS estimation
⇒In both approach, we need to assume no covariates which have impacts on both outcomes and treatments. ⇒
One useful approach to reduce bias is to use control variables. Sub-sample mean difference:
OLS estimation:
Example: Education program
Econometrics 3
• A government starts to education program; subsidies for high-school education.
• Due to budget limitation, a part of household can receive subsidy
⇒We can estimate the difference of education attainments between treatments (receive subsidies) and controls (not receive subsidies).
• It’s u iased esti ator of ausal effe ts? ←We should collect more detail information .
• “uppose there are t o regio ; ri h regio a d poor regio . – Within each region, subsidies are pure-randomly distributed.
– Government more intensively distributes subsidies in poor region than in rich region: The share of treatments over region population is different.
Review: sample difference and causal effect
Econometrics 4
• Let � � � denote potential education outcome with and without subsidies as respectively.
• The causal effect can be defined as
• The sample mean difference is the BLUE of
←It is unbiased estimator of the causal effect if and only if
where � = { , }.
Numerical example
Econometrics 5
• Simple average difference between treatment and control group is
ID �� region �� ID �� region ��
1 0 Rich 500 5 1 Rich 500
2 0 Rich 500 6 1 Poor 100
3 0 Rich 500 7 1 Poor 100
4 0 Poor 100 8 1 Poor 100
Review: Source of bias
Econometrics 6
• In the example, we assume that larger subsidies are distributed in poor region than in rich region.
⇒Regional background is difference between treatment and control groups.
The share of poor region livener in treatment group is larger than in control group.
←
Education subsidies Income
Regional characteristics
(e.g., infrastructure, local government policies, regional economic conditions)
Conditional average causal effects
Econometrics 7
• Let define average causal effects in each region as
� � � = , � = − � � � = , � =
� � � = , � = − � � � = , � =
� � � = , � = − � � � = , � =
� � � = , � = − � � � = , � =
where � is a dummy of treatment status (=1 if treated), � = is a dummy of region (=1 if living in rich region).
Conditional average causal effects
T=1 group T=0 group
�[
�|
�= ,
�= ]
�[
�|
�= ,
�= ]
�[
�|
�= ,
�= ]
�[
�|
�= ,
�= ]
Causal
effect
Causal
effect
Conditional average causal effects
T=1 group T=0 group
�[
�|
�= ,
�= ]
�[
�|
�= ,
�= ]
�[
�|
�= ,
�= ]
�[
�|
�= ,
�= ]
Causal
effect
Causal
effect
Estimation of conditional average causal effects
Econometrics 10
• We can easily estimate the conditional average causal effects in region R(=0,1);
• Let estimate � � � = , � = − � � � = , � =
• The sub-sample mean income among treatment group in rich region, � ,�, is the BLUE of
• Because treatment status is pure-randomly determined within region,
• The sub-sample mean among control group in rich region, � ,�, is the BLUE of both � � � = , � = and
⇒The sub-sample difference in region is the of the conditional average causal effect.
Estimation of average causal effects
Econometrics 11
• Average causal effects in groups � = can be also easily estimated.
• The average causal effects in group � = can be rewritten as
� � � = − � � � =
= {� � � = , � = − � � � = , � = }� � � =
+{� � � = , � = − � � � = , � = } − � � � =
⇒By using sample average and difference, the can be obtained.
Estimation of average causal effects
Econometrics 12
• Let sub-sample means of � define by � ,� � � ,�, and sub-sample average of
� in treatment and control groups define by � and � .
⇒the matching estimator of � � � = − � � � = can be defined as
⇒Under the assumption of pure-random sampling and � � � � = , � = =
� � � � = , � = , above matching estimator is the
⇒From the estimator of average causal effects in treatment and control groups, we can obtain the BLUE of the average causal effect,
Numerical example
Econometrics 13
• Simple average difference between treatment and control group is -200.
• The average differences in rich and poor region are
⇒The matching estimator is also
ID �� region �� ID �� region ��
1 0 Rich 500 5 1 Rich 500
2 0 Rich 500 6 1 Poor 100
3 0 Rich 500 7 1 Poor 100
4 0 Poor 100 8 1 Poor 100
Exact matching estimator
Econometrics 14
• The sample difference conditional on �� = � is the BLUE of
� � � = , �� = � − � � � = , �� = � , which leads to the BLUE of the average causal effect �[ � ] − �[ � ], if and only if
Intuition
Econometrics 15
• � � � � = , �� = � = �[ � � | � = , �� = �] requires that among group with �� = �, treatment status are
• We can easily extend above discussion to multiple control variable case
e.g.,) We can obtain the matching estimator as BLUE if treatments status are pure- randomly determined among same parental education level, gender, religion, and race groups.
Limitation
Econometrics 16
• The exact matching approach requires to estimate sub-sample means in each combination of treatment and control variable
⇒ In many times, we face
We cannot use the exact matching approach if – continuous treatment variables,
– continuous control variables, – a y o trol aria les.
the sample-size problem.
Conclusion
Econometrics 17
• After conditional on the values of control variables, if treatments status is determined pure-randomly, the matching estimator provides the BLUE.
• However, in many case, the exact matching estimator cannot be used due to small sample problems ⇒We should use the OLS with control variables.
• More serious problem, if we cannot observe all covariate, we cannot get the BLUE.
Question
Econometrics 18
• True/False question. Suppose the pure random sampling data.
1. We can observe income, gender, and education level. To estimate causal effect of gender, we should use the matching estimator using the education level as the control variable.
2. In the original data, we observed microfinance status, income, and region name, and used then the matching estimator with region name as control variable to estimate the causal effect of microfinance.
We now obtain the information of regional average literacy rate. However, even in the case, e do ’t necessarily use regional average literacy rate as
additional control variable.
Good control variable and Bad control variable
Econometrics 19
• To reduce bias, we should control only covariate which have impacts on both treatment and outcome.
⇔ We should not control mediator which is determined by treatment.
ID Sex Education �� ID Edu atio ��
1 Male High school 500 5 Female High school 500
2 Male High school 500 6 Female Junior high 100
3 Male High school 500 7 Female Junior high 100
4 Male Junior high 100 8 Female Junior high 100
Redundant control
Econometrics 20
• Even if � is o ariate, you do ’t e essarily o trol it if
� � � = , �� = � = �[ � | � = , �� = �, � = ] For any T,X,Z.
⇒ Members of group with � = � �� = � are same as groups with � = , �� = �, � � = .
• Because the regional literacy rate is common within a region, the member of group according to region is same as the member of group cording to region and regional literacy rate. ⇒Only region name, we can automatically control
common factors within region (e.g., average income, public facility, and country GDP etc).