Slide 7_distribution 最近の更新履歴 Keisuke Kawata's HP

(1)

Econometrics: Exact Matching

Keisuke Kawata

IDEC, Hiroshima University

Econometrics 1

(2)

Bias problem from covariate

Econometrics 2

• We studies two approach to estimate the causal effects; sub-sample mean difference, OLS estimation

⇒In both approach, we need to assume no covariates which have impacts on both outcomes and treatments. ^⇒

One useful approach to reduce bias is to use control variables. Sub-sample mean difference:

OLS estimation:

(3)

Example: Education program

Econometrics 3

• A government starts to education program; subsidies for high-school education.

• Due to budget limitation, a part of household can receive subsidy

⇒We can estimate the difference of education attainments between treatments (receive subsidies) and controls (not receive subsidies).

• It’s u iased esti ator of ausal effe ts? ←We should collect more detail information .

• “uppose there are t o regio ; ri h regio a d poor regio . – Within each region, subsidies are pure-randomly distributed.

– Government more intensively distributes subsidies in poor region than in rich region: The share of treatments over region population is different.

(4)

Review: sample difference and causal effect

Econometrics 4

• Let _� _� _� denote potential education outcome with and without subsidies as respectively.

• The causal effect can be defined as

• The sample mean difference is the BLUE of

←It is unbiased estimator of the causal effect if and only if

where _{� = { , }.}

(5)

Numerical example

Econometrics 5

• Simple average difference between treatment and control group is

ID _�_� region _�_� ID _�_� region _�_�

1 0 Rich 500 5 1 Rich 500

2 0 Rich 500 6 1 Poor 100

3 0 Rich 500 7 1 Poor 100

4 0 Poor 100 8 1 Poor 100

(6)

Review: Source of bias

Econometrics 6

• In the example, we assume that larger subsidies are distributed in poor region than in rich region.

⇒Regional background is difference between treatment and control groups.

The share of poor region livener in treatment group is larger than in control group.

←

Education subsidies ^Income

Regional characteristics

(e.g., infrastructure, local government policies, regional economic conditions)

(7)

Conditional average causal effects

Econometrics 7

• Let define average causal effects in each region as

� _� _� = , _� = − � _� _� = , _� =

where _� is a dummy of treatment status (=1 if treated), _� = is a dummy of region (=1 if living in rich region).

(8)

Conditional average causal effects

T=1 group ^{T=0 group}

�[

_�

|

_�

= ,

_�

= ]

�[

_�

|

_�

= ,

_�

= ]

�[

_�

|

_�

= ,

_�

= ]

�[

_�

|

_�

= ,

_�

= ]

Causal

effect

Causal

effect

(9)

Conditional average causal effects

T=1 group ^{T=0 group}

�[

_�

|

_�

= ,

_�

= ]

�[

_�

|

_�

= ,

_�

= ]

�[

_�

|

_�

= ,

_�

= ]

�[

_�

|

_�

= ,

_�

= ]

Causal

effect

Causal

effect

(10)

Estimation of conditional average causal effects

Econometrics 10

• We can easily estimate the conditional average causal effects in region R(=0,1);

• Let estimate � _� _� = , _� = − � _� _� = , _� =

• The sub-sample mean income among treatment group in rich region, � _,�^{, is the} BLUE of

• Because treatment status is pure-randomly determined within region,

• The sub-sample mean among control group in rich region, � _,�, is the BLUE of both _� _� _� _{= ,} _� ₌ and

⇒The sub-sample difference in region is the of the conditional average causal effect.

(11)

Estimation of average causal effects

Econometrics 11

• Average causal effects in groups _� = can be also easily estimated.

• The average causal effects in group _� = can be rewritten as

� _� _� = − � _� _� =

= {� _� _� = , _� = − � _� _� = , _� = }� _{� �} =

+{� _� _� = , _� = − � _� _� = , _� = } − � _{� �} =

⇒By using sample average and difference, the can be obtained.

(12)

Estimation of average causal effects

Econometrics 12

• Let sub-sample means of _� ^{define by}_� _,� _{� �} _,�, and sub-sample average of

� in treatment and control groups define by _{� and � .}

⇒the matching estimator of _� _� _� ₌ _{− �} _� _� ₌ can be defined as

⇒Under the assumption of pure-random sampling and _� _� _� _� _{= ,} _� ₌ ₌

� _� � _� = , _� = , above matching estimator is the

⇒From the estimator of average causal effects in treatment and control groups, we can obtain the BLUE of the average causal effect,

(13)

Numerical example

Econometrics 13

• Simple average difference between treatment and control group is -200.

• The average differences in rich and poor region are

⇒The matching estimator is also

ID _�_� region _�_� ID _�_� region _�_�

1 0 Rich 500 5 1 Rich 500

2 0 Rich 500 6 1 Poor 100

3 0 Rich 500 7 1 Poor 100

4 0 Poor 100 8 1 Poor 100

(14)

Exact matching estimator

Econometrics 14

• The sample difference conditional on �_� = � is the BLUE of

� _� _� = , �_� = � − � _� _� = , �_� = � , which leads to the BLUE of the average causal effect _�[ _� _{] − �[} _� ], if and only if

(15)

Intuition

Econometrics 15

• � _� � _� = , �_� = � = �[ _� � | _� = , �_� = �] requires that among group with _�_� = �, treatment status are

• ^Ｗｅ can easily extend above discussion to multiple control variable case

e.g.,) We can obtain the matching estimator as BLUE if treatments status are pure- randomly determined among same parental education level, gender, religion, and race groups.

(16)

Limitation

Econometrics 16

• The exact matching approach requires to estimate sub-sample means in each combination of treatment and control variable

⇒ In many times, we face

We cannot use the exact matching approach if – continuous treatment variables,

– continuous control variables, – a y o trol aria les.

the sample-size problem.

(17)

Conclusion

Econometrics 17

• After conditional on the values of control variables, if treatments status is determined pure-randomly, the matching estimator provides the BLUE.

• However, in many case, the exact matching estimator cannot be used due to small sample problems ^⇒We should use the OLS with control variables.

• More serious problem, if we cannot observe all covariate, we cannot get the BLUE.

(18)

Question

Econometrics 18

• True/False question. Suppose the pure random sampling data.

1. We can observe income, gender, and education level. To estimate causal effect of gender, we should use the matching estimator using the education level as the control variable.

2. In the original data, we observed microfinance status, income, and region name, and used then the matching estimator with region name as control variable to estimate the causal effect of microfinance.

We now obtain the information of regional average literacy rate. However, even in the case, e do ’t necessarily use regional average literacy rate as

additional control variable.

(19)

Good control variable and Bad control variable

Econometrics 19

• To reduce bias, we should control only covariate which have impacts on both treatment and outcome.

⇔ We should not control mediator which is determined by treatment.

ID Sex Education _�_� ID _{Edu atio} _�_�

1 Male High school 500 5 Female High school 500

2 Male High school 500 6 Female Junior high 100

3 Male High school 500 7 Female Junior high 100

4 Male Junior high 100 8 Female Junior high 100

(20)

Redundant control

Econometrics 20

• Even if _� is o ariate, you do ’t e essarily o trol it if

� _� _� = , �_� = � = �[ _� | _� = , �_� = �, _� = ] For any T,X,Z.

⇒ Members of group with _� ₌ _{� �}_� = � are same as groups with _� = , �_� = �, � _� = .

• Because the regional literacy rate is common within a region, the member of group according to region is same as the member of group cording to region and regional literacy rate. ^⇒Only region name, we can automatically control

common factors within region (e.g., average income, public facility, and country GDP etc).

Slide 7_distribution 最近の更新履歴 Keisuke Kawata's HP

Econometrics: Exact Matching

Keisuke Kawata

IDEC, Hiroshima University

Bias problem from covariate

Example: Education program

Review: sample difference and causal effect

Numerical example

Review: Source of bias

Conditional average causal effects

Conditional average causal effects

T=1 group T=0 group

�[

|

= ,

= ]

�[

|

= ,

= ]

�[

|

= ,

= ]

�[

|

= ,

= ]

Causal

effect

Causal

effect

Conditional average causal effects

T=1 group T=0 group

�[

|

= ,

= ]

�[

|

= ,

= ]

�[

|

= ,

= ]

�[

|

= ,

= ]

Causal

effect

Causal

effect

Estimation of conditional average causal effects

Estimation of average causal effects

Estimation of average causal effects

Numerical example

Exact matching estimator

Intuition

Limitation

Conclusion

Question

Good control variable and Bad control variable

Redundant control

T=1 group ^{T=0 group}

T=1 group ^{T=0 group}