Econometrics
Regression with a binary outcome
Keisuke Kawata IDEC
Binary explained variables
• In some case, we would like to estimate the effect on e.g.)
• The effects of class size on drop out.
• The effects of parent income on college graduation.
• The effe ts of the u er of hildre o other’s la or supply.
• The effects of business cycle in home country on migration.
⇒How to do?
Binary treatments
• Let denote (binary) potential outcome by �� � (=0 or 1).
• If outcome is binary variable,
• Good estimator of the average difference of probability as �� = between T=1 and T=o groups is
�� � �� �� = − �� �[�� |�� = ]
• If hold, the sample difference is BLUE
of the effect of treatments on the probability with �� � =
Continuous treatments: Linear model
• We should use continuous population model
⇒ What’s types odel of o ditio al pro a ility should e applied?
• Linear probability model
⇒We can use the standard OLS technique to get the estimator �� which can be interpreted as the effect of treatments on the probability with �� = .
⇒Under the least square assumptions, these estimators are unbiased and consistent.
Problem of linear model: Graphical example
Y
T 1
0
Pr �� = �� = � = � + ���
????????
????????
Non-linear population model
• We assume the population model as
Pr �� = �� = � = where Φ is the cumulative distribution function (c.d.f.). Probit model: Φ is the c.d.f of
Φ � + ��� =
−∞
�0+��� − �
0+��� 2/
� Logit model: Φ is the c.d.f of
Φ � + ��� = +�0+��0+�����
Graphical (rough) image
Y
T 1
0
�� ���
The image of logit model is similar to probit model
Non-linear population model
• How to estimate the coefficients of probit and logit models?
←Because the population model is not additive separable, we cannot standard OLS estimations.
Approach 1) Nonlinear least squares estimation:
The estimators are determined to minimize the sum of squared prediction mistakes:
⇒Consistent but
Maximum Likelihood Estimation: example
Maximum Likelihood Estimation: Estimators are determined to maximize the
e.g.) Let suppose the population distribution as binary:
= with probability ,
= with probability − Our data set is,
• Let try to estimate using the maximum likelihood estimation.
ID The value of y
1 1
2 0
3 1
Maximum Likelihood Estimation: example
If we use pure-random sampling data, the probability that our data is drawn is
⇒Called as likelihood function.
Maximum likelihood estimators are defined to maximize the likelihood function as
max × − ×
Then,
Maximum Likelihood Estimation
Procedure of maximum likelihood estimation
1. Under the assumption about the distribution (probit: normal, logit: logistic), calculating the likelihood as a function of unknown parameters � , … , � . 2. Estimating the unknown parameters to maximize the likelihood function.
Under the assumption about the distribution, maximum likelihood estimators are and have
⇒ than OLS regression.
⇒ We can do the statistical test and construct confidence interval.
Marginal effects
• The coefficients of probit and logit do not mean marginal effects:
�� ≠ ��� �[�|�]��
⇒ � is not intuitive.
• Using statistical software, you can calculate marginal effect ← should report in your paper.
Conclusion: Linear VS Probit VS Logit
• Which type of specification is better? ⇒ No clear answer.
• You should first get estimators of a linear model as benchmark.
• In many cases, the results of Probit and Logit are not so different.
• To R sessio , let i stall the pa kage mfx .