Econometrics
Dummy variable and cross term
Keisuke Kawata
IDEC, Hiroshima University
Dummy variable: Revisit
(Remainder) If the population model includes the binary treatment and/or control variables, you should use the dummy variable.
e.g.) Gender dummy (T): T=1 if male, T=0 if female.
• If outcomes of the discrete variable (category) is three and more, how can I do? e.g.) The effect of education level (junior high, high school, and college) on income. Can we use the dummy variable as T=0 if junior high school, T=1 if high school, and T=2 if college? ⇒
Wro g du y aria le
Y
0(junior high) 2(college) X
Yi = � + � �
1(high school) Too strong assumption
Multiple dummy variables
If your population model includes discrete explanation and control variables (and sample size is enough), you should use multiple dummy variables.
e.g.) We should use two dummy variables: high school dummy � and college dummy �� .
� = if high school graduate, and = if not.
�� = if college graduate, and � = if not.
⇒ We estimate the population model as
⇒ We can use the standard approach as in
Multiple dummy variables: Interpretation
• The expected income of junior high school graduate (� � = ��� = ) is
• The expected income of high school graduate (� � = , ��� = ) is
• The expected income of college graduate (� � = , ��� = ) is
⇒ : The income difference between high school and junior high school.
⇒ : The income difference between college and junior high school.
⇒ : The income difference between college and high school.
� � � � = ��� = =
� � � � = , ��� = =
� � � � = , ��� = =
Corre t du y aria le
Y
junior high high school college
We do ’t assu e a y para etri relatio ship et ee i o e a d edu atio le el.
Notion: reference category
Why does ot ake the ju ior high s hool du y?⇒ e.g.) The relationship between dummy variables is
+ ℎ ℎ ℎ + ℎ ℎ ℎ
• If the number of outcomes of a discrete variable is �, we should make dummy variables.
⇒We should not make the dummy of a category (← Called )
⇒The coefficient of a dummy variable should be interpreted as the difference between
Dummy variable VS Continuous variable
• In some case, categorical variable can be naturally converted to the continuous variable.
e.g.) The education level can be converted to the education year. Junior high ⇒ , High school ⇒ , College ⇒
• The assumption of linearity about the continuous variables may be acceptable.
⇒ There are
Dummy variable approach: Population model is more flexible ⇒ Continuous variable approach: Small number of ⇒
Dummy variable VS Continuous variable
• If sample size is enough large, the advantage of continuous variable approach is because the variance is enough small.
⇒We should use
Dummy variables and Multi olli earity
• I so e ase, e ould like to esti ate or o trol the effe ts of a ro fa tors. e.g.) Survey data from some regions.
• You conduct the household survey in three village (A,B an C).
• You data set includes the information about population size and price of rice in each village.
Population size Price of rice
Village A 1000 140
Village B 1200 130
Village C 800 200
e.g.) Survey data from some regions.
• We suppose the population model:
= � + � ∗ + � ∗ � + � ∗ � + �
∗ � +
Note: Village A is reference.
• Can we get the OLS estimators? ⇒
⇒Between village dummy and variables of village characteristic (rice price and population size), there are
= 4 − × � + 6 × �
= + × � − × �
Practical issue
• We can include either village dummy or variables of village characteristics.
• Which should be included?
• If your interest is the effects of village factors
⇒ should be included.
• If your interest is not the effects of village factors
⇒ should be included (controlled).
• Village dummy has a linear relationship with common factors among peoples within same village (e.g. history of village, location, weather et al).
If you control village dummy, such
Interaction between factors
• In the linear specification, the effect of one treatment does not depend on
e.g.) The effect of education year on income
To control the effect of gender, we estimate the following population model:
� = � + � × � � � + � × � � + �
⇒� captures the effect of education year holding gender constant.
⇒Assuming the effect of education year does not depend
⇒No interaction effects among repressors.
e.g. The effect of education year on income
Income
�
� � , � �
= � + � × � � �
� � , � �
= � + � × � � � + �
Interaction between categorical variables
• We suppose interaction effects among categorical variables.
e.g.) The interaction effects between gender and nationality on income. We should estimate the following population model
�[ �| , ] =
where is male dummy(=1 if male, and =0 if female), and is native dummy (=1 if native, and =0 if non-native).
Interaction between categorical variables
Male&Native: � = , = =
Male&Non-native: � = , = = Female&Native: � = , = =
Female&Non-native: � = , = =
• The impacts of nationality
Among male ⇒ � = , = − � = , = =
Among female ⇒ � = , = − � = , = =
• captures the difference of the effects of nationality between male and female.
Interaction between categorical and continuous variables
• Supposing interaction effects among categorical and continuous variables. e.g.) The interaction effects between gender and education year on income. We should estimate the following population model
�[ | ����, ] = � + �� × � _ � + � × +
Interaction between categorical and continuous variables
Income
Educ_year
�
� � , _ �
= � + �� × � _ �
� � , _ �
= � + �� + � × � _ � + �
•
Interaction between continuous variables
• We would like to suppose interaction effects among continuous variables. e.g.) The interaction effects between age and education year on income.
We should estimate the following population model
�[ |� , _ � ] = � + ��� + �� _ � + The effect of age on income is
�
�� =
⇒ captures the effect of education year on the age effect.
Quiz
• True/False question
1. We estimate the population model as E[ �|�] = � + � T + � � . If � > , the effect of T is increased if � is large.
2. We estimate the population model as [ �|� , � ] = � + � � + � � + � � � . If � < , the effect of � is increased if � is small.
Conclusion
• We can use the OLS estimation technique for the estimation of non-linear population model if the model is additive separable.
• Generally, if we suppose complicated population model, the bias may be eliminated, but the efficiency is down.
• If your sample size is not so small, about categorical variables, you should use dummy variable approach.
• We should incorporate the high order and interaction effects to get additional (and interesting) implication.