Example 5: Multinomial logit model:
The ith individual has m + 1 choices, i.e., j = 0 , 1 , · · · , m.
P(y i = j) = exp(X i β j )
∑ m
j = 0 exp(X i β j ) ≡ P i j ,
for β 0 = 0. The case of m = 1 corresponds to the bivariate logit model (binary choice).
Note that
log P i j
P i0 = X i β j
The log-likelihood function is:
log L( β 1 , · · · , β m ) =
∑ n i = 1
∑ m j = 0
d i j ln P i j ,
where d i j = 1 when the ith individual chooses jth choice, and d i j = 0 otherwise.
Example 6: Nested logit model:
(i) In the 1st step, choose YES or NO. Each probability is P Y and P N = 1 − P Y . (ii) Stop if NO is chosen in the 1st step. Go to the next if YES is chosen in the 1st step.
(iii) In the 2nd step, choose A or B if YES is chosen in the 1st step. Each probability is P A|Y and P B|Y .
For simplicity, usually we assume the logistic distribution.
So, we call the nested logit model.
The probability that the ith individual chooses NO is:
P N , i = 1 1 + exp(X i β ) .
The probability that the ith individual chooses YES and A is:
P A | Y , i P Y , i = P A | Y , i (1 − P N , i ) = exp(Z i α ) 1 + exp(Z i α )
exp(X i β )
1 + exp(X i β ) .
The probability that the ith individual chooses YES and B is:
P B | Y , i P Y , i = (1 − P A | Y , i )(1 − P N , i ) = 1 1 + exp(Z i α )
exp(X i β ) 1 + exp(X i β ) . In the 1st step, decide if the ith individual buys a car or not.
In the 2nd step, choose A or B.
X i includes annual income, distance from the nearest station, and so on.
Z i are speed, fuel-e ffi ciency, car company, color, and so on.
The likelihood function is:
L( α, β ) =
∏ n i=1
P I N
1i, i (
((1 − P N , i )P A | Y , i ) I
2i((1 − P N , i )(1 − P A | Y , i )) 1 − I
2i) 1 − I
1i=
∏ n i=1
P I N
1i, i (1 − P N , i ) 1 − I
1i(
P I A
2i| Y , i (1 − P A | Y , i ) 1 − I
2i) 1 − I
1i,
where
I 1i =
1 , if the ith individual decides not to buy a car in the 1st step, 0 , if the ith individual decides to buy a car in the 1st step, I 2i =
1 , if the ith individual chooses A in the 2nd step,
0 , if the ith individual chooses B in the 2nd step,
Remember that E(y i ) = F(X i β ∗ ), where β ∗ = β σ . Therefore, size of β ∗ does not mean anything.
The marginal e ff ect is given by:
∂ E(y i )
∂ X i = f (X i β ∗ ) β ∗ .
Thus, the marginal e ff ect depends on the height of the density function f (X i β ∗ ).
7.2 Limited Dependent Variable Model (
制限従属変数モデル)
Truncated Regression Model: Consider the following model:
y i = X i β + u i , u i ∼ N(0 , σ 2 ) when y i > a, where a is a constant, for i = 1 , 2 , · · · , n.
Consider the case of y i > a (i.e., in the case of y i ≤ a, y i is not observed).
E(u i | X i β + u i > a) =
∫ ∞
a − X
iβ
u i f (u i )
1 − F(a − X i β ) du i . Suppose that u i ∼ N(0 , σ 2 ), i.e., u i
σ ∼ N(0 , 1).
Using the following standard normal density and distribution functions:
φ (x) = (2 π ) − 1 / 2 exp( − 1 2 x 2 ) , Φ (x) =
∫ x
−∞
(2 π ) − 1 / 2 exp( − 1
2 z 2 )dz =
∫ x
−∞ φ (z)dz ,
f (x) and F(x) are given by:
f (x) = (2 πσ 2 ) −1/2 exp( − 1
2 σ 2 x 2 ) = 1 σ φ ( x
σ ) , F(x) =
∫ x
−∞ (2 πσ 2 ) − 1 / 2 exp( − 1
2 σ 2 z 2 )dz = Φ ( x σ ) .
[Review — Mean of Truncated Normal Random Variable:]
Let X be a normal random variable with mean µ and variance σ 2 . Consider E(X | X > a), where a is known.
The truncated distribution of X given X > a is:
f (x | x > a) = (2 πσ 2 ) −1/2 exp (
− 1
2 σ 2 (x − µ ) 2 )
∫ ∞
a
(2 πσ 2 ) − 1 / 2 exp (
− 1
2 σ 2 (x − µ ) 2 ) dx
= 1 σ φ ( x
σ ) 1 − Φ ( a − µ
σ )
.
E(X | X > a) =
∫ ∞
a
x f (x | x > a)dx =
∫ ∞
a
x(2 πσ 2 ) − 1 / 2 exp (
− 1
2 σ 2 (x − µ ) 2 ) dx
∫ ∞
a
(2 πσ 2 ) − 1 / 2 exp (
− 1
2 σ 2 (x − µ ) 2 ) dx
= σφ ( a − µ σ ) + µ (
1 − Φ ( a − µ σ ) ) 1 − Φ ( a − µ
σ )
= σφ ( a − µ σ ) 1 − Φ ( a − µ σ )
+ µ, which are shown below. The denominator is:
∫ ∞
a
(2 πσ 2 ) − 1 / 2 exp( − 1
2 σ 2 (x − µ ) 2 )dx =
∫ ∞
a−µ σ
(2 π ) − 1 / 2 exp( − 1 2 z 2 )dz
= 1 −
∫
a−µσ
−∞ (2 π ) − 1 / 2 exp( − 1 2 z 2 )dz
= 1 − Φ ( a − µ σ ) , where x is transformed into z = x − µ
σ . x > a = ⇒ z = x − µ
σ > a − µ
σ .
The numerator is:
∫ ∞
a
x(2 πσ 2 ) − 1 / 2 exp( − 1
2 σ 2 (x − µ ) 2 )dx
=
∫ ∞
a−µσ
( σ z + µ )(2 π ) − 1 / 2 exp( − 1 2 z 2 )dz
= σ
∫ ∞
a−µσ
z(2 π ) − 1 / 2 exp( − 1
2 z 2 )dz + µ
∫ ∞
a−µσ
(2 π ) − 1 / 2 exp( − 1 2 σ 2 z 2 )dz
= σ
∫ ∞
1 2
(
a−µσ)
2(2 π ) −1/2 exp( − t)dt + µ (
1 − Φ ( a − µ σ ) )
= σφ ( a − µ σ ) + µ (
1 − Φ ( a − µ σ ) )
, where z is transformed into t = 1
2 z 2 . z > a − µ
σ = ⇒ t = 1 2 z 2 > 1
2 ( a − µ
σ ) 2 .
[End of Review]
Therefore, the conditional expectation of u i given X i β + u i > a is:
E(u i | X i β + u i > a) =
∫ ∞
a − X
iβ
u i f (u i )
1 − F(a − X i β ) du i =
∫ ∞
a − X
iβ
u i σ
φ ( u i σ ) 1 − Φ ( a − X i β
σ )
du i
= σφ ( a − X i β
σ )
1 − Φ ( a − X i β
σ )
.
Accordingly, the conditional expectation of y i given y i > a is given by:
E(y i | y i > a) = E(y i | X i β + u i > a) = E(X i β + u i | X i β + u i > a)
= X i β + E(u i | X i β + u i > a) = X i β + σφ ( a − X i β
σ )
1 − Φ ( a − X i β
σ )
,
for i = 1 , 2 , · · · , n.
Estimation:
MLE:
L( β, σ 2 ) =
∏ n i = 1
f (y i − X i β ) 1 − F(a − X i β ) =
∏ n i = 1
1 σ
φ ( y i − X i β
σ )
1 − Φ ( a − X i β
σ )
is maximized with respect to β and σ 2 .
Some Examples:
1. Buying a Car:
y i = x i β + u i , where y i denotes expenditure for a car, and x i includes income, price of the car, etc.
Data on people who bought a car are observed.
People who did not buy a car are ignored.
2. Working-hours of Wife:
y i represents working-hours of wife, and x i includes the number of children, age, education, income of husband, etc.
3. Stochastic Frontier Model:
y i = f (K i , L i ) + u i , where y i denotes production, K i is stock, and L i is amount of labor.
We always have y i ≤ f (K i , L i ), i.e., u i ≤ 0.
f (K i , L i ) is a maximum value when we input K i and L i .
Censored Regression Model or Tobit Model:
y i =
X i β + u i , if y i > a , a , otherwise . The probability which y i takes a is given by:
P(y i = a) = P(y i ≤ a) = F(a) ≡
∫ a
−∞ f (x)dx ,
where f ( · ) and F( · ) denote the density function and cumulative distribution function of y i , respectively.
Therefore, the likelihood function is:
L( β, σ 2 ) =
∏ n i = 1
F(a) I(y
i= a) × f (y i ) 1 − I(y
i= a) ,
where I(y i = a) denotes the indicator function which takes one when y i = a or zero
otherwise.
When u i ∼ N(0 , σ 2 ), the likelihood function is:
L( β, σ 2 ) =
∏ n i = 1
(∫ a
−∞ (2 πσ 2 ) − 1 / 2 exp( − 1
2 σ 2 (y i − X i β ) 2 )dy i
) I(y
i= a)
× (
(2 πσ 2 ) − 1 / 2 exp( − 1
2 σ 2 (y i − X i β ) 2 ) ) 1 − I(y
i= a)
,
which is maximized with respect to β and σ 2 .
Example of Truncated Regression Model:
Demand Function of Watermelon
二人以上の世帯のうち勤労者世帯(