EIOFI201702 Recent site activity Naoki Wakamori's Website

(1)

Naoki Wakamori

University of Tokyo

2017 A1 Term

(2)

1. Demand for Differentiated Products 2. Discrete Choice Models with Micro Data 3. Discrete Choice Models with Aggregate Data

(3)

▶ What is a demand function?

(4)

▶ Almost all products are differentiated!

▶ Automobile, Banking, Beer, Personal Computers, etc..

▶ Other examples?

(5)

▶ Almost all products are differentiated!

▶ Automobile, Banking, Beer, Personal Computers, etc..

▶ Other examples?

▶ Different from homogeneous product markets

▶ Demand function for product j ∈ J:

Q^D_j = Q^D_j(P¹, P², · · · , PJ; X¹, X², · · · , XJ)

▶ Supply function for product j ∈ J:

Q^S_j = Q^S_j(P¹, P², · · · , P^J^{; X}¹^{, X}², · · · , X^J⁾ i.e., demand and supply functions for each product

(6)

▶ For example, the constant elasticity framework: log Q_j = β_j,0+

∑J k=1

β_j,klog P_k+ ε_j.

(7)

▶ For example, the constant elasticity framework: log Q_j = β_j,0+

∑J k=1

β_j,klog P_k+ ε_j.

▶ Too many elasticities!!

▶ If we have 100 cars, need to estimate 10,000 parameters!

▶ Easily exceed the number of observations that we can have

▶ How to solve this problem?

(8)

▶ Possible solutions are:

1. Put some restrictions on cross-price elasticities: 1.1 symmetric cross-price elasticities

1.2 zero cross-price elasticities.

2. Categorize a couple of products together to reduce the number of products.

(9)

▶ These are too arbitrary and no theory behind

(10)

▶ These are too arbitrary and no theory behind

▶ Therefore we take an structural approach, specifying how such demand function is generated

(11)

▶ A structural approach is characterized by two features:

(12)

▶ A structural approach is characterized by two features: 1. Characteristic approach:

Each product is assumed to be a bundle of the characteristics and consumers derive utility from these characteristics

Xj = [xj1, xj2, · · · , x^jK], for all j

(13)

Xj = [xj1, xj2, · · · , x^jK], for all j 2. Utility maximization (revealed preference):

Each consumer chooses the product which gives the highest utility

uij = Xjβ > Xlβ = uil for∀l ̸= j

(14)

Xj = [xj1, xj2, · · · , x^jK], for all j 2. Utility maximization (revealed preference):

Each consumer chooses the product which gives the highest utility

uij = Xjβ > Xlβ = uil for∀l ̸= j

▶ Although we cannot observe the utility function itself, it is possible to infer consumers utility from observed data.

(15)

▶ Disaggregate (micro, individual, or consumer-level) data

▶ Easily incorporate interactions of household and product characteristics

▶ Rarely available to researchers

▶ Aggregate (macro, market-level) data

▶ Often easily available to researchers

▶ A bit hard to reconcile individual demographics

(16)

▶ Advantages:

▶ Reduce the number of parameters in demand

▶ Reduce computational burden

▶ Enables us to predict new products’ demand (Petrin, 2002)

▶ Disadvantages:

▶ Rule out multiple purchases (cf. Gentzkow, 2007)

▶ Put parametric assumptions on demand structure (cf. non-parametric identification by Fox et.al (2012), Fox and Gandhi (2015), Byrne, Imai, Jain and Hirukawa (2015).)

(17)

1. Demand for Differentiated Products

2. Discrete Choice Models with Micro Data 3. Discrete Choice Models with Aggregate Data

(18)

▶ In the basic econometrics course, we only consider continuous dependent variables cases

(19)

▶ Dependent variables can be discrete values!

(20)

▶ Dependent variables can be discrete values! 1. Unordered cases (Qualitative variable models)

▶ Transportation choice: Bus, Taxi, Car, Train, and so on

▶ Occupation choice: Banker, medical doctor, lawyer and so on

(21)

▶ Occupation choice: Banker, medical doctor, lawyer and so on 2. Ordered cases

▶ Student’s Grade: A, B, C, D, F

▶ # of Automobiles owned by one household: 1, 2, 3, ...

▶ other example?

(22)

▶ Occupation choice: Banker, medical doctor, lawyer and so on 2. Ordered cases

▶ Student’s Grade: A, B, C, D, F

▶ # of Automobiles owned by one household: 1, 2, 3, ...

▶ other example?

▶ Revealed preference provide useful information!!

(23)

▶ i denotes a individual: i = 1, 2, · · · , N

▶ j denotes a choice element: j ∈ J = {0, 1, 2, · · · , J}

▶ _X

ij denotes a characteristics vector of person/choices

(24)

▶ _X

▶ Assume each i chooses an option j that maximizes her utility: u_ij = X_ijβ+ ε_ij

(25)

▶ _X

▶ Note that this model is called

(26)

▶ _X

▶ binary/multinomial choice models when J = 1 or J > 1,

▶ probit/logit models whenf εij follows the standard normal or logistic distribution

▶ random coefficient models if β becomes βi

(27)

▶ _X

▶ binary/multinomial choice models when J = 1 or J > 1,

▶ probit/logit models whenf εij follows the standard normal or logistic distribution

▶ random coefficient models if β becomes βi

▶ Typically the normalization is required, i.e., u_i0= ε_i0

(28)

▶ Assume that ε follows the Type I extreme value distribution: F (x) = e^e^−x

Then this class of model is called “Multinomial Logit Models”

▶ What is the probability of choosing j for person i, Pr(d_i = j)?

▶ Consider a case with three alternatives, J = {0, 1, 2}

(29)

▶ Assume that ε follows the Type I extreme value distribution: F (x) = e^e^−x

Then this class of model is called “Multinomial Logit Models”

▶ What is the probability of choosing j for person i, Pr(d_i = j)?

▶ Consider a case with three alternatives, J = {0, 1, 2}

▶ McFadden showed that we have a nice analytic formula! Pr(d_i = j) = _∑_J^exp(X^ij^β)

k=1^exp(X^ik^β)

▶ Show an example on the blackboard

(30)

▶ We can construct the likelihood function.

∏N i=1





∏J j=1

{ exp(Xi^β_j)

∑J

k=1^exp(Xⁱ^βk⁾

}yij^



where

yij =

{1, if i chooses j, 0, otherwise.

(31)

Raw Prediction Prediction Data When β = 1 When β = 2 1

2 3 4 ... N







1 0 0 1 0 0 0 0 1 0 1 0 ... ^... ^... 0 1 0













.5 .25 .25 .7 .2 .1 .4 .4 .2 .3 .5 .2 ... ^... ^... .2 .5 .3













.7 .1 .2 .8 .1 .1 .1 .1 .8 .1 .8 .1 ... ^... ^... .2 .7 .1







(32)

Raw Prediction Prediction Data When β = 1 When β = 2 1

2 3 4 ... N







1 0 0 1 0 0 0 0 1 0 1 0 ... ^... ^... 0 1 0













.5 .25 .25 .7 .2 .1 .4 .4 .2 .3 .5 .2 ... ^... ^... .2 .5 .3













.7 .1 .2 .8 .1 .1 .1 .1 .8 .1 .8 .1 ... ^... ^... .2 .7 .1







β = 1: Likelihood = .5 × .7 × .2 × .5 × .5 β = 2: Likelihood = .7 × .8 × .8 × .8 × .7

(33)

▶ Implementation in Stata is soooooo easy!! Just use the following syntax:

mlogit DependentVar IndVar1 IndVar2

if you have two explanatory variables. Usually, Stata automatically put constant

▶ You can also set some restrictions on parameters using constraint command before estimating the model.

(34)

1. Demand for Differentiated Products 2. Discrete Choice Models with Micro Data

3. Discrete Choice Models with Aggregate Data

(35)

1. Demand for Differentiated Products 2. Discrete Choice Models with Micro Data

3. Discrete Choice Models with Aggregate Data 3.1 Mutinomial Logit Models

3.2 Nested Logit Models

3.3 Random Coefficients Models (Mixed Logit)

(36)

Case I: Logit Model

(37)

uij = Xj^β_{− αp}j+ ξj

| {z }

+εij,

= δ_j + ε_ij. where

▶ _X_j: a vector of product characteristics for product j

▶ β: a vector of coefficients

▶ α: a price coefficient

▶ _p

j: price for product j

▶ _ξ

j: consumers’ valuation of an unobserved product characteristics

▶ _ε

ij: i.i.d. utility shock across consumers and choices

▶ _δ

j: the mean utility of product j and assume δ₀ = 0.

(38)

▶ Each consumer purchases at most one product which gives the highest utility.

▶ Let M denote the market size.

▶ Given the density f (ε), the market demand for product j is q_j = M s_j, where s_j(δ) =

∫

Aj(δ)

f (ε)dε. where Aj is a set of consumers who purchase product j:

A_j_{(δ) = {ε|δ}_j+ ε_ij > δ_k+ ε_ik, ∀k ̸= j}.

(39)

▶ Assume that ε follows Type I extreme value distribution, F (ε) = exp(− exp(−ε)).

▶ Market share of product j is given by: s_j(δ) = ^exp(δ^j⁾

1 +^∑^J_k=1exp(δ_k)

▶ Market share of the outside option is given by:

s₀(δ) = ¹

1 +^∑^J_k=1exp(δ_k)

(40)

▶ Invert out unobserved product characteristics, ξ_j.

▶ Taking logs of market share of product j yields ln(s_j_{) − ln(s}₀) = δ_j = X_jβ_{− αp}_j + ξ_j.

▶ _{Assuming ξ}

j is an error term, we can estimate the model via regression!!

▶ The dependent variable is ln(sj) − ln(s⁰⁾

▶ The independent variables are [Xj, pj]

▶ The error term is ξj.

▶ We would expect Cov(pj, ξj) ̸= 0

▶ Need to use IV technique. Discuss later.

(41)

Own- and cross- price elasticities are give by:

∂sj

∂p_k p_k s_j ⁼

{−αpj(1 − sj), if k = j, αp_ks_k, otherwise

▶ Now all products are substitute in some extent!

▶ Are these price elasticities realistic?

(42)

∂sj

∂p_k p_k s_j ⁼

1. Own-price elasticity is proportional to own price.

▶ _s

j is typically small, implying α(1 − sj) is nearly constant.

▶ This implies that the lower the price, the lower the elasticity.

▶ This implies that markups should be higher in cheap products.

(43)

∂sj

∂p_k p_k s_j ⁼

1. Own-price elasticity is proportional to own price.

▶ _s

j is typically small, implying α(1 − sj) is nearly constant.

▶ This implies that the lower the price, the lower the elasticity.

▶ This implies that markups should be higher in cheap products. 2. Cross-price elasticities are proportional to market share.

▶ Substitution between product j and j^′ depends on the market share.

▶ Reason: i.i.d. structure of the random shock.

(44)

Case II: Nested Logit Model

(45)

The notation is based on Cardell (1991), Econometric Theory.

▶ Group the products into G + 1 exhaustive and mutually exclusive sets, g = 0, 1, · · · , G.

▶ Denote the set of products in group g as Gg^.

▶ The outside good, j = 0, is assumed to be the only member of group 0.

▶ Use blackboard to explain the concept.

(46)

▶ Maintained assumptions

▶ Homogeneous consumer taste

▶ The mean utility from the outside option is normalized to zero, i.e., δ⁰= 0.

▶ Utility for consumer i choosing product j is given by: u_ij = X_jβ_{− αp}_j+ ξ_j+ ξ_ig_{+ (1 − σ)ε}_ij,

= δ_j+ ξ_ig _{+ (1 − σ)ε}_ij

▶ For consumer i, ξig is common to all products in group g.

▶ Assuming extreme value distribution for ε_ij, ξ_ig_{+ (1 − σ)ε}_ij is also an extreme value random variable.

(47)

▶ The feature of σ:

▶ As σ → 1, the within group correlation of utility levels goes to one.

▶ As σ → 0, the within group correlation of utility levels goes to zero.

▶ _{If d}

jg is a dummy variable equal to one if j ∈ Gg and equal to zero otherwise, we can rewrite the utility function as

uij = δj+^∑

g

djgξig_{+ (1 − σ)ε}ij.

This can be viewed as a special case of random coefficients models.

(48)

s_j/g(δ, σ) = exp(δ_j/(1 − σ))/Dg

where Dg is given by Dg_≡^∑_j∈G_gexp(δj_{/(1 − σ)).}

▶ Similarly, the probability of choosing one of the group g products is

sg(δ, σ) = ^D

(1−σ)

[∑ g k^D

(1−σ) k

]

▶ Therefore the market share for product j is given by: s_j(δ, σ) = s_j/g(δ, σ)s_g(δ, σ) = ^exp(δ^j^{/(1 − σ))} D_g^σ^[∑_kD^(1−σ)_k ^]

(49)

▶ Also the market share for the outside options is given by: s₀(δ, σ) = 1/

[

∑

k

D^(1−σ)_k ]

.

▶ Again, we can use inversion technique:

ln(s_j_{) − ln(s}₀) = δ_j/(1 − σ) − σ ln(Dg⁾

ln(s_j/g) = δj/(1 − σ) − ln(D^g⁾ which gives us

ln(s_j_{) − ln(s}₀) = δ_j+ σ ln(s_j/g),

= X_jβ_{− αp}_j+ σ ln(s_j/g) + ξ_j.

(50)

∂s_j

∂p_k p_k sj

=







−αpj(1 − σsj/g− (1 − σ)sj)/(1 − σ), if k = j,

−αpk^(σsj/g+ (1 − σ)sk)/(1 − σ), if j, k ∈ g

αp_ks_k, otherwise

▶ The cross-price elasticity of two products in the same group is higher than the cross-price elasticity of two products in different groups when everything else is constant.

▶ Are these elasticities realistic now?

(51)

▶ Pros: Allows us to model correlation among groups of similar products in a simple way and easy to estimate.

▶ Cons: Correlation patterns depend on grouping of products, which is determined prior to the estimation.

(52)

Case III: Random Coefficients Model

(53)

u_ij = X_jβ_i+ α ln(y_i_{− p}_j) + ξ_j+ ε_ij.

▶ Do we have any economic model behind this specification?

(54)

▶ Do we have any economic model behind this specification? Yes. Consider the following Cobb-Douglas utility:

max_(C,j) u^c(C)u^a(j) exp(εij) s.t. C + pj _{≤ y}i, where

{u^c(C) = C^α

ln(u^a(j)) = Xj^β_i+ ξj

▶ Then the maximization problem can be rewritten as

(55)

▶ Do we have any economic model behind this specification? Yes. Consider the following Cobb-Douglas utility:

max_(C,j) u^c(C)u^a(j) exp(εij) s.t. C + pj _{≤ y}i, where

{u^c(C) = C^α

ln(u^a(j)) = Xj^β_i+ ξj

▶ Then the maximization problem can be rewritten as maxj ^(yⁱ^{− p}^j⁾

α_exp(X

j^βi^{+ ξ}j^{) exp(ε}ij^),

which enables us to have the utility function above.

(56)

u_ij = X_jβ_i+ α ln(y_i_{− p}_j) + ξ_j+ ε_ij,

=^∑

k

x_jkβ_ik+ α ln(y_i_{− p}_j) + ξ_j+ ε_ij,

=^∑

k

x_jk( ¯β_k+ σ_kν_ik) + α ln(y_i_{− p}_j) + ξ_j+ ε_ij

where

▶ _X

j: a vector of product characteristics for product j

▶ _β

i: a vector of coefficients which differs across consumers

▶ _p

▶ _ξ

j: consumers’ valuation of an unobserved product charact.

▶ _ε_ij: i.i.d. utility shock across consumers and choices

▶ _ν

ik: random utility shock which follows N (0, 1)

(57)

=^∑

k

=^∑

k

where

▶ _X

▶ _β

▶ _p

▶ _ξ

▶ _ν

(58)

=^∑

k

=^∑

k

where

▶ _X

▶ _β

▶ _p

▶ _ξ

▶ _ν

(59)

▶ Furthermore, we can rewrite the utility as: u_ij = ^∑

k

x_jk( ¯β_k+ σ_kν_ik) + α ln(y_i_{− p}_j) + ξ_j + ε_ij,

= ^∑

k

x_jkβ^¯_k+ ξ_j

| {z }

δj:mean utility

+^∑

k

x_jkσ_kν_ik+ α ln(y_i_{− p}_j)

| {z }

µij:deviation from the mean

+ε_ij,

▶ Assuming Type 1 Extreme value distribution for ε, the choice probability for product j is given by:

Pr(d_i_{= j|{ν}_ik_}_{k=1,··· ,K}) = ^exp(δ^j^{+ µ}^ij⁾ 1 +^∑^J_l=1exp(δ_l+ µ_il)^.

(60)

▶ A set of consumers who purchase product j is given by

Aj = { ({εih}^Jh=1,{νik}^Kk=1) | uij≥ uil,∀l ̸= j}

= { ({εih}^Jh=1,{νik}^Kk=1) | εil≤ (δj− δl) + (µij− µil) + εij,∀l ̸= j}.

▶ Therefore, we have the market share for product j as

sj =

∫

· · ·

∫

| {z }

ν

∏

l̸=j

[ exp(δj+ µij) 1 +^∑^J_l=1exp(δl+ µil)

]

dFν₁(νi1) · · · dFν_K(νiK).

(61)

▶ Unfortunately, we cannot use the inversion technique that we learned last time. But, we have J equations with J unknowns:

s₁(δ₁, · · · , δJ^{) = S}1

s₂(δ₁, · · · , δJ^{) = S}2

... s_J(δ₁, · · · , δJ^{) = S}J^.

▶ This system of equations has unique solution, and thus we can solve for δ using contraction mapping!! Then, we have

δ^∗_j = X_jβ + ξ^¯ _j, which enables us to have ˜ξ_j = δ_j^∗_{− X}_jβ.^¯

(62)

Now price elasticity is given by:

∂sj

∂p_k p_k s_j ⁼

{−^p_s^j_j ^∫^αsij(1 − sij^{)dP (ν),} ^{if k = j,} pk

sj

∫αs_ijs_ikdP (ν), otherwise. Do you remember the previous three models’ elasticity?

1. Vertical differentiation model 2. Logit model

3. Nested logit model

(63)

(64)

▶ Do we need IVs? - Yes, because

▶ price is endogenous in the model.

▶ price is usually positively correlated with the unobserved product characteristics, ξj. Thus, the price coefficient, α, is biased upward.

▶ Instruments, zj, must satisfy the following conditions:

▶ _E[ξ

j_|zj] = 0,

▶ _Cov(z

j, pj_{) ̸= 0.}

(65)

▶ Variables that shift cost and are uncorrelated with the demand shock.

▶ IVs: cost shifters

▶ Average number of employees, capital structure or etc..

▶ Problem: Rarely observe cost shifters that vary by product.

(66)

▶ Prices in other markets

▶ E.g., use prices in New York, Delaware, New Jersey as instruments for price endogeneity in Pennsylvania.

▶ These instruments can pick up common cost shocks due to the common marginal cost, but they do not pick up market specific demand shock.

▶ However, if they also pick up common demand shocks,they are invalid.

(67)

▶ Instruments (zj) must satisfy:

E[ξ_j_|z_j] = 0 and Cov(p_j, z_j) = 0

▶ BLP propose the following three sets of IVs: 1. xjk

2. ^∑_r̸=j,r∈F_fxrk

3. ^∑_{r̸=j,r /}_∈F_fxrk

▶ The average product characteristics offered by other firms

▶ is excluded from the product j’s utility

(68)

3. ^∑_{r̸=j,r /}_∈F_fxrk

▶ is correlated with price of product j via FOC:

(69)

3. ^∑_{r̸=j,r /}_∈F_fxrk

▶ Suppose X is only one dimension, which is the size.

(70)

3. ^∑_{r̸=j,r /}_∈F_fxrk

▶ _D_j_(p|x₁_,_{· · · , x}_J) should be affected by xl, l̸= j.

(71)

3. ^∑_{r̸=j,r /}_∈F_fxrk

▶ Thus, the optimal price pjshould be a function of xl, l̸= j.

(72)

3. ^∑_{r̸=j,r /}_∈F_fxrk

▶ Thus, the optimal price pjshould be a function of xl, l̸= j.

▶ A feature of oligopolistic markets allows this argument!!

(73)

(74)

Now we obtain the market-level unobservables: (ξ, ω).

▶ In principle, imposing the specific distributional assumptions for (ξ, ω), we can construct a maximum likelihood function.

(75)

Now we obtain the market-level unobservables: (ξ, ω).

▶ In principle, imposing the specific distributional assumptions for (ξ, ω), we can construct a maximum likelihood function.

▶ In BLP, however, they suggest to use GMM utilizing the following moment condition:

E^[H(z)(ξ ω)^′^]= 0,

where H(z) is a set of appropriate instruments discussed previously.

▶ More specifically, define G_J(θ) = ¹

J

∑J j=1

H_j(z)^{( ˜ξ}^j^(θ)

˜ ω_j(θ)

)

(76)

GMM estimator is defined by: θGMM= arg min

θ ^||G^J^(θ)||^A^,

where A is an optimal weighting matrix.

(77)

GMM estimator is defined by: θGMM= arg min

θ ^||G^J^(θ)||^A^,

where A is an optimal weighting matrix.

▶ For example, if we only have the demand side, θGMM = arg min

θ

ξ(θ)˜ ^′HAH^′ξ(θ)^˜ ^′.

▶ The estimator, θGMM, is ^√N −consistent and asymptotically normal with variance covariance matrix:

(Γ^′Γ)⁻¹Γ^′V Γ(Γ^′Γ)⁻¹

where Γ = ∂G(θ₀)/∂θ^′ and V is a variance covariance matrix.

(78)

(79)

▶ Estimation results in Table 4.

▶ Own- and cross-price elasticities in Table 6.

▶ Substitution to the outside option in Table 7.

▶ Mark-ups (price cost margins) in Table 8.

(80)

(81)

are constant over the estimation procedure.

▶ Given the parameter values, we can calculate µ_ij for all (i, j).

▶ Pick up any δ⁰. Given δ⁰, we can calculate the predicted market share:

sj =

∫

· · ·

∫

| {z }

ν

∏

l̸=j

[ exp(δ⁰j+ µij) 1 +^∑^J_l=1exp(δ⁰_l + µil)

]

dFν₁(νi1) · · · dFν_K(νiK).

▶ This is done by numerically using {νik}i=1,··· ,ns, k=1,··· ,K^:

sj = ¹ ns

∑ns i=1

[ exp(δ⁰_j + µ_ij) 1 +^∑^J_l=1exp(δ_l⁰+ µ_il)

]

(82)

s₁(δ₁^∗, · · · , δJ^∗^{) = S}1

s2(δ₁^∗, · · · , δJ^∗^{) = S}²

... s_J(δ₁^∗, · · · , δJ^∗^{) = S}J^.

▶ To obtain δ^∗, use the following contraction mapping: δ_j^R+1= δ_j^R+ ln(Sj_{) − ln(s}j(δ^R)),

repeat updating δ until ||δ^R+1− δ^R|| becomes sufficiently small, say less than 1.0E⁻⁶.

▶ After having δ^∗, you can obtain ˜ξ via ξ˜= δ^∗_{− X ¯}β + αp,

and ˜ω via ˜ω_{= ln(p − ∆}⁻¹s) − wγ, which enable us to construct the objective function.

(83)

▶ Each firm f solves the following profit maximization problem:

{pmaxj}j_∈Ff

∑

j∈Ff

(p_j_{− mc}_j)M s_j(p),

▶ _F_f is the subset of products which are produced by firm f

▶ _{M s}

j is the market demand for product j under price p

▶ Marginal cost is given by:

ln(mc_j) = w_jγ+ ω_j

▶ _w

j is the observed product characteristics for product j

▶ _ω

j is the product j specific unobserved characteristics

(84)

s_j+ ^∑

h∈Ff

(p_j_{− mc}_j)^∂sⁱ

∂pj

= 0.

Define the (j,i)-element of substitution matrix of ∆ as

∆j,i=

{−_∂p^∂sⁱ_j if j and i are produced by the same firm, 0, otherwise.

Then we have the following matrix notation of the FOCs: s− ∆(p − mc) = 0.

If ∆ is invertible, we have

p_{− mc = ∆}⁻¹s.

(85)

Now we have

p_{− mc = ∆}⁻¹s.

Remember that the marginal cost function is given by: ln(mc_j) = w_jγ+ ω_j, or ω_j = ln(mc) − wj^γ

Therefore, we can solve the equation for ω:

˜

ω_{= ln(p − ∆}⁻¹s_{) − wγ.}