Naoki Wakamori
University of Tokyo
2017 A1 Term
1. Demand for Differentiated Products 2. Discrete Choice Models with Micro Data 3. Discrete Choice Models with Aggregate Data
▶ What is a demand function?
▶ What is a demand function?
▶ Almost all products are differentiated!
▶ Automobile, Banking, Beer, Personal Computers, etc..
▶ Other examples?
▶ What is a demand function?
▶ Almost all products are differentiated!
▶ Automobile, Banking, Beer, Personal Computers, etc..
▶ Other examples?
▶ Different from homogeneous product markets
▶ Demand function for product j ∈ J:
QDj = QDj(P1, P2, · · · , PJ; X1, X2, · · · , XJ)
▶ Supply function for product j ∈ J:
QSj = QSj(P1, P2, · · · , PJ; X1, X2, · · · , XJ) i.e., demand and supply functions for each product
▶ For example, the constant elasticity framework: log Qj = βj,0+
∑J k=1
βj,klog Pk+ εj.
▶ For example, the constant elasticity framework: log Qj = βj,0+
∑J k=1
βj,klog Pk+ εj.
▶ Too many elasticities!!
▶ If we have 100 cars, need to estimate 10,000 parameters!
▶ Easily exceed the number of observations that we can have
▶ How to solve this problem?
▶ Possible solutions are:
1. Put some restrictions on cross-price elasticities: 1.1 symmetric cross-price elasticities
1.2 zero cross-price elasticities.
2. Categorize a couple of products together to reduce the number of products.
▶ Possible solutions are:
1. Put some restrictions on cross-price elasticities: 1.1 symmetric cross-price elasticities
1.2 zero cross-price elasticities.
2. Categorize a couple of products together to reduce the number of products.
▶ These are too arbitrary and no theory behind
▶ Possible solutions are:
1. Put some restrictions on cross-price elasticities: 1.1 symmetric cross-price elasticities
1.2 zero cross-price elasticities.
2. Categorize a couple of products together to reduce the number of products.
▶ These are too arbitrary and no theory behind
▶ Therefore we take an structural approach, specifying how such demand function is generated
▶ A structural approach is characterized by two features:
▶ A structural approach is characterized by two features: 1. Characteristic approach:
Each product is assumed to be a bundle of the characteristics and consumers derive utility from these characteristics
Xj = [xj1, xj2, · · · , xjK], for all j
▶ A structural approach is characterized by two features: 1. Characteristic approach:
Each product is assumed to be a bundle of the characteristics and consumers derive utility from these characteristics
Xj = [xj1, xj2, · · · , xjK], for all j 2. Utility maximization (revealed preference):
Each consumer chooses the product which gives the highest utility
uij = Xjβ > Xlβ = uil for∀l ̸= j
▶ A structural approach is characterized by two features: 1. Characteristic approach:
Each product is assumed to be a bundle of the characteristics and consumers derive utility from these characteristics
Xj = [xj1, xj2, · · · , xjK], for all j 2. Utility maximization (revealed preference):
Each consumer chooses the product which gives the highest utility
uij = Xjβ > Xlβ = uil for∀l ̸= j
▶ Although we cannot observe the utility function itself, it is possible to infer consumers utility from observed data.
▶ Disaggregate (micro, individual, or consumer-level) data
▶ Easily incorporate interactions of household and product characteristics
▶ Rarely available to researchers
▶ Aggregate (macro, market-level) data
▶ Often easily available to researchers
▶ A bit hard to reconcile individual demographics
▶ Advantages:
▶ Reduce the number of parameters in demand
▶ Reduce computational burden
▶ Enables us to predict new products’ demand (Petrin, 2002)
▶ Disadvantages:
▶ Rule out multiple purchases (cf. Gentzkow, 2007)
▶ Put parametric assumptions on demand structure (cf. non-parametric identification by Fox et.al (2012), Fox and Gandhi (2015), Byrne, Imai, Jain and Hirukawa (2015).)
1. Demand for Differentiated Products
2. Discrete Choice Models with Micro Data 3. Discrete Choice Models with Aggregate Data
▶ In the basic econometrics course, we only consider continuous dependent variables cases
▶ In the basic econometrics course, we only consider continuous dependent variables cases
▶ Dependent variables can be discrete values!
▶ In the basic econometrics course, we only consider continuous dependent variables cases
▶ Dependent variables can be discrete values! 1. Unordered cases (Qualitative variable models)
▶ Transportation choice: Bus, Taxi, Car, Train, and so on
▶ Occupation choice: Banker, medical doctor, lawyer and so on
▶ In the basic econometrics course, we only consider continuous dependent variables cases
▶ Dependent variables can be discrete values! 1. Unordered cases (Qualitative variable models)
▶ Transportation choice: Bus, Taxi, Car, Train, and so on
▶ Occupation choice: Banker, medical doctor, lawyer and so on 2. Ordered cases
▶ Student’s Grade: A, B, C, D, F
▶ # of Automobiles owned by one household: 1, 2, 3, ...
▶ other example?
▶ In the basic econometrics course, we only consider continuous dependent variables cases
▶ Dependent variables can be discrete values! 1. Unordered cases (Qualitative variable models)
▶ Transportation choice: Bus, Taxi, Car, Train, and so on
▶ Occupation choice: Banker, medical doctor, lawyer and so on 2. Ordered cases
▶ Student’s Grade: A, B, C, D, F
▶ # of Automobiles owned by one household: 1, 2, 3, ...
▶ other example?
▶ Revealed preference provide useful information!!
▶ i denotes a individual: i = 1, 2, · · · , N
▶ j denotes a choice element: j ∈ J = {0, 1, 2, · · · , J}
▶ X
ij denotes a characteristics vector of person/choices
▶ i denotes a individual: i = 1, 2, · · · , N
▶ j denotes a choice element: j ∈ J = {0, 1, 2, · · · , J}
▶ X
ij denotes a characteristics vector of person/choices
▶ Assume each i chooses an option j that maximizes her utility: uij = Xijβ+ εij
▶ i denotes a individual: i = 1, 2, · · · , N
▶ j denotes a choice element: j ∈ J = {0, 1, 2, · · · , J}
▶ X
ij denotes a characteristics vector of person/choices
▶ Assume each i chooses an option j that maximizes her utility: uij = Xijβ+ εij
▶ Note that this model is called
▶ i denotes a individual: i = 1, 2, · · · , N
▶ j denotes a choice element: j ∈ J = {0, 1, 2, · · · , J}
▶ X
ij denotes a characteristics vector of person/choices
▶ Assume each i chooses an option j that maximizes her utility: uij = Xijβ+ εij
▶ Note that this model is called
▶ binary/multinomial choice models when J = 1 or J > 1,
▶ probit/logit models whenf εij follows the standard normal or logistic distribution
▶ random coefficient models if β becomes βi
▶ i denotes a individual: i = 1, 2, · · · , N
▶ j denotes a choice element: j ∈ J = {0, 1, 2, · · · , J}
▶ X
ij denotes a characteristics vector of person/choices
▶ Assume each i chooses an option j that maximizes her utility: uij = Xijβ+ εij
▶ Note that this model is called
▶ binary/multinomial choice models when J = 1 or J > 1,
▶ probit/logit models whenf εij follows the standard normal or logistic distribution
▶ random coefficient models if β becomes βi
▶ Typically the normalization is required, i.e., ui0= εi0
▶ Assume that ε follows the Type I extreme value distribution: F (x) = ee−x
Then this class of model is called “Multinomial Logit Models”
▶ What is the probability of choosing j for person i, Pr(di = j)?
▶ Consider a case with three alternatives, J = {0, 1, 2}
▶ Assume that ε follows the Type I extreme value distribution: F (x) = ee−x
Then this class of model is called “Multinomial Logit Models”
▶ What is the probability of choosing j for person i, Pr(di = j)?
▶ Consider a case with three alternatives, J = {0, 1, 2}
▶ McFadden showed that we have a nice analytic formula! Pr(di = j) = ∑Jexp(Xijβ)
k=1exp(Xikβ)
▶ Show an example on the blackboard
▶ We can construct the likelihood function.
∏N i=1
∏J j=1
{ exp(Xiβj)
∑J
k=1exp(Xiβk)
}yij
where
yij =
{1, if i chooses j, 0, otherwise.
Raw Prediction Prediction Data When β = 1 When β = 2 1
2 3 4 ... N
1 0 0 1 0 0 0 0 1 0 1 0 ... ... ... 0 1 0
.5 .25 .25 .7 .2 .1 .4 .4 .2 .3 .5 .2 ... ... ... .2 .5 .3
.7 .1 .2 .8 .1 .1 .1 .1 .8 .1 .8 .1 ... ... ... .2 .7 .1
Raw Prediction Prediction Data When β = 1 When β = 2 1
2 3 4 ... N
1 0 0 1 0 0 0 0 1 0 1 0 ... ... ... 0 1 0
.5 .25 .25 .7 .2 .1 .4 .4 .2 .3 .5 .2 ... ... ... .2 .5 .3
.7 .1 .2 .8 .1 .1 .1 .1 .8 .1 .8 .1 ... ... ... .2 .7 .1
β = 1: Likelihood = .5 × .7 × .2 × .5 × .5 β = 2: Likelihood = .7 × .8 × .8 × .8 × .7
▶ Implementation in Stata is soooooo easy!! Just use the following syntax:
mlogit DependentVar IndVar1 IndVar2
if you have two explanatory variables. Usually, Stata automatically put constant
▶ You can also set some restrictions on parameters using constraint command before estimating the model.
1. Demand for Differentiated Products 2. Discrete Choice Models with Micro Data
3. Discrete Choice Models with Aggregate Data
1. Demand for Differentiated Products 2. Discrete Choice Models with Micro Data
3. Discrete Choice Models with Aggregate Data 3.1 Mutinomial Logit Models
3.2 Nested Logit Models
3.3 Random Coefficients Models (Mixed Logit)
Case I: Logit Model
uij = Xjβ− αpj+ ξj
| {z }
+εij,
= δj + εij. where
▶ Xj: a vector of product characteristics for product j
▶ β: a vector of coefficients
▶ α: a price coefficient
▶ p
j: price for product j
▶ ξ
j: consumers’ valuation of an unobserved product characteristics
▶ ε
ij: i.i.d. utility shock across consumers and choices
▶ δ
j: the mean utility of product j and assume δ0 = 0.
▶ Each consumer purchases at most one product which gives the highest utility.
▶ Let M denote the market size.
▶ Given the density f (ε), the market demand for product j is qj = M sj, where sj(δ) =
∫
Aj(δ)
f (ε)dε. where Aj is a set of consumers who purchase product j:
Aj(δ) = {ε|δj+ εij > δk+ εik, ∀k ̸= j}.
▶ Assume that ε follows Type I extreme value distribution, F (ε) = exp(− exp(−ε)).
▶ Market share of product j is given by: sj(δ) = exp(δj)
1 +∑Jk=1exp(δk)
▶ Market share of the outside option is given by:
s0(δ) = 1
1 +∑Jk=1exp(δk)
▶ Invert out unobserved product characteristics, ξj.
▶ Taking logs of market share of product j yields ln(sj) − ln(s0) = δj = Xjβ− αpj + ξj.
▶ Assuming ξ
j is an error term, we can estimate the model via regression!!
▶ The dependent variable is ln(sj) − ln(s0)
▶ The independent variables are [Xj, pj]
▶ The error term is ξj.
▶ We would expect Cov(pj, ξj) ̸= 0
▶ Need to use IV technique. Discuss later.
Own- and cross- price elasticities are give by:
∂sj
∂pk pk sj =
{−αpj(1 − sj), if k = j, αpksk, otherwise
▶ Now all products are substitute in some extent!
▶ Are these price elasticities realistic?
Own- and cross- price elasticities are give by:
∂sj
∂pk pk sj =
{−αpj(1 − sj), if k = j, αpksk, otherwise
▶ Now all products are substitute in some extent!
▶ Are these price elasticities realistic?
1. Own-price elasticity is proportional to own price.
▶ s
j is typically small, implying α(1 − sj) is nearly constant.
▶ This implies that the lower the price, the lower the elasticity.
▶ This implies that markups should be higher in cheap products.
Own- and cross- price elasticities are give by:
∂sj
∂pk pk sj =
{−αpj(1 − sj), if k = j, αpksk, otherwise
▶ Now all products are substitute in some extent!
▶ Are these price elasticities realistic?
1. Own-price elasticity is proportional to own price.
▶ s
j is typically small, implying α(1 − sj) is nearly constant.
▶ This implies that the lower the price, the lower the elasticity.
▶ This implies that markups should be higher in cheap products. 2. Cross-price elasticities are proportional to market share.
▶ Substitution between product j and j′ depends on the market share.
▶ Reason: i.i.d. structure of the random shock.
Case II: Nested Logit Model
The notation is based on Cardell (1991), Econometric Theory.
▶ Group the products into G + 1 exhaustive and mutually exclusive sets, g = 0, 1, · · · , G.
▶ Denote the set of products in group g as Gg.
▶ The outside good, j = 0, is assumed to be the only member of group 0.
▶ Use blackboard to explain the concept.
▶ Maintained assumptions
▶ Homogeneous consumer taste
▶ The mean utility from the outside option is normalized to zero, i.e., δ0= 0.
▶ Utility for consumer i choosing product j is given by: uij = Xjβ− αpj+ ξj+ ξig+ (1 − σ)εij,
= δj+ ξig + (1 − σ)εij
▶ For consumer i, ξig is common to all products in group g.
▶ Assuming extreme value distribution for εij, ξig+ (1 − σ)εij is also an extreme value random variable.
▶ The feature of σ:
▶ As σ → 1, the within group correlation of utility levels goes to one.
▶ As σ → 0, the within group correlation of utility levels goes to zero.
▶ If d
jg is a dummy variable equal to one if j ∈ Gg and equal to zero otherwise, we can rewrite the utility function as
uij = δj+∑
g
djgξig+ (1 − σ)εij.
This can be viewed as a special case of random coefficients models.
sj/g(δ, σ) = exp(δj/(1 − σ))/Dg
where Dg is given by Dg≡∑j∈Ggexp(δj/(1 − σ)).
▶ Similarly, the probability of choosing one of the group g products is
sg(δ, σ) = D
(1−σ)
[∑ g kD
(1−σ) k
]
▶ Therefore the market share for product j is given by: sj(δ, σ) = sj/g(δ, σ)sg(δ, σ) = exp(δj/(1 − σ)) Dgσ[∑kD(1−σ)k ]
▶ Also the market share for the outside options is given by: s0(δ, σ) = 1/
[
∑
k
D(1−σ)k ]
.
▶ Again, we can use inversion technique:
ln(sj) − ln(s0) = δj/(1 − σ) − σ ln(Dg)
ln(sj/g) = δj/(1 − σ) − ln(Dg) which gives us
ln(sj) − ln(s0) = δj+ σ ln(sj/g),
= Xjβ− αpj+ σ ln(sj/g) + ξj.
∂sj
∂pk pk sj
=
−αpj(1 − σsj/g− (1 − σ)sj)/(1 − σ), if k = j,
−αpk(σsj/g+ (1 − σ)sk)/(1 − σ), if j, k ∈ g
αpksk, otherwise
▶ The cross-price elasticity of two products in the same group is higher than the cross-price elasticity of two products in different groups when everything else is constant.
▶ Are these elasticities realistic now?
▶ Pros: Allows us to model correlation among groups of similar products in a simple way and easy to estimate.
▶ Cons: Correlation patterns depend on grouping of products, which is determined prior to the estimation.
Case III: Random Coefficients Model
uij = Xjβi+ α ln(yi− pj) + ξj+ εij.
▶ Do we have any economic model behind this specification?
uij = Xjβi+ α ln(yi− pj) + ξj+ εij.
▶ Do we have any economic model behind this specification? Yes. Consider the following Cobb-Douglas utility:
max(C,j) uc(C)ua(j) exp(εij) s.t. C + pj ≤ yi, where
{uc(C) = Cα
ln(ua(j)) = Xjβi+ ξj
▶ Then the maximization problem can be rewritten as
uij = Xjβi+ α ln(yi− pj) + ξj+ εij.
▶ Do we have any economic model behind this specification? Yes. Consider the following Cobb-Douglas utility:
max(C,j) uc(C)ua(j) exp(εij) s.t. C + pj ≤ yi, where
{uc(C) = Cα
ln(ua(j)) = Xjβi+ ξj
▶ Then the maximization problem can be rewritten as maxj (yi− pj)
αexp(X
jβi+ ξj) exp(εij),
which enables us to have the utility function above.
uij = Xjβi+ α ln(yi− pj) + ξj+ εij,
=∑
k
xjkβik+ α ln(yi− pj) + ξj+ εij,
=∑
k
xjk( ¯βk+ σkνik) + α ln(yi− pj) + ξj+ εij
where
▶ X
j: a vector of product characteristics for product j
▶ β
i: a vector of coefficients which differs across consumers
▶ α: a price coefficient
▶ p
j: price for product j
▶ ξ
j: consumers’ valuation of an unobserved product charact.
▶ εij: i.i.d. utility shock across consumers and choices
▶ ν
ik: random utility shock which follows N (0, 1)
uij = Xjβi+ α ln(yi− pj) + ξj+ εij,
=∑
k
xjkβik+ α ln(yi− pj) + ξj+ εij,
=∑
k
xjk( ¯βk+ σkνik) + α ln(yi− pj) + ξj+ εij
where
▶ X
j: a vector of product characteristics for product j
▶ β
i: a vector of coefficients which differs across consumers
▶ α: a price coefficient
▶ p
j: price for product j
▶ ξ
j: consumers’ valuation of an unobserved product charact.
▶ εij: i.i.d. utility shock across consumers and choices
▶ ν
ik: random utility shock which follows N (0, 1)
uij = Xjβi+ α ln(yi− pj) + ξj+ εij,
=∑
k
xjkβik+ α ln(yi− pj) + ξj+ εij,
=∑
k
xjk( ¯βk+ σkνik) + α ln(yi− pj) + ξj+ εij
where
▶ X
j: a vector of product characteristics for product j
▶ β
i: a vector of coefficients which differs across consumers
▶ α: a price coefficient
▶ p
j: price for product j
▶ ξ
j: consumers’ valuation of an unobserved product charact.
▶ εij: i.i.d. utility shock across consumers and choices
▶ ν
ik: random utility shock which follows N (0, 1)
▶ Furthermore, we can rewrite the utility as: uij = ∑
k
xjk( ¯βk+ σkνik) + α ln(yi− pj) + ξj + εij,
= ∑
k
xjkβ¯k+ ξj
| {z }
δj:mean utility
+∑
k
xjkσkνik+ α ln(yi− pj)
| {z }
µij:deviation from the mean
+εij,
▶ Assuming Type 1 Extreme value distribution for ε, the choice probability for product j is given by:
Pr(di= j|{νik}k=1,··· ,K) = exp(δj+ µij) 1 +∑Jl=1exp(δl+ µil).
▶ A set of consumers who purchase product j is given by
Aj = { ({εih}Jh=1,{νik}Kk=1) | uij≥ uil,∀l ̸= j}
= { ({εih}Jh=1,{νik}Kk=1) | εil≤ (δj− δl) + (µij− µil) + εij,∀l ̸= j}.
▶ Therefore, we have the market share for product j as
sj =
∫
· · ·
∫
| {z }
ν
∏
l̸=j
[ exp(δj+ µij) 1 +∑Jl=1exp(δl+ µil)
]
dFν1(νi1) · · · dFνK(νiK).
▶ Unfortunately, we cannot use the inversion technique that we learned last time. But, we have J equations with J unknowns:
s1(δ1, · · · , δJ) = S1
s2(δ1, · · · , δJ) = S2
... sJ(δ1, · · · , δJ) = SJ.
▶ This system of equations has unique solution, and thus we can solve for δ using contraction mapping!! Then, we have
δ∗j = Xjβ + ξ¯ j, which enables us to have ˜ξj = δj∗− Xjβ.¯
Now price elasticity is given by:
∂sj
∂pk pk sj =
{−psjj ∫αsij(1 − sij)dP (ν), if k = j, pk
sj
∫αsijsikdP (ν), otherwise. Do you remember the previous three models’ elasticity?
1. Vertical differentiation model 2. Logit model
3. Nested logit model
▶ Do we need IVs? - Yes, because
▶ price is endogenous in the model.
▶ price is usually positively correlated with the unobserved product characteristics, ξj. Thus, the price coefficient, α, is biased upward.
▶ Instruments, zj, must satisfy the following conditions:
▶ E[ξ
j|zj] = 0,
▶ Cov(z
j, pj) ̸= 0.
▶ Variables that shift cost and are uncorrelated with the demand shock.
▶ IVs: cost shifters
▶ Average number of employees, capital structure or etc..
▶ Problem: Rarely observe cost shifters that vary by product.
▶ Prices in other markets
▶ E.g., use prices in New York, Delaware, New Jersey as instruments for price endogeneity in Pennsylvania.
▶ These instruments can pick up common cost shocks due to the common marginal cost, but they do not pick up market specific demand shock.
▶ However, if they also pick up common demand shocks,they are invalid.
▶ Instruments (zj) must satisfy:
E[ξj|zj] = 0 and Cov(pj, zj) = 0
▶ BLP propose the following three sets of IVs: 1. xjk
2. ∑r̸=j,r∈Ffxrk
3. ∑r̸=j,r /∈Ffxrk
▶ The average product characteristics offered by other firms
▶ is excluded from the product j’s utility
▶ Instruments (zj) must satisfy:
E[ξj|zj] = 0 and Cov(pj, zj) = 0
▶ BLP propose the following three sets of IVs: 1. xjk
2. ∑r̸=j,r∈Ffxrk
3. ∑r̸=j,r /∈Ffxrk
▶ The average product characteristics offered by other firms
▶ is excluded from the product j’s utility
▶ is correlated with price of product j via FOC:
▶ Instruments (zj) must satisfy:
E[ξj|zj] = 0 and Cov(pj, zj) = 0
▶ BLP propose the following three sets of IVs: 1. xjk
2. ∑r̸=j,r∈Ffxrk
3. ∑r̸=j,r /∈Ffxrk
▶ The average product characteristics offered by other firms
▶ is excluded from the product j’s utility
▶ is correlated with price of product j via FOC:
▶ Suppose X is only one dimension, which is the size.
▶ Instruments (zj) must satisfy:
E[ξj|zj] = 0 and Cov(pj, zj) = 0
▶ BLP propose the following three sets of IVs: 1. xjk
2. ∑r̸=j,r∈Ffxrk
3. ∑r̸=j,r /∈Ffxrk
▶ The average product characteristics offered by other firms
▶ is excluded from the product j’s utility
▶ is correlated with price of product j via FOC:
▶ Suppose X is only one dimension, which is the size.
▶ Dj(p|x1,· · · , xJ) should be affected by xl, l̸= j.
▶ Instruments (zj) must satisfy:
E[ξj|zj] = 0 and Cov(pj, zj) = 0
▶ BLP propose the following three sets of IVs: 1. xjk
2. ∑r̸=j,r∈Ffxrk
3. ∑r̸=j,r /∈Ffxrk
▶ The average product characteristics offered by other firms
▶ is excluded from the product j’s utility
▶ is correlated with price of product j via FOC:
▶ Suppose X is only one dimension, which is the size.
▶ Dj(p|x1,· · · , xJ) should be affected by xl, l̸= j.
▶ Thus, the optimal price pjshould be a function of xl, l̸= j.
▶ Instruments (zj) must satisfy:
E[ξj|zj] = 0 and Cov(pj, zj) = 0
▶ BLP propose the following three sets of IVs: 1. xjk
2. ∑r̸=j,r∈Ffxrk
3. ∑r̸=j,r /∈Ffxrk
▶ The average product characteristics offered by other firms
▶ is excluded from the product j’s utility
▶ is correlated with price of product j via FOC:
▶ Suppose X is only one dimension, which is the size.
▶ Dj(p|x1,· · · , xJ) should be affected by xl, l̸= j.
▶ Thus, the optimal price pjshould be a function of xl, l̸= j.
▶ A feature of oligopolistic markets allows this argument!!
Now we obtain the market-level unobservables: (ξ, ω).
▶ In principle, imposing the specific distributional assumptions for (ξ, ω), we can construct a maximum likelihood function.
Now we obtain the market-level unobservables: (ξ, ω).
▶ In principle, imposing the specific distributional assumptions for (ξ, ω), we can construct a maximum likelihood function.
▶ In BLP, however, they suggest to use GMM utilizing the following moment condition:
E[H(z)(ξ ω)′]= 0,
where H(z) is a set of appropriate instruments discussed previously.
▶ More specifically, define GJ(θ) = 1
J
∑J j=1
Hj(z)( ˜ξj(θ)
˜ ωj(θ)
)
GMM estimator is defined by: θGMM= arg min
θ ||GJ(θ)||A,
where A is an optimal weighting matrix.
GMM estimator is defined by: θGMM= arg min
θ ||GJ(θ)||A,
where A is an optimal weighting matrix.
▶ For example, if we only have the demand side, θGMM = arg min
θ
ξ(θ)˜ ′HAH′ξ(θ)˜ ′.
▶ The estimator, θGMM, is √N −consistent and asymptotically normal with variance covariance matrix:
(Γ′Γ)−1Γ′V Γ(Γ′Γ)−1
where Γ = ∂G(θ0)/∂θ′ and V is a variance covariance matrix.
▶ Estimation results in Table 4.
▶ Own- and cross-price elasticities in Table 6.
▶ Substitution to the outside option in Table 7.
▶ Mark-ups (price cost margins) in Table 8.
are constant over the estimation procedure.
▶ Given the parameter values, we can calculate µij for all (i, j).
▶ Pick up any δ0. Given δ0, we can calculate the predicted market share:
sj =
∫
· · ·
∫
| {z }
ν
∏
l̸=j
[ exp(δ0j+ µij) 1 +∑Jl=1exp(δ0l + µil)
]
dFν1(νi1) · · · dFνK(νiK).
▶ This is done by numerically using {νik}i=1,··· ,ns, k=1,··· ,K:
sj = 1 ns
∑ns i=1
[ exp(δ0j + µij) 1 +∑Jl=1exp(δl0+ µil)
]
s1(δ1∗, · · · , δJ∗) = S1
s2(δ1∗, · · · , δJ∗) = S2
... sJ(δ1∗, · · · , δJ∗) = SJ.
▶ To obtain δ∗, use the following contraction mapping: δjR+1= δjR+ ln(Sj) − ln(sj(δR)),
repeat updating δ until ||δR+1− δR|| becomes sufficiently small, say less than 1.0E−6.
▶ After having δ∗, you can obtain ˜ξ via ξ˜= δ∗− X ¯β + αp,
and ˜ω via ˜ω= ln(p − ∆−1s) − wγ, which enable us to construct the objective function.
▶ Each firm f solves the following profit maximization problem:
{pmaxj}j∈Ff
∑
j∈Ff
(pj− mcj)M sj(p),
▶ Ff is the subset of products which are produced by firm f
▶ M s
j is the market demand for product j under price p
▶ Marginal cost is given by:
ln(mcj) = wjγ+ ωj
▶ w
j is the observed product characteristics for product j
▶ ω
j is the product j specific unobserved characteristics
sj+ ∑
h∈Ff
(pj− mcj)∂si
∂pj
= 0.
Define the (j,i)-element of substitution matrix of ∆ as
∆j,i=
{−∂p∂sij if j and i are produced by the same firm, 0, otherwise.
Then we have the following matrix notation of the FOCs: s− ∆(p − mc) = 0.
If ∆ is invertible, we have
p− mc = ∆−1s.
Now we have
p− mc = ∆−1s.
Remember that the marginal cost function is given by: ln(mcj) = wjγ+ ωj, or ωj = ln(mc) − wjγ
Therefore, we can solve the equation for ω:
˜
ω= ln(p − ∆−1s) − wγ.