2014.7.29. (edit: 2015.4.27) Semiparametric efficiency bound for linear quantile
regression
Kengo Kato Remark 1. The semiparametric efficieny bound for the linear quantile re- gression model is derived in [1] as a special case of that for the censored quantile regression model. Here we present a direct derivation of the effi- ciency bound, following Section 25.4 in [2]; indeed the derivation is just a small modification of that in Example 25.28 for the mean regression.
Consider the quantile regression model
Y = XTβ + ϵ, P(ϵ ≤ 0 | X) = τ,
where Y is scalar and X is k-dimensional. Let f (ϵ, x) be the joint density of (ϵ, X) with respect to dϵdµ(x) where µ is some σ-finite measure on Rk; we assume that lim|ϵ|→∞f (ϵ, x) = 0 for µ-almost all x ∈ Rk and other stan- dard regularity conditions (we drop “µ-almost all x ∈ Rk” in the following discussion). Then the distribution of (Y, X), denoted by Pβ,f, is of the form
dPβ,f(y, x) = f (y − xTβ, x)dydµ(x).
Denote by f (ϵ | x) the conditional density of ϵ given X = x, i.e., f (ϵ | x) = f (ϵ, x)/∫ f(ϵ′, x)dϵ′. The conditional quantile restriction is written as
∫
φ(ϵ)f (ϵ | x)dϵ = 0, φ(ϵ) = τ − 1(ϵ ≤ 0), which is equivalent to
∫
φ(ϵ)f (ϵ, x)dϵ = 0.
Consider a perturbation ftof f with t ∈ R, which must satisfy the relation
∫
φ(ϵ)ft(ϵ, x)dϵ = 0. Taking derivative with respect to t, we have
0 = d dt
∫
φ(ϵ)ft(ϵ, x)dϵ =
∫
φ(ϵ)∂
∂tft(ϵ, x)dϵ, (1) where we assumed that the derivative and the integral can be interchanged. Note that the score function for f in the submodel t 7→ Pβ,ft is
g(y − xTβ, x) = ∂
∂tlog ft(y − x
Tβ, x) t=0=
∂ft(y − xTβ, x)/∂t|t=0 f (y − xTβ, x) , which, because of (1), satisfies
∫
φ(ϵ)g(ϵ, x)f (ϵ, x)dϵ = 0.
1
2
This leads to an intuition that the L2(Pβ,f)-closure of the set of score func- tions for f is H = {(y, x) 7→ g(y − xTβ, x) : g ∈ G}, where
G ={(ϵ, x) 7→ g(ϵ, x) :
∫ ∫
g2(ϵ, x)f (ϵ, x)dϵdµ(x) < ∞,
∫ ∫
g(ϵ, x)f (ϵ, x)dϵdµ(x) = 0,
∫
φ(ϵ)g(ϵ, x)f (ϵ, x)dϵ = 0}. Indeed, for a bounded g ∈ G, consider ft= (1+tg)f , for which ∂ log ft/∂t|t=0= g. To verify that the map (y, x) 7→ g(y − xTβ, x) is a score function for f , we have to check that t 7→ Pβ,ft is a submodel for t in a neighborhood of 0. For sufficiently small t, ft is nonnegative and ∫∫ ftdϵdµ = 1 (the latter follows from the fact that ∫∫ gfdϵdµ = 0), and verifies ∫ φ(ϵ)ft(ϵ, x)dϵ = 0, so that t 7→ Pβ,ft gives a submodel for t in a neighborhood of 0. Taking the L2(Pβ,f)-closure, we obtain the desired assertion.
The score function for β is
˙ℓβ,f(y, x) = −x∂f (y − x
Tβ, x)/∂ϵ
f (y − xTβ, x) .
The efficient score for β, denoted by ˜ℓβ,f, is obtained by projecting (each element of) ˙ℓβ,f onto the orthocomplement of H in L2(Pβ,f). For any func- tion a(ϵ, x) square integrable with respect to the distribution of (ϵ, X) such that ∫∫ a(ϵ, x)f(ϵ, x)dϵdµ(x) = 0, the projection of a(ϵ, x) onto the ortho- complement of H in L2(Pβ,f) is identical to that of a(ϵ, x) onto the set of functions of the form φ(ϵ)h(x) where h is square integrable with respect to the marginal distribution of X, so that the desired projection is given by
φ(ϵ)E[a(ϵ, X)φ(ϵ) | X = x] E[φ2(ϵ) | X = x] . Hence the efficient score ˜ℓβ,f for β is computed as
ℓ˜β,f(y, x) = −xφ(ϵ)∫ φ(ϵ
′){∂f (ϵ′| x)/∂ϵ}dϵ′
∫ φ2(ϵ′)f (ϵ′| x)dϵ′ = x φ(ϵ)
τ (1 − τ )f (0 | x), so that the semiparametric efficiency bound for estimation of β is
(E[˜ℓβ,f(y, x)˜ℓβ,f(y, x)T])−1= τ (1 − τ )(E[f2(0 | X)XXT])−1, provided that the inverse matrix on the right side exists.
References
[1] Newey, W.K. and Powell, J.L. (1990). Efficient estimation of linear and type I censored regression models under conditional quantile restrictions. Econometric Theory 6295-317.
[2] van der Vaart, A.W. Asymptotic Statistics. Cambridge University Press.