2013.5.7. Minimax lower bounds via Neyman-Pearson lemma
Kengo Kato Suppose that there is a scalar dependent variable Y and a scalar covari- ate X which we assume has support in [0, 1]. Consider the nonparametric regression model
Y = f (X) + ϵ, ϵ ⊥⊥ X, ϵ ∼ N(0, σ2), σ2 > 0.
We fix the distribution of X and σ2 > 0. Let {ϕj}∞j=1 be an orthonormal system in L2([0, 1]). We assume that
L := sup
j≥1E[ϕ
2j(X)] < ∞.
For given α > 0 and C1> 0, suppose that f belongs to the class F(α, C1) = {f ∈ L2([0, 1]) : |⟨f, ϕj⟩| ≤ C1j−α, ∀j ≥ 1},
where ⟨·, ·⟩ denotes the inner product in L2([0, 1]). Denote by ∥ · ∥ the L2([0, 1])-norm. Let (Y1, X1), . . . , (Yn, Xn) be i.i.d. observations of (Y, X).
The purpose of this note is to prove (in a self-contained manner) the following (well-known) theorems by means of a simple application of the Neyman-Pearson lemma.1
Theorem 1. Under the above setup, we have inffˆ f ∈F (α,Csup 1)Ef[∥ ˆf − f∥
2] ≳ n−(2α−1)/(2α),
where the infimum is taken over all estimators ˆf of f .
Remark 1. The idea of the proof of Theorem 1 is borrowed from [1, 2] where minimax lower bounds in the problem of estimating structural func- tions in nonparametric instrumental variables models and slope functions in functional linear models are derived [see also the proof of 4, Theorem 7]. However, in [1, 2], detailed proofs for the minimax lower bounds are not presented (though the proofs are correct). Hence I hope that this note would be of some help in understanding their proofs.
Alternatively, we have the following theorem.
Theorem 2. There exists a small constant c > 0 such that lim inf
n→∞ inffˆ f ∈F (α,Csup 1)Pf(∥ ˆf − f∥
2 > cn−(2α−1)/(2α)) > 0.
1For various techniques to derive minimax lower bounds in nonparametric statistical problems, see [3].
1
Proof of Theorem 1. Let Mn be the integer part of n1/(2α). For θMn = (θMn+1, . . . , θ2Mn)T ∈ RMn, define
fθMn(·) = C1
2Mn
∑
j=Mn+1
j−αθjϕj(·).
Clearly, fθMn ∈ F(α, C1) whenever θMn ∈ [0, 1]Mn. Lemma 1. We have
inffˆ f ∈F (α,Csup 1)Ef[∥ ˆf − f∥
2] ≥ inf θˆMn
sup
θMn∈{0,1}Mn
EθMn[∥fθˆMn − fθMn∥2],
where the infimum on the right side is taken over all estimators ˆθMn ∈ [0, 1]Mn of θMn.
Proof of Lemma 1. For arbitrary ˆf , we have sup
f ∈F (α,C1)Ef[∥ ˆf − f∥
2] ≥ sup θMn∈{0,1}Mn
EθMn[∥ ˆf − fθMn∥2]. Moreover, by Bessel’s inequality,
∥ ˆf − f∥2≥
∞
∑
j=1
(⟨ ˆf , ϕj⟩ − ⟨f, ϕj⟩)2,
so that when f = fθMn for some θMn ∈ {0, 1}Mn, it is enough to consider the estimator of the form
f (·) =ˆ
2Mn
∑
j=Mn+1
ˆ αjϕj(·),
where ˆαj are data-dependent. By defining ˆθj = C1−1jααˆj, ˆf is of the form f (·) = fˆ θˆMn(·) = C1
2Mn
∑
j=Mn+1
j−αθˆjϕj(·).
We need to show that we can restrict ˆθj in such a way that 0 ≤ ˆθj ≤ 1. For given ˆθj, define
θ˜j =
1, if ˆθj > 1, θˆj, if 0 ≤ ˆθj ≤ 1, 0, if ˆθj < 0. Then whenever θMn ∈ {0, 1}Mn,
∥fθˆMn − fθMn∥2≥ ∥fθ˜Mn − fθMn∥2.
This completes the proof of the lemma. □
For the notational convenience, write
θ−jMn = (θMn+1, . . . , θj−1, θj+1, . . . , θ2Mn)T, Mn+ 1 ≤ j ≤ 2Mn.
Observe that sup
θMn∈{0,1}MnEθ
Mn[∥fθˆMn − fθMn∥2]
≥ 2M1n ∑
θMn∈{0,1}Mn
EθMn[∥fθˆMn − fθMn∥2]
= C
12
2Mn
2Mn
∑
j=Mn+1
j−2α ∑
θMn
−j ∈{0,1} Mn−1
{EθMn
−j ,θj=0
[(ˆθj− θj)2]
+ EθMn
−j ,θj=1
[(ˆθj− θj)2]}. (1) We want to lower bound
EθMn
−j ,θj=0
[(ˆθj− θj)2] + EθMn
−j ,θj=1
[(ˆθj− θj)2].
To this end, we make use of a variant of Neyman-Pearson lemma combined with Le Cam’s inequality [3, Lemma 2.3].
Lemma 2. Let (S, S, µ) be a measure space, and let p, q be probability den- sity functions with respect to µ. Then
(i) (A variant of Neyman-Pearson lemma): inf
{∫
φpdµ +
∫
ψqdµ : φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1 }
≥
∫
(p ∧ q)dµ. (ii) (Le Cam’s inequality):
∫
(p ∧ q)dµ ≥ 12
(∫ √pqdµ)2.
Proof of Lemma 2. Part (i): Let φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1. Then
∫
φpdµ +
∫
ψqdµ ≥
∫
(φ ∧ 1)pdµ +
∫
(1 − φ ∧ 1)qdµ.
We lower bound the right side with respect to φ. Clearly, we may assume that φ ≤ 1. The desired conclusion follows from the inequality
∫
(p − q)(φ − 1(p < q))dµ ≥ 0. Part (ii): Since∫ (p ∨ q)dµ + ∫ (p ∧ q)dµ = 2, we have
(∫ √pqdµ)2=(∫ √(p ∧ q)(p ∨ q)dµ )2
≤
∫
(p ∧ q)dµ
∫
(p ∨ q)dµ
=
∫
(p ∧ q)dµ {
2 −
∫
(p ∨ q)dµ }
≤ 2
∫
(p ∧ q)dµ,
so that the desired inequality is obtained. □
For a while, fix Mn+1+1 ≤ j ≤ 2Mnand θM−jn ∈ {0, 1}Mn−1. Let pθj(y | x) denote the conditional density function of Y given X = x when f = fθMn
−j ,θj
:
pθj(y | x) = √ 1 2πσ2 exp
{
−2σ12 (y − fθMn
−j ,θj
(x))2 }
. Then
EθMn
−j ,θj=0
[ˆθ2j] + EθMn
−j ,θj=1
[(ˆθj− 1)2]
= E [∫
θˆ2j((y1, X1), . . . , (yn, Xn))
n
∏
i=1
pθj=0(yi | Xi)dy1· · · dyn
+
∫ {
1 − ˆθj((y1, X1), . . . , (yn, Xn))}2
n
∏
i=1
pθj=1(yi | Xi)dy1· · · dyn ]
. (2)
Note that ˆθ2j + (1 − ˆθj)2 ≥ 1/2, i.e., 2ˆθ2j + 2(1 − ˆθj)2≥ 1, and
pθj=0(y | x)pθj=1(y | x) = 1 2πσ2 exp
−σ12 (
y − fθMn
−j ,θj=0
(x) + fθMn
−j ,θj=1
(x) 2
)2
× exp {
−4σ12(fθMn
−j ,θj=1(x) − fθ Mn
−j ,θj=0
(x))2 }
, so that, by Lemma 2,
∫ θˆ2
j((y1, X1), . . . , (yn, Xn))
n
∏
i=1
pθj=0(yi | Xi)dy1· · · dyn
+
∫ {
1 − ˆθj((y1, X1), . . . , (yn, Xn))}2
n
∏
i=1
pθj=1(yi | Xi)dy1· · · dyn
≥ 14exp {
−C
12j−2α
8σ2
n
∑
i=1
ϕ2j(Xi) }
.
By convexity of the map x 7→ e−x, we have (2) ≥ 14exp
{
−C
12j−2αn
8σ2 E[ϕ
2j(X)]
}
≥ 14exp {
−C
12j−2αnL
8σ2 }
. For j ≥ Mn+ 1,
j−2αn ≤ (Mn+ 1)−2αn ≤ 1, so that whenever j ≥ Mn+ 1,
EθMn
−j ,θj=0
[ˆθ2j] + EθMn
−j ,θj=1
[(ˆθj− 1)2] ≥ 1 4exp
{
−C
12L
8σ2 }
.
Since Mn+ 1 ≤ j ≤ 2Mn and θM−jn ∈ {0, 1}Mn−1 are arbitrary, combining this inequality with (1), we have
ˆinf
θMn
sup
θMn∈{0,1}Mn
EθMn[∥fθˆMn − fθMn∥2]
≥ C
12
8 exp {
−C
12L
8σ2
} 2Mn
∑
j=Mn+1
j−2α. (3)
Since ∑2Mj=Mnn+1j−2α ∼ Mn−2α+1 ∼ n−(2α−1)/(2α), we obtain the desired
conclusion. □
Proof of Theorem 2. It is not difficult to see that inffˆ f ∈F (α,Csup
1)
Pf(∥ ˆf − f∥2 > cn−(2α−1)/(2α))
≥ inf
θˆMn
sup
θMn∈{0,1}MnPθ
Mn(∥fθˆMn − fθMn∥2> cn−(2α−1)/(2α)),
where infθˆMn is taken over all estimators ˆθMn ∈ [0, 1]Mn of θMn.
Denote by δ1n the right side on (3). Fix arbitrary estimator ˆθMn ∈ [0, 1]Mn−1 of θMn. Since {0, 1}Mn is a finite set and the supremum over {0, 1}Mn is attained, there exists a sequence θMn such that
EθMn[∥fθˆMn − fθMn∥2] ≥ δ1n.
Moreover,
∥fθˆMn − fθMn∥2≤ C1 2Mn
∑
j=Mn+1
j−2α(ˆθj − θj)2≤ C12
2Mn
∑
j=Mn+1
j−2α =: δ2n, so that E[∥fθˆMn − fθMn∥4] ≤ δ22n. We recall the Paley-Zygmund inequality. Lemma 3(Paley-Zygmund inequality). Let Z be a real valued random vari- able with finite second moment. Then for every λ ∈ (0, 1),
P(Z≥ λE[Z]) ≥ (1 − λ)2(E[Z])
2
E[Z2] .
Apply the Paley-Zygmund inequality with λ = 1/2 and Z = ∥fθˆMn −
fθMn∥2. Then
PθMn(∥fθˆMn − fθMn∥2 ≥ δ1n/2)
≥ PθMn
(
∥fθˆMn − fθMn∥2≥ 1
2EθMn[∥fθˆMn − fθMn∥
2]
)
≥ 14(E[∥fθˆMn − fθMn∥
2])2
E[∥fθˆMn − fθMn∥4] ≥ δ1n2 4δ22n ≥
1 256exp
{
−C
12L
4σ2 }
. Therefore, we conclude that
lim inf
n→∞ inffˆ f ∈F (α,Csup
1)
Pf(∥ ˆf − f∥2> δ1n/2) ≥ 2561 exp {
−C
12L
4σ2 }
.
Since δ1n∼ n−(2α−1)/(2α), we obtain the desired conclusion. □ References
[1] Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Statist. 33 2904-2929. [2] Hall, P. and Horowitz, J.L. (2007). Methodology and convergence rates
for functional linear regression. Ann. Statist. 35 70-91.
[3] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer.
[4] Yuan, M. and Cai, T. (2010). A reproducing kernel Hilbert space ap- proach to functional linear regression. Ann. Statist. 38 3412-3444.