pdf Research Kengo Kato

(1)

2013.5.7. Minimax lower bounds via Neyman-Pearson lemma

Kengo Kato Suppose that there is a scalar dependent variable Y and a scalar covari- ate X which we assume has support in [0, 1]. Consider the nonparametric regression model

Y = f (X) + ϵ, ϵ ⊥⊥ X, ϵ ∼ N(0, σ²^{), σ}² ^{> 0.}

We fix the distribution of X and σ² > 0. Let {ϕj}^∞j=1 be an orthonormal system in L²([0, 1]). We assume that

L := sup

j≥1^E[ϕ

2j(X)] < ∞.

For given α > 0 and C1> 0, suppose that f belongs to the class F(α, C1) = {f ∈ L²([0, 1]) : |⟨f, ϕj⟩| ≤ C1^j^−α, ∀j ≥ 1},

where ⟨·, ·⟩ denotes the inner product in L²([0, 1]). Denote by ∥ · ∥ the L²([0, 1])-norm. Let (Y₁, X₁), . . . , (Y_n, X_n) be i.i.d. observations of (Y, X).

The purpose of this note is to prove (in a self-contained manner) the following (well-known) theorems by means of a simple application of the Neyman-Pearson lemma.¹

Theorem 1. Under the above setup, we have inf_fˆ _{f ∈F (α,C}^sup ₁₎^E^f^{[∥ ˆ}^{f − f∥}

2_{] ≳ n}−(2α−1)/(2α)_,

where the infimum is taken over all estimators ˆf of f .

Remark 1. The idea of the proof of Theorem 1 is borrowed from [1, 2] where minimax lower bounds in the problem of estimating structural functions in nonparametric instrumental variables models and slope functions in functional linear models are derived [see also the proof of 4, Theorem 7]. However, in [1, 2], detailed proofs for the minimax lower bounds are not presented (though the proofs are correct). Hence I hope that this note would be of some help in understanding their proofs.

Alternatively, we have the following theorem.

Theorem 2. There exists a small constant c > 0 such that lim inf

n→∞ ^inf_fˆ _{f ∈F (α,C}^sup ₁₎^P^f^{(∥ ˆ}^{f − f∥}

2 _{> cn}−(2α−1)/(2α)_{) > 0.}

1For various techniques to derive minimax lower bounds in nonparametric statistical problems, see [3].

1

(2)

Proof of Theorem 1. Let M_n be the integer part of n^1/(2α). For θ^Mⁿ = (θ_M_n₊₁, . . . , θ_2M_n)^T _{∈ R}^Mⁿ, define

f_θMn_{(·) = C}₁

2Mn

∑

j=Mn+1

j^−αθjϕj_(·).

Clearly, f_θMn _{∈ F(α, C}₁) whenever θ^Mⁿ _{∈ [0, 1]}^Mⁿ. Lemma 1. _{We have}

inf_fˆ _{f ∈F (α,C}^sup ₁₎^E^f^{[∥ ˆ}^{f − f∥}

2_{] ≥ inf} θˆ^Mn

sup

θ^Mn∈{0,1}^Mn

E_θ^Mn[∥f_θˆMn − fθ^Mn∥²^],

where the infimum on the right side is taken over all estimators ˆθ^Mⁿ _∈ [0, 1]^Mⁿ of θ^Mⁿ.

Proof of Lemma 1. For arbitrary ˆf , we have sup

f ∈F (α,C₁)^E^f^{[∥ ˆ}^{f − f∥}

2_{] ≥} _sup θ^Mn∈{0,1}^Mn

Eθ^Mn[∥ ˆf − fθ^Mn∥²^]. Moreover, by Bessel’s inequality,

∥ ˆf − f∥²≥

∞

∑

j=1

(⟨ ˆ^{f , ϕ}j⟩ − ⟨f, ϕj⟩)²^,

so that when f = f_θMn for some θ^Mⁿ _{∈ {0, 1}}^Mⁿ, it is enough to consider the estimator of the form

f (·) =ˆ

2Mn

∑

j=Mn+1

ˆ α_jϕ_j_(·),

where ˆα_j are data-dependent. By defining ˆθ_j = C₁⁻¹j^ααˆ_j, ˆf is of the form f (·) = fˆ _θˆMn_{(·) = C}1

2Mn

∑

j=Mn+1

j^−αθ^ˆ_jϕ_j_(·).

We need to show that we can restrict ˆθ_j in such a way that 0 ≤ ˆ^θj ≤ 1. For given ˆθ_j, define

θ˜_j =







1, if ˆθj > 1, θˆ_j, _{if 0 ≤ ˆ}θ_j _{≤ 1,} 0, if ˆθ_j < 0. Then whenever θ^Mⁿ _{∈ {0, 1}}^Mⁿ,

∥f_θˆMn _{− f}_θ^Mn_∥²_{≥ ∥f}_θ˜Mn _{− f}_θ^Mn_∥².

This completes the proof of the lemma. □

(3)

For the notational convenience, write

θ_−j^Mⁿ = (θ_M_n₊₁, . . . , θ_j−1, θ_j+1, . . . , θ_2M_n)^T, M_n+ 1 ≤ j ≤ 2Mn^.

Observe that sup

θ^Mn∈{0,1}^Mn^E^θ

Mn_[∥f_θ_ˆ_Mn _{− f}_θMn_∥²]

≥ ₂_M¹_n ^∑

θ^Mn∈{0,1}^Mn

Eθ^Mn[∥f_θˆMn _{− f}_θ^Mn_∥²]

= ^C

12

2^Mⁿ

2Mn

∑

j=Mn+1

j^−2α ^∑

θ^Mn

−j ^∈{0,1} Mn−1

{E_θ^Mn

−j ^,θ^j⁼⁰

[(ˆθ_j_{− θ}_j)²]

+ E_θ^Mn

−j ^,θ^j⁼¹

[(ˆθ_j_{− θ}_j)²]}. (1) We want to lower bound

E_θ^Mn

−j ^,θ^j⁼⁰

[(ˆθj_{− θ}j)²_{] + E}_θMn

−j ^,θ^j⁼¹

[(ˆθj_{− θ}j)²].

To this end, we make use of a variant of Neyman-Pearson lemma combined with Le Cam’s inequality [3, Lemma 2.3].

Lemma 2. Let (S, S, µ) be a measure space, and let p, q be probability den- sity functions with respect to µ. Then

(i) (A variant of Neyman-Pearson lemma): inf

{∫

φpdµ +

∫

ψqdµ : φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1 }

≥

∫

(p ∧ q)dµ. (ii) (Le Cam’s inequality):

∫

(p ∧ q)dµ ≥ ¹₂

(∫ √_pqdµ⁾²_.

Proof of Lemma 2. Part (i): Let φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1. Then

∫

φpdµ +

∫

ψqdµ ≥

∫

(φ ∧ 1)pdµ +

∫

(1 − φ ∧ 1)qdµ.

We lower bound the right side with respect to φ. Clearly, we may assume that φ ≤ 1. The desired conclusion follows from the inequality

∫

(p − q)(φ − 1(p < q))dµ ≥ 0. Part (ii): Since∫ (p ∨ q)dµ + ∫ (p ∧ q)dµ = 2, we have

(∫ √_pqdµ⁾²₌^(∫ √(p ∧ q)(p ∨ q)dµ )2

≤

∫

(p ∧ q)dµ

∫

(p ∨ q)dµ

=

∫

(p ∧ q)dµ {

2 −

∫

(p ∨ q)dµ }

≤ 2

∫

(p ∧ q)dµ,

so that the desired inequality is obtained. □

(4)

For a while, fix M_n+1+1 ≤ j ≤ 2Mn^{and θ}^M_−jⁿ ∈ {0, 1}^Mⁿ⁻¹^{. Let p}θj(y | x) denote the conditional density function of Y given X = x when f = f_θMn

−j ^,θ^j

:

p_θ_j_{(y | x) =} _√ ¹ 2πσ² ^exp

{

−_2σ¹₂ ⁽y − f_θ^Mn

−j ^,θ^j

(x)⁾² }

. Then

E_θ^Mn

−j ^,θ^j⁼⁰

[ˆθ²_j_{] + E}_θMn

−j ^,θ^j⁼¹

[(ˆθ_j_{− 1)}²]

= E [∫

θˆ²_j((y1, X1), . . . , (yn, Xn))

n

∏

i=1

p_θ_j₌₀(yi _{| X}i)dy1_{· · · dy}n

+

∫ _{

1 − ˆ^θj^((y1^{, X}1), . . . , (y_n, X_n))^}²

n

∏

i=1

p_θ_j₌₁(y_i _{| X}_i)dy₁_{· · · dy}_n ]

. (2)

Note that ˆθ²_j _{+ (1 − ˆ}θ_j)² ≥ 1/2, i.e., 2ˆ^θ²j + 2(1 − ˆ^θj⁾²≥ 1, and

p_θ_j₌₀_{(y | x)p}_θ_j₌₁_{(y | x) =} ¹ 2πσ² ^exp







−_σ¹₂ (

y − f_θMn

−j ^,θ^j⁼⁰

(x) + f_θMn

−j ^,θ^j⁼¹

(x) 2

)2^





× exp {

−_4σ¹₂^(f_θ^Mn

−j ^,θ^j⁼¹^{(x) − f}^θ Mn

−j ^,θ^j⁼⁰

(x))² }

, so that, by Lemma 2,

∫ _θˆ²

j^((y¹^{, X}¹), . . . , (yn, Xn))

n

∏

i=1

p_θ_j₌₀(yi _{| X}i)dy1_{· · · dy}n

+

∫ _{

1 − ˆ^θ^j^((y¹^{, X}¹), . . . , (yn, Xn))^}²

n

∏

i=1

p_θ_j₌₁(yi _{| X}i)dy1_{· · · dy}n

≥ ¹₄^exp {

−^C

12^j^−2α

8σ²

n

∑

i=1

ϕ²_j(X_i) }

.

By convexity of the map x 7→ e^−x^{, we have} (2) ≥ ¹₄^exp

{

−^C

12^j^−2αⁿ

8σ² ^E[ϕ

2j^(X)]

}

≥ ¹₄^exp {

−^C

12^j^−2α^nL

8σ² }

. For j ≥ Mn^{+ 1,}

j^−2α_{n ≤ (M}n+ 1)^−2α_{n ≤ 1,} so that whenever j ≥ Mⁿ^{+ 1,}

E_θ^Mn

−j ^,θ^j⁼⁰

[ˆθ²_j_{] + E}_θMn

−j ^,θ^j⁼¹

[(ˆθ_j_{− 1)}²_{] ≥} ¹ 4^exp

{

−^C

12^L

8σ² }

.

(5)

Since M_n+ 1 ≤ j ≤ 2Mn ^{and θ}^M_−jⁿ ∈ {0, 1}^Mⁿ⁻¹ are arbitrary, combining this inequality with (1), we have

ˆinf

θ^Mn

sup

θ^Mn∈{0,1}^Mn

Eθ^Mn[∥f_θˆMn _{− f}_θMn_∥²]

≥ ^C

12

8 ^exp {

−^C

12^L

8σ²

} 2Mn

∑

j=Mn+1

j^−2α. (3)

Since ^∑^2M_j=Mⁿ_n₊₁j^−2α _{∼ M}_n^−2α+1 _{∼ n}−(2α−1)/(2α), we obtain the desired

conclusion. □

Proof of Theorem 2. It is not difficult to see that inf_fˆ _{f ∈F (α,C}^sup

1⁾

Pf(∥ ˆf − f∥² ^{> cn}−(2α−1)/(2α)₎

≥ inf

θˆ^Mn

sup

θ^Mn∈{0,1}^Mn^P^θ

Mn_(∥f_θ_ˆ_Mn _{− f}_θMn_∥²> cn−(2α−1)/(2α)_),

where inf_θ_ˆMn is taken over all estimators ˆθ^Mⁿ _{∈ [0, 1]}^Mⁿ of θ^Mⁿ.

Denote by δ1n the right side on (3). Fix arbitrary estimator ˆθ^Mⁿ _∈ [0, 1]^Mⁿ⁻¹ of θ^Mⁿ. Since {0, 1}^Mⁿ is a finite set and the supremum over {0, 1}^Mⁿ is attained, there exists a sequence θ^Mⁿ such that

Eθ^Mn[∥f_θˆMn − fθ^Mn∥²] ≥ δ1n^.

Moreover,

∥f_θˆMn − fθ^Mn∥²≤ C1 2Mn

∑

j=Mn+1

j^−2α(ˆθ_j _{− θ}_j)²_{≤ C}₁²

2Mn

∑

j=Mn+1

j^−2α =: δ_2n, so that E[∥f_θˆMn _{− f}_θ^Mn_∥⁴_{] ≤ δ}²_2n. We recall the Paley-Zygmund inequality. Lemma 3(Paley-Zygmund inequality). Let Z be a real valued random vari- able with finite second moment. Then for every λ ∈ (0, 1),

P(Z≥ λE[Z]) ≥ (1 − λ)²^(E[Z])

2

E[Z²^] ^.

Apply the Paley-Zygmund inequality with λ = 1/2 and Z = ∥f_θˆMn ₋

f_θMn_∥². Then

Pθ^Mn(∥f_θˆMn _{− f}_θ^Mn_∥² _{≥ δ}1n/2⁾

≥ Pθ^Mn

(

∥f_θˆMn − fθ^Mn∥²≥ ¹

2^E^θ^Mn^[∥f^θ^ˆ^Mn ^{− f}^θ^Mn^∥

2_]

)

≥ ¹₄^(E[∥f^θ^ˆ^Mn ^{− f}^θ^Mn^∥

2_])2

E[∥f_θˆMn − fθ^Mn∥⁴^] ^≥ δ_1n² 4δ²_2n ^≥

1 256^exp

{

−^C

12^L

4σ² }

. Therefore, we conclude that

lim inf

n→∞ ^inf_fˆ _{f ∈F (α,C}^sup

1⁾

Pf(∥ ˆf − f∥²^{> δ}1n/2) ≥ ₂₅₆¹ ^exp {

−^C

12^L

4σ² }

.

(6)

Since δ_1n_{∼ n}−(2α−1)/(2α), we obtain the desired conclusion. □ References

[1] Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Statist. 33 2904-2929. [2] Hall, P. and Horowitz, J.L. (2007). Methodology and convergence rates

for functional linear regression. Ann. Statist. 35 70-91.

[3] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer.

[4] Yuan, M. and Cai, T. (2010). A reproducing kernel Hilbert space ap- proach to functional linear regression. Ann. Statist. 38 3412-3444.