pdf Research Kengo Kato NPIV supplement v2

(1)

SUPPLEMENT TO “QUASI-BAYESIAN ANALYSIS OF NONPARAMETRIC INSTRUMENTAL VARIABLES

MODELS”^∗ By Kengo Kato University of Tokyo

This supplemental file contains the additional technical proofs omitted in the main text, and some technical tools used in the proofs.

APPENDIX A: CDV WAVELET BASES AND BESOV SPACES A.1. Wavelet bases for L₂[0, 1]. We review wavelet theory on the compact interval [0, 1]. We refer the reader to [4], [6] and [5, Chapter 7 and Appendix B] as useful general references on wavelet theory in the statistical (and signal processing) context.

Let (ϕ, ψ) be a Daubechies pair of the scaling function and wavelet of a multiresolution analysis of the space L2(R) of order N , with ψ having N vanishing moments and support contained in [−N + 1, N], and ϕ having support contained in [0, 2N − 1] [see 4, Remark 7.1]. We translate ϕ so that its support is contained in [−N + 1, N]. Define

ϕ_jk(x) = 2^j/2ϕ(2^j_{x − k), ψ}_jk(x) = 2^j/2ψ(2^j_{x − k).}

Then, for any fixed J₀ ≥ 0, it is known that {ϕJ0k^{, ψ}jk, j ≥ J0, k ∈ Z} forms an orthonormal basis for L₂(R). However, we need an orthonormal basis for L₂[0, 1]. From the Daubechies pair (ϕ, ψ), we wish to construct an orthonormal basis for L₂[0, 1]. The construction here is based on Cohen et al. [2, Section 4]. See also Chapter 7.5 of [6] for wavelet bases on [0, 1].

Take a fixed resolution level j such that 2^j ≥ 2N. For k = N, . . . , 2^j − N − 1, ϕjk are supported in [0, 1] and left unchanged: ϕ^int_jk(x) = ϕ_jk(x) for x ∈ [0, 1]. At boundaries, k = 0, . . . , N − 1, construct suitable functions ϕ^Lk

with support [0, N + k] and ϕ^R_k with support [−N − k, 0], and define ϕînt_jk(x) = 2^j/2ϕ^L_k(2^jx), ϕînt_j,2j_−k−1(x) = 2^j/2ϕ^R_k(2^j(x − 1)), x ∈ [0, 1]. Note that both ϕ^L_k and ϕ^R_k have the same smoothness as ϕ. Define the multiresolution spaces V_j _{= span{ϕ}înt_jk, k = 0, . . . , 2^j− 1}, which satisfy the

∗Supported by the Grant-in-Aid for Young Scientists (B) (25780152) from the JSPS. 1

(2)

following properties (i) dim(V_j) = 2^j; (ii) V_j _{⊂ V}_j+1; (iii) each V_j contains all polynomials of order at most N − 1.

Turning to the wavelet spaces, define Wj by the orthogonal complement of V_j in V_j+1. Starting from the Daubechies wavelet ψ, construct ψînt_jk similarly to ϕînt_jk. Then, we have Wj _{= span{ψ}_jkînt, k = 0, . . . , 2^j−1}, and for any J⁰≥ 1 with 2^J⁰ ≥ 2N and J > J⁰^,

V_J = V_J0

J−1M

j≥J⁰

W_j, L₂[0, 1] = V_J0

M

j≥J⁰

W_j.

Therefore, {ϕ^intJ0k}k=0²^J0⁻¹∪ {ψjk^int, j ≥ J⁰, k = 0, . . . , 2^j− 1} forms an orthonormal basis for L₂[0, 1] [see Section 4 of 2, for formal proofs of these results]

Definition 1. Call the so-constructed basis {ϕ^intJ0k}²_k=0^J0⁻¹ ∪ {ψ^intjk, j ≥ J0, k = 0, . . . , 2^j − 1} the CDV (Cohen-Daubechies-Vial) wavelet basis for L₂[0, 1] generated from the Daubechies pair (ϕ, ψ). If (ϕ, ψ) is S-regular, i.e., if (ϕ, ψ) are S-times continuously differentiable, then call the so-generated CDV wavelet basis S-regular.

Remark 1. For any given positive integer S, there is an S-regular Daubechies pair (ϕ, ψ) by taking the order N sufficiently large [see 4, Re- mark 7.1].

A.2. Besov spaces. We recall the definition of Besov spaces.

Definition 2. Let 0 < s < S, s ∈ R, S ∈ N and 1 ≤ p, q ≤ ∞. Let {ϕ^intJ0k}_k=0²^J0⁻¹∪ {ψjk^int, j ≥ J0, k = 0, . . . , 2^j− 1} be an S-regular CDV wavelet basis for L2[0, 1]. Let

ϕ^int_J₀_k(f ) = Z 1

0

f (x)ϕ^int_J₀_k(x)dx, ψ^int_jk(f ) = Z 1

0

f (x)ψ_jk^int(x)dx.

Then the Besov space B_p,q^s is defined by the set of functions {f ∈ Lp^{[0, 1] :}

kfks,p,q < ∞}, where

kfk^s,p,q^:=



 X

0≤k≤2^J0−1

|ϕ^intJ0k(f )|^p





1/p

+





 X

j≥J⁰





²

j(s+1/2−1/p)



 X

0≤k≤2^j−1

|ψ^intjk(f )|^p





1/p^





q





1/q

,

(3)

with the obvious modification in case p = ∞ or q = ∞.

Remark2. Besov spaces cover commonly used smooth function spaces. For example, B_∞,∞^s is equal to the Hölder-Zygmund space, which coincides with the classical Hölder space for non-integer s. For integer s, they do not coincide but the Hölder-Zygmund space contains the classical Hölder space. Moreover, B_2,2^s is equal to the classical L₂-Sobolev space when s is an integer. See [5], Appendix B.

APPENDIX B: PROOFS OF LEMMAS 1-3

Proof of Lemma 1. For part (ii), the lower bound on s_min(E[φ^J(W )^⊗2]) follows from Assumption 1 (iii); the upper bounds on smax(E[φ^J(W )^⊗2]) and s_max(E[φ^J(W )φ^J(X)^T]) follow from Assumption 1 (i) and the fact that {φl} is an orthonormal basis of L₂[0, 1] (see (8)). Part (iii) follows from Rudel- son’s [8] inequality and (i). For the reader’s convenience, we state Rudel- son’s inequality in Appendix E. For Part (v), we first note that, by (iii) and Weyl’s perturbation theorem [1, Problem III.6.13], s_min( ˆΦ_{W X}_{) ≥ τ}_J_n₋ OP(pJn2^Jⁿ/n). Since nowpJn2^Jⁿ/n = o(τJn), we have smin( ˆΦW X_{) ≥ (1 −}

o_P(1))τ_J_n. For the proof of (i), denote by N the order of the Daubechies pair (ϕ, ψ) generating the CDV wavelet basis {φl, l ≥ 1}. Then, for each x ∈ [0, 1] and each j ≥ J⁰, the number of nonzero elements in φ₂^j₊₁(x), . . . , φ₂^j+1(x) is bounded by some constant depending only on N , and each φ₂^j_+k(x) is bounded by some constant (depending only on ψ) times 2^j/2 for all k = 1, . . . , 2^j. Similarly, φ1, . . . , φ₂J0 are uniformly bounded. Therefore, there exists a constant D depending only on (ϕ, ψ) such that kφ^J(x)k²_ℓ² ≤ D(2^J⁰ +^P^J−1_j=J₀2^j) = D2^J for all x ∈ [0, 1].

Finally, we wish to show Part (iv). First, observe that kEⁿ^[φ^Jⁿ^(Wⁱ^)Rⁱ]k²_ℓ² ≤ 2kEn^[φ^Jⁿ^(Wi^)Ri]−E[φ^Jⁿ(W )R]k²_ℓ²+2kE[φ^Jⁿ(W )R]k²_ℓ². By a simple moment calculation, the first term is O_P(2^Jⁿ/n). For the second term, by Assump- tions 3 and 4 (ii),

kE[φ^Jⁿ(W )R]k²_ℓ² = kE[φ^Jⁿ^{(W )(g}0− PJ^g0)(X)]k²_ℓ² .τ_J²_n_kg₀_{− P}_J_ng₀_k²

.τ_J²_n2^−2Jⁿ^s. This completes the proof.

Proof of Lemma 2. Step 1. We first show that |Σn| = 1 + o(1) (|Σn| denotes the determinant of Σ_n). Let λ_min,n and λ_max,n denote the minimum and maximum eigenvalues of Σn, respectively. Then, by Weyl’s perturbation

(4)

theorem, 1−o(kn⁻¹) ≤ λmin,n ≤ λmax,n≤ 1+o(k⁻¹n ), so that (1−o(kn⁻¹⁾⁾^kⁿ ⁼

λ_min,n^kⁿ _{≤ |Σ}_n_{| ≤ λ}_max,n^kⁿ = (1 + o(k⁻¹_n ))^kⁿ. Here both sides converge to 1. Step 2. By Step 1, we have

Z

|dN(0, Σn)(x) − dN(0, Ikn)(x)|dx

= ¹

(2π)^kⁿ^/2 Z

1

|Σn|^1/2^e

−x^T^Σ⁻¹ⁿ ^x/2_{− e}−x^T^x/2

dx

≤

1

|Σⁿ|^1/2 ^{− 1}

+ ¹

(2π)^kⁿ^/2_|Σn_|^1/2

Z

|e^−x^T^Σ⁻¹ⁿ ^x/2− e^−x^T^x/2|dx

≤ o(1) + ¹

(2π)^kⁿ^/2(1 + o(1)) Z

e^−x^T^x/2_|e^−x^T^(Σ⁻¹ⁿ ^−I^kn^)x/2_{− 1|dx} By assumption, we have ǫ_n _{:= kΣ}⁻¹_n _{− I}_k_n_k_op _{≤ kΣ}⁻¹_n _k_op_kI_k_n _{− Σ}_n_k_op = o(k⁻¹_n _{). Now, |e}^−x^T^(Σ⁻¹ⁿ ^−I^kn^)x/2 _{− 1| ≤ e}^ǫⁿ^x^T^x/2 _{− e}^−ǫⁿ^x^T^x/2. By a direct calculation, the conclusion follows from the fact that (1±ǫⁿ⁾^kⁿ ^{= 1+o(1).}

Proof of Lemma 3. The first assertion follows from the assumption. Suppose now that Â_nis non-singular. Then, Â⁻¹_n A_n= ( Â_n_−A_n+A_n)⁻¹A_n= (A⁻¹_n A^ˆn_−Ikn^+Ikn⁾⁻¹^{. Here, A}⁻¹n Â^ˆⁿ−Ikn ^{= A}⁻¹n ^{( ˆ}Âⁿ−Aⁿ), so that kA⁻¹n Â^ˆⁿ− I_k_n_k_op _{≤ kA}⁻¹_n _k_op_{k ˆ}A_n_{− A}_n_k_op = s⁻¹_min(A_n_{)k ˆ}A_n_{− A}_n_k_op = O_P(ǫ⁻¹_n δ_n). Let

∆ = Iˆ _k_n_−A⁻¹_n A^ˆ_n. Then, Â⁻¹_n A_n= (I_k_n_{− ˆ}∆)⁻¹= I_k_n+^P^∞_m=1∆^ˆ^m(Neumann series). Therefore, we conclude that k ˆÂ⁻¹n Âⁿ− Iknkôp = k^P^∞m=1^∆^ˆ^mkôp ≤ P_∞

m=1k ˆ∆k^mop = k ˆ∆k^op·^P^∞m=0k ˆ∆k^mop ^{= O}^P^(ǫ⁻¹n ^δⁿ^).

APPENDIX C: PROOF OF THEOREM 3 For the notational convenience, define

E_Π

n[ · | Dn^{] :=}

Z

· Πn(dg | Dn^{), E}_Π˜_n[ · | Dn^{] :=}

Z

· ˜^Πn^(db^Jⁿ | Dn^).

Proof of Theorem 3. Define the event

E3n= {Dn^{: ˆ}^ΦW X ^{and ˆ}^ΦW W are non-singular}.

Then, by Lemma 1, P{1(E3n) = 1} = P(E3n) → 1. Suppose that 1(E3n^{) = 1.}

Then, by (16), ℓ_bJn_(Dn) defined in the proof of Proposition 4 is bounded from below by

ckbˆ ^Jⁿk²ℓ² + a term independent of b^Jⁿ, for some positive random variable ˆc. Hence, the integral E_Π_˜

n^[kb

Jn_k

ℓ² | Dn^]

is finite as soon as 1(E3n) = 1. This proves the first assertion.

(5)

In what follows, we wish to prove the convergence rate result (14). First of all, by the triangle inequality and Jensen’s inequality,

1(E3n)kˆgQB− g0k ≤ 1(E3n)kˆgQB− PJn^g0k + kg0− PJn^g0k

= 1(E3n)kEΠn[g − PJn^g0 | Dn]k + kg0− PJn^g0k

= 1(E3n)kE_Π˜_n^[b^Jⁿ− b^J0ⁿ | Dn]kℓ²+ kg0− PJn^g0k

≤ 1(E3n^)E_Π˜_n[kb^Jⁿ− b^J0ⁿkℓ² | Dn] + kg0− PJn^g0k. Since kg0−PJn^g0k = O(2^−Jⁿ^s), it suffices to show that there exists a constant D > 0 such that for every Mn_{→ ∞,}

P h

1(E3n^)E_Π˜_n[kb^Jⁿ− b^J0ⁿkℓ² | Dn^]

≤ D max{2^−Jⁿ^s^{, τ}J⁻¹n

q

2^Jⁿ/n, τ_J⁻¹

n^ǫⁿ^̺ⁿ^Mⁿ^}

i

→ 1. Let π_n^∗(θ^Jⁿ _{| D}n) be the (random) density defined in the proof of Theorem 1. Note that π^∗_n(θ^Jⁿ _{| D}_n) is well-defined as soon as 1(E3n) = 1. Let δ_n := ǫ_n+ τ_J_n2^−Jⁿ^s. Then we have:

Lemma 1. There exists a constant c1 > 0 such that for every sequence M_n_{→ ∞ with M}_n= o(L_n),

P (

1(E3n⁾

Z

kθ^Jⁿkℓ² · |π^∗n^(θ^Jⁿ | Dn) − dN(∆n^{, ˆ}^ΦW W^)(θ^Jⁿ)|dθ^Jⁿ

≤ e^−c¹^Mⁿ^nδ²ⁿ^{+ M}n^√^nδn^̺n

)

→ 1,

where ∆n:=^√nEn[φ^Jⁿ(Wi)Ri].

We defer the proof of Lemma 1 after the proof of this theorem. Note that 1(E³ⁿ⁾

Z

kθ^Jⁿkℓ²^{dN (∆}ⁿ^{, ˆ}^ΦW W^)(θ^Jⁿ^)dθ^Jⁿ

2

≤ 1(E3n⁾

Z

kθ^Jⁿk²_ℓ²^{dN (∆}n^{, ˆ}^ΦW W^)(θ^Jⁿ^)dθ^Jⁿ

≤ k∆nk²ℓ² ^{+ tr( ˆ}^Φ^{W W}^).

(6)

By the proof of Theorem 2, there exists a constant D₁ > 0 such that P_{k∆_n_k²

ℓ² ^{+ tr( ˆ}^Φ^{W W}^{) ≤ D}¹^(nτJ²n²

−2Jⁿ^s_{+ 2}^Jⁿ)} → 1. Hence for every sequence Mn_{→ ∞ with M}n= o(Ln), with probability approaching one,

q

D₁(nτ_J²

n²^−2J

ns_{+ 2}Jn_{) + e}−c¹^Mⁿ^nδn² _{+ M}

n^√^nδn^̺n

≥ 1(E3n⁾

Z

kθ^Jⁿkℓ²^πn^∗^(θ^Jⁿ | Dn⁾

= 1(E³ⁿ⁾^√ⁿ Z

kˆ^Φ^{W X}^(b^Jⁿ− b^J0ⁿ)kℓ²^π^˜n(b^Jⁿ _{| D}n)db^Jⁿ

≥ 1(E3n⁾^√^nsmin^{( ˆ}^ΦW X^)E_Π˜_n[kb^Jⁿ− b^J0ⁿkℓ² | Dn^].

Take M_n→ ∞ sufficiently slowly such that ̺n^Mn→ 0. Since the left side is then . max{^√^nτJn²^−Jⁿ^s^,

√2^Jⁿ,^√nǫ_n̺_nM_n}, there exists a constant D2^{> 0}

such that P

h

1(E3n^)smin^{( ˆ}^ΦW X^)E_Π˜_n[kb^Jⁿ− b^J0ⁿkℓ² | Dn^]

≤ D2max{τJn²^−J

ns_,^q₂_J_n_{/n, ǫ}

n^̺n^Mn}ⁱ→ 1. Finally, by Lemma 1, P(s_min( ˆΦ_{W X}_{) ≥ 0.5τ}_J_n) → 1, by which we have

P h

1(E3n^)E_Π˜_n[kb^Jⁿ− b^J0ⁿkℓ² | Dn^]

≤ 2D2max{2^−Jⁿ^s^{, τ}J⁻¹n

q

2^Jⁿ/n, τ_J⁻¹

n^ǫⁿ^̺ⁿ^Mⁿ^}

i

→ 1. This leads to the desired conclusion (it is not difficult to see that the final expression holds for every sequence M_n_{→ ∞).}

Proof of Lemma 1. As before, we say that a sequence of random variables A_n is eventually bounded by another sequence of random variables B_n if P(A_n_{≤ B}_n_{) → 1.}

Take any Mn_{→ ∞ with M}n= o(Ln). Then, 1(E3n⁾

Z

kθ^Jⁿkℓ² · |π^∗n^(θ^Jⁿ | Dn) − dN(∆n^{, ˆ}^ΦW W^)(θ^Jⁿ)|dθ^Jⁿ

≤ 1(E3n⁾

Z

kθ^Jnkℓ2≤Mⁿ^√^nδⁿ^kθ Jn_k

ℓ²· |πn^∗^(θ^Jⁿ | Dn) − dN(∆n^{, ˆ}^ΦW W^)(θ^Jⁿ)|dθ^Jⁿ + 1(E3n⁾

Z

kθ^Jnkℓ2^>Mⁿ

√nδn

kθ^Jⁿkℓ²^πn^∗^(θ^Jⁿ | Dn^)dθ^Jⁿ

+ 1(E3n⁾

Z

kθ^Jnkℓ2^>Mⁿ

√_nδ

n

kθ^Jⁿkℓ²^{dN (∆}n^{, ˆ}^ΦW W^)(θ^Jⁿ^)dθ^Jⁿ

=: I + II + III.

(7)

We divide the rest of the proof into three steps.

Step 1. Claim: ∃c2 > 0 such that P(II ≤ e^−c²^Mⁿ²^nδⁿ²) → 1.

(Proof of Step 1): The assertion of Step 1 follows from the same line as in the proof of Proposition 4 by noting that for any c > 0, xe^−cx² _{≤ e}^−cx²^/2 for all x > 0 sufficiently large. Hence the proof is omitted.

Step 2. Claim: ∃c³ > 0 such that P(III ≤ e^−c³^Mⁿ^nδ²ⁿ) → 1.

(Proof of Step 2): By the Cauchy-Schwarz inequality, the square of III is bounded by_{R kθ}^Jⁿ_k²_ℓ2dN (∆_n, ˆΦ_{W W})dθ^Jⁿ^R_kθ_Jn_k

ℓ2^>Mⁿ

√_nδ

n^{dN (∆}ⁿ^{, ˆ}^Φ^{W W}^)dθ

Jn_.

Here the first integral is bounded by

k∆nk²ℓ² ^{+ tr( ˆ}^Φ^{W W}^),

which is eventually bounded by D(nτ_J²_n2^−2Jⁿ^s+2^Jⁿ) for some constant D > 0 by the proof of Theorem 2. On the other hand, by the proof of Theorem 1, the second integral is eventually bounded by ^R_kθ_Jn_k

ℓ2^>

√Mnnδn^{dN (0, I}²^Jn^)dθ

Jn_.

By Borell’s inequality for Gaussian measures (see (23)), the last integral is bounded by e^−c^′^Mⁿ^nδⁿ² for some small constant c^′ > 0. Taking these together, we obtain the conclusion of Step 2 by choosing the constant c3 > 0 sufficiently small.

Step 3. Claim: ∃c4 > 0 such that P(I ≤ e^−c⁴^Mⁿ²^nδ²ⁿ^{+ M}n^√^nδn^̺n) → 1. (Proof of Step 3): Let Cn := {θ^Jⁿ ∈ R²^Jn : kθ^Jⁿkℓ² ≤ Mn^√^nδn}. Let π_n,C^∗

n^(θ

Jn _{| D}

n^{) and dN}^Cⁿ^(∆n^{, ˆ}^ΦW W^)(θ^Jⁿ) denote the probability densities obtained by first restricting π_n^∗(θ^Jⁿ _{| D}n) and dN (∆n, ˆΦW W)(θ^Jⁿ) to the ball Cn and then renormalizing, respectively. Then, abbreviating π^∗_n(dθ^Jⁿ _| Dn^{) by π}_n^∗^{, π}_n,C^∗ _n^(dθ^Jⁿ | Dn^{) by π}_n,C^∗ _n^{, dN (∆}n^{, ˆ}^ΦW W^)(θ^Jⁿ) by dN , and dN^Cⁿ(∆_n, ˆΦ_{W W})(θ^Jⁿ) by dN^Cⁿ we have

I ≤ 1(E³ⁿ⁾ Z

kθ^Jⁿkℓ² · |π_n,C^∗ n− dN^Cⁿ| + 1(E3n⁾

Z

ℓ²· |π_n,C^∗ n− π^∗n| + 1(E3n⁾

Z

ℓ²· |dN^Cⁿ− dN|

=: IV + V + IV.

By the proof of Theorem 1, the term IV is eventually bounded by 1(E3n^)Mn^√^nδn

Z

|π_n,C^∗ n− dN^Cⁿ| ≤ Mn^√^nδn^̺n^.

(8)

For the term V , we have

V ≤ 1(E3n^)Mn^√^nδn

Z

kθ^Jnkℓ2≤Mn^√nδn

|π^∗_n,Cn− πn^∗|

= 1(E³ⁿ^)Mⁿ^√^nδⁿ Z

kθ^Jnkℓ2^>Mⁿ

√nδn

π_n^∗.

By the proof of Proposition 4, there exists a constant c₅ > 0 such that the integral on the right side is eventually bounded by e^−c⁵^Mⁿ²^nδ²ⁿ, so that P(V ≤ e^−c⁵^Mⁿ²^nδⁿ²^/2) → 1. Likewise, by Borell’s inequality for Gaussian measures, there exists a constant c₆ > 0 such that P(V I ≤ e^−c⁶^Mⁿ^nδⁿ²) → 1. Taking these together, we obtain the conclusion of Step 3 by choosing the constant c4> 0 sufficiently small.

Finally, Steps 1-3 lead to the conclusion of Lemma 1.

APPENDIX D: PROOFS FOR SECTION 4

Proof of Proposition 2. We only consider the mildly ill-posed case. The proof for the severely ill-posed case is similar. For either case of product or isotropic priors, it suffices to check conditions P1) and P2) in Theorem 1. We shall do this with the choice ǫ_n=p2^Jⁿ(log n)/n ∼ (log n)^1/2ⁿ−(r+s)/(2r+2s+1)_.

Case of product priors: Let c_min := min_x∈[−A,A]q(x) > 0. Since kb^Jⁿ − b^J₀ⁿ_k²_ℓ2 =^P²_l=1^Jn(b_l_{− b}_0l)²_{≤ 2}^Jⁿmax_1≤l≤2Jn(b_l_{− b}_0l)², we have

Π˜_n(b^Jⁿ _{: kb}^Jⁿ_{− b}₀^Jⁿ_k_ℓ² _{≤ ǫ}_n_{) ≥ ˜}Π_n

b^Jⁿ : max

1≤l≤2^Jn^|b^l^{− b}^0l^{| ≤ ǫ}ⁿ^/

√2^Jⁿ

≥

2^Jn

Y

l=1

Π˜n(b^Jⁿ _{: |b}_l_{− b}_0l_{| ≤ ǫ}n/^√2^Jⁿ).

Since ∃ǫ ∈ (0, A), b0l ∈ [−A+ǫ, A−ǫ] for all l ≥ 1, for all n sufficiently large, the last expression is bounded from below by

cmin^ǫn

√2^Jⁿ

2^Jn

= e⁻²^Jn^log(^√²^Jn^/(c^min^ǫⁿ⁾⁾ _{≥ e}^−Cnǫ²ⁿ,

where C > 0 is a sufficiently large constant, which verifies condition P1). Second, with this ǫ_n, γ_n in condition P2) is ∼ (log n)^1/2ⁿ−s/(2r+2s+1)_.

Let, say, L_n _{∼ (log n)}^1/2 so that L_nγ_n _{∼ (log n)n}−s/(2r+2s+1)_{. Then, {b}^Jⁿ _:

kb^Jⁿ − b^J0ⁿkℓ² ≤ Lⁿ^γⁿ} ⊂ [−A, A]²^Jn for all n sufficiently large, so that

(9)

˜

π_n(b^Jⁿ) =^Q²_l=1^Jnq(b_l) is positive for all kb^Jⁿ− b0^Jⁿkℓ² ≤ Ln^γn. Let kb^Jⁿkℓ² ≤ Lnγn _{and k˜b}^Jⁿ_k_ℓ² _{≤ L}nγn. Then,

˜

π_n(b^J₀ⁿ+ b^Jⁿ)

˜

π_n(b^J₀ⁿ+ ˜b^Jⁿ) ^{= exp}





2^Jn

X

l=1

{log q(b0l^{+ b}l) − log q(b0l^{+ ˜b}l)}





≤ exp





 L

2^Jn

X

l=1

|bl− ˜bl|







≤ expⁿ^L^√²^Jⁿkb^Jⁿ− ˜b^Jⁿkℓ²

o

≤ e^2L

√2^JnLnγn _{= e}o(1)_,

where the last step is due to s > 1/2 . Likewise, we have

˜

π_n(b^J₀ⁿ+ ˜b^Jⁿ) ^{≥ e}

−2L^√²^Jn^Lⁿ^γⁿ _{= e}−o(1)_.

Therefore, condition P2) is verified.

Case of isotropic priors: Let c_min := min_x∈[0,A]r(x) > 0. Then, for all n sufficiently large,

Π˜_n(b^Jⁿ _{: kb}^Jⁿ_{− b}^J₀ⁿ_k_ℓ² _{≤ ǫ}_n) = R

kb^Jn−b^Jn0 kℓ2≤ǫⁿ^r(kb^Jⁿ^k^ℓ²^)db^Jⁿ

R r(kb^Jⁿkℓ²^)db^Jⁿ

= R

kb^Jnkℓ2≤ǫⁿ^r(kb^Jⁿ^{+ b} Jn

0 kℓ²^)db^Jⁿ

R r(kb^Jⁿkℓ²^)db^Jⁿ ^≥

c_min^R_kb_Jn_k

ℓ2≤ǫⁿ^db^Jⁿ

R r(kb^Jⁿkℓ²^)db^Jⁿ

= c_min R

x∈[0,ǫⁿ^]^x²

Jn₋₁

dx R_∞

0 ^x²

Jn₋₁_r(x)dx ≥ cmin

_ǫ

n

2^Jⁿ

2^Jn

× e^−c^′′²^Jn^log(2^Jn⁾

= c_mine⁻²^Jn^log(2^Jn^/ǫⁿ^)−c^′′²^Jn^log(2^Jn⁾_{≥ e}^−Cnǫ²ⁿ,

where C > 0 is a sufficiently large constant, which verifies condition P1). Second, with this ǫ_n, γ_nin condition P2) is ∼ (log n)^1/2ⁿ−s/(2r+2s+1)_{. Let}

L_n_{∼ (log n)}^1/2so that L_nγ_n_{∼ (log n)n}−s/(2r+2s+1)_{. Since kb}^Jⁿ

0 kℓ² ≤ kg0k < A and L_nγ_n _{→ 0, {b}^Jⁿ _{: kb}^Jⁿ _{− b}^J₀ⁿ_k_ℓ² _{≤ L}_nγ_n_{} ⊂ {b}^Jⁿ _{: kb}^Jⁿ_k_ℓ² _{≤ A}} for all n sufficiently large, so that ˜π_n(b^Jⁿ_{) ∝ r(kb}^Jⁿ_k_ℓ²) is positive for all kb^Jⁿ − b^J0ⁿkℓ² ≤ Ln^γn. Let kb^Jⁿkℓ² ≤ Ln^γn and k˜b^Jⁿkℓ² ≤ Ln^γn^{. Then by}

Parseval’s identity,

kb0^Jⁿ^{+ b}^Jⁿkℓ² ≤ kb0^Jⁿkℓ²^{+ L}n^γn→ kg0k, and likewise we have

kb0^Jⁿ^{+ b}^Jⁿkℓ² ≥ kb0^Jⁿkℓ²− Ln^γn→ kg0k.

(10)

Therefore, we conclude that, uniformly in kb^Jⁿkℓ² ≤ Ln^γn, k˜b^Jⁿkℓ² ≤ Ln^γn^,

˜

π_n(b^J₀ⁿ+ ˜b^Jⁿ) ⁼

r(kb^J0ⁿ^{+ b}^Jⁿkℓ²⁾

r(kb^J0ⁿ^{+ ˜b}^Jⁿkℓ²⁾

→ ^r(kg⁰^k) r(kg⁰k)^{= 1.} Hence condition P2) is verified.

Proof of Proposition 3. Given the proof of Proposition 2 and the discussion following Theorem 3, it is sufficient to verify that ̺_nis o((log n)^−1/2) in the mildly ill-posed case (in the severely ill-posed case, it is sufficient to verify that ̺_n= o(1)). However, this is readily verified by tracking the proof of Proposition 2.

APPENDIX E: TECHNICAL TOOLS

We state here Rudelson’s inequality for the reader’s convenience.

Theorem1 (Rudelson’s [8] inequality). LetZ₁, . . . , Z_n be i.i.d. random vectors in R^k with Σ := E[Z₁^⊗2]. Then for every k ≥ 2,

E



 1 n

n

X

i=1

Z_i^⊗2_{− Σ} op



_{≤ max{kΣk}^1/2_op δ, δ²_{}, δ = D} slog k

n ^E^{[ max}1≤i≤n^kZⁱ^k 2 ℓ²^],

where D is a universal constant.

Rudelson’s inequality implies the following corollary useful in our appli- cation.

Corollary 1. Let (X₁, Y₁^T)^T, . . . , (X_n, Y_n^T)^T be i.i.d. random vectors withX_i_{∈ R}^k¹, Y_i _{∈ R}^k², and k₁+ k₂ _{≥ 2. Let Σ}_X := E[X₁^⊗2], Σ_Y := E[Y₁^⊗2] andΣ_XY := E[X1Y₁^T]. Suppose that there exists a finite number m such that E_[max_1≤i≤n_kX_i_k²

ℓ²^{] ∨ E[max}1≤i≤nkYik²_ℓ²] ≤ m. Then E



 1 n

n

X

i=1

X_iY_i^T _{− Σ}_XY op



_{≤ max{(kΣ}_X_k^1/2_op _{∨ kΣ}_Y_k^1/2_op )δ, δ²_},

withδ = D

rm log(k1+ k2)

n ^,

where D is a universal constant.

(11)

Proof. Let Z_i = (X_i, Y_i^T)^T, and apply Rudelson’s inequality to Z₁, . . . , Z_n. Note that by the variational characterization of the operator norm, we have kn⁻¹^Pⁿi=1^Xⁱ^Yi^T − Σ^XYk^op ≤ kn⁻¹^Pⁿi=1^Zi^⊗2 − E[Z1^⊗2]k^op^{, and by} the Cauchy-Schwarz inequality, kE[Z1^⊗2]kop≤ 2kΣXkop+ 2kΣYkop^.

Lastly, we shall recall Talagrand’s [9] concentration inequality for general empirical processes. The following version is due to [7]. Here, for a generic class F of measurable functions on some measurable space X , we say that F is pointwise measurable if there exists a countable subclass G ⊂ F such that for any f ∈ F, there exists a sequence {g^m} ⊂ G with g^m(x) → f(x) for every x ∈ X . See Chapter 2.3 of [10].

Theorem 2 (Massart’s form of Talagrand’s inequality). Let ξ₁, . . . , ξ_n be i.i.d. random variables taking values in some measurable space_{X . Let F} be a pointwise measurable class of functions on X such that E[f(ξ1^{)] = 0}

for all f ∈ F and sup_{f ∈F}^sup_x∈X|f(x)| ≤ B for some constant B > 0. Let σ² be any positive constant such that σ² _{≥ sup}_{f ∈F}^E[f²(ξ1)]. Let Z := sup_{f ∈F}_|^Pⁿ_i=1f (ξ_i)|. Then for every x > 0,

P{Z ≥ C(E[Z] + σ^√nx + Bx)} ≤ e^−x^, where C > 0 is a universal constant.

REFERENCES

[1] Bhatia, R. (1997). Matrix Analysis. Springer.

[2] Cohen, A., Daubechies, I., and Vial, P. (1993). Wavelets on the interval and fast wavelet transforms. Appl. Comput. Harmon. Anal. 1 54-81.

[3] Ghosal, S., Ghosh, J. K. and van der Vaart, A.W.(2000). Convergence rates of pos- terior distributions. Ann. Statist. 28 500-531.

[4] H¨ardle, W., Kerkyacharian, F., Picard, D., and Tsybakov, A. (1998). Wavelets, Ap- proximation, and Statistical Applications. Springer.

[5] Johnstone, I.M. (2011). Gaussian Estimation: Sequence and Multiresolution Models. Unpublished draft.

[6] Mallat, S. (2009). A Wavelet Tour of Signal Processing. Third Edition. Academic Press.

[7] Massart, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863-884.

[8] Rudelson, M. (1999). Random vectors in the isotropic position. J. Functional Anal. 164_60-72.

[9] Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126505-563.

[10] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer.

Graduate School of Economics, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan. E-mail:kkato@e.u-tokyo.ac.jp