• 検索結果がありません。

A Fuk-Nagaev type inequality

ドキュメント内 empirical process v3 (ページ 96-106)

and because P

(

Xn

i=1

εi(f1{F >ρ})(Xi) F

>0 )

≤P(M > ρ)≤ρ1E[M]≤1/24 by our choice ofρ, the Hoffmann-Jørgensen inequality (Proposition 2) yields that

E[Z2]≤12E[M]≤ x 4(1 +α). Hence we have

P{Z1 ≥(1 +α)(E[Z1]−E[Z2]) + 3x/4} ≤P{Z1 ≥(1 +α)E[Z1] +x/2}. Applying Talagrand’s inequality of the form (52) to Z1, we conclude that

P{Z1 ≥(1 +α)E[Z1] +x/2} ≤ec1x2/(nσ2)+ec2x/E[M]. In addition, using the Hoffmann-Jørgensen inequality again, we have

(E[Z2p])1/p≤c3{E[Z2] + (E[Mp])1/p} ≤c4(E[Mp])1/p, and so Markov’s inequality yields that

P{Z2≥x/4} ≤c5E[Mp]/xp.

Finally, since ec2x/E[M]≤ec2x/(E[Mp])1/p, and ex/xp →0 as x→ ∞, we have ec2x/E[M]≤c6E[Mp]/xp

whenx/(E[Mp])1/p≥1. But whenx/(E[Mp])1/p<1, the inequality (51) becomes trivial by takingc large enough. This completes the proof.

8 Rudelson’s inequality

The purpose of this section is to prove the following remarkable inequality by Rudelson (1999). For a matrix A, let kAkop be the operator norm of A, that is, when A has d columns,kAkop := supx∈Rd,|x|=1|Ax|.

Theorem 32 (Rudelson (1999)). Let X a random vector of dimension d ≥ 2 with Σ :=E[XX]. Let X1, . . . , Xn be independent copies of X. Then we have

E

 1 n

Xn

i=1

XiXi−Σ op

≤max{kΣk1/2op δ, δ2}, δ :=C

slogd

n E[ max

1in|Xi|2], where C >0 is a universal constant.

The theorem indeed follows from the following proposition, which we will prove later.

Proposition 10. Let A1, . . . , An be fixed d×d symmetric matrices. Let ε1, . . . , εn be independent Rademacher random variables. Letσ2 :=kPn

i=1A2ikop. Then we have

P

Xn

i=1

εiAi op

> t

≤2det2/(2σ2), ∀t >0, (53) and consequently

E

Xn

i=1

εiAi op

≤Cdσ, (54) where Cd:=p

2 log(2d) +p π/2.

Proof of Theorem 32. By the variational characterization of the operator norm, together with the symmetrization inequality (Theorem 1), we have

E

 1 n

Xn

i=1

XiXi−Σ op

=E

"

sup

αRd,|α|=1

1 n

Xn

i=1

{(αXi)2−E[(αX)2]}

#

≤2E

"

sup

αRd,|α|=1

1 n

Xn

i=1

εiXi)2

#

= 2E

 1 n

Xn

i=1

εiXiXi op

.

We shall apply Proposition 10 to the right hand side with Ai =XiXi conditionally on

X1, . . . , Xn. Then Eε

Xn

i=1

εiXiXi op

≤Cd

Xn

i=1

(XiXi)2

1/2

op

=Cd

Xn

i=1

|Xi|2XiXi

1/2

op

≤Cd max

1in|Xi|

Xn

i=1

XiXi

1/2

op

.

Hence we have

D:=E

 1 n

Xn

i=1

XiXi−Σ op

≤ 2Cd

√nE

max

1in|Xi| 1 n

Xn

i=1

XiXi

1/2

op

≤ 2Cd

√n r

E[ max

1in|Xi|2] vu uu tE

 1 n

Xn

i=1

XiXi op

≤ 2Cd

√n r

E[ max

1in|Xi|2] q

D+kΣkop. Solving this inequality gives the desired conclusion.

The remainder of this section is devoted to proving Proposition 10. The original proof uses the non-commutative Khinchin inequality. A more simple (to me) proof is given by Oliveira (2010). We shall follow here Tropp (2012b).

To prove Proposition 10, we have to prepare some background materials on matrix analysis. Let Symd denote the space of all d×d symmetric matrices, and let Sym+d denote the space of all d×d symmetric positive definite matrices. For A ∈ Symd, let A = QΛQ be the spectral expansion of A, that is, Q is an orthogonal matrix and Λ is a diagonal matrix of which the diagonal entries consist of the eigenvalues of A. Let Λ = diag(λ1, . . . , λd). A function f : R → R (or (0,∞) → R) can be extended to a function on Symd (or Sym+d) by

f(A) :=Qdiag(f(λ1), . . . , f(λd))Q. For example, the exponential ofA,eA, is defined by

eA:=Qdiag(eλ1, . . . , eλd)Q.

The Taylor expansion of x7→ex leads to eA=

X

p=0

Ap p!.

Another example is the logarithm ofA, forA∈Sym+d, which is defined by logA:=Qdiag(logλ1, . . . ,logλd)Q.

Observe that log(eA) =A for allA∈Symd.

For A∈Symd, letλmax(A) denote the largest eigenvalue ofA. Observe that kAkop = max{λmax(A), λmax(−A)}.

We introduce the partial ordering ≥on the space of symmetric matrices by A≥B ⇔A−B is positive semi-definite.

We now move to proving Proposition 10. At a first sight, one might think that the proposition could be proved by directly mimicking the proof of Hoeffding’s inequality, by using the fact that eλmax(A)max(eA)≤Tr(eA) for A∈Symd. However, the situation is not so simple, as we discuss below. For the matrix exponential, although the equality TreA+B= Tr(eAeB) does not hold in general, the one-sided inequality

TreA+B ≤Tr(eAeB), ∀A, B∈Symd,

is still valid, which is called theGolden-Thomson inequality(Bhatia, 1997, p.261). How-ever, a version of the Golden-Thompson inequality for three matrices is false, that is, TreA+B+C Tr(eAeBeC) (Bhatia, 1997, Problem IX.8.4). A consequence of this fact is that we can not directly extend the proof of Hoeffding’s inequality to the proof of inequality (53). Instead, we make use of Lieb’s concavity theorem.

Theorem 33 (Lieb (1973)). Let B∈Symd. Then the map Sym+d ∋A7→Tr exp(B+ log(A)) is concave.

We do not prove this theorem here. See Tropp (2012a) for a simple proof. An important consequence of Lieb’s theorem is the following.

Lemma 43. Let Y1, . . . , Yn be independent randomd×dsymmetric matrices. Then we have

P (

λmax Xn

i=1

Yi

!

> t )

≤inf

θ>0

(

eθtTr exp Xn

i=1

logE[eθYi]

!)

, ∀t >0.

Proof. By Markov’s inequality, for θ >0, we have P

( λmax

Xn

i=1

Yi

!

> t )

≤eθtE[eλmaxPni=1Yi)]

=eθtE[λmax(eθPni=1Yi)]

≤eθtE[TreθPni=1Yi].

Applying Lieb’s concavity theorem (Theorem 33) with A = eθY1 and B = θPn i=2Yi

conditionally onY2, . . . , Yn, we have

E[TreθPni=1Yi] =E[Tr exp(B+ log(A))]

=E[E[Tr exp(B+ log(A))|Y2, . . . , Yn]]

≤E[Tr exp(B+ logE[A])] (Jensen’s inequality)

=E

"

Tr exp θ Xn

i=2

Yi+ logE[eθY1]

!#

.

Iterating this step leads to the inequality E[TreθPni=1Yi]≤Tr exp

Xn

i=1

logE[eθYi]

! .

This completes the proof.

Proof of Proposition 10. We make use of Lemma 43. For θ >0, we have E[eθεiAi] = eθAi+eθAi

2 =I+θ2A2i

2 +θ4A4i

4! +· · · ≤eθ2A2i/2, and hence

Tr exp Xn

i=1

logE[eθεiAi]

!

≤Tr exp θ2 2

Xn

i=1

A2i

!

≤dλmax exp θ2 2

Xn

i=1

A2i

!!

≤dexp (θ2

2 λmax Xn

i=1

A2i

!)

=deθ2σ2/2. Therefore, we have

P (

λmax Xn

i=1

εiAi

!

> t )

≤dinf

θ>0etθ+θ2σ2/2 =det2/(2σ2).

Likewise, we also have P

(

λmax − Xn

i=1

εiAi

!

> t )

≤dinf

θ>0etθ+θ2σ2/2 =det2/(2σ2). The first assertion (53) follows from combining these two inequalities.

The second assertion (54) follows from the first assertion (53). Indeed, by setting Z :=kPn

i=1εiAikop, we have E[(Z/σ−p

2 log(2d))+] = Z

0

P{(Z/σ−p

2 log(2d))+> t}dt

= Z

0 P{Z >(p

2 log(2d) +t)σ}dt

≤2d Z

0

e(

2 log(2d)+t)2/2dt

≤2d Z

0

e(2 log(2d)+t2)/2dt

= Z

0

et2/2dt=p π/2.

The final conclusion follows from the inequality Z ≤p

2 log(2d)σ+ (Z/σ−p

2 log(2d))+σ.

References

Adamczak, R. (2010). A few remarks on the operator norm of random Toeplitz matrices.

J. Theoret. Probab.23 85-108.

Adler, R.J. (1990).An Introduction to Continuity, Extrema, and Related Topics for Gen-eral Gaussian Processes (IMS Lecture Notes-Monograph Series). Institute of Mathe-matical Statistics.

Andersen, N.T. and Dobri´c, V. (1987). The central limit theorem for stochastic processes.

Ann. Probab. 15164-177.

Bartlett, P., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities.

Ann. Statist.33 1497-1537.

Bhatia, R. (1997).Matrix Analysis. Springer.

Billingsley, P. (1968). Convergence of Probability Measures. Wiley.

Borell, C. (1975). The Brunn-Minkowski inequality in Gauss space. Invent. Math. 30 205-216.

Boucheron, S., Lugosi, G. and Massart, P. (2003). A sharp concentration inequality with applications. Random Structures Algorithms 16277-292.

Boucheron, S., Lugosi, G. and Massart, P. (2003). Concentration inequalities using the entropy method. Ann. Probab. 311583-1614.

Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press.

Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C.R. Acad. Sci. Paris 334 495-500.

Chernozhukov, V., Chetverikov, D., and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist.421564-1597.

Davydov, Y., Lifshits, M. and Smorodina, N. (1998).Local Properties of Distributions of Stochastic Functions (Transaction of Mathematical Monographs, Vol. 173). American Mathematical Society.

Dudley, R.M. (1967). The size of compact subsets of Hilbert space and continuity of Gaussian processes. J. Functional Anal. 1 290-330.

Dudley, R.M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899-929. (Correction: Ann. Probab.7 909-911.)

Dudley, R.M. (1999). Uniform Central Limit Theorems. Cambridge University Press.

Dudley, R.M. (2002). Real Analysis and Probability, second edition. Cambridge Univer-sity Press.

Efron, B. and Stein, C. (1981). The jackknife estimate of variance. Ann. Statist.9 586-596.

Einmahl, U. and Li, D. (2008). Characterization of LIL behavior in Banach space.Trans.

Amer. Math. Soc. 360 6677-6693.

Erd¨os, P. and Stone, A.H. (1970). On the sum of two Borel sets.Proc. Amer. Math. Soc.

25 304-306.

Gikhman, I.I. and Skorohod, A.V. (1974).The Theory of Stochastic Processes I. Springer.

Gin´e, E. (2007). Empirical Processes and Some of Their Applications. Lecture notes available from the author’s website.

Gin´e, E. and Guillou, A. (2001). On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals.Ann. Inst. H. Poincar´e Probab. Statist.37 503-522.

Gin´e, E. and Nickl, R. (2009). Uniform central limit theorems for wavelet density esti-mators. Ann. Probab.371605-1646.

Gin´e, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Sta-tistical Models. Cambridge University Press.

Gin´e, E. and Zinn, J. (1984). Some limit theorems for empirical processes.Ann. Probab.

12 929-989.

Gross, L. (1975). Logarithmic Sobolev inequalities. Amer. J. Math.971061-1078.

Hoffmann-Jørgensen, J. (1991).Stochastic Processes on Polish Spaces.Various Publica-tion Series 39Aarhus University. The manuscript was available in 1984.

Hoffmann-Jørgensen, J., Shepp, L.A. and Dudley, R.M. (1979). On lower tails of Gaus-sian seminorms. Ann. Probab.7 319-342.

Jain, N. and Marcus, M.B. (1975). Central limit theorems for C(S)-valued random variables. J. Functional Analysis 19216-231.

Koltchinskii, V.I. (1981). On the central limit theorem for empirical measures. Theor.

Probab. Math. Statist. 2471-82.

Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems.Lecture Notes in Math.2033 Springer.

Krein, T. and Rio, E. (2005). Concentration around the mean for maxima of empirical processes. Ann. Probab.331060-1077.

Landau, H.J. and Shepp, L.A. (1970). On the supremum of Gaussian processes.Sankhya 32 369-378.

Ledoux, M. (1996). On Talagrand’s deviation inequalities for product measures.ESAIM:

Probab. Statist.1 63-87.

Ledoux, M. (2001).The Concentration of Measure Phenomenon. American Mathemati-cal Society.

Ledoux, M. and Talagrand, M. (1991).Probability in Banach Spaces. Springer.

Li, W.V. and Shao, Q.-M. (2001). Gaussian Processes: Inequalities, Small Ball Proba-bilities and Applications. In: Stochastic Processes: Theory and Methods, Handbook of Statistics, Vol. 19, pp: 533-598.

Lieb, E.H. (1973). Convex trace functions and the Wigner-Yanase-Dyson conjecture.

Adv. Math. 11267-288.

Marcus, M.B. and Shepp, L.A. (1971). Sample behavior of Gaussian processes. Proc.

Sixth Berkeley Symp. Math. Statist. Probab. 2 423-442.

Massart, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab.28863-884.

Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1893Springer.

Oliveira, R.I. (2010). Sums of random Hermitian matrices and an inequality by Rudelson.

Elec. Comm. in Probab. 15203-212.

Panchenko, D. (2003). Symmetrization approach to concentration inequalities for em-pirical processes. Ann. Probab.31 2068-2081.

Pisier, G. (1986). Some applications of the metric entropy condition to harmonic anal-ysis. Banach Spaces, Harmonic Analysis, and Probability Theory. Lecture Notes in Mathematics 995 123-154. Springer.

Pisier, G. (1989).The Volume of Convex Bodies and Banach Space Geometry. Cambridge University Press.

Pollard, D. (1982). A central limit theorem for empirical processes. J. Austral. Math.

Soc. Ser. A 33235-248.

Rudelson, M. (1999). Random vectors in the isotropic position.J. Functional Anal. 164 60-72.

Slepian, D. (1962). The one-sided barrier problem for Gaussian noise.Bell Sys. Tech. J.

463-501.

ドキュメント内 empirical process v3 (ページ 96-106)

関連したドキュメント