A tail inequality for quadratic forms of subgaussian random vectors

(1)

ISSN:1083-589X in PROBABILITY

A tail inequality for quadratic forms of subgaussian random vectors

Daniel Hsu

^∗

Sham M. Kakade

^†

Tong Zhang

^‡

Abstract

This article proves an exponential probability tail inequality for positive semidefinite quadratic forms in a subgaussian random vector. The bound is analogous to one that holds when the vector has independent Gaussian entries.

Keywords:Tail inequality; quadratic form; subgaussian random vectors; subgaussian chaos.

AMS MSC 2010:60F10.

Submitted to ECP on June 11, 2012, final version accepted on October 29, 2012.

SupersedesarXiv:1110.2842.

1 Introduction

Suppose thatx= (x1, . . . , xn)is a random vector. LetA∈R^n×nbe a fixed matrix. A natural quantity that arises in many settings is the quadratic formkAxk² =x^>(A^>A)x. Throughoutkvk denotes the Euclidean norm of a vectorv, andkMkdenotes the spec- tral (operator) norm of a matrix M. We are interested in how close kAxk² is to its expectation.

Consider the special case wherex1, . . . , xn are independent standard Gaussian random variables. The following proposition provides an (upper) tail bound forkAxk². Proposition 1.1. LetA∈R^n×nbe a matrix, and letΣ:=A^>A. Letx= (x1, . . . , xn)be an isotropic multivariate Gaussian random vector with mean zero. For allt >0,

Prh

kAxk²>tr(Σ) + 2p

tr(Σ²)t+ 2kΣkti

≤e^−t.

The proof, given in Appendix A.2, is straightforward given the rotational invariance of the multivariate Gaussian distribution, together with a tail bound for linear combinations of χ² random variables from [2]. We note that a slightly weaker form of Proposition 1.1 can be proved directly using Gaussian concentration [3].

In this note, we consider the case where x= (x1, . . . , xn)is asubgaussian random vector. By this, we mean that there exists aσ≥0, such that for allα∈Rⁿ^,

E[exp (α^>x)]≤exp kαk²σ²/2 .

We provide a sharp upper tail bound for this case analogous to one that holds in the Gaussian case (indeed, the same as Proposition 1.1 whenσ= 1).

∗Microsoft Research New England, USA. E-mail:[email protected]

†Microsoft Research New England, USA. E-mail:[email protected]

‡Department of Statistics, Rutgers University, USA. E-mail:[email protected]

(2)

Tail inequalities for sums of random vectors

One motivation for our main result comes from the following observations about sums of random vectors. Leta₁, . . . , a_n be vectors in a Euclidean space, and letA = [a₁| · · · |a_n] be the matrix with a_i as its ith column. Consider the squared norm of the random sum

kAxk²=

n

X

i=1

aixi

2

(1.1)

wherex:= (x1, . . . , xn)is a martingale difference sequence withE[xi|x1, . . . , x_i−1] = 0 and E[x²_i | x1, . . . , xi−1] = σ². Under mild boundedness assumptions on the xi, the probability that the squared norm in (1.1) is much larger than its expectation

E[kAxk²] =σ²

n

X

i=1

ka_ik²=σ²tr(A^>A)

falls off exponentially fast. This can be shown, for instance, using the following lemma by takingui=aixi(see Appendix A.1).

Proposition 1.2. Letu1, . . . , unbe a martingale difference vector sequence,i.e., E[u_i|u1, . . . , u_i−1] = 0, for alli= 1, . . . , n,

such that

n

X

i=1

E

ku_ik²|u₁, . . . , u_i−1

≤v and ku_ik ≤b

for alli= 1, . . . , n, almost surely. For allt >0,

Pr

"

n

X

i=1

u_i

>√ v+√

8vt+ (4/3)bt

#

≤e^−t.

After squaring the quantities in the stated probabilistic event, Proposition 1.2 gives the bound

kAxk²≤σ²·tr(A^>A) +σ²·O

tr(A^>A)(√ t+t) +p

tr(A^>A) max

i ka_ik(t+t^3/2) + max

i ka_ik²t²

with probability at least1−e^−t when the x_i are almost surely bounded by 1 (or any constant).

Unfortunately, this bound obtained from Proposition 1.2 can be suboptimal when thex_i are subgaussian. For instance, if the x_i are Rademacher random variables, so Pr[x_i= +1] = Pr[x_i=−1] = 1/2, then it is known that

kAxk²≤tr(A^>A) +Op

tr((A^>A)²)t+kAk²t

(1.2) with probability at least1−e^−t. A similar result holds for any subgaussian distribution on thex_i [1]. This is an improvement over the previous bound because the deviation terms (i.e., those involvingt) can be significantly smaller, especially for larget.

In this work, we give a simple proof of (1.2) with explicit constants that match the analogous bound when thexi are independent standard Gaussian random variables.

(3)

2 Positive semidefinite quadratic forms

Our main theorem, given below, is a generalization of (1.2).

Theorem 2.1. Let A ∈ R^n×n be a matrix, and let Σ := A^>A. Suppose that x = (x1, . . . , xn)is a random vector such that, for someµ∈Rⁿ^andσ≥0,

E[exp (α^>(x−µ))]≤exp kαk²σ²/2

(2.1) for allα∈Rⁿ^{. For all}t >0,

Pr

"

kAxk²> σ²·

tr(Σ)+2p

tr(Σ²)t+2kΣkt

+tr(Σµµ^>)·

1+2 kΣk²

tr(Σ²)t^1/2#

≤e^−t.

Remark 2.2. Ifµ= 0, then the assumption(2.1)impliesE[x] = 0andcov(x)σ²I. In this case,

E[kAxk²] = tr(Σcov(x))≤σ²tr(Σ), var(kAxk²) =O(σ⁴tr(Σ²)),

so probability inequality may be interpreted as a Bernstein inequality. If µ = 0 and σ= 1, then the probability inequality reads

Prh

kAxk²>tr(Σ) + 2p

tr(Σ²)t+ 2kΣkti

≤e^−t,

which is the same as Proposition 1.1.

Remark 2.3. Our proof (via (2.2), (2.4), and (2.5)) actually establishes the following upper bounds on the moment generating function ofkAxk²for0≤η <1/(2σ²kΣk):

E

exp ηkAxk²

≤Eh exp

σ²kA^>zk²η+µ^>A^>zp 2ηi

≤exp

σ²tr(Σ)η+σ⁴tr(Σ²)η²+kAµk²η 1−2σ²kΣkη

wherezis a vector ofnindependent standard Gaussian random variables.

Proof of Theorem 2.1. Let z be a vector of n independent standard Gaussian random variables (sampled independently ofx). For anyα∈Rⁿ^,

E[exp (z^>α)] = exp kαk²/2

. (2.2)

Thus, for anyλ∈R^andε≥0, we have the following decoupling (which holds, in fact, for any random vectorx):

E[exp (λz^>Ax)]≥E

exp (λz^>Ax)

kAxk²> ε

·Pr

kAxk²> ε

≥exp λ²ε

2

·Pr

kAxk²> ε

. (2.3)

Moreover, using (2.1),

E[exp (λz^>Ax)] =E

E

exp (λz^>A(x−µ)) z

exp (λz^>Aµ)

≤E

exp λ²σ²

2 kA^>zk²+λµ^>A^>z

. (2.4)

(4)

Let U SV^> be a singular value decomposition of A; whereU and V are, respectively, matrices of orthonormal left and right singular vectors; andS = diag(√

ρ₁, . . . ,√ ρ_m)is the diagonal matrix of corresponding singular values. Note that

kρk1=

n

X

i=1

ρi= tr(Σ), kρk²₂=

n

X

i=1

ρ²_i = tr(Σ²), and kρk∞= max

i ρi=kΣk.

By rotational invariance,y:=U^>zis an isotropic multivariate Gaussian random vector with mean zero. Therefore kA^>zk² = z^>U S²U^>z = ρ1y₁²+· · ·+ρny²_n and µ^>A^>z = ν^>y =ν1y1+· · ·+νnyn, whereν := SV^>µ(note thatkνk² =kSV^>µk² =kAµk²). Let γ:=λ²σ²/2. By Lemma 2.4,

E

"

exp γ

n

X

i=1

ρ_iy_i²+

√2γ σ

n

X

i=1

ν_iy_i

!#

≤exp

kρk1γ+kρk²₂γ²+kνk²γ/σ² 1−2kρk∞γ

(2.5)

for0≤γ <1/(2kρk∞). Combining (2.3), (2.4), and (2.5) gives

Pr

kAxk²> ε

≤exp

−εγ/σ²+kρk1γ+kρk²₂γ²+kνk²γ/σ² 1−2kρk_∞γ

for0≤γ <1/(2kρk_∞)andε≥0. Choosing

ε:=σ²(kρk1+τ) +kνk² s

1 +2kρk∞τ

kρk²₂ ^and γ:= 1

2kρk_∞ 1− s

kρk²₂ kρk²₂+ 2kρk_∞τ

! ,

we have

Pr

"

kAxk²> σ²(kρk1+τ) +kνk² s

1 + 2kρk∞τ kρk²₂

#

≤exp − kρk²₂

2kρk²_∞ 1 + kρk_∞τ kρk²₂ −

s

1 + 2kρk_∞τ kρk²₂

!!

= exp − kρk²₂

2kρk²_∞h₁ kρk_∞τ kρk²₂

!!

whereh1(a) := 1 +a−√

1 + 2a, which has the inverse function h⁻¹₁ (b) =√

2b+b. The result follows by settingτ := 2p

kρk²₂t+ 2kρk∞t= 2p

tr(Σ²)t+ 2kΣkt.

The following lemma is a standard estimate of the logarithmic moment generating function of a quadratic form in standard Gaussian random variables, proved much along the lines of the estimate from [2].

Lemma 2.4. Letzbe a vector ofnindependent standard Gaussian random variables.

Fix any non-negative vectorα∈Rⁿ+and any vectorβ ∈Rⁿ^{. If}0≤λ <1/(2kαk_∞), then

logE

"

exp λ

n

X

i=1

αiz_i²+

n

X

i=1

βizi

!#

≤ kαk1λ+kαk²₂λ²+kβk²₂/2 1−2kαk∞λ . Proof. Fix λ ∈ R ^{such that} 0 ≤ λ < 1/(2kαk_∞), and let ηi := 1/√

1−2αiλ > 0 for i= 1, . . . , n. We have

E

exp λαiz²_i +βizi

= Z ∞

−∞

√1

2πexp −z_i²/2

exp λαiz_i²+βizi

dzi

=ηiexp β_i²η²_i

2

Z ∞

−∞

1 p2πη²_i exp

− 1

2η_i² zi−βiη_i²2 dzi

(5)

so

logE

"

exp λ

n

X

i=1

αiz_i²+

n

X

i=1

βizi

!#

= 1 2

n

X

i=1

β_i²η²_i +1 2

n

X

i=1

logη_i².

The right-hand side can be bounded using the inequalities 1

2

n

X

i=1

logη²_i =−1 2

n

X

i=1

log(1−2αiλ) =1 2

n

X

i=1

∞

X

j=1

(2αiλ)^j

j ≤ kαk1λ+ kαk²₂λ² 1−2kαk_∞λ and

1 2

n

X

i=1

β_i²η²_i ≤ kβk²₂/2 1−2kαk_∞λ.

Example: fixed-design regression with subgaussian noise

We give a simple application of Theorem 2.1 to fixed-design linear regression with the ordinary least squares estimator.

Letx1, . . . , xn be fixed design vectors inR^d. Let the responsesy1, . . . , ynbe random variables for which there existsσ >0such that

E

"

exp

n

X

i=1

α_i(y_i−E[y_i])

!#

≤exp σ²

n

X

i=1

α²_i

!

for anyα1, . . . , αn∈R. This condition is satisfied, for instance, if yi=E[yi] +εi

for independent subgaussian zero-mean noise variablesε₁, . . . , ε_n. LetΣ:=Pn

i=1x_ix^>_i/n, which we assume is invertible without loss of generality. Let

β :=Σ⁻¹ 1 n

n

X

i=1

xiE[yi]

!

be the coefficient vector of minimum expected squared error (i.e., E[n⁻¹Pn

i=1(x^>_iβ− yi)²] = min!). The ordinary least squares estimator is given by

βˆ:=Σ⁻¹ 1 n

n

X

i=1

xiyi

! .

The excess lossR( ˆβ)ofβˆis the difference between the expected squared error ofβˆand that ofβ:

R( ˆβ) :=E

"

1 n

n

X

i=1

(x^>_iβˆ−yi)²

#

−E

"

1 n

n

X

i=1

(x^>_iβ−yi)²

# .

It is easy to see that R( ˆβ) =

Σ^1/2( ˆβ−β)

2=

n

X

i=1

Σ^−1/2xi

(yi−E[yi])

2

.

By Theorem 2.1,

Pr

"

R( ˆβ)>σ² d+ 2√

dt+ 2t n

#

≤e^−t.

Note that in the case thatE[(y_i−E[y_i])²] =σ²for eachi, then E[R( ˆβ)] =σ²d

n ;

so the tail inequality above is essentially tight when theyi are independent Gaussian random variables.

(6)

A Standard tail inequalities

A.1 Martingale tail inequalities

The following is a standard form of Bernstein’s inequality stated for martingale difference sequences.

Lemma A.1(Bernstein’s inequality for martingales). Letd₁, . . . , d_nbe a martingale difference sequence with respect to random variablesx1, . . . , xn(i.e.,E[di|x1, . . . , x_i−1] = 0 for alli= 1, . . . , n) such that|di| ≤bandPn

i=1E[d²_i|x1, . . . , x_i−1]≤v. For allt >0, Pr

" _n X

i=1

d_i>√

2vt+ (2/3)bt

#

≤e^−t.

Proposition 1.2 is an immediate consequence of the following folklore results, together with Jensen’s inequality. Lemma A.2 is a straightforward application of Bern- stein’s inequality to a Doob martingale, and Lemma A.3 is proved by a simple induction argument.

Lemma A.2. Letu1, . . . , un be random vectors such thatPn

i=1E[kuik²|u1, . . . , u_i−1]≤v andkuik ≤bfor alli= 1, . . . , n, almost surely. For allt >0,

Pr

"

n

X

i=1

ui

−E

n

X

i=1

ui

>√

8vt+ (4/3)bt

#

≤e^−t.

Lemma A.3. If u1, . . . , un is a martingale difference vector sequence (c.f. Proposi- tion 1.2), thenE[kPn

i=1uik²] =Pn

i=1E[kuik²].

A.2 Gaussian quadratic forms andχ²tail inequalities

It is well-known that ifz∼ N(0,1)is a standard Gaussian random variable, thenz² follows aχ²distribution with one degree of freedom. The following inequality from [2]

gives a bound on linear combinations ofχ²random variables.

Lemma A.4(χ²tail inequality; [2]). Letq1, . . . , qnbe independentχ²random variables, each with one degree of freedom. For any vector γ = (γ1, . . . , γn) ∈ Rⁿ+ with non- negative entries, and anyt >0,

Pr

" _n X

i=1

γ_iq_i>kγk1+ 2 q

kγk²₂t+ 2kγk∞t

#

≤e^−t.

Proof of Proposition 1.1. Let VΛV^> be an eigen-decomposition of A^>A, where V is a matrix of orthonormal eigenvectors, andΛ := diag(ρ1, . . . , ρn)is the diagonal matrix of corresponding eigenvaluesρ₁, . . . , ρ_n. By the rotational invariance of the distribution, z := V^>xis an isotropic multivariate Gaussian random vector with mean zero. Thus, kAxk²=z^>Λz=ρ1z₁²+· · ·+ρnz_n², and thez²_i are independentχ²random variables, each with one degree of freedom. The claim now follows from a tail bound forχ² random variables (Lemma A.4).

References

[1] D. L. Hanson and F. T. Wright,A bound on tail probabilities for quadratic forms in independent random variables, The Annals of Math. Stat.42(1971), no. 3, 1079–1083. MR-0279864 [2] B. Laurent and P. Massart,Adaptive estimation of a quadratic functional by model selection,

The Annals of Statistics28(2000), no. 5, 1302–1338. MR-1805785

[3] G. Pisier,The volume of convex bodies and banach space geometry, Cambridge University Press, 1989. MR-1036275

Acknowledgments.We thank the anonymous reviewers for their helpful comments.