• 検索結果がありません。

JJ II

N/A
N/A
Protected

Academic year: 2022

シェア "JJ II"

Copied!
14
0
0

読み込み中.... (全文を見る)

全文

(1)

volume 4, issue 1, article 15, 2003.

Received 13 December, 2002;

accepted 8 January, 2003.

Communicated by:T. Mills

Abstract Contents

JJ II

J I

Home Page Go Back

Close Quit

Journal of Inequalities in Pure and Applied Mathematics

A BOUND ON THE DEVIATION PROBABILITY FOR SUMS OF NON-NEGATIVE RANDOM VARIABLES

ANDREAS MAURER

Adalbertstr. 55

D-80799 Munich, GERMANY.

EMail:[email protected]

c

2000Victoria University ISSN (electronic): 1443-5756 145-02

(2)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page2of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

Abstract

A simple bound is presented for the probability that the sum of nonnegative independent random variables is exceeded by its expectation by more than a positive number t. If the variables have the same expectation the bound is slightly weaker than the Bennett and Bernstein inequalities, otherwise it can be significantly stronger. The inequality extends to one-sidedly bounded martin- gale difference sequences.

2000 Mathematics Subject Classification:Primary 60G50, 60F10.

Key words: Deviation bounds, Bernstein’s inequality, Hoeffdings inequality.

I would like to thank David McAllester, Colin McDiarmid, Terry Mills and Erhard Seiler for encouragement and help.

Contents

1 Introduction. . . 3

2 Statement and Proof of the Main Result. . . 5

3 Comparison to Other Bounds. . . 8

4 Martingales . . . 10

5 Conclusion. . . 13 References

(3)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page3of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

1. Introduction

Suppose that the{Xi}mi=1are independent random variables with finite first and second moments and use the notation S := P

iXi. Let t > 0. This note discusses the inequality

(1.1) Pr{E[S]−S ≥t} ≤exp

−t2 2P

iE[Xi2]

, valid under the assumption that theXiare non-negative.

Similar bounds have a history beginning in the nineteenth century with the results of Bienaymé and Chebyshev ([3]). Setσ2 = m1 P

i E[Xi2]−(E[Xi])2 . The inequality

Pr{|E[S]−S| ≥m} ≤ σ2 m2

requires minimal assumptions on the distributions of the individual variables and, if applied to identically distributed variables, establishes the consistency of the theory of probability: If theXi represent the numerical results of indepen- dent repetitions of some experiment, then the probability that the average result deviates from its expectation by more than a value ofdecreases to zero as as σ2/(m2), whereσ2 is the average variance of theXi.

If theXisatisfy some additional boundedness conditions the deviation prob- abilities can be shown to decrease exponentially. Corresponding results were obtained in the middle of the twentieth century by Bernstein [2], Cramér, Cher- noff [4], Bennett [1] and Hoeffding [7]. Their results, summarized in [7], have since found important applications in statistics, operations research and com- puter science (see [6]). A general method of proof, sometimes called the expo- nential moment method, is explained in [10] and [8].

(4)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page4of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

Inequality (1.1) is of a similar nature and can be directly compared to one- sided versions of Bernstein’s and Bennett’s inequalities (see Theorem 3 in [7]) which also require theXito be bounded on only one side. It turns out that, once reformulated for non-negative variables, the classical inequalities are stronger than (1.1) if theXi are similar in the sense that their expectations are uniformly concentrated. If the expectations of the individual variables are very scattered and/or for large deviationstour inequality (1.1) becomes stronger.

Apart from being stronger than Bernstein’s theorem under perhaps somewhat extreme circumstances, the new inequality (1.1) appears attractive because of its simplicity. The proof (suggested by Colin McDiarmid) is very easy and direct and the method also gives a concentration inequality for martingales of one- sidedly bounded differences.

In Section2we give a first proof of (1.1) and list some simple consequences.

In Section 3our result is compared to Bernstein’s inequality, in Section4it is extended to martingales. All random variables below are assumed to be mem- bers of the algebra of measurable functions defined on some probability space (Ω,Σ, µ). Order and equality in this algebra are assumed to hold only almost everywhere w.r.t. µ, i.e. X ≥0meansX ≥0almost everywhere w.r.t. µonΩ.

(5)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page5of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

2. Statement and Proof of the Main Result

Theorem 2.1. Let the{Xi}mi=1be independent random variables,E[Xi2]<∞, Xi ≥0. SetS =P

iXi and lett >0. Then (2.1) Pr{E[S]−S ≥t} ≤exp

−t2 2P

iE[Xi2]

.

Proof. We first claim that forx≥0

e−x ≤1−x+ 1 2x2.

To see this letf(x) = e−xandg(x) = 1−x+ (1/2)x2 and recall that for every realx

(2.2) ex ≥1 +x

so thatf0(x) =−e−x ≤ −1 +x=g0(x). Sincef(0) = 1 =g(0)this implies f(x)≤g(x)for allx≥0, as claimed.

It follows that for anyi∈ {1, . . . , m}and anyβ ≥0we have E

e−βXi

≤1−βE[Xi] +β2 2 E

Xi2

≤exp

−βE[Xi] + β2 2 E

Xi2

,

where (2.2) was used again in the second inequality. This establishes the bound

(2.3) lnE

e−βXi

≤ −βE[Xi] + β2 2 E

Xi2 .

(6)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page6of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

Using the independence of theXithis implies lnE

e−βS

= lnY

i

E e−βXi

=X

i

lnE e−βXi

≤ −βE[S] +β2 2

X

i

E Xi2

. (2.4)

Letχ be the characteristic function of[0,∞). Then for anyβ ≥ 0, x ∈ Rwe must haveχ(x)≤exp (βx)so, using (2.4),

ln Pr{E[S]−S ≥t}= lnE[χ(−t+E[S]−S)]

≤lnE[exp (β(−t+E[S]−S))]

=−βt+βE[S] + lnE e−βS

≤ −βt+ β2 2

X

i

E Xi2

.

We minimize the last expression withβ =t/P

iE[Xi2]≥0to obtain ln Pr{E[S]−S ≥t} ≤ −t2

2P

iE[Xi2], which implies (2.1).

Some immediate and obvious consequences are given in

(7)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page7of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

Corollary 2.2. Let the{Xi}mi=1be independent random variables,E[Xi2]<∞.

SetS =P

iXiand lett >0.

1. IfXi ≤bi and setσ2i =E[Xi2]−(E[Xi])2then

Pr{S−E[S]≥t} ≤exp −t2 2P

iσi2+ 2P

i(bi−E[Xi])2

! .

2. If0≤Xi ≤bithen

Pr{E[S]−S ≥t} ≤exp

−t2 2P

ibiE[Xi]

.

3. If0≤Xi ≤bithen

Pr{E[S]−S ≥t} ≤exp

−t2 2P

ib2i

Proof. (1) follows from application of Theorem 2.1 to the random variables Yi =bi−Xi since

2X E

Yi2

= 2X E

Xi2

−E[Xi]2+E[Xi]2−2biE[Xi] +b2i

= 2X

i

σi2+ 2X

i

(bi−E[Xi])2,

while (2) is immediate from Theorem2.1and (3) follows trivially from (2).

(8)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page8of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

3. Comparison to Other Bounds

Observe that part (3) of Corollary 2.2 is similar to the familiar Hoeffding in- equality (Theorem 2 in [7]) but weaker by a factor of 4 in the exponent. If there is information on the expectations of the Xi andE[Xi] ≤ bi/4then (2) of Corollary2.2becomes stronger than Hoeffding’s inequality. If thebi are all equal then (2) is weaker than what we get from the relative-entropy Chernoff bound (Theorem 1 in [7]).

It is natural to compare our result to Bernstein’s theorem which also requires only one-sided boundedness. We state a corresponding version of the theorem (see [1] or [10] or [9])

Theorem 3.1 (Bernstein’s Inequality). Let {Xi}mi=1 be independent random variables with Xi −E[Xi] ≤ d for alli ∈ {1, . . . , m}. Let S = P

Xi and t >0. Then, withσi2 =E[Xi2]−E[Xi]2 we have

(3.1) Pr{S−E[S]≥t} ≤exp

−t2 2P

iσ2i + 2td/3

.

Now suppose we know Xi ≤ bi for all i. In this case we can apply part (1) of Corollary 2.2. On the other hand if we setd = maxi(bi−E[Xi])then Xi −E[Xi] ≤ dfor all iand we can apply Bernstein’s theorem as well. The latter is evidently tighter than part (1) of Corollary2.2if and only if

t 3max

i (bi−E[Xi])<X

i

(bi−E[Xi])2. We introduce the abbreviationsB= maxi(bi−E[Xi]),B1 =P

i(bi −E[Xi]) and B2 = P

i(bi−E[Xi])2. Both results are trivial unlesst < B1. Assume

(9)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page9of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

t =B1, where0< <1, then Bernstein’s theorem is stronger in the interval 0< < 3B2

B1B

,

which is never empty. The new inequality is stronger in the interval 3B2

B1B

< < 1.

The latter interval may be empty, in which case Bernstein’s inequality is stronger for all nontrivial deviations. This is clearly the case if all the bi −E[Xi]are equal, for then B2/(B1B) = 1. This happens, for example, if the Xi are identically distributed. The fact that the new inequality can be stronger in a sig- nificant range of deviations may be seen if we setE[Xi] = 0andbi = 1/ifor i∈ {1, . . . , m}, then

3B2 B1B

< π2 2Pm

i=1(1/i) →0asm → ∞.

In this case, for every given deviation , the new inequality becomes stronger for sufficiently largem.

To summarize this comparison: If the deviation is small and/or the individ- ual variables have a rather uniform behaviour, then Bernstein’s inequality is stronger, otherwise weaker than the new result. A similar analysis applies to the stronger Bennett inequality and the yet stronger Theorem 3 in [7]. In all these cases a single uniform bound on the variablesXi−E[Xi]enters into the bound on the deviation probability.

(10)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page10of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

4. Martingales

The key to the proof of Theorem2.1lies in inequality (2.3):

X ≥0,β ≥0 =⇒lnE e−βX

≤ −βE[X] +β2 2 E

X2 .

Apart from the inequality e−x ≤ 1 − x+ (1/2)x2 (for non-negative x) its derivation uses only monotonicity, linearity and normalization of the expecta- tion value. It therefore also applies to conditional expectations.

Lemma 4.1. Let X, W be random variables, W not necessarily real valued, β ≥0.

1. IfX ≥0then lnE

e−βX|W

≤ −βE[X|W] +β2 2 E

X2|W .

2. IfX ≤bandE[X|W] = 0andE[X2|W]≤σ2then lnE

eβX|W

≤ β2

2 σ2+b2 .

Proof. To see part 1 retrace the first part of the proof of Theorem 2.1. Part 2 follows from applying part 1 toY =b−X to get

lnE

eβX|W

=βb+ lnE

e−βY|W

≤βb−βE[Y|W] +β2 2 E

Y2|W

= β2 2 E

Y2|W

= β2 2 E

X2|W +b2

.

(11)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page11of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

Part (2) of this lemma gives a concentration inequality for martingales of one-sidedly bounded differences, with less restrictive assumptions than [5], Corollary 2.4.7.

Theorem 4.2. LetXi be random variables ,Sn = Pn

i=1Xi, S0 = 0. Suppose that bi, σi > 0and that E[Xn|Sn−1] = 0, E[Xn2|Sn−1] ≤ σn2 andXn ≤ bn, then, forβ ≥0,

(4.1) lnE

eβSn

≤ β2 2

n

X

i=1

σi2+b2i

and fort >0,

(4.2) Pr{Sn≥t} ≤exp

−t2 2Pn

i=1i2+b2i)

.

Proof. We prove (4.1) by induction onn. The casen = 1is just part (2) of the lemma withW = 0. Assume that (4.1) holds for a given value ofn. IfΣnis the σ-algebra generated bySntheneβSnisΣn-measurable, so

E

eβSn+1|Sn

=E

eβSneβXn+1|Sn

=eβSnE

eβXn+1|Sn

almost surely. Thus, lnE

eβSn+1

= lnE E

eβSn+1|Sn

= lnE

eβSnE

eβXn+1|Sn

(12)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page12of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

≤lnE eβSn

2

2 σn+12 +b2n+1 (4.3)

≤ β2 2

n+1

X

i=1

σi2+b2i (4.4) ,

where Lemma4.1, part 2 was used to get (4.3) and the induction hypothesis was used for (4.4).

To get (4.2), we proceed as in the proof of Theorem2.1: Forβ ≥0, ln Pr{Sn ≥t} ≤lnE

eβ(Sn−t)

≤ −βt+ β2 2

n

X

i=1

σi2+b2i .

Minimizing the last expression withβ =t/P

i2+b2i)gives (4.2).

(13)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page13of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

5. Conclusion

It remains to be seen if our inequality has any interesting practical implications.

In view of the comparison to Bernstein’s theorem this would have to be in a situation where the random variables considered have a highly non-uniform be- haviour and the deviations to which the result is applied are large. Apart from its potential utility the new inequality may have some didactical value due to its simplicity.

(14)

A Bound on the Deviation Probability for Sums of

Non-Negative Random Variables Andreas Maurer

Title Page Contents

JJ II

J I

Go Back Close

Quit Page14of14

J. Ineq. Pure and Appl. Math. 4(1) Art. 15, 2003

http://jipam.vu.edu.au

References

[1] G. BENNETT, Probability inequalities for the sum of independent random variables, J. Amer. Statist. Assoc., 57 (1962), 33–45.

[2] S. BERNSTEIN, Theory of Probability, Moscow, 1927.

[3] P. CHEBYCHEV, Sur les valeurs limites des intégrales, J. Math. Pures Appl., Ser. 2, 19 (1874), 157–160.

[4] H. CHERNOFF, A measure of asymptotic efficiency for tests of a hypoth- esis based on the sum of observations, Annals of Mathematical Statistics, 23 (1952), 493–507.

[5] A. DEMBO ANDO. ZEITOUMI, Large Deviation Techniques and Appli- cations, Springer 1998.

[6] L. DEVROYE, L. GYÖRFI AND G. LUGOSI, A Probabilistic Theory of Pattern Recognition. Springer, 1996.

[7] W. HOEFFDING, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), 13–30.

[8] D. McALLESTER AND L. ORTIZ, Concentration inequalities for the missing mass and for histogram rule error, NIPS, 2002.

[9] C. McDIARMID, Concentration, in Probabilistic Methods of Algorithmic Discrete Mathematics, Springer, Berlin, 1998, p. 195–248.

[10] H. WITTING AND U. MÜLLER–FUNK, Mathematische Statistik, Teub- ner Stuttgart, 1995.

参照

関連したドキュメント

A total of 148 serum samples collected during the 2005 CHIK outbreak in the Union of Comoros, which had been previously tested using the CDC ELISA (13), were used to evaluate

These findings indicate that previous overseas assignments and pre-departure willingness to work in the country of assignment are important predictors of Japanese expatriate job

Although we have shown that all three methods show promise for parallel texts such as the AWK manual in English &amp; Japanese, much work remains to be done before we can reliably

As seen above, most articles published in the Bulletin were on political trends. Therefore we do not share the opinion that a close look at the information disseminated by the

[r]

According to the analysis, classes under the charge of teachers in the high-autonomy-improvement-level group showed an increase in the group of students “who were satisfied with their

 そこで、本研究では断面的にも考慮された空間づくりに

In the study, the model experiments were conducted to measure the wave overtopping rate of the long period swell on seawall and to propose the countermeasure to be able to