TakuhisaShikimi LargeDeviationsforthePosteriorDistributionsunderConjugatePriorDistributions

(1)

Large Deviations for the Posterior Distributions under Conjugate Prior Distributions

Takuhisa Shikimi

Abstract

This paper takes up three parametric cases−the normal, Poisson, exponential cases−in order to study a large deviation upper bound for some posterior probabilitiy of the unknown parameter when in each case the prior distribution is assumed to be in a conjugate family. The upper bound will be given explicitly in each case.

Keywords:large deviations;posterior distributions;exchangeability.

１ Introduction

LetX₁, X₂,... be i.i.d. random variables with unknown distribution that be- longs to a statistical model(P_θ:θ∈Θ)，whereΘis a parameter space. In this paper, we focus on exponential rates of convergence of the posterior distributions in three parametric models−the normal, Poisson and exponential statistical models−when in each case the prior distribution is assumed to be in a conjugate family. There is comparatively little literature on the exponential rate of convergence of posterior distribution. Fu and Kass(1988)studies the rate of convergence of posterior distributions in the neighborhood of the mode. In the nonparametric Bayesian framework, Shen and Wasserman (2001)studies the rate at which the posterior distribution concentrates

(2)

around the true parameter, and Ganesh and O'Connell(1999)proves the large deviation principle for posterior distributions given i.i.d. random variables taking values in a finite set.

We will give a large deviation upper bound in an explicit form for posterior probabilities of the event[θ,∞) given X₁,...,X_n in each of the three parametric cases. In all cases, the basic tool to derive the results is the law of large numbers for exchangeable random variables(Theorem A.3) together with the conditional Markov inequality.

２ Constructing the model

Let(Θ,U)be a measurable space. A stochastic kernel from(Θ,U)to(R, B(R))，whereB(Rⁿ)is the Borelσ-algebra ofRⁿ(n＝1,2,...,∞)，is a family(P_θ:θ∈Θ)of probability measures on(R, B(R))indexed by θ∈Θ such that for eachA∈A,θ∈Θ P_θ(A)∈[0,1]is measurable. As is usual， (P_θ:θ∈Θ)is referred to as a statistical model. IfP_θ⁽ⁿ⁾is thendimensional product measureP_θ×・・・×P_θ，the infinite product probability measureP_θ^(∞)

＝P_θ×P_θ×・・・，θ∈Θis the unique probability measure on(R^∞,B(R^∞)) such that

P_θ^(∞)(A₁×・・・×A_n×R×R×・・・)＝P_θ(A₁)・・・P_θ(A_n)

＝P_θ⁽ⁿ⁾(A₁×・・・×A_n) for alln^1andA₁,..., A_n∈B(R).

Lemma 1．For each n＝1,2,...,∞，the family(P_θ⁽ⁿ⁾:θ∈Θ)is a stochastic kernel from(Θ,U)to(Rⁿ,B(Rⁿ)).

Proof.We only show that(P_θ^(∞):θ∈Θ)is a stochastic kernel, since(P_θ⁽ⁿ⁾:θ

∈Θ)，1_n＜∞will be shown to be stochastic kernels in the same manner.

(3)

If we define

L＝｛B∈B(R^∞):θ P_θ^(∞)(B)is measurable｝，

thenL is aλ-class containing theπ-class

D＝｛A₁×・・・×A_n×R×R×・・・:n^1，A₁,..., A_n∈B(R)｝．

It follows thatB(R^∞)＝σ(D)⊂L. æ

For a prior distributionπon(Θ,U)，define to be the probability measure on(Ω,F)＝(Θ×R^△ ^∞，U×B(R^∞))satisfying

(U×B)＝

∫

^U^P^θ^(∞)⁽^B^)π(^d^θ). ⁽¹⁾

for everyU∈UandB∈B(R^∞)．It is not difficult to show the existence and uniqueness of ．Now let us introduce the coordinate mappingsÚ，Xand ξidefined by

Ú(ω)＝Ú(θ,x)＝θ， (2)

X(ω)＝X(θ,x)＝x， (3)

ξi(x)＝x_i (i^1)

forω＝(θ,x)∈Ωandx＝(x_i)∈R^∞．A random elementXis a sequence of random variables X₁, X₂,..., where X_i＝ξ^△ i(X)．We think of Úas the unknown parameter,X＝(X₁, X₂,...)a date, where the distribution ofX_iis specified byÚ．By(2)，(3)and(1)，the parameter Úhasπas its distribution:

(Ú∈U)＝π(U);

the distribution (X∈dx)ofXis given by the mixture

(4)

∫

^Θ ^θ^∞⁽^B^)π(^d^θ)，^B^∈B(R^∞^); ⁽⁴⁾

the distribution ((X₁,..., X_n)∈(dx₁,..., dx_n))is given by the mixture

∫

^Θ^P^θ⁽ⁿ⁾⁽^Bⁿ^)π(^d^θ)，^Bⁿ^∈B(Rⁿ^)， ⁽⁵⁾

and the distribution (X_i∈dx_i)ofX_iis given by the mixture

∫

^Θ^P^θ⁽^A^)π(^d^θ)，^A^∈B(R)， ⁽⁶⁾

In particular, X₁, X₂,... are identically distributed(but not independent in general)under ．Distributions defined by(4)，(5)and(6)are called prior predictive distributions ofX,(X₁,..., X_n)and X_i, respectively. Lemma 2．The function P_Ú^(∞)_(ω)(B)，defined onΩ×B(R^∞)，is a regular conditional distribution for X＝(X₁, X₂,...)givenÚ．For each n＜∞，the function P_Ú⁽ⁿ⁾_(ω)(B_n)，defined for(ω,B_n)∈Ω×B(Rⁿ)，is a regular conditional distribution of(X₁,..., X_n)given Ú．Moreover, P_Ú_(ω)(A)，defined for(ω,A)∈Ω

×B(R)，is a regular conditional distribution of X_i givenÚfor every i^1．

Proof.For eachω∈Ω，P_Ú^(∞)_(ω)is a probability measure on(R^∞，B(R^∞))．If B∈B(R^∞)

∫

^Ú^∈U^P^(∞)^Ú^(ω)⁽^B⁾ ⁽^d^ω)＝

∫

^U^P^θ^(∞)⁽^B^)π(^d^θ)

＝ (U×B)

＝ (Ú∈U,X∈B)．

Thus,P^(∞)_Ú_(ω)(B)is a version of (X∈B｜υ)(ω)，becauseP^(∞)_Ú_(ω)(B)isσ(Ú)- measurable as a function ofωfor eachB.

Likewise, P⁽ⁿ⁾_Ú_(ω)(B_n)andP_Ú_(ω)(A)are regular conditional distributions for (X₁,..., X_n)andX_i(i＝1,2,...)，respectively given Ú，since they areσ(Ú)- measurable and almost surely

(5)

((X₁,..., X_n)∈B_n｜Ú)(ω)＝ (X∈B_n×R×R×・・・｜Ú)(ω)

＝P^(∞)_Ú_(ω)(B_n×R×R×・・・)

＝P⁽ⁿ⁾_Ú_(ω)(B_n)，B_n∈B(Rⁿ)，

(X_i∈A｜Ú)(ω)＝ (X∈R×・・・×R×A×R×・・・｜Ú)(ω)

＝P^(∞)_Ú_(ω)(R×・・・×R×A×R×・・・)

＝P_Ú_(ω)(A)，A∈B(R)．

æ Lemma 3．The random variables X₁, X₂,... are conditionally i.i.d. givenÚ． Proof.For alln^1and all A₁,..., A_n∈B(R)

(X₁∈A₁,..., X_n∈A_n｜Ú)(ω)＝P⁽ⁿ⁾_Ú_(ω)(A₁×・・・×A_n)

＝P_Ú_(ω)(A₁)・・・P_Ú_(ω)(A_n)

＝ (X₁∈A₁｜Ú)(ω)・・・ (X_n∈A_n｜Ú)(ω)a.s., where the first and third equalities follow from Lemma 2．Thus, X₁, X₂,...

are conditionally independent givenÚ．Since (X_i∈A｜Ú)(ω)＝P_Ú_(ω)(A)＝

(X₁∈A｜Ú)(ω)a.s. for alli^1，X₁, X₂,...are conditionally identically distributed.

Rea-valued random variablesY₁, Y₂,...are exchangeable if for alln^1and all permutationsτof｛1,...,n｝

(Y₁,..., Y_n)＝(^d Y_τ(1),..., Y_τ(n))． (7) Here＝^d stands for equality in distribution. de Finetti's theorem claims that random variablesY₁, Y₂,...are conditionally i.i.d. given some subσ-algebra if and only if they are exchangeable. Lemma 3 tells us that X₁, X₂,... are exchangeable random variables. See Aldous(1982)for an abstract version of de Finetti's theorem.

In what follows, we assume thatΘis a complete seperable metric space,

(6)

which is referred to as a Polish space. Accordingly, there exists a regular conditional distribution ofÚgivenX₁,..., X_nfor alln^1，which is termed a posterior distribution ofÚgivenX₁,..., X_nand denoted byπ^ωn(U)，(ω,U)∈

Ω×U．More precisely, there exists a functionπ^ωn(U)onΩ×Usuch that

(a) for eachω∈Ω，π^ωn(・)is a probability measure on(Θ,U);

(b) for eachU∈U,π^ωn(U)is a variant of (Ú∈U｜X₁,..., X_n)(ω).

Suppose that the statistical model(P_θ:θ∈Θ)is dominated by aσ-finite measureνon(R，B(R))with density functionf(x｜θ)，x∈R. We assume thatf(x｜θ)is measurable as a function of(θ，x)∈Θ×R. The marginal distribution ((X₁,..., X_n)∈(dx₁,..., dx_n))of(X₁,..., X_n)has the marginal density function

f_n(x₁,..., x_n)＝

∫

^Θ

n

Π

_i＝1^f⁽^xⁱ^｜^θ)π(^d^θ)

with respect toν⁽ⁿ⁾(then-fold measure ofν)，i.e.,

((X₁,..., X_n)∈B_n)＝

∫

^Bⁿ^fⁿ⁽^x¹^{,..., x}ⁿ^)ν⁽ⁿ⁾⁽^d⁽^x¹^{,..., dx}ⁿ^))．

This can be seen from

(X₁∈A₁,..., X_n∈A_n)＝

∫

^Θ^P^θ⁽ⁿ⁾⁽^A¹^×・^・^・×^Aⁿ^)π(^d^θ)

＝

∫

^Θ^P^θ⁽^A¹^)・^・^・^P^θ⁽^Aⁿ^)π(^d^θ)

＝

∫

^Θ

[ ∫

^A¹^f⁽^x¹^｜θ)ν(^dx¹^)・・・

∫

^A¹^f⁽^xⁿ^｜θ)ν(^dxⁿ⁾

]

^π(^d^θ)

＝

∫

^Θ

∫

^A¹^×・^・^・×Aⁿ

n

Π

_i_＝1^f⁽^xⁱ^｜θ)ν⁽ⁿ⁾⁽^d⁽^x¹^{,..., x}ⁿ^))π(^d^θ)

＝

∫

^A¹^×・^・^・×Aⁿ

∫

^Θ

n

Π

_i＝1^f⁽^xⁱ^｜θ)π(^d^θ)ν⁽ⁿ⁾⁽^d⁽^x¹^{,..., x}ⁿ⁾⁾

＝

∫

^A¹^×・^・^・×^Aⁿ^fⁿ⁽^x¹^{,..., x}ⁿ^)ν⁽ⁿ⁾⁽^d⁽^x¹^{,..., x}ⁿ^))．

(7)

Note that (f_n(X₁,..., X_n)＝0)＝0.

Lemma 4．If the statistical model(P_θ:θ∈Θ)is dominated by aσ-finite measureνon(R,B(R))with density f(x｜θ)，a measurable function onΘ×R, then

π^ωn(U)＝^△

[ ∫

^U^f^Πn(ⁿ^i＝1X₁,..., X^f⁽^xⁱ^｜^θ)_n)π(dθ)

]

¹^｛fⁿ^＞0｝⁽^X¹^{,..., X}ⁿ⁾

＋π(U)1{fn＝0}(X₁,..., X_n) is a posterior distribution ofÚgiven X₁,..., X_n.

Proof.It is easily seen that for eachω，π^ωn(・)is a probability measure on(Θ, U)and that for each U∈U，π^ωn(U)is σ(X₁,..., X_n)-measurable. Thus it suffices to show thatπ^ωn(U)＝ (Ú∈U｜X₁,..., X_n)(ω)a.s. and this can be shown in the following way:

∫

^｛(X¹^{,..., X}ⁿ^)∈Bⁿ^｝^π^ωⁿ⁽^U⁾^d ^＝

∫

^｛(Xfn(X¹^{,..., X}1,..., Xⁿ^)∈Bn)＞0ⁿ^｝

[ ∫

^U^Π^fn(ⁿ^i＝1X₁^f,..., X⁽^Xⁱ^｜^θ)_n)π(dθ)

]

^d

＝

∫

^U

[ ∫

^｛(Xfn(X¹^{,..., X}₁,..., Xⁿ^)∈Bn)＞0ⁿ^｝

Πⁿi＝1f(X_i｜θ)

f_n(X₁,..., X_n)d

]

^π(^d^θ)

＝

∫

^U

[ ∫

^Bⁿ^∩^｛fⁿ^＞0｝^Π^fnⁿ^i＝1(x₁,..., x^f⁽^xⁱ^｜^θ)_n) f_n(x₁,..., x_n)ν⁽ⁿ⁾(d(x₁,..., x_n))

]

^π(^d^θ)

＝

∫

^U

[ ∫

^Bⁿ^∩^｛^fⁿ^＞0｝

n

Π

_i_＝1^f⁽^xⁱ^｜θ)ν⁽ⁿ⁾⁽^d⁽^x¹^{,..., x}ⁿ⁾⁾

]

^π(^d^θ)

＝

∫

^U^P^θ⁽ⁿ⁾⁽^Bⁿ^∩^｛^fⁿ^＞0｝^)π(^d^θ)

＝ (Ú∈U，(X₁,..., X_n)∈B_n，f_n(X₁,..., X_n)＞0)

＋ (Ú∈U，(X₁,..., X_n)∈B_n，f_n(X₁,..., X_n)＝0)

＝ (Ú∈U，(X₁,..., X_n)∈B_n)．

(8)

３．The large deviation principle

LetSbe a Polish space equipped with the Borelσ-algebraB(S)．A function I:S→[0,∞]is a rate function if for eachM＜∞the level set｛x∈S:I(x)_

M｝is a compact subset ofS. A rate function is necessarily a lower semicon- tinuous function, a function with closed level sets. A family(Q_n)of probability measures onSis defined to satisfy the large deviation principle with rate functionIif for each closedF⊂S

lim sup

n→∞

1

nlogQ_n(F)_−inf

x∈F

I(x)

and for each openG⊂S

lim inf

n→∞

1

nlogQ_n(G)^−inf

x∈G

I(x)

Large deviation theory focuses on probability measuresQ_nfor whichQ_n(A) converges to0exponentially fast for a class of events A. The exponential decay ofQ_n(A)is characterized in terms of a rate function defined above.

General treatments of the theory of large deviations and a wide variety of applications may be found in Dembo and Zeitouni(1998)，Deuschel and Stroock(2000).

In analogous way, let us define the large deviation principle for regular conditional distributions. Let(Ω,F, )be a probability space，(Fn)a filtra- tion of subσ-algebras. We define a functionI:Ω×S→[0，∞]to be a rate function if for eachω∈Ω，I(ω,・)is a rate function onS.

Definition 5．Suppose thatQ^ω_n(B)，n^1is a family of regular conditional distributions for a random variable taking values inSgivenFn. We say that Q^ω_n(B)，n^1satisfies the large deviation principle if for each closed setF ofS

(9)

lim sup

n→∞

1

nlogQ^ω_n(F)_−inf

x∈F

I(ω,x) a.s. (8)

and for each open setGofS

lim inf

n→∞

1

nlogQ^ω_n(G)^−inf

x∈G

I(ω,x) a.s.

In this paper we restrict ourselves to the analysis on the large deviation upper bound(8)for the posterior distributions ofÚgivenX₁,..., X_n.We will examine the posterior distributionsπ^ωngivenX₁,..., X_nin the normal, Poisson and exponential cases and give a large deviation upper bound(8)explicitly for the posterior probability of the closed set[θ,∞)in each case.

４．The normal case Suppose that

P_θ(dx)＝f(x｜θ)dx＝^△ 1

2πexp

(

⁻⁽^x^−θ)2 ²

)

^dx^，θ∈Θ^＝R^△

and assume that the prior distribution for the normal meanÚis a conjugate distribution

π(dθ)＝^△ 1

2πσexp

(

⁻^(θ−μ)2σ² ²

)

^d^{θ，σ＞0，μ∈R}

It follows from Lemma4that the posterior distribution ofÚgivenX₁,..., X_n is given by

π^ωn(dθ)＝Πⁿi＝1f(X_i｜θ) f_n(X₁,..., X_n)π(dθ)

＝ 1 2πσn

exp

[

⁻^(μⁿ⁽^X¹^{,..., X}2σ²nⁿ^)−μ)²

]

^d^θ，

whereμn＝μn(x₁,..., x_n)andσ²nare defined by

μn(x₁,..., x_n)＝

(

1＋¹nσ²

)

^μ＋

(

1＋ⁿ^σnσ² ²

)

^xⁿ^，^xⁿ^＝^x¹^＋・^・ⁿ^・＋^xⁿ^，

(10)

σ²n＝ σ² 1＋nσ²

Theorem 6．For eachθ∈Θ

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−(θ−Ú(ω))²

2 on｛ω:θ＞Ú(ω)｝a.s.

Proof.By Markov's inequality for conditional expectations, for allt＞0

π^ωn[θ,∞)＝ (｛ω′:Ú(ω′)^θ｝｜X₁,..., X_n)(ω)

＝ (e^ntÚ^(ω′⁾:^e^ntθ｜X₁,..., X_n)(ω) _e^−ntθ (e^ntÚ｜X₁,..., X_n)(ω)

＝e^−ntθexp

[

^μⁿ⁽^X¹^{,..., X}ⁿ⁾^nt^＋^σ²ⁿⁿ2²^t²

]

^a.s.,

so that

1

nlogπ^ωn[θ,∞)_−tθ＋μn(X₁,..., X_n)t＋σ²nnt² 2 ．

Sinceμn(X₁,..., X_n)→ (X₁｜Ú)＝Úa.s. by Theorem A.3and Lemma A.1，

we have

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−tθ＋Ú(ω)t＋t² 2． Sincet＞0is arbitrary

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_inf

t＞0

[

⁻^t^θ＋^Ú^(ω)^t^＋^t2²

]

＝−(θ−Ú(ω))²

2 on｛ω:θ＞Ú(ω)｝a.s. (9) æ In the same manner, it follows that

lim sup

n→∞

1

nlogπ^ωn(−∞,θ]_−(θ−Ú(ω))²

2 on｛ω:θ＜Ú(ω)｝a.s.

In Theorem6the rate functionI(ω,θ′)，(ω,θ′)∈Ω×Θis

(11)

I(ω,θ′)＝(θ′−Ú(ω))²

2 ＝K(Ú(ω),θ′)，

whereK(θ1,θ2)is the Kullback-Leibler distance

K(θ1,θ2)＝

∫

^−∞^∞ ^log^f^f⁽(^xx^｜｜^θθ¹2⁾)f(x｜θ1)dx＝(θ1−θ2)² 2 ． Ifθ＞Ú(ω)，then

(θ−Ú(ω))² 2 ＝inf

θ′^θ

I(θ′,ω)．

and so the large deviation upper bound inequality(9)is rewritten by using the rate functionI(ω,θ′)as

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−inf

θ′^θ

I(ω,θ′) on｛ω:θ＞Ú(ω)｝a.s.

We now turn to the case where the samples are observed from the normal distribution with mean0and unknown precision. A precision is the recipro- cal of the variance. Accordingly, we assume that

P_θ(dx)＝^△

(

2π^θ

)

^1/2^exp

(

⁻^θ2^x²

)

^dx^，θ∈Θ^{＝(0;∞)．}^△

If the prior distributionπis specified by

π(dθ)＝ β^α

Γ(α)θ^α−1e^−βθ1(0;∞)dθ，α＞0,β＞0，

which is a gamma distribution with parametersαand β(α＞0,β＞0)，

then the posterior distribution ofÚgivenX₁,..., X_nis a gamma distribution with parameters

αn＝α＋n

2 and βn＝βn(X₁,..., X_n)＝β＋1 2

n

Σ

_i＝1^Xⁱ²^．

Theorem A.3together with Lemma A.1entails the convergence βn

n →1

2 (X²₁｜υ)＝1 2Ú a.s.

(12)

Theorem 7．For eachθ＞1

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_− 1

2Ú(ω)(θ−1−Ú(ω)logθ) on｛ω:θ＞Ú(ω)｝a.s.

Proof. For almost allω∈｛θ＞Ú｝andt∈(0,1/2Ú(ω))，there is ann₀such thatβn/(βn−nt)＞0for alln^n₀，since

βn

βn−nt＝ βn/n

βn/n−t→ 1/(2Ú(ω))

1/(2Ú(ω))−t ＝ 1 1−2Ú(ω)t． By Markov's inequality

1

nlogπ^ωn[θ,∞)_−tθ＋log (e^ntθ｜X₁,..., X_n)(ω)

＝−tθ＋αn

n log

(

βn^β(Xⁿ⁽₁^X,..., X¹^{,..., X}_n)−ⁿ⁾nt

)

^．

It follows that

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−tθ＋1

2log

(

1−2Ú¹(ω)t

)

for almost allω∈｛θ＞Ú｝andt∈(0,1/2Ú(ω))．Now we obtain

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_ inf

0＜t＜1/2Ú(ω)

[

⁻^t^θ＋¹2log

(

1−2Ú¹(ω)t

)]

＝− 1

2Ú(ω)(θ−1−Ú(ω)logθ) on｛θ＞Ú｝a.s.

５．The Poisson case

Letν0be the counting measure on (R,B(R))and defineν(A)＝ν0(A∩｛0, 1,...｝)，A∈B(R)．Thenνis aσ-finite measure on(R,B(R))．If

(13)

P_θ(dx)＝f(x｜θ)ν(dx)＝^△e^−θθ^x

x! ν(dx)，θ∈Θ＝(0,∞)

and the prior distributionπis a gamma distribution with parametersαand β, then the posterior distribution ofÚgivenX₁,..., X_nis given by a gamma

distribution with parametersαn＝αn(X₁,..., X_n),βn.Here we define

αn＝αn(x₁,..., x_n)＝α＋

n

Σ

_i＝1^xⁱ^，βⁿ^＝β＋ⁿ^．

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−(θ−Ú(ω))＋Ú(ω)log θ Ú(ω) on｛ω:θ＞Ú(ω)｝a.s.

Proof.For allt∈(0,1)，Markov's inequality yields

π^ωn[θ,∞)＝ ({ω′:Ú(ω′)^θ｜X₁,..., X_n)(ω) _e^−ntθ (e^nt^Ú｜X₁,..., X_n)(ω) _e^−ntθ

(

βn^β−ⁿnt

)

^αⁿ^(X¹^{,..., X}ⁿ⁾^a.s.,

and hence for allt∈(0,1)

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−θt＋lim

n→∞

αn(X₁,..., X_n)

n log

(

βn^β−ⁿnt

)

＝−θt＋ (X₁｜Ú)(ω)log

(

1−¹t

)

＝−θt＋Ú(ω)log

(

1−¹t

)

^a.s.

Thus on｛ω:θ＞Ú(ω)｝

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_inf

0＜t＜1

[

^−θ^t^＋Ú^(ω)^log

(

1−¹t

)]

＝−(θ−Ú(ω))＋Ú(ω)log θ Ú(ω) a.s.

æ

(14)

６．The exponential case

Suppose thatΘ＝(0,∞)and that for eachθ∈Θ

P_θ(dx)＝θe^−θ^x1(0,∞)dx.

If the prior distributionπis a gamma distribution with parametersαandβ，

then the posterior distribution givenX₁,..., X_nis a gamma distribution with parametersαnandβn＝βn(X₁,..., X_n)，where

αn＝α＋n，βn＝βn(x₁,..., x_n)＝β＋

n

Σ

_i＝1^xⁱ^．

lim sup

n→∞

1

nπ^ωn[θ,∞)_1−θÚ(ω)＋log(θÚ(ω)) on｛ω:θ＞Ú(ω)｝a.s.

Proof.For almost allω∈{θ＞Ú}andt∈(0,Ú(ω))，there is ann₀such that βn(X₁,..., X_n)

βn(X₁,..., X_n)−nt＞0for alln^n₀，since βn(X₁,..., X_n)

βn(X₁,..., X_n)−nt→ (X₁｜Ú)(ω)

(X₁｜Ú)(ω)−t＝ Ú(ω) Ú(ω)−t＞0．

Thus for almost allω∈{θ＞Ú}and allt∈(0,Ú(ω)) 1

nlogπ^ωn[θ,∞)_−θt＋αn

n log

(

βn^β(Xⁿ⁽₁^X,..., X¹^{,..., X}_n)−ⁿ⁾nt

)

for alln^n₀，so that forω∈{θ＞Ú}andt∈(0,Ú(ω))

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_−θt＋log

(

Ú(ω)−^Ú^(ω)t

)

^．

Consequently

lim sup

n→∞

1

nlogπ^ωn[θ,∞)_ inf

0＜t＜Ú(ω)

[

^−θ^t^＋^log

(

Ú^Ú(ω)−^(ω)t

)]

(15)

＝1−θυ(ω)＋log(θυ(ω))．

æ Appendix

Lemma A.1．LetY₁andY₂be random variables on(Ω,F, )with values in measurable spaces (E₁,E1)and(E₂,E2)，respectively, and G a sub-σ-algebra with respect to whichY₂is measurable. Ifμis a regular conditional distribution forY₁ givenG，then for every measurable functionf：E₁×E₂

→Rsuch thath(Y₁,Y₂)∈L¹(Ω,F, ),

∫

^E¹^h⁽^y¹^,^Y²^(ω))μ(ω,^dy¹⁾ ⁽^A^.1)

isG-measurable and

(h(Y₁,Y₂)｜G)(ω)＝

∫

Ê¹^h⁽^y¹^,^Y²^(ω))μ(ω,^dy¹⁾ â.s. ⁽Â^.2)

In other words，(A.1)is a version of (h(Y₁,Y₂)｜G).

Proof. If h＝1A1×A2，A_i∈Ei, then(A.1)is G-measurable and(A.2)holds.

Since

H＝

{

^A^∈E¹^×E²^：

∫

^E¹¹^A⁽^y¹^,^Y²^(ω))μ(ω,^dy¹⁾is a version of (1A(Y₁;Y₂)｜G)(ω)

}

is aλ-class andH contains theπ-class D＝{A₁×A₂：A_i∈Ei,i＝1,2}，

E1×E2⊂H．Thus(A.1)is a version of (h(Y₁,Y₂)｜G)wheneverhis an in- dicator function. By linearity，(A.1)is a version of (h(Y₁,Y₂)｜G)for all simple functionsh, and hence for all nonnegative functions by the monotone convergence theorem. For the general case, the result follows by splitting

the function into positive and negative parts. æ

(16)

Let Y₁,Y₂,... be real-valued random variables defined on a probability space(Ω,F, )andGa subσ-algebra. If for alln^1andA₁,..., A_n∈B(R)

(Y₁∈A₁,..., Y_n∈A_n｜G)＝

n

Π

_i＝1 ⁽^Xⁱ^∈^Aⁱ^｜^G) ^a.s.,

Y₁, Y₂,... are declared conditionally independent given G．If G＝σ(η)for some random elementη，Y₁, Y₂,... are called conditionally independent givenη．In addition to the conditional independence, if for alli^1 (Y_i∈A

｜G)＝ (Y₁∈A｜G)a.s.,Y₁, Y₂,...are defined to be conditionally independent and identically distributed(abbreviated to conditionally i.i.d.)givenG. IfY₁, Y₂,...are conditionally i.i.d. and ×is a measurable function, then × (Y₁), ×(Y₂),...are conditionally i.i.d.

Lemma A. 2．IfY₁, Y₂,...are conditionally i.i.d. given G, there exists a regular conditional distributionμ(ω,B),(ω,B)∈Ω×B(R^∞)for Y＝(Y₁, Y₂,...)givenGsuch that for eachω∈Ωthe coordinate functionsξ1,ξ2,... on (R^∞,B(R^∞),μ(ω,･))are i.i.d. Moreover, ifY₁is integrable, thenξ1,ξ2,...

are integrable with respect toμ(ω,･)for almost allω∈Ω．

Proof.SinceR^∞is a Borel space, there is a regular conditional distributionν0

(ω,B)for Y＝(Y₁, Y₂,...)givenG. For eachi^1and eachr∈Qthere is a null setN_i,r∈Gsuch that for eachω∈/N_i,r

ν0(ω,ξi_r)＝ν0(ω,R×・・・×R×(−∞,r]×R×・・・)

＝ (Y∈R×・・・×R×(−∞,r]×R×・・・｜G)(ω)

＝ (Y_i_r｜G)(ω)＝ (Y₁_r｜G)(ω)

＝ν0(ω,ξ1_r)，

and hence for allω∈/N＝^△ i^1,r∈QN_i,rand for alli^1,r∈Q, we have

ν0(ω,ξi_r)＝ν0(ω,ξ1_r).

(17)

Since the sets of the form(−∞,r],r∈Qform aπ-class generatingB(R)，

it follows that for eachω∈/N，ν0(ω,ξi∈･)andν0(ω,ξ1∈･)agree as probability measures on(R,B(R))．For eachωdefine a measureν^ωby

ν^ω(･)＝i jl

ν0(ω,ξ1∈･)，ω∈/N ν(･)， ω∈N，

whereνis any probability measure on(R,B(R))．Now we define a probability measure

μ(ω,･)＝(ν^ω×ν^ω×・・・)(･)

for eachω∈Ωon(R^∞,B(R^∞))．We will show thatμis a regular conditional distribution givenGthat satisfies the requirement of the theorem. Sinceμ (ω,･)is the infinite-dimensional product measure ofν^ωwith itself, the coordinate functionsξ1,ξ2,... are necessarily i.i.d. random variables on(R^∞,B (R^∞),μ(ω,･))for eachωwith distribution

μ(ω,ξi∈A)＝ν^ω(A)

＝i jl

ν0(ω,ξ1∈A)，ω∈/N

A∈B(R)．

ν(A)， ω∈N，

To show thatμ(ω,B)is a regular conditional distribution forY＝(Y₁,Y₂,...) given G，it suffices to verify thatμ(･,B)is a version of (Y∈B｜G)for each B∈B(R^∞)，sinceμ(ω,･)is a probability measure by definition. If A₁,..., A_n∈B(R),n^1，then

μ(ω,A₁×・・・×A_n×R×・・・)＝ν^ω(A₁)・・・ν^ω(A_n)1N^c＋ν^ω(A₁)・・・ν^ω(A_n)1N

＝ν0(ω,ξ1∈A₁)・・・ν0(ω,ξ1∈A_n)1N^c＋ν(A₁)・・・ν(A_n)1N, and thereforeμ(ω,A₁×・・・×A_n×R×・・・)isG-measurable. Besides outside

(18)

theG-null setN

μ(ω,A₁×・・・×A_n×R×・・・)＝ν0(ω,ξ1∈A₁)・・・ν0(ω,ξ1∈A_n)

＝ν0(ω,ξ1∈A₁)・・・ν0(ω,ξn∈A_n)

＝ (Y₁∈A₁｜G)(ω)・・・ (Y_n∈A_n｜G)(ω)

＝ (Y₁∈A₁,..., Y_n∈A_n｜G)(ω)

＝ (Y∈A₁×・・・A_n×R×・・・｜G)(ω)a.s. Thereforeμ(･,A₁×・・・×A_n×R×・・・)is a version of (Y∈A₁×・・・A_n×R

×・・・｜G)．Note that

D＝{A₁×・・・A_n×R×・・・:n^1,A_i∈B(R),i＝1,..., n}

is aπ-class that generatesB(R^∞)．Since

H＝{B∈B(R^∞):μ(･,B)is a version of (Y∈B｜G)}

is aλ-class withD⊂H，B(R^∞)⊂H．This implies thatμ(･,B)is a version of (Y∈B｜G)for eachB∈B(R^∞).

Finally by Lemma A.1

∫

^R^∞^｜ξⁱ⁽^y^)｜μ(ω,^dy^)＝

∫

^R^∞^｜ξ¹⁽^y^)｜μ(ω,^dy⁾

＝ (｜ξ1(Y)｜｜G)(ω)＝ (｜Y₁｜｜G)(ω) a.s.

The integrability of Y₁ entails (｜Y₁｜｜G)(ω)＜∞ a.s., and hence the

claims follows. This completes the proof. æ

Theorem A.3．IfY₁, Y₂,... are conditionally i.i.d. random variables given a subσ-algebraGand ifY₁is integrable, then

Œ

Y_n＝Y₁＋・・・＋Y_n

n → (Y₁｜G) a.s.(n→∞).

Proof.Letμ^ω(B)＝μ(ω,B),(ω,B)∈Ω×B(R^∞)be a regular conditional dis-

(19)

tribution for Y＝(Y₁, Y₂,...)givenGsuch that the coordinate functionsξ1, ξ2... are i.i.d. random variables on (R^∞,B(R^∞),μ^ω)for eachω．We will

show that

(

^supn^m｜YŒ_n− (Y₁｜G)｜＞ε

)

^{→ 0} ⁽^m^{→ ∞),} ⁽^A^.3)

which is equivalent to the convergence YŒ_n→ (Y₁｜G)a.s. asn→ ∞．For allε＞0

(

n^sup^m｜YŒ_n− (Y₁｜G)｜＞ε

)

^＝

[ (

^supn^m｜YŒ_n− (Y₁｜G)｜＞ε｜G

)]

＝

[ (

^supn^m

｜

ⁿ¹

n

Σ

_i＝1^ξⁱ⁽^Y^{)− (}^Y¹^｜^G)

｜

^＞ε｜^G

)]

＝

[

^μ^ω

{

^y^∈R^∞^:n^sup^m

｜

ⁿ¹

n

Σ

_i_＝1^ξⁱ⁽^y^{)− (}^Y¹^｜^G)(ω)

｜

^＞ε

}]

^.

The last equation follows from Lemma A.1．SinceY₁is assumed to be integrable, Lemma A.2shows thatξ1, ξ2,... are i.i.d. integrable random variables on(R^∞,B(R^∞),μ^ω)for almost allω．It follows by the strong law of large numbers and Lemma A.1that

1 n

n

Σ

_i＝1^ξⁱ^→

∫

^R^∞^ξ¹^d^μ^ω^＝ ^(ξ¹⁽^Y^)｜^G)(ω)

＝ (Y₁｜G)(ω) μ^ω-a.s.

for almost allω．It follows that

μ^ω

{

^y^∈R^∞^:^supn^m

｜

ⁿ¹

n

Σ

_i＝1^ξⁱ⁽^y⁾⁻ ⁽^Y¹^｜^G)(ω)

｜

^＞ε

}

^→0

for almost allω．And now(A.3)is obtained by the dominated convergence theorem.

References

[1] Aldous, D. J．(1982)．On exchangeabilitiy and conditional independence, In Koch, G.

and Spizzichino, F., editors,Exchangeabilitiy in Probability and Statistics，165‑170，

(20)

North-Holland, Amsterdam.

[2]Dembo, A. and Zeitouni, O．(1998)．Large Deviations Techniques and Applications， 2nd ed., Springer-Verlag, New York.

[3]Deuschel, J. D. and Stroock, D. W．(2000)．Large Deviations, AMS Chelsea Publish- ing, Amer. Math. Soc.

[4]Fu, J. C. and Kass, R. E．(1988)．The exponential rate of convergence of posterior distributions,Ann. Inst. Statist. Math.,40，683‑691.

[5]Ganesh, A. and O'Connell, N．(1999)．An inverse of Sanov's theorem,Statist. Probab.

Lett．42,201‑206.

[6]Shen, X. and Wasserman, L．(1998)．Rates of convergence of posterior distributions, Ann. Probab.,29，687‑714.