Journalof AppliedMathematics&DecisionSciences,2(2),107-117(1998)
PeprintsAvailabledirectly from theEditor. Printed inNewZealand.
The Predictive Distribution in Decision Theory" A Case Study
GEOFF JONES
InstituteofInformation Sciences and Technology College ofSciences, Massey University, NewZealand
Abstract. Intheclassical decisiontheoryframework,thelossis a functionof thedecisiontaken and the stateof nature as represented byaparameter 0. Information about 0 can be obtained viaobservation ofa random variable X. In somesituationshowever the loss willdepend not directly on 0 but ontheobserved value of another randomvariableYwhosedistributiondepends on0. Thisaddsanextralayertothedecisionproblem, and may leadto a widerchoiceofactions.
In particular there are now twosample sizesto choose, for Xand for Y,leading to arange of behaviours intheBayesrisk. Weillustrate this with aproblem arising from the cleanup ofsites contaminated with radioactive waste. Wealsodiscusssome computational approaches.
Keywords: DecisionTheory,BayesPule,PredictiveDistribution,MonteCarlo Integration
1. Introduction
Consider thefollowing consulting problem: theclientisinvolvedin the cleanup of sites contaminated withradioactivewaste, which involvessending bins ofradioac- tivematerialto anuclear reprocessing plant.
Two
such plantsareavailable,one of whichislessexpensivebutwhich willonlyaccept abinifthe levelof radioactivity of the material is below athreshold level. The actual levelofradioactivity in the binisdeterminedby samplingatthe reprocessingplant;asfaras the clientisaware onlyonesuchsampleistaken. Ifthemeasuredlevel exceeds thethreshold, thema- terialisreturnedtothe client and must then be sent to thesecond,more expensive reprocessingplantwhich will acceptmaterialatany level ofcontamination.The client wishes to base the decision ofwhich reprocessing plant to use on a sample or samples taken from each bin on site before the materialis dispatched.
The costofsamplingissmall relative to the differenceinreprocessingcosts, butis notnegligible.
How
manysamples shouldbetaken,and howshouldthis information beused?The
Loss
Functioninthisproblemclearlyhas theformL(al y) IS S
/K + P
ify<cotherwiseL(a2,y) S + K
Here
alisthe decision tosendto thelessexpensivereprocessing plantat acostS,
K
istheextracostof sendingto theother plant, andP
theextrapenaltyincurred bysending first to theless expensiveplant but having thematerial rejected. Thisoccurs if the sample taken at the plant has an observed value y greaterthan the threshold levelc. Sincethevalue of
S
playsnopartinthe decision,wecan without loss of generalitytakeS =
0.The client expects that therewill be considerable variation in levels of radioac- tivityofthematerial withineachbin, muchofitbeingat quitealow level but with some "hotspots" of high radioactivity, suggesting ahighly skeweddistribution.
It
seemsreasonable then to model the result forasinglesampleasarandom variable
X
with an Exponential distribution.We
writeX Exp(0)
to denote thatX
has densityHere
parametrisesthe"state
ofnature"
and relates to the averagelevel ofra- dioactivity ofthe material in the bin.Note
however that the mean level is1/.
The alternative parametrisationof the Exponential distribution is more intuitive, but we keep the present form for mathematical convenience.
Because
of thehigh skewnessit is intuitively clear that asingle samplewill not provide reliable infor- mationabout theaverageradioactivity level.It
may beadvantageousfor the client to persuade the reprocessor, through financial inducement or otherwise, to take further samples beforeaccepting orrejectinga bin. Thus there may betwosample sizes to consider, relatingto the sampling before and after dispatch. Intuitively one might expect that increasing eithersamplesizewould beadvantageous
tothe client, andthat foragivencostof sampling the total samplesize might beshared equallybetween thetwostages. This turns out not tobethe case.In
Section 2 wedevelopandanalysea Bayesianframework forthisproblemusing aconjugatepriordistributionfor. In
Section3weconsiderthe determinationof optimalsample sizes at eachstage of sampling.In
Section4 we reconsider the choice of prior and discuss some numerical strategies forincorporating a non-conjugate prior.The basic principles ofstatistical decisiontheory, asused
here,
are describedbyDeGroot [2], although
ournotationisclosertothat ofFerguson [3]. In
theclassicalapproachthe loss incurred by thedecision makeris afunctionoftheactiontaken and the true value of an unknown parameter, information about which can be obtainedby sampling. The situation inwhichthe loss dependsnot on a parameter butonfutureobservations wasconsideredby Roberts
[7]
inthecontextofstatistical prediction. Aitchison andDunsmore [1]
and Geisser[4]
provide an overview andmany applicationsofthe predictiveapproach,someofwhich involve decisionmaking butnot thesamplesizedeterminationproblemconsideredhere.
A
related problem in determining asingle sample size in the classicalframework whenthere aretwo"adversaries" with different priors, was consideredbyLindley andSingpurwalla
[5],
and an applicationinenvironmental monitoring of radiationlevelsgivenby Wolfson etal.
[8].
PREDICTIVEDISTRIBUTION IN DECISIONTHEORY
109
2. Bayesian Analysis
We
assume that the uncertaintyabout/9canbe expressedas aGamma
distribution,F(a, A)
with priordensity0(0) r() (2)
Ifwe wish to use a non-informative prior we might consider a 1 and
A
--+ 0, althoughthismay not be sensible here(see
Section4).
The main advantageof theGamma
prior isthatit is aconjugateforthe Exponentialdistribution, sothat the posteriordistribution for/9 after observing one or more sample valuesX
will also beGamma.
Specifically, for asingle observationX
x thejoint distribution of(X, 8)
has density, o > o (3)
so by inspection the posterior density
rOlX(8 x)
cOe
-(x+=) and( z)
r(a + I, A + z).
The usual procedure, when the lossisafunctionof
O,
would be to choosetheaction(al
ora2)
whichminimizesthe expected loss under thisposteriordistribution,giving theBayes
Rule5(x)
argmin{Ex [L(a,
t9x)] } (4)
Here
however the loss depends not on 0 but on the observed value of a second randomvariable, sayY,
representing the result ofthe sampletaken at the repro- cessing plant. TheBayes
criterion isnowEx [L(a,
yx)],
the expected loss under the predictive distribution forY
givenX
x. Ifwe assume that both the client and the reprocessoruse the samesampling and measurement technique, thenY
has thesamedistribution asX (for
a given0),
inwhich casethe predictive distribution has densityfYIx(y z) fofYl(Y O)ralX(O z)dO ()
f (A + x) ’+10,:,+le_(X+,+)Od
0(6)
rca + )
( + 1)
( + + ).+, >
0(7)
Theexpectedlossfora2, the expensivereprocessor,isfixed at
S + K
and fora
Ex [L(al,
ylx)] S + (K + P)Px [Y > c]
( + )+
S+(K+P)
A+x+c
sothe
Bayes
Ruleistochooseal if |x.+x
|<
KWPK i.e. ifc
where= +/ K (8)
x<=l- K+P
The expected loss incurred by using this decision rule, say
(x),
can now be found byintegratingthe expected loss at fixedX,
as givenabove,
with respect to themarginaldistributionfx
ofX.
This gives theBayes
RiskB(r, )
of the rule withrespect to the priordistributionr. Formallywemay writeB(r, ) = E= [Ee [L((x), y) O]] Ef [Ex [L((x), y) x]] (9)
to show two different ways of calculating the
Bayes
Risk corresponding to two differentformsofiterated expectation.It
ismore convenienthereto usethelatter.Themarginal distributionfor
X
has densityx(z) O)dO
(A + z)
=+’so the
Bayes
Riskis/0
S(r,5) (g+P)
A+x+c
A,
[ K+P K+P
+ + +
x
>
0(10)
(A + x)
=+tdx+ K + +
(A + x)
=+iNote
thatA
isascale parameter for the marginaldistributionofX (and
ofY)
sothat the problem isinvariant totransformations
(A, c) -> (kA, kc)
for k>
0. Thistransformationcorrespondsto achangeintheunitof measurement ofradioactivity.
Similarlythe decision madedepends only on theratio
K/P
noton theindividualvalues.
Suppose
then that with suitable units we take a 3,A
10, c5, K =
10,P
15. The prior distribution forthe mean levelofradioactivity1/0
is showninFigure 1, and the marginaldistribution for
X (and Y)
in Figure 2.We
find thatthe
Bayes
Ruleisai if x
< 9.423
5(x) (11)
a
otherwiseIt
is clear from Figure 2 that at will be chosen most(93%)
of the time, even thoughthe meanlevelofradioactivityisoften abovethe criticallevelc. Thisoccurs becauseof theextremeskewness ofthesamplingdistributionforY
0whichmakes the "gamble" ofusing the cheaperreprocessor worthwhile even when the average level of radioactivity in a bin is quite high.In
the next section we consider the changeswhich occur when repeated samplingis used at both ends ofthe process(i.e.
forX
and forY).
PREDICTIVE DISTRIBUTION INDECISION THEORY
III
g(1/e)
0.2
0.15
0.1
0.05
0
0 2 3 4 5 6 7 8 9 10
Figure 1. Prior inverted gamma density for mean level of radioactivity (1/8) with a 3and
A=IO
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0 2 3 4 5 6 7 8 9 10
Figure
.
Marginal density for sampled level of radioactivity (X)with 3andA 103. Optimal Sample Sizes
Using the framework established in the previous section, we now suppose that the client bases his decision on samples Xi,
X2,..., X
taken from the bin, andassume that these are iid
Exp(). It
is convenient now to work with the totalX X1
/X2
/... /Xn
which is a sufficient statistic for and is distributed asF(n, ).
Similarly the totalY
ofm samples taken bythe cheaper reprocessor, assumingthebinissentthere,
willbeF(m, ).
Thedecision to acceptor reject the binisthen basedon the mean ofthem samples, so thatin theLoss
Functionc is replaced bymc.Proceeding as
before,
we now find that the posterior distribution forX
isF(n
/c,A
/x).
The predictive distributionforY X,
following the method of Equations(5)-(7),
then has densityr(m +
n+ ) () + x)n+ay
m-1]rl(Y ]) (12)
r(m)r(n + ) (: +
z+ y)++
If we make the substitution u _.._e__A+x+y we find that the predictive density for
U
r givenX =
x, has theformofaBeta
distribution, and we write:A+z+Y
Y
A + X + Y IX
xBe(m,
n+ a) (13)
for the predictive distributionofthetransformed variable.
A
closed formfor the predictive probabilityPlY <
mcX x]
isnot possible,buttheincomplete
Beta
distributioniseasytocalculate numerically(see Press
etal.
[6])
sowe can useEquation(13)
towriteP[Y<mclX=x]=IB, +x+m (m,n+a)
The
Bayes
Ruleisthento chooseal if(14)
IB +=o (m,
n+ a) > K P + P (15)
i.e. if
x<f= mc( : (16)
1-(
where 1 isthe
Pg
quantile ofBe(m,
n+ ).
To
determine theBayes
RiskB(,)
for fixed m and n we use the marginaldensity of
X.
Proceeding asbeforewe find thatBe(n, a) (17)
sofrom Equation
(9)
B(r, ) KP[X > f] + (g + P)P[X < f
andY > mc]
= K (1-IBx, (n, x)) + (K + P)fo’ (1- IB/x (m,n +Cz))fx(x)dx
PREDICTIVEDISTRIBUTION IN DECISION THEORY
113
Although numerical integrationis now required this can be accomplishedquite easily using standard routines
(see Press
et al.[6]),
and evaluation over a range of valuesofm and ngives a criterion for choosing the optimal(from
the client’s viewpoint) sampling plan. Usingthe parametervaluesfrom Section2, Table1 gives theBayes
Risk for values ofm andn rangingfrom 1 to 6; notethat these values donot includethe costof obtainingthe samples. The optimal choices form andn willdepend onthe samplingcost: if forexampleeach sample determination hasa cost of0.1,weadd 0.1(n + m)
toeachvalueinthe table and find thattheoptimum is n m 5. The advantageofnotincluding the sampling cost explicitly in the tableisthatwecanobserve the behaviouroftheBayes
Risk asnandmarevaried.Noticethat forn 1 the
Bayes
Risk initiallyincreases as m increasesfrom 1 to2, the increasedaccuracy ofdetermination by thereprocessorbeing disadvantageous tothe client, but thereafter an increase inm resultsin alower expected cost.For
nhoweverahigher valuewillalways decreasethe
Bayes
Risk, asonewould expect:moreinformation for the client shouldalwaysresult in abetter decision.
Table1. BayesRiskforvarioussample sizes, 0 F(3,10)
2 3 4 5 6
m--1 2 3 4 5 6
7.05 7.165 7.’130 7.085 7.06 7’014- 6.885 6.878 6.782 6.699 6.633 6.580 6.781 6.709 6.577 6.470 6.387 6.322 6.711 6.596 6.439 6.316 6.221 6.147 6.661 6.515 6.340 6.205 6.101 6.019 6.623 6.453 6.266 6.120 6.009 5.922
Table2. Cutoff point for
,
0 F(3,10)m=l 3 4 5 6
7.398
’6.862
6.623 6.490 6.407n=l ’9.4}3
2 7.430 6.158 5.828 5.684 5.605 5.557 3 6.768 5.747 5.487 5.375 5.315 5.279 4 6.438 5.543 5.318 5.223 5.173 5.142 5 6.240 5.422 5.217 5.132 5.088 5.062 6 6.109 5.341 5.151 5.073 5.033 5.009
Table 3. Probability of choosinga2, F(3,10)
m=l 2 3 4 5 6
{’1 0.136 0.190 0.209 0.18 0.223 0.26
2 0.182 0.239 0.257 0.266 0.271 0.274 3 0.205 0.262 0.280 0.288 0.293 0.295 4 0.219 0.276 0.293 0.301 0.305 0.307 5 0.229 0.285 0.302 0.309 0.313 0.315 6 0.235 0.291 0.308 0.315 0.318 0.321
Table 2 shows the cutoffpoint for the
Bayes
lule, expressed in relation to the samplemean,i.e. ifx/n
isgreater thanthetabulated value, thebinshould besent to the expensive reprocessor.
As
the numberof samples increases, the cutoff convergesquitequickly tothecriticalvaluec. Table3showsthe proportionofbins whichwould be senttotheexpensive reprocessor for eachsampling plan.Several different behaviours are possible, dependingon the parameter values.
In
some cases the
Bayes
riskmaydecreaseveryslowly,oreven increase, asm increases from 1; in other cases it decreases quite markedly.It
is important therefore to get accurate information about costs and the prior before deciding whether it is worthwhileobtainingextrasamples,and whether theextraeffortshould bedevoted toX
orY
or both equally.4. Non-conjugate Prior
Theprior
Gamma
distributionfor employedinSections2 and 3 waschosenmainly formathematical convenience.We
now re-examine its appropriateness and how a widerclassof priors mightbe incorporatedinto the analysis.To
obtain areasonableprior distribution from the client, he must be invited to speculate onthelikelihoodofarangeofvaluesof.
This may be difficultsince8 is itself notaparticularlymeaningful parameter.A
farmorenatural parametrization oftheproblemwould be to use1/
whichrepresentsthe mean levelofradioactivity in the bin; this is somethingabout which the client might reasonablybe expected to speculate.We
could stillproceed by showingthe client graphs ofthe densityof1/
for various choicesofa andA,
as in Figure 1, but even so we are restricting ourselvesto aclass of distributions, the InvertedGamma,
whichmight bethought
inappropriate. These distributions areverylong-tailed, havingless thana-1 finite moments.Suppose
that instead we decide to use a generalprior distribution specified for1/. We
can stilldenote theimplied priorfor byr0(),
but theintegrals needed to evaluatethe marginal distributionforX
and the predictive distribution forY
will not now involvesimplespecial functions liketheGamma
andBeta. It
has become commonplaceissuchsituationstoemploysomeform ofMonte
Carlo integration.Thereare essentiallythree stagesto thecalculation:
Evaluate the riskforfixedcutoff and fixedsamplesizes
n,m.
Choose to minimizethe riskforfixed n,m.
Choose n,mto minimizethe
Bayes
Risk.Ifwedenote the rule withcutoff by
,
i.e.al ifx
<
a2 otherwise
(18)
then weneed to evaluate
R(r, 5) KP[X > ] + (K + P)P[X <
andY > mc] (19)
PREDICTIVEDISTRIBUTION IN DECISION THEORY
115
One
approach would beto sample from thejoint distribution of(, X, Y). Pro-
videdthat the priorfor
1/
isreasonably easy tosimulate,
we invert arandomly
drawn value to get,
then drawX
andY
from their conditional distributionsF(n, )
andF(m, )
respectively. The conditional independenceofX
andY
given means that wedo not neediterative methodssuch as the Gibbs sampler. Given asample(, X, Y),
i 1,...,N
we can approximatetheriskfor given byN
R(r, ) _ g2:{;r)<} + (]C + P){>>
and:>>$]}(20)
i----1
where27isthe indicatorfunction.
Because
ofthe dimensionalityproblemthismethodrequires ahugesamplesizeto achieveevenreasonableaccuracy,andrepeatedcomputationfor varying becomes very inefficient.A
better approachisto simulate for onlyand to calculate directlyPe[X > ]
andPe[Y > mc].
These probabilities are incomplete gammafunctions and canbe calculatedquiteefficiently(see Press
et al.[6]),
givingN
R(r, ) _
g(1 IG,(n)) + (g + P)IGe, (n)(1 IGm, (m)) (21)
where
IG..(k)
denoted the incompletegammafunctionIG (k) uk-e-Udu (22)
Note
that only theX
probabilities depend on(,
so theY
probabilities for each0i
may be storedend
re-used. This method requires a much smaller sample of 0 values to achieve reasonable accuracy, and istherefore more eNcient han use of thefull multivariate joint distribution.Usingthe priorandparameter valuesfromSection itwasfoundghat
N
10,000 gavesufficientaccuracy(2 dp)
andareasonablecomputation time(about 90s). Now
howeverwe arenolongerrestrictedgo asmall class ofpriors. Thecalculagionwas repeated using a
r(4,1)
prior for1/0.
Thisis shown in Figure for comparisonwith Figure 1; it is similarbut much less long-tailed. Theestimated
Bayes
Risk andcutoffvalue usingthisprioraxegiveninTables 4 and 5.We
nowfindthat with acostof sampling of0.1thebest optionisn=
m=
1.TableJ. BayesRiskforvarioussample sizes,1/0 r(4,1)
m-1 2 3 4 5 6
6:’509 ""6.606 6.554 6.495 6.443
6.400
6.469 6.450 6.408 6.321 6.249 6.190 6.435 6.418 6.298 6.193 6.107 6.037 6.406 6.354 6.214 6.094 5.998 5.920 6.382 6.302 6.146 6.016 5.911 5.827 6.361 6.260 6.091 5.952 5.840 5.751
g(1/e)
0.25
0.2
0.15
0.1
0.05
0
0 2 3 4 5 6 7 8 9 10
Figure 3. Priorinverted gammadensity for mean level of radioactivity (1/8) with c 3and A=IO
Table 5. Cutoffpoint for 5:,1/e r(4, 1)
m=l 2 3 4 5 6
n=l
14.538"’10.36
9.374 8.907 8’635 8.522 9.813 7.494 6.909 6.685 6.561 6.469 3 8.257 6.529 6.137 5.974 5.894 5.816 4 7.514 6.085 5.758 5.619 5.541 5.487 5 7.075 5.829 5.532 5.413 5.354 5.315 6 6.777 5.645 5.390 5.283 5.240 5.200
5. Discussion
In
the above analysis we haveconsidered the problem only from the client’spoint of view, assumingthat he can pay the reprocessor to take extra samples, as well as deciding to take more samples himself, if this is to his advantage.We
have also assumed that the critical value c used by the reprocessor is kept fixed for different samplesizes. Ifwe now considerthereprocessor’s
pointofview, heclearly does not want to accept material which has too high alevel ofradioactivity.We
assume that herequiresthemean level for eachbintobe less than c, but that he does not allow for sampling variability in making his test.Were
he to do so, he would want to adjustthe criticalvalue depending on thesample size m.It
would also be to his advantageto take extrasamples. Rather than use decision theory merely to improvethedecision-making ofone sidein the process, aswe havedone here, itwould be more appropriatetouse an agreed decisiontheory frameworkasPREDICTIVE DISTRIBUTION IN DECISION THEORY
117
a negotiating tool in establishing an optimal samplingscheme which wouldbe of benefit toboth parties.
It
has already been noted that the optimal solution for the problem we have considered seems tobequitesensitivetothe parametervaluesand prior information.Thisshows theneedforestimated costs to beasaccurateaspossible, and forprior data to be incorporated in choosing the prior distribution, possibly through an empirical
Bayes
approach. This sensitivity is probably due in part to the use of long-taileddistributions. Thereis aconsiderablerange ofbehavior for different parameter values anddistributions.In
particular thefact that increasing the sample sizeforY
mayeither increase ordecrease the riskisinteresting, and isthe subject of furtherinvestigation.Acknowledgments
The author would like to thank the Editor and the referees fortheir support and helpfulsuggestions.
References
1. J. Aitchison andI. R. Dunsmore. Statistical PredictionAnalysis. University Press, Cam- bridge, 1975.
2. M.H. DeGroot. Optimal StatisticalDecisions. McGraw-Hill,NewYork, 1970.
3. T. S.Ferguson. MathematicalStatistics: aDecision TheoreticApproach. AcademicPress, New York,1967.
4. S. Geisser. PredictiveIn]erence: an Introduction. Chapman and Hall, London, 1993.
5. D. V. Lindley and N. D. Singpurwalla. On the evidence needed to reach agreed emtion between adversaries, with application to acceptance sampling. J. Amer. Statist.Assoc.,86, 993-937, 1991.
6. W. H.Press, S.A.Teukolsky,W. T.Vetterling andB. P.Flannery. NumericalRecipes: The ArtofScientificComputing, 2nd ed. UniversityPress,Cambridge,1992.
7. H.V. Roberts. Probabilistic prediction. J. Amer.Statist. Assoc., 60,50-62,1965.
8. L.J. Wolfson, J. B. Kadane andM. J. Small. Asubjective Bayesianapproach toenviron- mental saxnpling. In: Case StudiesinBayesian Statistics Vol.3 (C. Gatsonis, J. S. Hodges, R. E. Kass, R. McCulloch, P. Rossi and N. D. Singpurwalla, eds.) Springer-Verlag, New York,457-468, 1997.