On
the
effect
of variance-stabilizing
transformation
to wavelet-based
software reliability
assessment
首都大学東京システムデザイン研究科
肖
雷
Xiao Xiao
Graduate School of
System
Design,
Tokyo Metropolitan University
1
Introduction
Insoftware$1^{\cdot}$eliabilityengineering, ithas become
one
of themain issuesin thisareatoassess
thesoftwarerelia-bility quantitatively.Among
over
hundreds ofsoftware reliability models(SRMs) [11, 12,14],non-homogeneousPoissonprocess(NHPP)basedSRMs have gainedmuch popularityinactual software testingphases. People
are
interestedin estimating the
software
intensityfunction
ofNHPP-based SRMsfromsoftwarefaultcount datathatcan
be collectedinthesoftware testing phases. Thesoftware intensity function indiscretetimedenotes thenumber of softwarefaults detectedperunit time. Fromtheestimated software intensityfunction,it ispossible to predictthe number of remaining software faults andthe quantitative software reliability, which is defined
as
theproba-bility that softwaresystemdoesnotfail during
a
specified time periodundera
specified operationalenvironment.Therefore,
we
are
interested indevelopingahigh-accuracyestimation methodforthe software intensity function.We proposed waveletshrinkageestimation(WSE)
as
non-parametricestimationmethod for NHPP-based SRMs([15, 16, 17 The WSE does notrequiresolvinganyoptimizationproblem,
so
that the implementation ofes-timationalgorithms is rathereasy than the othernon-parametric methods. We compared
our
method with theconventional maximumlikelihoodestimation(MLE)andtheleastsquares estimation(LSE)through
goodness-of-fit test. Ithas beenshown that WSEcould providehighergoodness-of-fitperformance than MLE and LSB inmany
cases,inspite ofits non-parametricnature,through numerical experimentswithreal softwarefault count data. Thefundamental idea of WSEistoremovethe noise includedintheobserved software faultcountdatatoget
a
noise-freeestimateofthesoftware intensity function. Itisperformed through the following three-step procedure.
First,the noisevariance is stabilizedby applying the variance-stabilizingtransformationtothedata. This produces
a
time-series data inwhichthenoisecan
betreatedas
Gaussianwhitenoise. Second,the noise is removed usingthreshold methods. Third,
an
inverse variance-stabilizing transformation is applied tothe denoised time-seriesdata,obtaining theestimateofthe softwareintensityfunction. Thispaperfocusesonthefirst andthe third steps of WSE andaimsatidentifyingandemphasizingtheinfluence that thedatatransformation exertsontheaccuracy
of WSE. The remaining part of thispaperis planmed
as
follows. InSection2,wegiveapreliminaryonwavelet analysis, especially focusing on multiresolution representation. Section 3 describes the WSE for NHPP-basedSRMs in details. InSection 4,wecarryout the realproject dataanalysisand examinethe effectivenessof eight
2
Multiresolution Representation
Let$f(t\rangle$denote the targetfunctiononcontinuous time$t$
.
Inmultiresolution representation, targetfunctionisusuallyexpressed bythe linear combinationofitsapproximation component anddetailcomponent, i.e.,
$f(t) = f_{jc!}(t)+ \sum_{j=j(}^{+\infty}g_{j}(t)$, (1)
where$f_{j)}(t)$and$g_{j}(t)$arethe level-joapproximationcomponentand level-j detail component of$f(t)$,respectively.
Here,parameter$j(\geq 0)$is called resolution level,and$jo$ isprimary resolution level. The levei-japproximation
component$f_{j}(t)$andlevel-jdetail$g_{j}(t)$component
are
definedas
follows.$f_{j}(t) \sum_{k=-\infty}^{+\infty}a_{jf}\phi_{JJ}(t\rangle,$ (2)
$g_{j}(t) = \sum_{k=-\infty}^{+\infty}\beta_{j,k}\psi_{ノ^{}a,k}(t\rangle$
.
(3)Here,$\alpha_{j\lambda}$and$\beta_{j.k}$
are
the so-called scalingcoefficients
and waveletcoefficients,respectively.$\phi_{j,k}(t)$and$\psi_{J^{1k}},(t)$are
waveletbases that
are
generatedfromfather
waveletand motherwavelet,respectively. Ifwe choose Haarwavelet,ofwhich the father wavelet and mother wavelet
are
given by$\phi(t)$ $=$ $\{\begin{array}{ll}1 (0\leq t\leq 1)0 (otherwise),\end{array}$ $\langle$4)
and
$\psi(t)$ $=$ $\{\begin{array}{ll}1 (0\leq t<1/2)-f (1/2\leq t<1) ,0 (otherwise)\end{array}$ $\langle 5\rangle$
then$\phi_{j,k}(t)$and$\psi_{j,k}(t)$canbegenerated byintroducingascaling parameter$j$andashift parameter$k$
as
$\phi_{J^{k}}(t\rangle$ $=$ $2^{j/2}\phi(2^{j}t-k)=\{\begin{array}{ll}2^{j/2} (2^{-j}k\leq t\preceq 2^{-j}(k+1))0 (otherwise),\end{array}$ (6)
and
$\psi_{j_{i}}k(t\rangle$ $=$ $2^{J/2}\psi(2^{j}t-k\rangle=\{\begin{array}{ll}2^{j/2} (2^{-j}k\leq t<2^{-j}(k+1/2)\rangle-2^{j/2} (2^{-j}(k+1/2)\leq t<2^{-j}(k+1))0 (otherwise).\end{array}$
Because$\phi_{j,k}(l)$and$\psi_{ノ,k}(t)$
are
orthonormalbases,their coefficientscan
befoundby calculating theinnerproductofthetargetfunction and themselves.Therefore,the scaling coefficients andwaveletcoefficients
are
definedas
$\alpha_{jJ} = \int_{\infty}^{\infty}f(t)\phi_{ノ,k}^{*}(t)dt$, (7)
$\beta_{i^{k}}, = \int_{\infty}^{\infty}f(t)\psi_{j,k}^{*}(t)dt$, (8)
wherenotation $*$
means
complexconjugation.In practice,thehighest resolution level$J$dependsonthe length ofobserved data, say$n(=2^{J})$
.
Additionally,givenby
$f_{J}(t \rangle = f_{0}(t)+\sum_{j=\mathfrak{o}}^{J-1}g_{j}(t)$, (9)
$f_{j}(t)= \sum_{\succ-0}^{2^{j}-1}\alpha_{jJ}\phi_{j,k}(t)$, (10)
$g_{j}(t)= \sum_{k=\mathfrak{o}}^{2^{J}-1}\beta_{j,k}\psi_{J^{4}}f(/)$. (11)
Inotherwords,thelevel-J multiresolution representationof$f(t)$
on
continuous time$t$isgivenby$f_{J}(t) = \alpha 0.0\phi_{0,0}(t)+\sum_{j=0}^{J-1}\sum_{k4}^{2^{j}-1}\beta_{jJ}\psi_{J\grave{J}}(t\rangle$
.
(12) $\circ n$theother hand,in discrete time case,targetfunction$f_{f}$$(i=1,2, \cdots, n)$can
be expandedina
similar fasion:$f^{J},, = u0, \mathfrak{o}\phi_{0.0}(i)+\sum^{J-1}\sum_{-j=0\succ \mathfrak{o}}^{2^{J}-1}v_{j,k}\psi_{j,k}(l)$, (13)
where $u_{j,k}$ and$v_{jJ}$
are
discretescalingcoefficients
and discrete wavelet coefficients, respectively. Because theapproximation
error
between continuous and discrete coefficientscan
be adjusted byfactor $\sqrt{n}$([1]),we
have $u_{j}J = \frac{1}{\sqrt{n}}\sum_{j=1}^{n}f_{f}\phi_{jf}^{*}(i)$, (14)$v_{jJ}$ $=$ $\frac{1}{\sqrt{n}}\sum_{1=1}^{n}f\psi_{jj}^{*}\langle\iota)$
.
$(1S)$The mappingfromfunction$f_{i}$tocoefficients $(u_{j\prime}, v_{j_{t}k})$ iscalledthediscrete Haarwavelet
transform
(DHWT),whilethereconstruction offunction from coefficients$(u_{\dot{j}}J, v_{Jt})$ is called the inverse discrete Haar wavelet
transform
(IDHWT).FrominherencepropertyofHaarwavelet,$u_{\dot{j}}.k$and$v_{y.k}$
can
becalculated easilyas
follows.For$j=J(J=\log_{2}n)$, $k=0$, 1,$\cdots,$ $n-1,$
$u_{j,k} = \frac{1}{\sqrt{n}}f_{k+1}$
.
(16)For$j=J-1,$ $J-2,$$\cdots,$ $0,$ $k=0$, 1,$\cdots,$ $2^{j}-1,$
$u_{r\acute{J}} = \frac{1}{\sqrt{2}}(u_{j+1.2k}+u_{j+1,2k+1})$, (17)
$v_{j.k} = \frac{1}{\sqrt{2}}(u_{j+1,2k}-u_{j+1,2k+1})$
.
(18)Furthermore,for$j=0$, 1,$\cdots,$ $J-1,$ $k=0$, 1,$\cdots,$ $2^{j}-1$,the IDHWTisachieved by
$u_{J+Ip_{k}} = \frac{1}{\sqrt{2}}(u_{j,2k}+v_{j.2k+1})$, (19)
$u_{j+1,2k+1} = \frac{1}{\sqrt{2}}(u_{J2k}-v_{jlk+}$ (20)
Let $s_{i}$ be the observation of$f(i=1,2, \ldots.n)$
.
Then, the empiricaldiscretescalingcoefficients
$c_{j\lambda}$ and empirical$discre/e$waveletcoefficients
$d_{J\lambda}$can
becalculated usingequations(14)and(15)with replacedby$s_{i}.$Coefficients$c_{j,k}$and$d_{j\cdot k}$
are
calledtheempiricalones
because it isconsideredthatobservationerrors
are
included inside. Thatis,thenoises ofthe dataare
$considel\cdot ed$to be involvedinempiricaldiscretewavelet coefficients$d_{j,k},$and the threshold methods
can
beusedfordenoising. Up tonow,althougha
broadclassofthresholding schemesare
available in the literature,thecommon
choices include hard thresholding andsoft
thresholding, wherehardthresholdingis
a
keep’or
kill’rule,whilesoftthresholdingisa
$shri\alpha$or
kill’ rule. Thehardthresholdingisdefined
as
$\delta_{\tau}(d\rangle = d1_{|d|>\tau},$ (21)
and soft thresholdingis defined
as
$\delta_{\tau}(d) = s\mathfrak{R}(d)(|d|-\tau)_{+}$, (22)
forafixedthreshold level $r(>0)$,where $1_{A}$istheindicatorfunction of
an
event$A,$$sg\mathfrak{n}(d)$is the signfunctionof $d$,and$(d)_{+}= \max(O, d)$.
Letting$d_{j,k}’$denotethe denoised empiricaldiscrete waveletcoefficients,theestimatesoftargetfunction canbeobtainedbyapplyingIDHWT to$c_{j,k}$and$d_{j,k}’.$
3
Wavelet Shrinkage
Estimation
for NHPP-based
SRMs
Supposethat the number of software faults detectedthrough asystem test isobserved atdiscretetime $i=$
$O$, 1, 2, $\cdots$ Let
$Y_{i}$,and$N_{i}= \sum_{k=0}^{i}Y_{k}$respectivelydenote the number ofsoftwarefaultsdetectedat testingdate$i,$
andits cumulativevalue,where$Y_{0}=N_{0}=0$ is assumed withoutanyloss ofgenerality. Thestochastic process
$\{N_{i}:i=0, 1, 2, \cdots\}$issaidtobe
a
discretenon-homogeneousPoissonprocess($D$-NHPP)ifthe probabilitymass
functionattime$i$isgiven by
$Pr\langle N_{l}=m\} = \frac{\{\Lambda_{f}^{m}}{m}!\exp\{-A_{i}\}, m=0, 1, 2, \cdots$, (23)
where$\Lambda_{i}=E[N_{i}J$ is theexpectedcumulativenumber of software faults detected by testing date$i$
.
ThefUmction $\lambda_{j}=\Lambda_{i}-\Lambda_{i-1}(i\geq 1\rangle$iscalled the discrete intensityfunction,andi:npliestheexpectednumberoffaults$de\{$eciedattesting date$i$,say$\lambda_{i}=E[Y_{i}]$
.
Inetherwords,$Y, = \lambda_{i}+\eta_{7}, i=0, 1, 2\ldots$
.
(24)Thewavelet-basedtechniques havebeenwell established especially in several
areas
suchas
non-parametricregression,probability densityestimation,time-seriesanalysis,etc.Considel$\cdot$
thefollowingnon-parametric
regres-sionmodel:
$S, = z_{i}+\epsilon_{i}, i=1, 2, \cdots, n$, (25)
where$z_{\dot{f}}$
are
arbitrary targetfunctionsofdiscrete-time index ; and$\epsilon$,
are
independentGaussianrandomvariableswith$N(O, \sigma^{2})$and$\sigma(>0)$
.
One ofthe basicapproachestotheGaussian non-parametric regression istoconsiderthe unknown function$z_{i}$ expanded
as a
waveletseries and to transformthe original problemtoan
estimation ofthe wavelet coefficients from the data. DonohoandJohnstone$\mathfrak{l}6$,7,8]and Donohoetal. [9]proposednon-linear
waveletestimatorsof$z_{i}$,and suggestedtheextractionofthe significant wavelet coefficientsbythresholding,where
waveletcoefficients
are
set to$O$iftheirabsolutevalueisbelowa certainthreshold level. Then,applicationoftheinverse wavelet transform ofdenoised coefficientsbythresholding leads to
an
estimatoroftheunderlying targetfunction$z_{l}.$
Thisstandard wavelet-basedtechnique
can
beappliedtoanonparametricestimationofthediscretizedNHPP.Table1: RepresentativeDataTransformaions(the
case
ofvarianceequalsto1/4).Table 2: RepresentativeDataTransformaions(the
case
ofvarianceequalsto 1).([3,4,2,10 Table 1 and Table 2showrepresentivevariance-stabilizingtransforms with differentvariances. By
using eitherof them, the
sottware
faultcountdata, say $y_{l}$,which isthe observation of$Y_{i}(i=1,2, \ldots, n)$, isapproximately transformed to theGaussiandata. That is, thetransformed realizations$s_{j}(i=1,2,$$\ldots,$ $n\rangle$by data
transformations
can
be regardedas
theones
from the normallydistributed random variables:$S_{f} = \lambda_{i}’+v_{i}, i=1, 2, \cdots, n$, (26)
where$\lambda_{i}’$ isthe transformedsoftware intensity nation, and
$v_{t}$istheGaussianwhitenoisewith constantvariance. Then,$\lambda_{j}’$isthetargetfunctiontobe expandedbymultiresolution representation introduced in Section 2.
Forthresholding, too largethresholdmay cutoffimportant parts ofthe original function, whereas toosmall
thresholdretainsnoise in the selectivereconstruction.Hence,the choice ofthresholdis also
a
significant problem.Infact manymethodstodeterminethethresholdlevelhave beenproposed(e.g. see[5]). In WSE([15, 16, 17
We
use
theuniversal threshold[6],and the’leave out-halfcross
validation threshold[13]:$\tau = \frac{v}{\sqrt{n}}\sqrt{2\log n}$, (27)
$\tau = (1-\frac{\log 2}{\log n})^{-1/2}\tau(\frac{n}{2})$, (28)
where$v$is the standardvariationof empirical waveletcoefficients,and$n$isthe lengthofthe observation$y_{1}$
.
Refer to[13]for details of$\tau(n/2)$.4
Real
Data
Analysis
Although eight kinds of variance-stabiliz\’ingtransformations
were
usedinWSE,the most appropriateoneforNHPP-based SRMs
was
notbe identified. Inthisexperiment,welook intothis problem. Weuse
six setsofsoft-ware
fault countdata(groupdata)collectedfrom real projectdata sets([11 Let$(K,$ $n,$ $x_{n}\rangle$denote the triple of the softwaresize,thefinaltestingdateand the total number ofdetected fault. Thenthese datasets$(DS1 \sim DS6)$arepresented by$(14kloc, 62weeks, 133)$,$(15kloc, 41weeks, 351)$,$(103kloc, 73weeks, 367)$,$(null, 46days, 266)$, $(300kb, 81days, 463\rangle, (2$ookloc, $11 ldays, 481)$, respectively. The MSE(meansquareserror)and LL$(\log$
likeli-hood)
are
employedas
the goodness-of-fit measures, whereMSE$i$ $=$
$\frac{\sqrt{\sum_{i=(}^{n}(\Lambda_{i}-x;\rangle^{2}}}{n}$
, (29)
$MSE_{2} = \frac{\sqrt{\sum_{j=1}^{n}(\lambda_{i}-y_{j})^{2}}}{n}$, (30)
$LL$ $=$ $\sum_{:=1}^{n}(x_{i}-x_{i-i})\ln[\Lambda_{;}-\Lambda\vdash 1]-\Lambda_{n}-\sum_{j=1}^{/\iota}\ln[(x_{i}-x_{i-1})!].$ $(31\rangle$
We apply the WSE with two threshelding techniques (hard thresholding(h) v.s. softthresholding (s)), and
twothresholds(universalthreshold(ut)v.s. cross-validation threshold(cvt$\rangle$).Table 3presentsthegoodness-of-fit
results ofDS1,based
on
the fourthreshold methodsfor DS1 withdifferent variance-stabilizingtransformations.Due topagelimit,theresults shownhere
are
ofthecase
where the highest resolution level$i$issettobe 3.Forall the variance-stabilizingtransformations,it is observedthat whenuniversalthreshold isused, the
theaccuracyof the
case
of$l=1/4$is thesame
withor
better than that of thecase
of$\nu^{2}=1$ inmostcases.
Thereasons are
consideredtobeas
follows. Forthecase
ofuniversalthreshold, because the magnitude relationshipsbetweentransformed dataand theuniversal threshold
are
the same, thesame
empirical discrete waveletcoeffi-cients$d_{JJ}$
are
set tobe O. Therefore, the accuraciesare
thesame
fromboththeoretical and experimental pointsofview. That is, thevarianceof the variance-stabilizingtransforms doesnot affectthe accuracy of WSEwhen
universalthresholdispreferred. $\circ n$theotherhand,the
same
magnitude relationshipdoes notexistin thecase
ofcross-vaiidation threshold. Hence,theaccuracies
are
different whendifferent variance-stabilizingtransformationsare
used from theoretical point ofview.However,itisobserved in all the sixdata sets ofour experimentsthat,theaccuracyofthe
case
of $l=1/4$isthesame
withor
betterthan that ofthecase
of$f=1$inmostcases.
Therefore,a
variance-stabilizing transformationwith smallervariance ispreferred.5
Conclusion
Inthispaper,the effectivenessof variance-stabilizingtransformations to waveletshrinkageestimation
was
in-vestigated. Fromthe numerical evaluation,wefound thattheaccuracyofthe
case
of $l=1/4$was
thesame
withorbetter thanthat of the
case
of$l=1$ inmostcases.
Therefore, weconcluded that suchvariance-stabilizingtransformations thattransform originalvarianceto 1/4should be usedinwaveletshrinkage estimationto
assess
softwarereliability.
Acknowledgment
Thiswork
was
supported by JSPS KAKENHI GrantNumber26730039.References
[1] F. Abramovich, T. C. Bailey, and T. Sapatinas, “Wavelet analysis and its statistical applications The
Statistician,49(1),pp. 1-29(2000).
[2] F.J.Anscombe, “The transformationofPoisson,binomial andnegative binomialdata,“Biometrika, vol. 35,
no.
3-4,pp.246-254, 1948.[3] M.S.Bartlett, “Thesquareroottransformation in the analysisofvariance,”SupplementtotheJournal
of
heRoyal StatisticalSociety,vol. 3,no. 1,pp.68-78,1936.
[4] M. S. Bartlett, “The
use
oftransformation Biometrics,vol.3,no.
1,pp.39-52, 1947.[5] P.Besbeas,I.DeFeis and T. Sapatinas, “A comparative simulation studyofwaveletshrinkageestimators for
Poisson counts International StatisticalReview,vol.72,$\iota lo.$2,pp.209-237,2004.
[6] D. L.DonohoandI. M. Johnstone, “Idealspatialadaptationby waveletshrinkage Biometrika, vol. 81,
no.
3,pp.425-455, 1994.[7] D. L. Donohoand 1.M. Johnstone, $Adaptin_{g}$tounknown smoothnessvia waveletshrinkage Journal
of
the American StatisticalAssociation,vol.90,pp. $120\triangleright 1224$, 1995.
[8] D.L.Donoho andI. M.Johnstone, “Minimaxestimation viawaveletshrinkage Annals
of
Statistics,vol.26,
no.
3,pp.879-921, 1998.[9] D. L. Donoho, I. M.Johnstone, G. Kerkyacharianand D.Ricard, “Waveletshrinkage: asymptopia? (with
[10] M.$i^{i}$
.
Freemanand J. W. Tukey,“Transformations relatedto theangular andthesquareroot,” The Annals
of
MathematicalStatistics,vol.21,no.4,pp. 607$*$11, 1950.
[11] M. R. Lyu(ed.), Handbook
ofSofware
ReliabilityEngineering, McGraw-Hill, New York,1996.[12] J. D. Musa, A. lannino and K. Okumoto,
Software
Reliability, Measurement, Prediction, Application,McGraw-Hill, NewYork, 1987.
[13] G.P.Nason, “Waveletshrinkageusingcross-validation,” Journal
of
theRoyalStatisticalSociety: SeriesB,vol. 58,pp.463-479, 1996.
[14] H.Pham,
Software
Reliability, Springer,Singapore,2000.[15] X. Xiao andT. Dohi, “Wavelet-based approach forestimating software reliability Proceedings
of
20thInternational Symposiumon
SofAvare
Reliability Engineering(ISSRE’09),pp. 11-20, IEEE CSPress,2009.[163 X. Xiao andT.Dohi, “A comparativestudy ofdatatransformations for wavelet shrinkageestimation with
application to software reliability assessment Advances in
Software
Engineering, vol. 2012, ArticleID 524636,9pages,
May2012.[17] X.Xiao and T.Dohi, “Waveletshrinkageestimationfor non-homogenousPoisson processbased software