Mathematical Problems in Engineering

(1)

PII. S0161171204202319 http://ijmms.hindawi.com

A BAYESIAN MODEL FOR BINARY MARKOV CHAINS

SOUAD ASSOUDOU and BELKHEIR ESSEBBAR Received 20 February 2002

This note is concerned with Bayesian estimation of the transition probabilities of a binary Markov chain observed from heterogeneous individuals. The model is founded on Jeﬀreys’

prior which allows for transition probabilities to be correlated. The Bayesian estimator is approximated by means of Monte Carlo Markov chain (MCMC) techniques. The performance of the Bayesian estimates is illustrated by analyzing a small simulated data set.

2000 Mathematics Subject Classiﬁcation: 62M05.

1. Introduction. Markov chain models have been useful for the analysis of longi- tudinal data in many areas of research. In ecology, the model was used to study the migration behavior of animal population from capture-recapture data [2]; in pathology, the model was useful to describe the evolution of certain viral or infectious diseases [3,6]; in sociology, the model was used for the modeling of the behavior of smoking population [5].

In most applications, these models do not take into account possible correlations between diﬀerent rows of the transition matrix. As the observations are dependent, it seems more reasonable to consider prior distributions which incorporate a certain type of dependence between the components of the parameters.

In this note, we explore the Bayesian model, for binary Markov chains, using Jeﬀreys’

prior which has some advantages: the model has no extra parameters and permits a structure of correlation between the transition probabilities.

In the sequel,X=(X0,...,Xn)denotes a homogenous and stationary Markov chain with transition probabilities

pij=P

Xt+1=j|Xt=i

, i,j=0,1. (1.1)

The equilibrium probability of observing a 1, which we denote byp, represents the long- run proportion of time when the Markov chain is in state 1. From [1], this probability is given byp=p01/(p01+p10).

Lettingx =(x0,...,xn) denote a fully observed realization of X, conditionally to X⁰=1, the distribution of the observed sequence is then

f

x|x⁰=1,θ

=

1−p⁰¹n00

pⁿ01⁰¹p10ⁿ¹⁰

1−p¹⁰n11

, (1.2)

wherenijis the number of one-step transition from stateito statejuntil timenand θ=(p01,p10)∈]0,1[²is the unknown parameter which is the aim of this inference.

(2)

Using the distribution (1.2), the maximum likelihood estimates (MLEs) ofp01andp10

are

pˆ01= n01

n00+n01, pˆ10= n10

n10+n11, (1.3)

and the MLE ofpis ˆp=pˆ01/(pˆ01+pˆ10).

The remainder of the note is organized as follows. In the next section, we calculate Jeﬀreys’ prior and the correspondent posterior distribution. Next, we describe the way to approximate the Bayesian estimator via the independent Metropolis-Hasting (IMH) algorithm. Finally, we develop a numerical study by simulation in order to compare the Bayesian estimates with the MLEs.

2. Jeffreys’ prior. The goal here is to determine Jeffreys’ prior (see, e.g., [4]) and its correspondent posterior distribution. Jeffreys’ prior is obtained by taking the determinant of the information matrix which is defined according to Fisher as

Ᏽn(θ)=E

−∂²ln

θ|x0=1

∂pij∂pkh

, (2.1)

whereln(θ |x0=1)is the logarithm function of (1.2). To obtain (2.1), we take the second derivates ofln(θ|x0=1), and then take the expectation with negative sign to yield

E

−∂²ln

θ|x0=1

∂p²01

=E

n01|X0=1 p01²

+E

n00|X0=1 1−p01

2 ,

E

−∂²ln

θ|x0=1

∂p²10

=E

n10|X0=1 p10²

+E

n11|X0=1 1−p102 .

(2.2)

Considering the expectation of the suﬃcient statistics(n⁰⁰,n⁰¹,n¹⁰,n¹¹), we have

E

nij|X0=1

= n t=1

P

Xt−1=i,Xt=j|X0=1

= n t=1

P

Xt=j|Xt−1=i P

Xt−1=i|X0=1

=pij

n t=1

p1i^(t−1),

(2.3)

wherep^(k)_ij denotes the k-step transition probabilities. By the Chapman-Kolmogorov equation [1], these probabilities may be written in terms of the one-step transition probabilitiespij as

(3)

p^k00=

p⁰¹+p10−1

p¹⁰+p01

1−p⁰¹−p¹⁰k , p^k01=

p01+p10−1

p01−p01

1−p01−p10k , p^k10=

p01+p10−1

p10−p10

1−p01−p10k , p^k11=

p01+p10

₋₁

p01+p10

1−p01−p10

_k .

(2.4)

Then, we deduce that

E

n01|X0=1

=p01

p01+p10

₋₁

np10−p10

1−

1−p01−p10

_n p⁰¹+p10

,

E

n⁰⁰|X⁰=1

=p⁰⁰

p⁰¹+p¹⁰−1

np¹⁰−p¹⁰1−

1−p01−p10n

p01+p10

,

E

n10|X0=1

=p10

p01+p10

₋1

np01+p10

1−

1−p01−p10

n

p01+p10

,

E

n11|X0=1

=p11

p01+p10

₋₁

np01+p10

1−

1−p01−p10

_n p⁰¹+p10

.

(2.5)

Hence, the Fisher information matrix can be written as

Ᏽn(θ)=

A¹¹ 0

0 A22 , (2.6)

where

A11=p10

n

p01+p10

−1+

1−p01−p10

n p01

1−p01

p01+p10

2 ,

A22=np01

p01+p10

+p10

1−

1−p10−p01

n p10

1−p10

p10+p01

2 .

(2.7)

Since Jeﬀreys’ priorπ(θ)is deﬁned by

π(θ)∝ det

Ᏽn(θ)1/2

, (2.8)

where det(·)denotes the determinant, it follows that

π(θ)∝ n

p01+p10

−1+

1−p01−p10

n1/2

× np01

p01+p10

+p10

1−p01−p10

n1/2

×p^−1/201

1−p01

₋1/2 1−p10

₋1/2

p01+p10

₋2

,

(2.9)

(4)

and the posterior density is

π(θ|x)∝ n

p⁰¹+p10

−1+

1−p⁰¹−p10n1/2

× np01

p01+p10 +p10

1−p01−p10n1/2

×p01ⁿ⁰¹⁻^1/2

1−p01

n00−1/2

pⁿ10¹⁰

1−p10

n11−1/2

p01+p10

₋2

.

(2.10)

The main advantage of the density (2.9) is that it provides a convenient analysis when the transition probabilities may be correlated. Moreover, this prior has no extra parameters and it is a conjugate distribution forf (x|x0=1,θ)given by (1.2). Also notice that for the particular casep01+p10=1 (independent case), Jeﬀreys’ prior given by (2.9) is just the beta distributionᏮe(1/2,1/2).

3. Bayesian estimation of transition probabilities. Under the squared error loss, we know that the Bayes estimator coincides with the posterior mean, that is,

E(θ|x)=

θπ(θ|x)dθ. (3.1)

In the case of Jeﬀreys’ prior, the above integral is diﬃcult to calculate, so we propose an approximation of it by means of a Monte Carlo Markov chain (MCMC) algorithm;

namely, the IMH algorithm (see [7]).

The fundamental idea behind these algorithms is to construct a homogenous and ergodic Markov chain(θ^(l))with stationary measure π(θ|x). Form0large enough, θ^(m⁰⁾ is roughly distributed from π(θ|x)and the sample θ^(m⁰⁾,θ^(m⁰⁺¹⁾,... can be used to derive the posterior means. For instance, the Ergodic theorem (cf. [7]) justiﬁes the approximation of the integral (3.1) by the empirical average

1 m

m l=1

θ^(m⁰^+l) (3.2)

in the sense that (3.2) is converging to the integral (3.1) for almost every realization of the chain(θ^(l))under minimal conditions. Next, we give the description of the IMH algorithm.

Givenθ⁽⁰⁾=(p01⁽⁰⁾,p⁽⁰⁾10), the IMH algorithm at steplproceeds as follows.

Step1. Generatey^(l)=(y1^(l),y2^(l))∼U[0,1]×U[0,1]. Step2. Take

θ^(l+¹⁾=







y^(l) with probabilityσ (θ^(l),y^(l)),

θ^(l) with probability 1−σ (θ^(l),y^(l)), (3.3)

where

(i) σ (θ^(l),y^(l))=min(π(y^(l)|x)/π(θ^(l)|x),1), (ii) π(· |x)is the posterior density given by (2.10).

(5)

As convergence assessments, we use the cumulated sums method (cf. [7]) in the sense that a necessary condition for convergence is the stabilization of the empirical average (3.2). Also, this method of convergence control gives the minimal valuemof iterations that provides the approximation of the integral (3.1) by the empirical average (3.2).

4. Numerical study. In this section, we illustrate the performance of the Bayesian estimation based on Jeﬀreys’ prior by analyzing a small simulated data set.

On the one hand, the simulation study compares the proposed Bayesian estimators with the MLEs. On the other hand, it compares the two estimators for independently- chosen transition probabilities in both cases of the beta distributionᏮe(1/2,1/2)and the uniform distribution.

We recall that under the beta prior distribution

π p01,p10

∝p⁻10^1/2

1−p10

₋1/2

p⁻01^1/2

1−p01

₋1/2

, (4.1)

the Bayesian estimator is calculated explicitly by

p˜01= n01+1/2

n00+n01+1, p˜10= n10+1/2

n10+n11+1. (4.2) The Bayesian solution, using the uniform prior distribution

π p01,p10

=I[0,1]

p10

×I[0,1]

p01

, (4.3)

is given by

p˜01= n01+1

n⁰⁰+n⁰¹+2, p˜10= n10+1

n¹⁰+n¹¹+2. (4.4) 4.1. A simulated data set. Table 4.1displays a data set consisting of 20 independent Markov chains each with 21 observations; obviously, the chains may be of differing lengths. To generate this data set, transition probabilities for each chain are first drawn from Jeffreys’ prior given by (2.9) by using the IMH algorithm (seeSection 3). We as- sume, without loss of generality, that the first stateX0, in each chain, is equal to 1. The remaining observations in each chain are drawn in succession from Bernoulli distribution with successive probabilities given by the appropriate transition probabilities.

We recall that Jeﬀreys’ prior permits a certain type of dependence between the ran- dom vectorsP10andP01. Indeed,P10andP01are correlated with correlation coeﬃcient

ρ=E P10P01

−E P10

E P01

σ10σ01 , (4.5)

whereσ10(resp.,σ01) is the standard deviation ofP10(resp.,P01).

(6)

Table4.1.Estimationofthetransitionprobabilities. Bayesianestimates SimulatedchainsActualMLEJeﬀreys’priorBetapriorUniformprior p10p01ˆp10ˆp01˜p10˜p01˜p10˜p01˜p10˜p01 11010101010101010101010.97650.98531.0001.0000.95610.95510.95450.95450.91670.9167 21000000000000000000000.98420.03381.0000.00000.92410.03250.75000.02500.66670.0476 31000101010000011000000.75620.16590.83330.28570.78210.30740.78570.30000.75000.3125 41111111111110001000000.19710.32160.15380.14290.18220.33200.17860.18750.20000.2222 51000000000000100000100.79620.04401.0000.11760.81510.03910.87500.13890.80000.1579 61100010101001111011010.37890.55090.54550.66670.54480.56120.54170.65000.53850.6364 71101110101100111010100.59810.75970.53850.85710.56170.80710.53570.81250.53330.7778 81111111111111111111110.00260.30170.0000—0.03560.34120.02380.50000.04540.5000 91000001000000100010000.91370.26601.0000.18750.89050.25130.90000.20590.83330.2222 101010101010101010101010.99080.99311.0001.0000.96790.97430.95450.95450.91670.9167 111111001011010110100110.47310.79250.50000.75000.49120.73160.50000.72220.50000.7000 121111100111111111111110.07190.67220.05560.50000.09330.59370.07890.50000.10000.5000 131011000001111000110110.46890.47910.40000.40000.44420.45180.40910.40910.41670.4167 141100011001111111011110.31250.30560.21430.50000.42420.39010.23330.50000.25000.5000 151111111110000000000010.28050.12530.11110.09090.25170.12450.15000.12500.18180.1538 161100000000000000000000.76860.01610.50000.00000.68560.03410.50000.02630.50000.0500 171111111111011111111110.08360.73080.05261.0000.09160.70250.07500.75000.09090.6667 181010000000000100000000.66670.23761.0000.11760.71560.18170.87500.13890.80000.1579 191101111101111010101010.44440.91350.42861.0000.43910.92920.43330.92860.43750.8750 201111000110110000000000.31590.15160.37500.16670.34150.16970.38890.19230.40000.2143 Mean0.52400.44230.53540.46220.53190.44550.50710.45110.49390.4472 Meansquareerror(MSE)0.01600.01330.00300.00240.01220.00960.01310.0101

(7)

To approximate this coeﬃcient, we use the sampleθ⁽¹⁾,...,θ^(m), drawn from Jeﬀreys’

prior thanks to the IMH algorithm. Therefore, an approximation ofρis

ρ˜= m_m

l=1p10^(l)p^(l)01−_m

l=1p^(l)10

_m

l=1p^(l)01

m_m

l=1

p10^(l)

2

−m l=1p^(l)10

2 m_m

l=1

p^(l)01

2

−m l=1p^(l)01

2. (4.6)

Numerically, we have ˜ρ=0.36.

To obtain the Bayesian estimator ˜pij, based on Jeﬀreys’ prior, of the transition probabilities, given each chain, we apply the IMH algorithm to the posterior distribution given by (2.10). The MLE ˆpij is calculated from (1.3). The Bayesian estimator ˜pij founded on the beta distribution (resp., the uniform distribution) is obtained from (4.2) (resp., from (4.4)). The results of this experiment are provided inTable 4.1.

4.2. Simulation results. Table 4.1shows the actual transition probabilitiespijfrom the simulation, the MLE ˆpij, and the Bayesian estimates ˜pijof the transition probabilities for each chain.

Notice that for many chains, the MLE takes extreme values 0 or 1; this is explained by the restricted size of the simulated sample. In addition, for the chain no. 8, ˆp01does not exist because the chain never entered state 0, whereas the Bayesian estimates do not suﬀer from these problems because a common prior distribution is assumed.

Also shown inTable 4.1are the mean actual and the estimated transition probabilities, as well as mean square errors (MSE) for the estimates. The MSE are calculated by averaging the squared diﬀerence between the estimated probability and the actual probability used in simulation. Notice that the Bayes posterior means perform better than the MLEs. In particular, the MSE of the MLEs is clearly higher than that corresponding to the Bayes estimates.

This study also illustrates the usefulness of modeling the dependence among the transition probabilities. In particular, the resulting posterior distributions, under the assumption thatP10andP01are independent, may not be accurate. Indeed, using the beta prior distribution (resp., the uniform prior distribution), the resulting changes in the posterior means range from−0.1594 to 0.1909 (resp., from−0.0844 to 0.2574) for ˜p10 and from−0.1588 to 0.1445 (resp., from −0.1588 to 0.1098) for ˜p01. More- over, the MSEs corresponding to ˜p10and ˜p01become 0.0122 and 0.0096 (resp., 0.0131 and 0.0101). These are slightly higher than the results obtained by modeling the dependence. All these results lead to privilege the Bayesian solution based on Jeﬀreys’

prior.

For the previous experiment, a Pascal program is written to run the transition probabilities. The Bayesian estimator ˜pij, founded on Jeﬀreys’ prior, is obtained from a single chain including 10⁴iterations.

Figures4.1and4.2give an example of the convergence evaluation (seeSection 3).

Figure 4.1(resp.,Figure 4.2) describes the convergence of the estimator ˜p10(resp., ˜p01) as the numbermof iterations increases. The ﬁnal values are 0.4073 and 0.1742 for ˜p10

and ˜p01, respectively.

(8)

1000 800

600 400

IMH 200

0.30 0.4 0.5 0.6 p10

0.7 0.8 0.9 1

Figure4.1. Convergence of ˜p10.

1000 800

IMH 600 400

200 0.10

0.2 0.3 0.4 0.5 0.6 p01

0.7 0.8 0.9

Figure4.2. Convergence of ˜p01.

5. Conclusion. In this note, we studied the Bayesian estimation for the transition probabilities of a binary Markov chain under Jeﬀreys’ prior distribution. As shown, this prior has many advantages: it permits a certain type of dependence between the

(9)

components of the parameter. The absence of extra parameter in this prior is of great interest because we do not need to do more extra estimation. A numerical study by simulation is also carried out to evaluate the performance of the Bayesian estimates compared to the MLEs. The following stage of this note will be to generalize the sug- gested method in the case of missing data.

References

[1] D. R. Cox and H. D. Miller,The Theory of Stochastic Processes, John Wiley & Sons, New York, 1965.

[2] J. A. Dupuis,Bayesian estimation of movement and survival probabilities from capture- recapture data, Biometrika82(1995), no. 4, 761–772.

[3] R. C. Gentleman, J. F. Lawless, J. C. Lindsey, and P. Yan,Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease, Statistics in Medicine13(1994), no. 3, 805–821.

[4] R. E. Kass and L. Wasserman,The selection of prior distributions by formal rules, J. Amer.

Statist. Assoc.91(1996), no. 435, 1343–1370.

[5] M. R. Meshkani and L. Billard, Empirical Bayes estimators for a finite Markov chain, Biometrika79(1992), no. 1, 185–193.

[6] S. Richardson, I. Deltour, and J. Y. Lehesran,Stochastic algorithms for Markov models esti- mation with intermittent missing data, Biometrics55(1999), 565–573.

[7] C. Robert,Méthodes de Monte Carlo par Chaînes de Markov[Markov Chain Monte Carlo Methods], Statistique Mathématique et Probabilité, Éditions Économica, Paris, 1996 (French).

Souad Assoudou: Department of Mathematics and Computer Sciences, Faculty of Sciences, Avenue Ibn Batouta, BP 1014, Rabat, Morocco

E-mail address:[email protected]

Belkheir Essebbar: Department of Mathematics and Computer Sciences, Faculty of Sciences, Avenue Ibn Batouta, BP 1014, Rabat, Morocco

E-mail address:[email protected]

(10)

Mathematical Problems in Engineering

Special Issue on

Modeling Experimental Nonlinear Dynamics and Chaotic Scenarios

Call for Papers

Thinking about nonlinearity in engineering areas, up to the 70s, was focused on intentionally built nonlinear parts in order to improve the operational characteristics of a device or system. Keying, saturation, hysteretic phenomena, and dead zones were added to existing devices increasing their behavior diversity and precision. In this context, an intrinsic nonlinearity was treated just as a linear approximation, around equilibrium points.

Inspired on the rediscovering of the richness of nonlinear and chaotic phenomena, engineers started using analytical tools from “Qualitative Theory of Di

ﬀ

erential Equations,”

allowing more precise analysis and synthesis, in order to produce new vital products and services. Bifurcation theory, dynamical systems and chaos started to be part of the mandatory set of tools for design engineers.

This proposed special edition of the Mathematical Prob-

lems in Engineering aims to provide a picture of the impor-

tance of the bifurcation theory, relating it with nonlinear and chaotic dynamics for natural and engineered systems.

Ideas of how this dynamics can be captured through precisely tailored real and numerical experiments and understanding by the combination of specific tools that associate dynamical system theory and geometric tools in a very clever, sophis- ticated, and at the same time simple and unique analytical environment are the subject of this issue, allowing new methods to design high-precision devices and equipment.

Authors should follow the Mathematical Problems in Engineering manuscript format described at

http://www .hindawi.com/journals/mpe/. Prospective authors should

submit an electronic copy of their complete manuscript through the journal Manuscript Tracking System at

http://

mts.hindawi.com/

according to the following timetable:

Manuscript Due February 1, 2009 First Round of Reviews May 1, 2009 Publication Date August 1, 2009

Guest Editors

José Roberto Castilho Piqueira,

Telecommunication and Control Engineering Department, Polytechnic School, The University of São Paulo, 05508-970 São Paulo, Brazil;

[email protected]

Elbert E. Neher Macau,

Laboratório Associado de Matemática Aplicada e Computação (LAC), Instituto Nacional de Pesquisas Espaciais (INPE), São Josè dos Campos, 12227-010 São Paulo, Brazil ; [email protected]

Celso Grebogi,

Department of Physics, King’s College, University of Aberdeen, Aberdeen AB24 3UE, UK;

[email protected]

Hindawi Publishing Corporation http://www.hindawi.com