185 FranciscoJ.Torres-Avilés ,GloriaIcaza ,ReinaldoB.Arellano-Valle UnaextensiónalamezcladeescaladenormalesparalaestimaciónBayesianaenpequeñasáreas AnExtensiontotheScaleMixtureofNormalsforBayesianSmall-AreaEstimation

(1)

Número especial en Bioestadística Junio 2012, volumen 35, no. 2, pp. 185 a 204

An Extension to the Scale Mixture of Normals for Bayesian Small-Area Estimation

Una extensión a la mezcla de escala de normales para la estimación Bayesiana en pequeñas áreas

Francisco J. Torres-Avilés^1,a, Gloria Icaza^2,b, Reinaldo B. Arellano-Valle^3,c

1Departamento de Matemática y Ciencia de la Computación, Facultad de Ciencia, Universidad de Santiago de Chile, Santiago, Chile

2Instituto de Matemática y Física, Universidad de Talca, Talca, Chile

3Departamento de Estadística, Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago, Chile

Abstract

This work considers distributions obtained as scale mixture of normal densities for correlated random variables, in the context of the Markov random field theory, which is applied in Bayesian spatial intrinsically autoregressive random effect models. Conditions are established in order to guarantee the posterior distribution existence when the random field is assumed as scale mixture of normal densities. Lung, trachea and bronchi cancer relative risks and childhood diabetes incidence in Chilean municipal districts are estimated to illustrate the proposed methods. Results are presented using appropriate thematic maps. Inference over unknown parameters is discussed and some extensions are proposed.

Key words:Disease mapping, Markov random field, Hierarchical model, Incidence rate, Relative risk.

Resumen

Este trabajo aborda las distribuciones obtenidas como mezcla de escala de normales para variables aleatorias correlacionadas, en el contexto de la teoría de los campos markovianos, la cual es aplicada a modelos bayesianos espa- ciales con efectos aleatorios autoregresivos intrínsecos. Se establecen condi- ciones para garantizar la existencia de la distribución a posteriori cuando se

aAssistant professor. E-mail: [email protected]

bAssociate professor. E-mail: [email protected]

cProfessor. E-mail: [email protected]

(2)

asume una distribución mezcla de escala de normales para el campo markoviano propuesto. Para ilustrar los métodos propuestos, se estiman los riesgos relativos de cáncer de tráquea, bronquios y pulmón, y tasas de incidencia de diabetes tipo 1 en distritos municipales de Chile. Los resultados son presen- tados usando mapas temáticos apropiados. Se discute la inferencia sobre los parámetros desconocidos y se proponen algunas extensiones.

Palabras clave:campo aleatorio markoviano, mapeo de enfermedades, mod- elo jerárquico, riesgo relativo, tasa de incidencia.

1. Introduction

Over the last two decades, Bayesian spatial models have become increasingly popular for epidemiologists and statisticians. In particular, small-area modeling is oriented to illustrate the behavior of rates or relative risks associated to each district that form a region or a country, that is, recognition of spatial patterns through maps is the main aim of these methodologies. The conventional assumption to estimate the standardized mortality ratios (SMRs) or incidence rates is based on the Poisson distribution. This assumption may cause several problems in this class of studies, mainly because of the extra-poisson variation. This extra-Poisson variation generally arises when the observed number of cases on each small-area are more variable than the variation contributed by the standard Poisson model (Mollié 2000). Bayesian models have been developed to solve this problem, intro- ducing random effects to account for unobserved spatial heterogeneity; even more, Markov chain Monte Carlo (MCMC) methods led to an explosive increment of the use of Bayesian analysis in these areas of application.

Important works that develop and use Bayesian theory are mentioned in the following lines. The pioneering work in this direction was done by Clayton &

Kaldor (1987) who proposed an empirical Bayes approach with application to lip cancer data in Scotland. In Ghosh, Natarajan, Stroud & Carlin (1998), conditions to demonstrate Bayesian generalized linear model (GLM) integrability are formal- ized under improper prior assumptions in order to represent lack of knowledge over unknown parameters. Best, Arnold, Thomas, Waller & Collon (1999) investigated several spatial prior distributions based on Markov random field (MRF) theory, and discussed methods for model comparison and diagnostics. Pascutto, Wake- field, Best, Richardson, Bernardinelli, Staines & Elliott (2000) examined some structural and functional assumptions of these models and illustrated their sensitivity through the presentation of results related to informal sensitivity analysis for prior distributions choices. They also explored the effect caused by outlying areas, assuming a Student-t distribution for the nonstructured effect.

Recently, Parent & Lesage (2008) proposed a linear Bayesian hierarchical model to study the knowledge spillovers in European countries, under different specifica- tions of the proximity structures. They also compared this effect through different strategies, for example allowing different prior distributions or Student-t errors, to include heterogeneity in the disturbances.

(3)

As it was previously mentioned, the class of spatial models has been related to GLM theory, considering random effects to represent the influence of geography.

Besag (1974) presented a pioneering work in the context of the MRF theory, with applications to regular lattice systems when spatial heterogeneity is considered.

Furthermore, the most used structure follows the work developed by Besag (1986), who presents a definition in the context of a MRF: Letu= (u1, u2, . . . , um)⁰ be a set of m random variables and u−i the random vector without the i-th component, where m represents a number of different and contiguous areas. If the joint distribution ofucan be expressed by each conditional distribution,u_i| u_−i, i= 1, . . . , m, then it is called a MRF. Intrinsically conditional autoregressive (CAR) random effects are defined as a particular MRF, initially proposed by Besag, York & Mollié (1991), which name is related to the impropriety of the joint distribution generated by the univariate conditional distributions ofui |u_−i, i= 1, . . . , m (see details in Banerjee, Carlin & Gelfand 2004).

In this work, an intrinsically Gaussian MRF is considered and its properties are extended to a more general family of continuous distributions. Scale mixture of normal (SMN) distributions have been proposed as robust extensions of the normal model. The genesis of this class of models is presented by Andrews & Mallows (1974). The SMN class of distributions is generated if the vector of interest, u, can be represented as

u=µ+ψ^−1/2z (1) where µ is a location vector parameter, and z and ψ are independent, with z following a multivariate zero centered normal distribution with covariance matrix Σand ψ being a non-negative random scale factor with c.d.f. F_ψ(· | ν), so that F_ψ(0|ν) = 0. Hereν is an additional or set of parameters controlling the kurtosis of the distribution ofψ. The SMN distributions have been shown to be a subclass of the elliptical distributions family by Fang, Kotz & Ng (1990). This subfamily presents properties similar to the normal distributions, except that their behavior allows capturing unusual patterns present in the data. In a Bayesian context, robust linear models have been studied since Zellner (1976). The multivariate Student-t and the multivariate Slash distributions are examples of this class of distributions.

Following the above ideas, heavier-tailed models will be assumed instead of working with the usual assumption of normality for u, through the Student-t and Slash distributions developed by Geweke (1993) and Lange & Sinsheimer (1993), respectively. Specifically, the Slash distribution considered in this work corresponds to the distribution of the random vector ψ^−1/2z, wherezand ψare independent, withzhaving a multivariate normal distribution as in (1) andψ|ν following a distributionBeta(ν/2,1),ν >0. The Student-t distribution is obtained through the same representation as the Slash distribution, with the difference that ψ|νfollows a Gamma distribution, where both parameters are equal toν/2,ν >0.

There are some potential problems with the Slash distribution that probably has resulted in more use of the Student-t. However, the Slash distribution may allow for fatter tails (more extreme values) than the Student-t.

(4)

From a MRF context, the class of SMN can be found in papers developed from a geological point of view, where prediction is the main focus. Student-t distributed MRF was treated by Roislien & Omre (2006) using a frequentist approach. Lyu

& Simoncelli (2007) made the extension of Gaussian MRF theory to what they called Gaussian scale mixture fields, for image reconstruction modeling.

In this work, Bayesian non-Gaussian spatial models are developed to detect unusual rates or relative risks in a particular area under the following scheme.

Standard small-area models are presented in Section 2. SMN theory is applied to extend the Gaussian MRF model (Besag 1974) in Section 3. In Section 4, non-Gaussian models are developed trough extensions of the spatial random effect following a Gaussian MRF to the scale mixture of Normal random field (SMN RF) proposed previously. Three different models are used to estimate the incidence rates of Insulin Dependent Diabetes Mellitus (IDDM) in the Chilean Metropolitan Region, and Respiratory Cancer mortality in the northern regions of Chile. These results are presented in Section 5. Finally, some comments and discussion are made in Section 6.

2. Spatial Models with Random Effects

Let y = (y₁, . . . , y_m)⁰ be a set of m random variables indexed to a specific region. A general formulation is assumed from the generalized linear mixed model theory (Breslow & Clayton 1993), which includes the following elements:

1. A specification of the likelihood function as member of the exponential family, namely

f(y|θ,φ) =

m

Y

i=1

exp{φ⁻¹_i (yiθi−g(θi)) +ρ(φi;yi)}, (2) where θ= (θ1, . . . , θm)⁰ is the vector of canonical parameters,φ= (φ1, . . . , φm)⁰ is a vector of known scale parameters,g is a known function that does not depend on the data, andρis a known function that does not depend on the unknown parameters.

2. A random specification for the link function,h(θi) =E(yi |θi), is typically represented by the normal linear mixed model

h(θi)|xi,β, ui, σ²^ind.∼ N ormal(x⁰_iβ+ui, σ²), (3) where thex_is are ap×1vectors of covariates associated to ap×1vector of coefficientsβ, theu_is represent spatial random effects, andσ² measures the nonstructured variability.

3. A model specification for the spatially structured random effectsuis. Typ- ically, Gaussian assumptions for the uis are made. In the literature it is recurrent to find that these spatial random effects are influenced by a prede- fined neighborhood represented by an adjacency matrixDw, controlling the

(5)

local variability. Hence, the mean is smoothed by the information given by its neighbors.

Let π(u | σ_u²,Dw) = π(u1, . . . , um | σ²_u,Dw) be the joint probability distribution derived from a MRF given a dispersion parameter σ_u² and a m×m m×madjacency matrixDw. A multivariate Gaussian distribution is then obtained when,

π(u|σ_u²,D_w)∝ 1 (σ²_u)^m/2exp

− 1

2σ²_uu⁰D_wu

. (4)

A specific case is considered in this work, where Dw has diagonal ele- mentsw_i+ representing the number of neighbors of thei-th component, and off-diagonal elements w_ij taking values −1 if the elements i and j share boundary, denoted by i∼j, and0in other case, i.e.,

wij=







wi+ i=j

−1 i6=j;i∼j 0 otherwise.

(5) Under 5, equation 4 is reduced to

π(u|σ²_u,D_w)∝ 1 (σ²_u)^m/2exp







− 1 2σ_u²

X

i∼j

(u_i−u_j)²







(6) A basic discussion and treatment of several proximity matrices can be found in Banerjee et al. (2004). A constraint will be imposed to this expression to guarantee integrability.

4. As a final step of the modeling, prior distributions are required for the unknown parameters to complete the hierarchical model. Usual non-informative prior distributions are represented by

i. π(β)∝constant

ii. σ⁻²∼Gamma(a/2, b/2) iii. σ_u⁻²∼Gamma(c/2, d/2)

(7) where the improper prior π(β), β = (β₁, β₂, . . . , β_p)⁰ ∈ R^p, is assumed according with Ghosh et al. (1998). The hyperparameters a, b, c, d > 0 are known constant. Here, both σ² and σ²_u represent the dispersion parameters included in the model; σ_u² is the local dispersion parameter related to a specific spatial structure. Another useful measure in spatial models is the percentage of spatial aggregation explained by the model, which usually is measured by the ratio

σ_u²

σ_u²+σ² ×100% (8)

Its interpretation is related to obtain the relative contribution given by the spatial aggregation effect. Here, a common estimation ofσ_u²is the empirical variances²_u, which can be obtained from the estimation ofufor each MCMC iteration.

(6)

3. Scale mixture of Intrinsically CAR Models

In this section an extension of the usual multivariate Gaussian MRF is proposed, assuming a multivariate SMN distribution. The next definition will provide an extension of (4) to the SMN random field (SMN RF).

Definition 1. A spatial random vectoru= (u₁, . . . , u_m)⁰ follows an SMN RF, if the kernel distribution can be obtained as

π(u|σ_u²,D_w, ν)∝ Z ∞

0

ψ σ²_u

m/2

exp

− ψ

2σ_u²u⁰D_wu

dF(ψ|ν) (9) whereF(· |ν)is the c.d.f. ofψ|ν,σ_u² is a dispersion parameter, andDwdenotes a adjacency matrix. A SMN RF with scale parameter σ²_u will be denoted as SM N RF(0, σ_u⁻²Dw, ν).

For the Gaussian case, it is known that specification of Dw in (5) makes (4) improper (Banerjee et al. 2004), since the matrixD_wis singular, so that its inverse does not exist, hence

Z

R^m

π(u|σ_u²,D_w, ν)du∝ Z

R^m

1 (σ²_u)^m/2exp

− 1

2σ²_uu⁰D_wu

du=∞

The last equation implies that a density function is available, but not inte- grable. This result is the intrinsic autoregressive model property, and it is usually relegated to the prior distribution elicitation. If additional assumptions are not considered, the improper condition will imply that if a multivariate SMN RF is assumed with kernel (9), then consistent property (Kano 1994) fails. Therefore, integration theory can not be applied.

In the same way as the joint distribution of the Gaussian MRF treated in the spatial literature, for every SMN RF, the joint distribution will also be improper.

In fact, this distribution will be proper only if the associated dispersion matrix is definite positive. Hence, some additional restrictions should be imposed to obtain a proper joint distribution, as discussed in Banerjee et al. (2004) and Assunção, Potter & Cavenaghi (2002). The next proposition establishes conditions to make proper the associated SMN RF. The proof of this proposition can be found in the appendix.

Proposition 1. Suppose that a set of spatial indexed random variables, represented by the vector u = (u₁, . . . , u_m)⁰, is available. Consider the SMN RF in (9) as the distribution of u. Additionally, let us suppose that F(· |ν)is a known positive c.d.f. IfPm

i=1u_i= 0 andE(ψ^1/2|ν)<∞, then (9) is proper.

Specific choices for F(· | ν) in (9) lead to different scale mixture probability distributions. Student-t and Slash MRFs will be used in this work, which can be obtained using stochastic representations which depend on the selected mixing distributionF(· |ν).

The SMN RF can be represented hierarchically in terms of two stages:

(7)

At the first stage of the hierarchy, a Gaussian MRF is specified with an additional random scale factorψ. At the second stage, a mixing distribution for the scale perturbationψis then specified. Specifically:

1. For the Student-tMRF:

i) u|σ_u², ψ,Dw∼N ormal 0, σ⁻²_u ψDw

(10)

ii) ψ|ν∼Gamma(ν/2, ν/2) (11)

In this case, the Student-t MRF withν degrees of freedom follows, which is denoted byu|Dw, σ_u², ν∼t(0, σ_u⁻²Dw, ν).

2. For the Slash MRF:

i) u|σ_u², ψ,Dw∼N ormal 0, σ⁻²_u ψDw

(12)

ii) ψ|ν∼Beta(ν/2,1) (13)

In this case, the Slash MRF, denoted byu|D_w, σ_u², ν∼Slash(0, σ⁻²_u D_w, ν), is obtained.

The model described above is useful to implement the MCMC method. It is important to mention that the distribution of both of the above random fields has the finite condition exposed in Proposition 1.

A prior distribution forνis required in order to assume a valid Bayesian model.

Usually, an exponential distribution prior is considered for this parameter, which is assumed independent of (6), that is,

ν |δ0∼exp(δ0), δ0>0 (14) Assuming (2), (3), (7), (9) and (14), the full joint posterior distribution is specified as,

π(θ,β,u, σ²_u, σ², ψ, ν|y,Dw, δ0)∝

m

Y

i=1

exp{φ⁻¹_i (yiθi−g(θi))}

×

m

Y

i=1

exp{−(1/2σ²)(h(θi)−x⁰_iβ−ui)²}h⁰(θi)

×exp{−(ψ/2σ²u)u⁰Dwu}ψ^m/2(σ²σ²u)^−m/2,

×exp{−a/2σu²}(σ²)^−(b/2+1)

exp{−c/2σ²}(σu²)^−(d/2+1)

×f(ψ|ν) exp{−δ0ν}

(15)

wherefψ(· |ν)represents the conditional density or probability function ofψ|ν.

See item 3 of the appendix for the computational aspects.

(8)

4. Proposed Bayesian Small-Area Models

An important point is to demonstrate the integrability of the proposed model.

Under the generalized linear model (2), link function (3) and prior assumption given by (7) and (14), Theorem 2 from the work developed by Ghosh et al. (1998) gives the conditions to obtain a proper posterior distribution forθ|ywhenP(ψ= 1 | ν) = 1 (the Gaussian MRF). Following that theorem, it is possible to find a generalization towards the SMN case.

The next proposition gives conditions when the spatial random effect follows an SMN RF. Its proof is given in the appendix.

Proposition 2. Consider the model (2), link function (3) and prior assumption given by (7) and (14). Consider also the assumptions of Proposition 1 and the following additional conditions:

i. θi∈(θ_i, θi), for some−∞< θ_i< θi<∞,i= 1, . . . , m;

ii. m−p+a−1>0;

iii. b >0,d >0,m+c >0 If the condition of integrability

Z θ_i θ_i

exp{φ⁻¹_i (yiθi−g(θi))}h⁰(θi)dθi<∞,

is verified for alli= 1, . . . , m, then posterior distribution π(θ|y) is proper.

The main interest is focused in establishing a non-Gaussian parametric spatial random effect. A MCMC structure seems to be adequate to make inferences from this class of model. Most full conditional distributions computed for this scheme are known distributions, therefore, a hybrid Gibbs sampling - metropolis Hastings algorithm is used to generate samples from the joint posterior distribution. The algorithm given in item 3 of the appendix presents the full conditional distributions for this particular model.

5. Applications

The proposed spatial Bayesian models will be applied assuming SMN random effects for two real data in the epidemiological framework to control for exces- sive smoothness in small areas with sparse data. One dataset is related to IDDM incidence rates in the Chilean municipal districts from Metropolitan region and the other dataset contains female lung, trachea and bronchi cancer standardized mortality ratios in the municipal districts of the country’s northern zone. The municipal district is the smallest administrative area in Chile. In this country there are only few published studies related to spatial epidemiology (Andia, Hs- ing, Andreotti & Ferreccio 2008, Ferreccio, Rollán, Harris, Serrano, Gederlini, Margozzini, Gonzalez, Aguilera, Venegas & Jara 2007, Icaza, Núñez, Torres, Díaz

& Varela 2007, Icaza, Núñez, Díaz & Varela 2006, Torres-Avilés, Icaza, Carrasco &

(9)

Pérez-Bravo 2010). Results from non-Gaussian spatial Bayesian modeling related to both diseases are presented in the next subsections.

The specific model that is considered for these two applications is the Poisson hierarchical model given by

yi |ei, λi

ind.∼ P oisson(eiλi) log(λi)|β0, ui, σ² ^ind.∼ N ormal(β0+ui, σ²)

i= 1, . . . , m, wherey= (y1, . . . , ym)⁰ represents the observed sample vector associated tomdifferent regions under study,e= (e1, . . . , em)⁰ represents the population at risk or the expected population associated to themdifferent regions, and u= (u1, . . . , um)⁰ is the vector of random effects which is assumed to have a SMN distribution constrained to sum zero. Diffuse prior distributions are considered for the location and scale parameters, as those presented in (7). For the variance parameters,σ²andσ_u², the hyperparametersa=b=c=d= 0.001were assumed.

Posterior estimations are obtained from a single run of the Gibbs sampler, with a burn-in of 1000 iterations followed by 10000 further cycles. Convergence have been checked through trace and autocorrelation plots. Three common ways to measure model assessment are taken into account. The first two are oriented to penalize the observed deviance: The deviance information criterion (DIC) (Spiegelhalter, Best, Carlin & Van der Linde 2002) and a modified BIC (Congdon 2003) will be used. A third model choice criterion is applied, proposed by Gelfand

& Ghosh (1998), which is based on a predictive check of the model, and measures the discrepancy between the observed data and predicted observations, taking into account quadratic loss measures. As was described in the introduction, the com- peting models are related to Gaussian, Student-t and Slash MRF. The percentage of spatial variability is computed using expression (8).

5.1. Insulin Dependent Diabetes Mellitus Incidence, Metropolitan Region, Chile

The objective of this study is to describe spatial patterns of type 1 diabetes in children under 15 of age, diagnosed between 2000 and 2005 with residence in the Metropolitan Region of Chile. The Metropolitan Region is located in the centre of Chile. According to the Chilean National Institute of Statistics (INE), this region represents an area of approximately 15,403 km². Total population living at Metropolitan Region was 6,061,185 inhabitants, according to the 2002 census. Metropolitan population represents 40%of the whole country. The region is divided into 52 districts, 18 are considered as rural and 34 as highly urbanized, known as Greater Santiago, in the centre of the region, with the 96.9% of the metropolitan population. With respect to the population at risk, children under 15 years of age represent the 24.9%of the metropolitan region population, which is composed by 1,509,218 children. A population-based registry of type 1 diabetes in children under than 15 years of age has been available in the Metropolitan Region since 2000. See Carrasco, Pérez-Bravo, Dorman, Mondragón & Santos (2006)

(10)

for details about the registry. Torres-Avilés et al. (2010) show an aggregation on incidence rates in urban areas of the Chilean Metropolitan Region, using the Bayesian methodology proposed by Mollié (2000).

Table 1: IDDM model selection criteria, DIC, BIC and predictive check.

Model DIC Dbar pD BIC Predictive (G & G)

Gaussian 846.778 534.886 311.892 1151.067 13240.408 Student-t 852.687 537.371 315.315 1160.315 13335.069 Slash 836.498 529.097 307.401 1136.405 13301.901

Model selection criteria results are presented in Table 1. According to previously mentioned goodness of fit criteria, small values imply better adjustment.

Therefore, a spatial model that includes Slash random effects with 7 d.f. is a strong candidate to model geographic dependence. This result seems to be adequate due to those extreme values, which match with the higher socioeconomic areas of the region, as is explained in next paragraphs. The predictive measureG&Gdisagrees with the other methods; this can be interpreted as a “failure of the model for prediction”, pointing out a better performance of the usual Gaussian MRF.

Table 2: Posterior mean, standard deviation and 95% HPD credibility intervals for unknown parameters when a Gaussian MRF, Student-t MRF and Slash MRF are assumed.

Gaussian MRF Student-t MRF Slash MRF β₀ −9.721 (0.004) −9.760 (0.006) −9.752 (0.002)

(−9.844,−9.634) (−9.876,−9.631) (−9.841,−9.656) σ² 0.346 (0.013) 0.291 (0.016) 0.275 (0.014)

(0.162,0.574) (0.089,0.537) (0.090,0.507) σ_u² 0.230 (0.016) 0.071 (0.001) 0.067 (0.001) (0.102,0.547) (0.035,0.117) (0.032,0.112)

% Spatial Variability

0.441 (0.011) 0.537 (0.0114) 0.546 (0.012) (0.242,0.649) (0.332,0.749) (0.338,0.749)

ν - 10.475 (16.482) 7.346 (6.226)

- (3.958,18.277) (3.038,12.389) Robust Bayesian models proposed in the previous section were applied to this problem. Inferences over unknown parameters are displayed in Table 2, when Gaussian MRF, Student-t MRF and Slash MRF are assumed to control spatial variability. Similar values are estimated for β0 and σ², under the three MRF models, showing the models’ robustness. In contrast,σ²_upresents different values, depending on the distribution assumed for the MRF. The non-Gaussian model (Slash MRF) increases the degree of spatial aggregation from 44.1% to 54.6 %, that is, the excess of spatial variability presented in these data seems mostly due

(11)

to a clustering effect. Notice that the estimated degrees of freedom are small, which implies that the excess of variability is better captured by one of the SMN RF model.

Figure 1: IDDM incidence rate (IR) variability: Raw estimates, Mollié’s convolution model (Gaussian MRF), Student-t convolution model (Student-t MRF) and Slash convolution model (Slash MRF).

Figure 1 shows that fully Bayesian estimates of IDDM incidence rates present less variation than raw incidence rate. The three Bayesian variation plots seem to have a similar behavior, due to the presence of several municipal districts with high incidence rates, which are considered as outliers. Comparing the four box-plots, the three fitted models (Gaussian, Student-t and Slash) present and additional municipal district, named Las Condes, as part of the higher incidence group. The normal MRF assumption leads to estimate smoother rates; however, Student-t, and Slash MRF’s present slight variability differences. Those differences allow controlling the excess of smoothness, i.e., non-Gaussian shrinkage gives a more adequate estimate of the pattern of underlying risk of disease than that provided by the Mollié’s convolution estimates.

From Figure 2, high incidence estimates remain in municipal districts with high socioeconomic level, such as Vitacura and Providencia, located at the northeast side of the map. These results were already found by Torres-Avilés et al. (2010).

Slight differences are seen when Slash MRF (d) and Student-t MRF (c) models are assumed, but these differences are clinically important since are in rural municipal districts with zero cases of diabetes located at southwest side of the map.

(12)

Figure 2: IDDM incidence rate by district: a) Raw incidence rates. b) Mollié’s convolution model (Gaussian MRF). c) Student-t convolution model (Student-t MRF). d) Slash convolution model (Slash MRF).

5.2. Female Trachea, Bronchi and Lung Cancer Mortality, Chilean Northern Regions

Bayesian methods that have been applied to several real problems to estimate relative risks of cancer mortality in small-areas can be found in the literature, e.g., Ghosh et al. (1998) and Pascutto et al. (2000), and Mollié (2000). In particular, this application is related to estimate female lung, bronchi and trachea cancer mortality relative risks in the northern regions of Chile. The northern region of Chile represents an area of approximately 300,904 km². According to the 2002 census there were 819,177 women inhabitants in this part of the country. The region is divided into 43 districts, many of them (20 or 47%) with less than 10,000 inhabitants. The aim of this study is to describe the geographical distribution of this class of mortality, which has presented smoothness problems in comparison with the usual model.

Mortality statistics for the years 1997-2004 published by the Chilean Ministry of Health were used. The SMR was calculated for 341 districts in the country.

Results show an excess of mortality caused by trachea, bronchi and lung cancer in the region. A previous work can be found, where the analysis for both sexes was done for the whole country and published by Icaza et al. (2007). The problem arised when Mollié’s model estimates for women cancer mortality risks were too smooth and high in municipal districts where zero cases occurred.

(13)

Table 3: Cancer mortality model selection criteria, DIC, BIC and predictive check.

Model DIC Dbar pD BIC Predictive (G & G)

Gaussian 4821.381 3064.272 1757.108 8187.896 381675.00 Student-t 4805.212 3058.344 1746.869 8152.110 381671.59 Slash 4792.151 3052.174 1739.977 8125.845 381950.00 Table 4: Posterior mean, standard deviation and 95% HPD credibility intervals for

unknown parameters when a Gaussian MRF, Student-t MRF and Slash MRF are assumed.

Gaussian MRF Student-t MRF Slash MRF β₀ −0.348 (0.001) −0.372 (0.001) −0.391 (0.001)

(−0.409,−0.300) (−0.441,−0.313) (−0.425,−0.331) σ² 0.092 (0.0003) 0.087 (0.0004) 0.085 (0.0003)

(0.060,0.129) (0.054,0.128) (0.055,0.128) σ_u² 0.197 (0.001) 0.203 (0.001) 0.203 (0.001) (0.153,0.238) (0.150,0.253) (0.153,0.244)

% Spatial Variability

0.770 (0.001) 0.788 (0.001) 0.788 (0.002) (0.708,0.841) (0.740,0.848) (0.715,0.863)

ν - 26.406 (116.944) 32.049 (87.516)

- (15.742,53.499) (15.585,50.462)

For this application, Table 3 shows a better fit for the model that includes the Slash spatial random effect with approximately 32 degrees of freedom, as can be seen in Table 4. Once again, the Slash can not be considered as a predictive alternative, in contrast to a parsimonious model such as the Student-t or the Gaussian MRF. One important result is referred to the 79%estimated proportion of spatial variability associated to this model. Notice that this proportion is almost the same for the three proposed models. This could be related to the estimated degrees of freedom. One important issue is related to the estimation for the other parameters, such asβ₀or baseline risk, which is not affected by the model.

Standardized mortality ratios and Risk estimations are compared in Figure 3. It is important to add that variability estimation is reduced when any of the Bayesian models is considered. All of them show an improvement in contrast to the SMR, and a district called Mejillones is separated from the rest of the distribution, showing the highest risk in the north for this mortality.

Figure 4 displays the cancer mortality relative risk estimation using three different models, with Mollié’s convolution model (b), Student-t MRF (c) and Slash MRF (d) as spatial random effects. Models were tested and the best fit was selected among the three different proposed spatial structures.

(14)

Figure 3: Female trachea, bronchi and lung cancer SMR variability: Standardized mortality ratio, Mollié’s convolution model (Gaussian MRF), Student-t convolution model (Student-t MRF) and Slash convolution model (Slash MRF).

Figure 4: Female trachea, bronchi and lung cancer SMR by district: a) Standardized mortality ratio (SMR). b) Mollié’s convolution model. c) Student-t convolution model (Student-t MRF). d) Slash convolution model (Slash MRF).

According to the DIC and BIC criteria, the selected Slash MRF model presented better fitted rates, even when Figure 4(d) shows that the first and darkest area in the extreme north, the most populated municipal district (Arica) in that

(15)

region, presents the highest rates compared to its closer neighbors. It was not possible to reduce the effect produced by the larger areas in the next darkest zones, which correspond to Tarapacá and Antofagasta regions, which are located in the Atacama Desert. The over-smoothing effect lead to flat true variations in risk, even by the selected model.

6. Concluding Remarks

In this work, a non-Gaussian Bayesian-small area estimation is proposed as an alternative to usual parametric models. This approach is particularly useful to obtain estimations of rates or relative risks when subjective geographical dependence is assumed and related results are too smooth for the region under study.

Conditions are required to ensure the propriety of these intrinsic spatial random effect posterior distributions, which must be associated to sum zero constraint and existence of mixing random variable expectations. When spatial correlation structure was available, Proposition 2 provided sufficient conditions to guarantee posterior distribution integrability for Bayesian GLM.

The general methodology is applicable to situations where small area parameters must be estimated. Variability parameters are of interest, since their in- corporation in the proposed hierarchical models allowed the computation of the marginal spatial proportion of variability, through the empirical marginal standard deviation function, to quantify excess of variability explained by the spatial effect.

This fact has direct relation with the spatial random effect contribution considered for the analysis. As mentioned in Banerjee et al. (2004, p. 166), differences may exist in percentage of variability estimation, when other prior distributions are considered. A prior sensitivity analysis is not studied in this work.

Considering the complex structure of Chilean geography, better results were obtained using our proposed strategy. Both applications were best modeled by Poisson regression with spatial random effects following a joint Slash distribution.

It can be seen thatβ0 does not produce changes when the three models are fitted to both applications. That is an important consideration that shows the non- Gaussian properties of the Student-t MRF and Slash MRF.

In the future, several topics can be explored in the spatial context. Diagnostic approaches and extensions of model assumptions which include asymmetry in the distribution of the random effects are related topics to be developed. Simulation studies to validate proposed models under different scenarios can also be made.

Bayesian space time models can be proposed, with the subsequent problem of sparseness of data that could affect estimation in municipal districts with low population. Therefore, non-Gaussian models will become more necessary. Tem- poral trends and geographical patterns are estimated simultaneously, allowing for additional random effects to represent temporal and spatio-temporal interaction variations.

(16)

Acknowledgements

The authors acknowledge the helpful and constructive comments of the two anonymous referees, which significantly improved the quality of this paper. Torres- Avilés’s research was partially supported by DICYT 040975TA and FONDECYT de Iniciación 11110119 grants. Arellano-Valle’s research was partially supported by FONDECYT 1085241-Chile grant. We are very grateful to Ms. Elena Carrasco, Dr. Francisco Pérez-Bravo, the Fundación de Diabetes Juvenil and the Depart- ment of Statistics of theMinisterio de Salud de Chilefor providing the data used in this article. We acknowledge the helpful initial discussions with Dr. Pilar Iglesias Z.

Recibido: septiembre de 2011 — Aceptado: febrero de 2012

References

Andia, M., Hsing, A. W., Andreotti, G. & Ferreccio, C. (2008), ‘Geographic variation of gallbladder cancer mortality and risk factors in Chile: A population- based ecologic study’,International Journal of Cancer123(6), 1411–1416.

Andrews, D. F. & Mallows, C. L. (1974), ‘Scale mixture of normal distributions’, Journal of the Royal Statistical Society Series B 36(1), 99–102.

Assunção, R. M., Potter, J. E. & Cavenaghi, S. M. (2002), ‘A Bayesian space varying parameter model applied to estimating fertility schedules’,Statistics in Medicine21, 2057–2075.

Banerjee, S., Carlin, B. & Gelfand, A. (2004),Hierarchical Modeling and Analy- sis for Spatial Data, Monographs on Statistics and Applied Probability 101.

Chapman and Hall, Boca Ratón, Florida.

Besag, J. (1974), ‘Spatial interaction and the statistical analysis of lattice systems’, Journal of the Royal Statistical Society Series B 36(2), 192–236.

Besag, J. (1986), ‘On the statistical analysis of dirty pictures’,Journal of the Royal Statistical Society Series B 48(3), 259–302.

Besag, J., York, J. & Mollié, A. (1991), ‘Bayesian image restoration, with two applications in spatial statistics’, Annals of the Institute of Statistical Math- ematics 43, 1–59.

Best, N., Arnold, R., Thomas, A., Waller, L. & Collon, E. (1999), Bayesian models for spatially correlated disease and exposure data, inJ. Bernardo, A. Smith, A. Dawid & J. Berger, eds, ‘Bayesian Statistics 6’, Oxford University Press, Oxford, pp. 131–156.

Breslow, N. & Clayton, D. (1993), ‘Approximate inference in generalized linear mixed models’,Journal of the American Statistical Association88, 9–25.

(17)

Carrasco, E., Pérez-Bravo, F., Dorman, J., Mondragón, A. & Santos, J. L. (2006),

‘Increasing incidence of type 1 diabetes in population from Santiago of Chile:

Trends in a period of 18 years (1986-2003)’, Diabetes/Metabolism Research and Reviews 22, 34–37.

Clayton, D. & Kaldor, J. (1987), ‘Empirical Bayes estimates of age-standardized relative risks for use in disease mapping’,Biometrics43, 671–681.

Congdon, P. (2003),Applied Bayesian Modelling, Wiley & Sons, Chichester.

Damien, P. & Walker, S. (2001), ‘Sampling truncated normal, beta and gamma densities’,Journal of Computational and Graphical Statistics10(2), 206–215.

Fang, K. T., Kotz, S. & Ng, K. W. (1990), Symmetric Multivariate and Related Distributions, Chapman and Hall, New York.

Ferreccio, C., Rollán, A., Harris, P., Serrano, C., Gederlini, A., Margozzini, P., Gonzalez, C., Aguilera, X., Venegas, A. & Jara, A. (2007), ‘Gastric cancer is related to early Helicobacter pylori infection in a high prevalence country’, Cancer Epidemiology, Biomarkers & Prevention16, 662–667.

Gelfand, A. E. & Ghosh, S. K. (1998), ‘Model choice: A minimum posterior predictive loss approach’, Biometrika85, 1–11.

Geweke, J. (1993), ‘Bayesian treatment of the independent Student-t linear model’, Journal of Applied Econometrics 8, 519–540.

Ghosh, M., Natarajan, K., Stroud, T. W. F. & Carlin, B. P. (1998), ‘Generalized linear models for small-area estimation’, Journal of the American Statistical Association93(441), 273–282.

Icaza, G., Núñez, L., Díaz, N. & Varela, D. (2006), Atlas de mortalidad por enfermedades cardiovasculares en Chile, 1997- 2003, Universidad de Talca y Ministerio de Salud, New York.

Icaza, G., Núñez, L., Torres, F., Díaz, N. & Varela, D. (2007), ‘Distribución geo- gráfica de mortalidad por tumores malignos de tráquea, bronquios y pulmón, Chile 1997-2004’,Revista Médica de Chile135(11), 1397–1405.

Kano, Y. (1994), ‘Consistency property of elliptical probability density functions’, Journal of Multivariate Analysis51, 139–147.

Lange, K. & Sinsheimer, J. S. (1993), ‘Normal/independent distributions and their applications in robust regression’, Journal of Computational and Graphical Statistics2(2), 175–198.

Lyu, S. & Simoncelli, E. P. (2007), Statistical modeling of images with fields of Gaussian scale mixtures, in B. Schölkopf, J. Platt & T. Hoffman, eds, ‘Ad- vances in Neural Information Processing Systems, 19’, MIT Press, Cambridge, pp. 945–952.

(18)

Mollié, A. (2000), Bayesian mapping of Hodgkin’s disease in France,inP. Elliott, J. Wakefield, N. G. Best & D. J. Briggs, eds, ‘Spatial Epidemiology: Methods and Applications’, Oxford University Press, New York, pp. 267–285.

Parent, O. & Lesage, J. P. (2008), ‘Using the variance structure of the conditional autoregressive specification to model knowledge spillovers’,Journal of Applied Economics23, 235–256.

Pascutto, C., Wakefield, J. C., Best, N. G., Richardson, S., Bernardinelli, L., Staines, A. & Elliott, P. (2000), ‘Statistical issues in the analysis of disease mapping data’,Statistics in Medicine19(17-18), 2493–519.

Roislien, J. & Omre, O. (2006), ‘T-distributed random fields: A parametric model for heavy-tailed well-log data’,Mathematical Geology38(7), 821–849.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P. & Van der Linde, A. (2002),

‘Bayesian measures of model complexity and fit’,Journal of the Royal Statis- tical Society, Series B 64, 583–639.

Torres-Avilés, F., Icaza, G., Carrasco, E. & Pérez-Bravo, F. (2010), ‘Clustering of cases of type 1 diabetes in high socioeconomic communes in Santiago de Chile:

Spatio-temporal and geographical analysis.’, Acta Diabetologica47(3), 251–

257.

Zellner, A. (1976), ‘Bayesian and non-Bayesian analysis of the regression model with multivariate student-t error terms’,Journal of the American Statistical Association71(354), 400–405.

Appendix

1. Proof of Proposition 1. As was showed by Assunção et al. (2002), the Pm

i=1ui = 0 constraint makes the Gaussian kernel (4) proper; i.e., on the setC={u∈R^m:Pm

i=1ui= 0}, we have

Z

C

1 (σ_u²)^m/2exp

− 1

2σ²_uu⁰D_wu

du<∞

Hence, under thePm

i=1ui= 0 constraint, by applying the Fubini’s theorem and the change variabley=ψ^1/2x, we have in (9) that

Z

C

π(u|σ_u²,Dw, ν)du= Z ∞

0

ψ^m/2 Z

C

1 (σ_u²)^m/2exp

− ψ

2σ_u²u⁰Dwu

dudF(ψν)

∝ Z ∞

0

ψ^1/2dF(ψ|ν)<∞ >

(19)

2. Proof of Proposition 2. From (2), (3), (7), (9) and (14) we have for the full joint posterior distribution that

π(θ,β,u, σ²u, σ², ψ, ν|y,Dw, δ0)∝

m

Y

i=1

exp{φ⁻¹_i (yiθi−g(θi))}

×

m

Y

i=1

exp{−(1/2σ²)(h(θi)−x⁰_iβ−ui)²}h⁰(θi)

×exp{−(ψ/2σ²u)u⁰Dwu}ψ^m/2(σ²σ²_u)^−m/2,

×exp{−a/2σu²}(σ²)^−(b/2+1) exp{−c/2σ²}(σu²)^−(d/2+1)

wheref(· |ν)is the conditional density (or probability) function ofψ |ν.

Integrating with respect toβ,σ² andσ²_u, we obtain π(θ,u, ψ, ν|y,Dw, δ0)∝

m

Y

i=1

exp{φ⁻¹_i (yiθi−g(θi))}h⁰(θi)

×ψ^m/2(a+ψu⁰Dwu)^{−(m+b−1)/2}

Notice that this last result has a multivariate Student-t kernel. Now, integrating overu∈R^m under the constraintPm

i=1ui= 0, the following result is obtained,

π(θ, ψ, ν|y,Dw, δ0)≤K

m

Y

i=1

exp{φ⁻¹_i (yiθi−g(θi))}h⁰(θi)

×f(ψ|ν) exp{−δ₀ν}

where K is a constant that does not depend onθ or any of the parameters previously integrated. Finally, integration over ψ and then over ν leads to the desire result.>

3. Proposed MCMC Algorithm. To implement the Gibbs sampling, the full conditional distributions associated with the full joint posterior distribution (15) are given in the following, in whichh(θ) = (h(θ1), . . . , h(θm))⁰ denotes the link vector andXis them×pdesign matrix which has rowsx1, . . . ,xm.

a) β|X, σ²,u∼N ormal( ˆβ, σ²(X⁰X)⁻¹), where βˆ = (X⁰X)⁻¹X⁰(h(θ)−u)

(20)

b) u|θ,β, σ², σ_u², ψ,X,D_w∼N ormal(µ_u,V_u), where µ_u= 1

σ²V_u(h(θ)−Xβ),V_u= 1

σ²I_m+ ψ σ_u²D_w

−1

andI_mis the identity matrix of sizem c) σ⁻²|θ,β,X,u, c, d∼Gamma(a^∗, b^∗), where

a^∗= 1

2[m+a] andb^∗= 1

2[(h(θ)−X⁰β−u)⁰(h(θ)−X⁰β−u) +b]

d) σ⁻²_u |u, ψ,Dw, c, d∼Gamma(c^∗, d^∗)where, c^∗=m+c

2 andd^∗=1

2(ψ(u⁰Dwu) +d) e) Choice of a distribution for the scale random factorψ:

i. Ifψ|ν∼Gamma(ν/2, ν/2), then ψ|u, σ²_u,D_w, ν∼Gamma

1

2(ν+m), 1

2σ_u²(u⁰D_wu) +ν

ii. Ifψ|ν∼Beta(ν/2,1), then

ψ|u, σ²_u,D_w, ν∼Gamma 1

2(ν+m), 1

2σ_u²(u⁰D_wu)

1_(0,1)(ψ) where 1_A represents the indicator function. Notice the presence of a truncated Gamma distribution in the[0,1] interval. To draw from this distribution, the Damien & Walker (2001) algorithm can be performed.

f) Degrees of freedom are estimated from fa. Ifψ|ν∼Gamma(ν/2, ν/2), then

π(ν |ψ, δ₀)∝Γ(ν/2)⁻¹ν^ν/2exp{−ν(δ₀+ 0.5(ψ−ln(ψ)))}

fb. If ψ|ν∼Beta(ν/2,1), then

ν |ψ, δ0∼Gamma(2, δ0−ln(ψ/2))1_(0,1)(ψ)

g) π(θi | y,β,X, σ²,u) ∝ h⁰(θi) exp{φ⁻¹_i (yiθi +g(θi)−¹₂(h(θi)−x⁰_iβ− ui)²)}

The algorithm must be iterated until convergence is detected in order to start to take a sample.