東北大学機関リポジトリTOUR

(1)

Spatial Extension of Mixed Analysis of

Variance Models

著者

Sato Takaki, Matsuda Yasumasa

journal or

publication title

DSSR Discussion Papers

number

120 page range

1-31

year

2021-02

URL

http://hdl.handle.net/10097/00130485

(2)

Data Science and Service Research

Discussion Paper

Discussion Paper No. 120

Spatial Extension of Mixed Analysis of

Variance Models

Takaki Sato and Yasumasa Matsuda

February, 2021

Center for Data Science and Service Research

Graduate School of Economic and Management

Tohoku University

27-1 Kawauchi, Aobaku

Sendai 980-8576, JAPAN

(3)

Spatial Extension of Mixed Analysis of Variance Models

Takaki Sato

∗1

and Yasumasa Matsuda

2

1

_{Advanced institute for Yotta informatics, Tohoku University, Sendai, Japan}

2

_{Graduate School of Economics and Management, Tohoku University, Sendai, Japan}

Abstract

This paper proposes a spatial extension of mixed analysis of variance models for spatial multilevel data in which individual belongs to one of spatial regions, which are called spatial error models for multilevel data (SEMM). We have introduced empirical bayes estimation methods in two steps because SEMM models which are defined by two level equations, individual and regional levels, can be regarded as a Bayesian hierarchal model. The first step estimator based on quasi-maximum likelihood estimation methods specifies the hyper parameters and has been justified in asymptotic situations, and posterior distributions for the parameters are evaluated with the hyperparameters estimated in the first step. The proposed models are applied to happiness survey data in Japan to demonstrate empirical properties.

Keywords: Spatial econometrics, Multilevel data, MANOVA model, Empirical Bayes method,

Quasi-maximum likelihood estimation.

1 Introduction

This paper aims to extend mixed analysis of variance (MANOVA) models for multilevel data (see Demidenko

(2013)) to those for spatial multilevel data, which we call spatial error models for multilevel data (SEMM).

Multilevel data is a kind of cluster data that is observations belong to some kinds of nested clusters (e.g.

students are members of one of schools in school eﬀectiveness research) and widely used in both of social

and natural science (see De Leeuw et al. (2008) and Hox et al. (2017)). Spatial multilevel data in which

clusters are organized with spatial regions are also used in many ﬁelds to capture spatial correlation between ∗_{Corresponding author. E-mail address: [email protected]}

(4)

regions. With spatial multilevel data, Fazio & Piacentino (2010) investigates spatial variability of Small and

Medium Enterprises productivity across the Italian territory, and Pierewan & Tampubolon (2014) examines

how spatial clusters explain variations in individual well-begin across regions in Europe.

MANOVA models which are linear regression models incorporating several kinds of random eﬀect terms

corresponding to the type of clusters have been used to analyze multilevel data. By including random eﬀect

terms, we can evaluate common feature within same clusters as grouping structures. Hartley and Rao (1967)

and Miller (1977) discuss asymptotic properties of maximum likelihood estimator for MANOVA models.

To evaluate spatial correlations between random eﬀects in multilevel data, we provide SEMM models as a

spatial extension of MANOVA models in this paper. The conventional way to estimate spatial correlation is to

include spatial lag terms into regression models (see Anselin (1988) and Arbia (2014)), and thus we combine

spatial lag terms with random eﬀect terms in MANOVA models to propose new spatial econometrics models.

Because SEMM models are deﬁned by two level equations, individual level and regional level equations,

the models can be regarded as hierarchical bayesian models whose parameters and hyperparameters can be

estimated by empirical bayesian estimation methods in two steps. In the ﬁrst step, hyperparameters are

estimated with quasi-maximum likelihood (QML) estimation methods, which makes it possible to apply a

method developed in spatial econometrics studies (see Lee (2004), Liu and Yang (2015), Su and Yang (2015),

and Yang (2018)) Posterior distributions for parameters are derived with hyperparemters estimated in the

ﬁrst step.

The interesting feature of SEMM models are summarized as follows. Firstly, SEMM models can analyze

regional specific effects for dependent variables, considering the effect of individual characteristics. Here, let

us note that regional effects are not the same as random effects. Regional effects of a region are defined by the

eﬀect of observed regional characteristics and the sum of random eﬀects for clusters, each of which corresponds

to the group in clusters which the region belongs to. Spatial econometrics models that have been considered

ever can’t take into account of the eﬀect of individual characteristics in estimating regional eﬀect in multilevel

data because we need to summarize the data on regions where more than one individuals are observed to

apply the models, and then individual characteristics are lost. Deﬁning the SEMM model in two level

equations would allow for both individual characteristics and regional eﬀects in analysis. Secondly, spatial

correlations between random eﬀects can be estimated. Some sources of random eﬀects such as cultures or

customs speciﬁc to a region may tend to be similar to them in nearby regions. Then, random eﬀects may have

spatial correlation, namely, regional eﬀects in nearby areas may take on similar values. Therefore, taking into

(5)

we can estimate regional eﬀects in areas where there are no observed individuals by using the information of

the region where observed individuals exist. Because regional eﬀects are estimated by Bayesian estimation

method, we can evaluate regional eﬀects for all regions, regardless of whether the individuals belong to them

or not by properly estimating the prior information of regional eﬀects.

Applications of SEMM models to happiness survey data in Japan demonstrate several interesting features

of the eﬀect of individual characteristics and regional speciﬁc characteristics on happiness. Firstly, individual

characteristics are important factor for happiness studies. People’s happiness is U-shaped with respect to

age, namely, happiness decreases until middle age and then increases. Moreover, female is basically happier

than male. Happiness increases monotonically as household income and personal income increase and getting

married greatly increases people’s happiness. Secondly, random eﬀects for each city have spatial correlation.

The similarity of culture or customs of nearby areas which greatly aﬀects the way people feel about their

happiness may cause the spatial correlation. Finally, spatial cluster exists in the regional eﬀects which can be

regarded as average happiness of each regions. The level of happiness in the southern and middle regions of

Japan is higher than that in the eastern region, and the estimated happiness of eastern coastal regions is the

lowest. The reason for this is thought to be that the eﬀects of the nuclear accident caused by the East-Japan

earthquake which occurred in 2011 are still lingering.

This paper is organized as follow. In Section 2, we deﬁne SEMM models as a spatial extension of MANOVA

models. A two step estimation method to estimate the parameters in SEMM models and asymptotic

prop-erties of the ﬁrst step estimator are discussed in Section 3. We apply the SEMM model to happiness survey

data in Japan to demonstrate empirical properties of the proposed models in Section 4. Section 5 concludes

the paper. All the proofs in Section 3 are discussed in Appendix.

2 Model speciﬁcation

Let n and L be the number of individual and regions, respectively. We assume that each individuals belong

to one of regions and admit that there are regions where no individuals are observed. In this paper, we call

the nested dataset whose grouping is based on spatial units as spatial multilevel data. Moreover, we assume

that the regions can be grouped into larger regional units by p diﬀerent groupings, and let m_l, l = 1, . . . , p, be

the number of larger regions obtained by the l-th grouping. For examples, several cities are grouped together

to form a prefecture in Japan.

(6)

j-th region and Y is the n× 1vector of y_i,js. Spatial error models for multilevel data (SEMM) is given by,

Y = X1β1+ J d + ε, (1)

d = X2β2+ u, (2)

u = U1(I1− ρ1W1)−1f1+· · · Up(Ip− ρpWp)−1fp, (3)

where X1 is an n× k1 matrix for individual level explanatory variables, J is an n× L matrix for regional

dummy variables, X2 is an L× k2 matrix for regional level explanatory variables, Ul is an L× ml matrix

for a random eﬀect which consists only of zeros and ones and there is exactly one 1 in each row and at

least one 1 in each column, l = 1, . . . , p, I_l is an m_l× m_l identity matrix, and W_l is an m_l× m_l spatial weight matrix which describes spatial relationships among the l-th grouped regions. A random variables

ε_i, i = 1, . . . , n, is independent and identically distributed (i.i.d.) with mean 0 and variance σ2₀and an n× 1 vector ε = (ε1, . . . , εn), and fl,j, j = 1, . . . , mlis also i.i.d. with mean 0 and variance σ2l and an ml×1 vector

f_l = (f_l1, . . . , f_lm_l) is a random eﬀect for the l-the groped regions. The vector β1 and β2 are regression

coeﬃcients for individual and regional level explanatory variables, respectively, and ρ_lis a spatial correlation parameter which describe the strength of spatial dependence between regions in the l-th grouping.

SEMM models is a spatial extension of mixed analysis of variance (MANOVA) models because SEMM

models reduce to MANOVA models when spatial correlation parameters, ρ_ls are equal to 0. In the analysis of spatial multilevel data, consideration of spatial correlation between random eﬀects can improve the accuracy

of the model fitting. Random effects f_l,j express the effect of the j-th region in the l-th larger regional units on regional effect, d, and some sources of random effects may be cultures or customs in the larger region.

Then, because the cultures and customs of nearby regions tend to be similar, random eﬀects f_l,j may have spatial correlation, namely, regional eﬀects in nearby areas may take on similar values. Therefore, taking

into account the spatial correlation between regional eﬀects allows for more detailed analysis of the spatial

multilevel data.

SEMM models are deﬁned by two level equations, individual level equations (1) and region level equations

(2) and (3), and this two level modeling has some advantages. One advantage is that we can analyze regional

eﬀect, d, considering the eﬀect of individual characteristics. Spatial econometrics models that have been

considered can’t take into account of individual characteristics when we estimate regional eﬀect in multilevel

data because they assume that exactly one observation is observed each regions. Thus, we need to summarize

(7)

one region. A commonly used method to summarize data is to take the average of the data in the area, but

then individual characteristics are lost then. Deﬁning the SEMM model in two level equations would allow

for both individual and regional eﬀects, which would allow for more accurate estimation of regional eﬀects.

Another advantage of modeling multilevel models with the hierarchical structure is that we can estimate

regional eﬀects, d, in areas where there are no observed individuals by using the information of the region

where observed individuals exist. Let us remember that J is a regional dummy matrix which may have

columns whose elements are all zeros because we admit the existence of regions where no individuals are

observed. Thus, usual ordinary least squares does not work because J is rank deﬁcient. The proposed

model can be regarded as a Bayesian hierarchical model and equation (2) and (3) describe prior information

of regional eﬀects. As discussed in the estimation section, by properly estimating the prior information of

regional eﬀects with marginal likelihood which is based on the information of regions where individuals are

observed, we can evaluate regional eﬀects for all regions, regardless of whether the individuals belong to them

or not.

3 Estimation

Let us consider a method to estimate the parameters for SEMM models and discuss asymptotic properties

of proposed estimators the size of m_l, l = 1, . . . , p, tends to be inﬁnity along with the sample size n. Because

the proposed model can be regarded as a Bayesian hierarchical model, we propose empirical Bayes

estima-tion procedure in two steps. The ﬁrst step is the estimaestima-tion of the hyperparameters in prior distribuestima-tions

with quasi-maximum likelihood (QML) estimation methods, and the second step is calculation of posterior

distributions with the hyperparameters estimated in the ﬁrst step. Moreover, we introduce the asymptotic

properties of the ﬁrst step estimators when the sample size of both individuals and regions tends to be large.

3.1 Empirical Bayes Estimation

We introduce empirical Bayes estimation procedure in two steps. Let β = (β1, β2), τl = σ

2

l σ2

0, l = 1, . . . , p,

θ = (β₂, τ1, . . . , τp, ρ1, . . . , ρp), ψ = (β, σ20, . . . , σ2p, ρ1, . . . , ρp) and δ = (β1, d). In this paper, we call σ20 and

θ as hyperparameters and δ as parameters, respectively.

The ﬁrst step is the estimation of σ2₀and θ by a quasi-maximum likelihood (QML) estimation method with a marginal likelihood of Y . Let us denote that f (Y|β1, d, σ20) is a probability density function for the data

(8)

variables ε_i and f_l,j which may be not normally distributed random variables follows normal distribution, and then the marginal distribution of Y ,

m(Y|ψ) =

f (Y|β, σ2_ε, d)g(d|β2, ρ1, . . . , ρp, σ21, . . . , σp2)dd,

follows a multivariate normal distribution. Thus, the marginal log-likelihood function is given by

log L(ψ) =−n 2log(2πσ 2 0)− 1 2log|Ω(θ)| − (Y − Xβ)Ω−1(θ)(Y − Xβ) 2σ2₀ , (4) where X = (X1, J X2), Ω(θ) = In+ τ1J U1(I1− ρ1W1)−1(I1− ρ1W)−1U1J+· · · + τpJ Up(Ip− ρpWp)−1(Ip− ρ_pW_p)−1U_pJ.

We will derive a concentrated marginal log-likelihood function to reduce the number of parameters for

numerical optimization. The ﬁrst-order condition of the marginal log-likelihood function is

ˆ β(θ) = (XΩ−1(θ)X)−1XΩ−1(θ)Y, ˆ σ₀2(θ) = 1 n(Y − X ˆβ(θ)) _Ω−1_(θ)(Y _{− X ˆβ(θ)).}

By substituting ˆβ(θ) and ˆσ2₀(θ) into (4), we obtain the concentrated marginal log-likelihood function,

log L(θ) =−n 2(log(2π) + 1)− n 2log(ˆσ 2 0(θ))− 1 2log|Ω(θ)|.

Maximizing the concentrated marginal log-likelihood function gives the ML estimator ˆθ of θ, and then the

ML estimators ˆβ and ˆσ₀2are obtained by ˆβ = ˆβ∗(ˆθ) and ˆσ₀2= ˆσ₀2(ˆθ), respectively.

The second step is the Bayesian estimation of the parameters δ based on the estimated hyperparameters ˆ

β2, ˆρ, ˆσ20 and ˆσl2= ˆτlσˆ02, l = 1, . . . , p. Let δ = (β, d) and ˜X = (X, J ). The estimated posterior distribution

for δ is given by

P (δ| Y, ˜X, ˆβ2, ˆρ1, . . . , ˆρp, ˆσ02, . . . , ˆp2f, b)∝ L(Y | ˜X, δ, ˆσ20)π(δ| ˆβ2, ˆρ, ˆσ12, . . . , ˆσp2, b),

where b is a hyperparameter for a prior information for β, L(Y| ˜X, θ, ˆσ2₀) is the likelihood of the data Y , and π(θ| ˆβ2, ˆρ, ˆσ21, . . . , ˆσp2, b) is the prior distribution of the model parameters, δ. If the prior and posterior

distribution for δ is conjugate distributions, then prior distribution can be calculated explicit form, and if

(9)

Carlo (MCMC) methods.

As one example of conjugate distributions, we will show the explicit form of the posterior distribution

when the likelihood and the prior distribution are multivariate normal distributions and the number of

random eﬀect is one. Then, the estimated posterior distribution follows a multivariate normal distribution.

We set prior means and the inverse of prior variance matrices of the multivariate normal distribution for

the prior distribution as ˆs0 = (0_k×1, ˆβ2) and ˆS0−1 =

⎛ ⎜ ⎝ 0k×k 0k×m 0_m×k _σ_ˆ12 1(In− ˆρW _)(I_n_{− ˆρW )} ⎞ ⎟ ⎠, where 0n1×n2

is the n1× n2 matrix whose elements are zeros. Then, the posterior covariance matrix and mean vector

is S1 = 1 ˆ σ2 0 ˜ XX + ˆ˜ S₀−1 −1 and s1 = S1 1 ˆ σ2 0 ˜ XY + ˆS−1₀ sˆ0

, respectively. Thus, the estimated posterior

distribution is given by

P (δ| Y, ˜X, ˆβ2, ˆρ1, ˆσ02, ˆσ21, b)∼ N(s1, S1),

where N (s1, S1) means the multivariate normal distribution with mean s1and covariance matrix S1. In other

cases, the posterior distribution can be derived in the same way.

3.2 Asymptotic properties

Here, we discuss the conditions under which the QML estimators ˆθ = ( ˆβ₂, ˆτ1, . . . , ˆτp, ˆρ1, . . . , ˆρp) and ˆψ =

( ˆβ, ˆσ2₀, . . . , ˆσ_p2, ˆρ1, . . . , ˆρp) in the ﬁrst step is consistent and asymptotically normal when the size of ml, l =

1, . . . , p, tends to be inﬁnity along with the sample size n. All of the proofs and Lemmas for the asymptotic

results are given in the Appendix.

Let θ0= (δ0, τ10, . . . , τp0, ρ10, . . . , ρp0) and ψ0= (β0, σ002, θ0)be the true values for θ and ψ. Assume the

following conditions.

Assumption 1 The true parameter θ0lies in the interior of a compact parameter space Θ.

Asuumption 2 ε_i, i = 1, . . . , n and f_l,j, l = 1, . . . , p, j = 1, . . . , m_l are i.i.d with mean 0 and variances σ₀2 and σ2_j, respectively. And, E|ε_i|4+δ<∞ and E|f_l,j|4+δ<∞ for some δ > 0.

Assumption 3 The number of regions in l-th grouping, m_l, tends to inﬁnity along with the sample size n.

Assumption 4 The matrices J , U_i, W_i, (I_j− ρ_j,0W_j)−1and Ω−1(θ) is uniformly bounded in both row and column sums. Moreover, 0 < c_ω ≤ inf_θ∈Θγmin(Ω−1(θ))≤ supθ∈Θγmax(Ω−1(θ))≤ cω<∞.

(10)

Assumption 5 X has full column rank and its elements are uniformly bounded constants, lim_n→∞_n1XΩ−1(θ)X exists and is non-singular.

Assumption 6 Let A−1_i (ρ_i) = (I_i− ρ_i,0W_j)−1(I_i− ρ_i,0W_i)−1 and B_i(ρ_i) = (I_i− ρ_iW_i)−1W_i+ W_i(I_i−

ρ_iW_i)−1. We assume that sup_θ∈Θ|γmax(J UiA−1i (ρi)UiJ)| < ∞ and supθ∈Θ|γmax(J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJ)| <

∞, i = 1, . . . , p. Assumption 7 lim n→∞ 1 n log|σ₀₀2 Ω(θ0)| − log |˜σ2(θ)Ω(θ)| = 0, for any θ = θ0.

First, we introduce the consistency of ˆθ. The expected log-likelihood function for the proposed model is

given by E log L(ψ) =−n 2log(2πσ 2 0)− 1 2log|Ω(θ)| − E (Y − Xβ)Ω−1(θ)(Y − Xβ) 2σ₀2 .

The expected log-likelihood is maximized at

˜ β(θ) = β0, ˜ σ₀2(θ) = σ 2 00 n tr(Ω(θ) −1_Ω(θ 0)), = 1 nE[u 0Ω− 1 2_{(θ)M (θ)Ω}−12_(θ)u₀_{] +}1 nE[u 0Ω− 1 2_{(θ)P (θ)Ω}−12_(θ)u₀_],

where P (θ) = I − M(θ) and M(θ) = I − Ω−12(θ)X(XΩ−1(θ)X)−1XΩ−12(θ). Thus, the concentrated expected log-likelihood function is given by,

E log L(θ) =−n 2(log(2π) + 1)− n 2log(˜σ 2 0(θ))− 1 2log|Ω(θ)|.

Consistency of ˆθ is obtained by the following two facts. The ﬁrst one is the identiﬁcation uniqueness

condition: lim sup_n→∞

max_θ∈Bc_(θ₀_,ε)∩ΘE log L(θ)− E log L(θ₀)

< 0 for any ε > 0, where Bc(θ0, ε) is

the compliment of an ε-neighborhood of θ0. The second one is the uniform convergence in probability:

sup_θ∈Θ _n1log L(θ)− 1_nE log L(θ) = o_p(1).

(11)

Next, let us consider the the asymptotic distribution of the QML estimator ˆψ. To derive the asymptotic

normality, we need to consider the the Taylor expansion of _∂ψ∂ log L_n( ˆψ) at ψ0. The ﬁrst-order derivatives of

the log-likelihood function at ψ has the elements

∂ log L(ψ) ∂β = X _Σ−1_(η)(Y _{− Xβ),} ∂ log L(ψ) ∂σ2_i =− 1 2tr(Σ −1_(η)G i(ρi)) +1₂(Y − Xβ)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂ log L(θ) ∂ρ_i = σ2_i 2 tr(Σ −1_(η)H i(ρi))−σ 2 i 2 (Y − Xβ) _Σ−1_(η)H i(ρi)Σ−1(η)(Y − Xβ), where η = (σ2₀, σ2₁, . . . , σ_p2, ρ1, . . . , ρp), A−1i (ρi) = (Ii− ρiWi)−1(Ii− ρiWi)−1, Bi(ρi) = Wi(Ii− ρiWi) +

(I_i− ρ_iW_i)W_i, G_i(ρ_i) = J U_iA−1_i (ρ_i)U_iJ, and H_i(ρ_i) = J U_iA−1_i (ρ_i)B_i(ρ_i)A−1_i (ρ_i)U_iJ. By the mean value theorem, √ n( ˆψ− ψ0) =− 1 n ∂2log L( ¯ψ) ∂ψ∂ψ −1 ₁ √ n ∂ log L(ψ0) ∂ψ ,

where ¯ψ lies between ˆψ and ψ0.

The score function which is the ﬁrst-order derivatives of the log-likelihood function at ψ0, ∂ log L(ψ_∂ψ 0), are

linear and quadratic functions of u0= (Y − Xβ0). By applying the central limit theorem for linear-quadratic

forms by Kelejian and Prucha (2001) to the score functions, we have the asymptotic normality for the QMLE ˆ

ψ under proper asymptotic behavior of the Hessian matrix and the variance of the score function whose

explicit forms are given in Appendix.

Theorem 2. Under Assumptions 1-7, if there exist Σ =− lim_n→∞E

1 n∂ 2_{log L(ψ} 0) ∂ψ∂ψ and Γ = lim_n→∞_n1E ∂ log L(ψ) ∂ψ ∂ log L(ψ)∂ψ

, and−Σ is positive deﬁnite, then √

n( ˆψ− ψ0)−→ N(0, ΣD −1ΓΣ−1).

4 Empirical Application

We apply the proposed model to happiness survey data in Japan to analyze the eﬀect of individual

charac-teristics on happiness and spatial correlation of regional eﬀect which is random eﬀects in each region. In this

analysis, we use

(12)

Macromill Co, LTD, which is a market research company in Japan, to conduct a survey of 26, 984 people

living in 1534 cities. Here, respondents were selected so that the distribution of age, population, and area of

residence would be the same as that of the Japanese census. The demographic information of the respondents

contains gender, age, personal and family incomes, marital status, and presence of children.

Individual happiness for dependent variables, Y , was collected as a response to the question: Currently,

how happy do you feel? Score the degree of your happiness between 1 (very unhappy) and 10 (very happy).

Thus, the happiness is measured discrete values between 1 and 10.

We use dummy variables created from the demographic information as explanatory variables, X. Age

and gender are divided into 22 categories, namely, all the respondents were separated into the two groups of

female and male, each of which is categorized as the 11 mutually disjoint subgroups corresponding with (1)

age < 20, (2) < 25, (3) < 30,. . . ,(10) < 65 and (11) ≥ 65. As the result, we obtain the 22 disjoint groups in total and deﬁne the group of female with age younger than 20 as the base group. Personal income is

categorized as the 7 mutually disjoint groups of income, i.e. (1) < 2 million yen, (2) < 4 million, (3) < 6

million, (4) < 8 million, (5) < 10 million, (6) < 12 million and (7)≥ 12 million yen, with the group less than 2 million yen as the base, while family income is categorized as the 10 groups of income, i.e. (1) < 2 million

yen, (2) < 4 million, (3) < 6 million, (4) < 8 million, (5) < 10 million, (6) < 12 million, (7) < 15 million, (8)

< 20 million, (9)≥ 20 million yen and (10) the group of no-response, with the group less than 2 million yen

as the base. We have personal and family income as the categorical variables with 7 and 10 subgroups as

the result, respectively. Presence of child is summarized as the dummy variable of taking 1 if a respondent

have more than one child and 0 otherwise. Martial status is recorded as the category variable with the three

groups of (1) single, (2) married and (3) divorced or widowed, with the single group taken as the base.

Regional dummy variables, J , is the 26, 984× 1845 matrix. Here, we note that the rank of J is less than 1845, at 1534 which is the number of cities which at least one respondent belong to. Let us remember that

our proposed models can also estimate regional eﬀects, d, in areas where there are no observed individuals

by using the information of the surrounding regions where observed individuals exist. Thus, the matrix, J ,

contains the columns whose elements are zeros which correspond to the areas where there are no respondents.

We use two spatial weight matrix(SWM), one at the prefecture level and the other are the city level. The

prefecture level SWM, W1, is a 46× 46 matrix based on the adjacencies of each prefecture in Japan, namely,

if the prefecture i and j share a border, then the i, j and j, i elements of W1is 1, and otherwise 0. The city

level SWM, W2, is a 1845× 1845 matrix created in two-steps. Firstly, If the distance between city i and j

(13)

Figure 1: a plot of estimated regional eﬀects for individual happiness for each cities with spatial error models for multilevel data which contains city level and prefecture level random eﬀects by applying it them to happiness dataset in Japan.

within 30 km of a city, then the element of W2 corresponding to the three closest cities is set to 1.

Table 1 reports the estimates of the parameters for SEMM models with standard errors. As a benchmark

for comparison, those of MANOVA models which is a special case of SEMM models where spatial parameters

for random eﬀects are equal to zeros, i.e. ρ_i = 0, i = 1, 2. In comparison between ﬁttings of SMMM models and ANOVA models, the former model accounts for happiness better than the later model in terms of

Akaike information criterion (AIC). This indicated that taking into account of spatial correlation improve

the accuracy of the model fitting. We find from table 1 that spatial correlation between random effects for city

level, ρ1, are positively signiﬁcant at 5 % level, which indicates that random eﬀect on a city takes similar value

with random eﬀects on surrounding cities of the city. One reason which derives spatial correlation between

city level random coeﬃcient is the similarity of culture or customs which greatly aﬀects the way people feel

about their happiness. Figure 1 is a plot of estimated regional eﬀects for each cities, d, which mean average

happiness of people living in that city. We can ﬁnd that spatial cluster in the regional eﬀects exists and

the level of happiness in the southern and middle regions of Japan is higher than that in the eastern region.

Especially, eastern coastal regions show the lowest level of happiness compared to other regions. The reason

for this is thought to be that the eﬀects of the nuclear accident caused by the East-Japan earthquake which

occurred in 2011 are still lingering. The estimated coeﬃcients of age takes a U-Shaped, namely, coeﬃcients

decreases until middle age and then increases. Moreover, the coeﬃcients of gender indicates that female is

(14)

increase and getting married greatly increases people’s happiness.

5 Conclusion

We have proposed spatial error models for multilevel dataset which is a spatial extension of analysis of variance

models in this paper. Because the proposed model can be regarded as a Bayesian hierarchal model, we have

introduced empirical bayesian estimation methods in two steps as estimation strategy for the parameters

in the proposed models. The first step estimator specifies the hyper parameters and has been justified in

asymptotic situations, and the second step estimator for parameters are derived by the Bayes formula with

the hyperparameters estimated in the ﬁrst step. Fitting the proposed model to happiness survey data in

Japan, we can evaluate the eﬀect of individual and regional level explanatory variables on happiness and

spatial correlation of regional eﬀects which are random eﬀects in each regions. Empirical results suggest that

happiness is U-shaped with age, female’s happiness is higher than male’s happiness at all ages, and regional

eﬀects on happiness are spatially correlated. The existence of spatial correlations between random eﬀects

indicates that unobserved features which aﬀect on individual happiness such as culture and custom tend to

be similar in nearby regions.

For future study, several extensions are possible. In this analysis, we regard individual happiness as

con-tinuous variables. However, the treatment creates a gap between the data and the model because individual

happiness takes only discrete values between 1 and 10. Thus, the extension of the proposed model to discrete

choice models ﬁlls the gap and allows for rigorous analysis of happiness. One more possibility is a panel

ex-tension of the proposed model. Our proposed model can capture only spatial correlation. However, it is said

that happiness on individual has a time series correlation. A panel extension would reveal more interesting

(15)

Table 1: Estimation values and their standard errors forβ1, β2, log likelihood (logL) and Akaike Information Criterion (AIC) in both spatial error models for multilevel data (SEMM2) and mixed analysis of variance models(MANOVA2) which contains city level and prefecture level random eﬀects, and estimates and standard errors of spatial parameters ρ1 andρ2 in SEMM2 which are obtained by applying them to happiness dataset in Japan.

SEEM2 MANOVA2

coef s.e coef s.e

20< Female < 25 -0.007 0.076 -0.005 0.076 Female < 30 -0.234 0.071 -0.232 0.071 Female < 35 -0.522 0.069 -0.521 0.069 Female < 40 -0.694 0.068 -0.691 0.068 Female < 45 -0.718 0.064 -0.717 0.064 Female < 50 -0.857 0.064 -0.856 0.064 Female < 55 -0.811 0.067 -0.813 0.066 Female < 60 -0.789 0.068 -0.790 0.068 Female < 65 -0.504 0.067 -0.506 0.067 Female > 65 -0.172 0.063 -0.172 0.062 Male < 20 0.185 0.078 0.184 0.078 Male < 25 -0.186 0.072 -0.183 0.072 Male < 30 -0.836 0.070 -0.835 0.070 Male < 35 -1.117 0.068 -1.118 0.068 Male < 40 -1.409 0.069 -1.411 0.068 Male < 45 -1.557 0.065 -1.558 0.065 Male < 50 -1.628 0.065 -1.629 0.065 Male < 55 -1.692 0.068 -1.694 0.068 Male < 60 -1.799 0.069 -1.802 0.069 Male < 65 -1.308 0.068 -1.311 0.068 Male < 65 -0.915 0.065 -0.917 0.064

200 < Personal Income (PI) < 400 0.076 0.032 0.075 0.032

PI < 600 0.263 0.041 0.264 0.041

PI < 800 0.335 0.057 0.338 0.057

PI < 1000 0.401 0.081 0.403 0.081

PI < 1200 0.572 0.120 0.577 0.120

PI > 1200 0.458 0.146 0.459 0.146

200 < Family Income (FI) < 400 0.055 0.048 0.055 0.048

FI < 600 0.337 0.048 0.339 0.048 FI < 800 0.494 0.052 0.495 0.052 FI < 1000 0.619 0.058 0.621 0.058 FI < 1200 0.769 0.069 0.770 0.069 FI < 1500 0.881 0.087 0.882 0.087 FI < 2000 1.181 0.117 1.183 0.117 FI > 2000 1.111 0.162 1.109 0.163 FI Unknown 0.137 0.045 0.138 0.045 Married 0.961 0.041 0.961 0.041 Divorced 0.390 0.057 0.391 0.057 Children 0.109 0.035 0.110 0.035 rho1(city) 0.440 0.013 rho2(Pref) 0.252 1.217

(16)

Appendix A. Hessian and Covariance matrix

Here, we show the detailed expression of the Hessian matrix and covariance matrix which is discussed in

The-orem 2. Firstly, we show the Hessian matrix. For simplicity, we denote A−1_i (ρ_i) = (I_i−ρ_iW_i)−1(I_i−ρ_iW_i)−1,

B_i(ρ_i) = W_i(I_i−ρ_iW_i) + (I_i−ρ_iW_i)W_i, G_i(ρ_i) = J U_iA−1_i (ρ_i)U_iJ, H_i(ρ_i) = J U_iA−1_i (ρ_i)B_i(ρ_i)A−1_i (ρ_i)U_iJ,

H_1,i(ρ_i) = J U_iA−1_i (ρ_i)B_i(ρ_i)A−1_i (ρ_i)B_i(ρ_i)A−1_i (ρ_i)U_iJ, and H_2,i(ρ_i) = J U_iA−1_i (ρ_i)W_iW_iA−1_i (ρ_i)U_iJ, i =

1, . . . , p. Moreover, we deﬁne A0(ρ0)−1 = IL and Ui = IL. Then, the variance matrix of the proposed

model is given by, Σ(η) =p_i=0σ_i2G_i(ρ_i), where η = (σ2₀, σ₁2, . . . , σ_p2, ρ1, . . . , ρp). Moreover, the derivatives of

G_i(ρ_i), H_i(ρ_i), Σ(η) are given by, ∂Gi(ρi) ∂ρ2 i =−Hi(ρi), ∂Hi(ρi) ∂σ2 i = −2(H1,i(ρi) + H2,i(ρi)), ∂Σ(η) ∂σ2 i = Gi(ρi) and ∂Σ(η) ∂ρi =−σ 2 iHi(ρi), respectively.

By using above notations, the gradients of the log-likelihood function, ∂ log L(ψ)_∂ψ , is given by

∂ log L(ψ) ∂β = X _Σ−1_(η)(Y _{− Xβ),} ∂ log L(ψ) ∂σ2_i =− 1 2tr(Σ −1_(η)G i(ρi)) +1₂(Y − Xβ)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂ log L(θ) ∂ρ_i = σ2_i 2 tr(Σ −1_(η)H i(ρi))−σ 2 i 2 (Y − Xβ) _Σ−1_(η)H i(ρi)Σ−1(η)(Y − Xβ).

Moreover, the hessian matrix of the log-likelihood function, ∂2_∂ψ∂ψlog L(ψ) has the elements:

∂2log L(ψ) ∂β∂β =−XΣ −1_(η)X, ∂2log L(ψ) ∂β∂σ_i2 =−X _Σ−1_(η)G i(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂β∂ρ_i = σ 2 iXΣ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂σ2_i∂σ2_i = 1 2tr(Σ −1_(η)G i(ρi)Σ−1(η)Gi(ρi)) − (Y − Xβ)_Σ−1_(η)G i(ρi)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ),

(17)

∂2log L(ψ) ∂σ_i2∂σ_j2 = 1 2tr(Σ −1_(η)G j(ρj)Σ−1(η)Gi(ρi)) − (Y − Xβ)_Σ−1_(η)G j(ρj)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂σ2_i∂ρ_i = σ_i2 2 tr(Σ −1_(η)H i(ρi)Σ−1(η)Gi(ρi))−1₂tr(Σ−1(η)Hi(ρi)) + σ_i2(Y − Xβ)Σ−1(η)H_i(ρ_i)Σ−1(η)G_i(ρ_i)Σ−1(η)(Y − Xβ) −1₂(Y − Xβ)Σ−1(η)H_i(ρ_i)Σ−1(η)(Y − Xβ) ∂2log L(ψ) ∂σ_i2∂ρ_j =− σ2_j 2 tr(Σ −1_(η)H j(ρj)Σ−1(η)Gi(ρi)) + σ_j2(Y − Xβ)Σ−1(η)H_j(ρ_j)Σ−1(η)G_i(ρ_i)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂ρ_i∂ρ_i = σ_i4 2 tr(Σ −1_(η)H i(ρi)Σ−1(η)Hi(ρi))− σi2tr(Σ−1(η)(H1,i(ρi) + H2,i(ρi)) −σ4i 2 (Y − Xβ) _Σ−1_(η)H i(ρi)Σ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ)

+ σ_i2(Y − Xβ)Σ−1(η)(H_1,i(ρ_i) + H_2,i(ρ_i))Σ−1(η)(Y − Xβ),

∂2log L(ψ) ∂ρ_i∂ρ_j = σ_i2σ2_j 2 tr(Σ −1_(η)H j(ρj)Σ−1(η)Hi(ρi)) −σ 2 iσj2 2 (Y − Xβ) _Σ−1_(η)H j(ρj)Σ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ).

Next, let us consider the variance matrix of the log likelihood function, E(∂ log L(ψ0)

∂ψ ∂ log L(ψ∂ψ 0)). The

explicit form of each elements can be obtained form Lemma 4 in Appendix B:

E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂β = XΣ−1(η0)X, E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂σ2_i = 1 2X _Σ−1_(η 0)E(u0u0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0), E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂ρ_i =−σ 2 0i 2 X _Σ−1_(η 0)E(u0u0Σ−1(η0)Hi(ρ0i)Σ−1(η0)u0),

(18)

E ∂ log L(ψ0) ∂σ_i2 ∂ log L(ψ0) ∂σ_i2 =−1 4[E(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)]2+1 4E[(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)2], E ∂ log L(ψ0) ∂σ_i2 ∂ log L(ψ0) ∂σ_j2 =−1 4E(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)E(u0Σ−1(η0)Gj(ρ0j)Σ−1(η0)u0) +1 4E[u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0u0Σ−1(η0)Gj(ρ0j)Σ−1(η0)u0], E ∂ log L(ψ0) ∂σ_i2 ∂ log L(ψ0) ∂ρ_j = σ 2 j 4 E(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)E(u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0), −1₄E[u₀Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0], E ∂ log L(ψ0) ∂ρ2_i ∂ log L(ψ0) ∂ρ_j =−σ 2 j 4 E(u 0Σ−1(η0)Hi(ρ0i)Σ−1(η0)u0)E(u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0), +1 4E[u 0Σ−1(η0)Hi(ρ0i)Σ−1(η0)u0u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0], where η0= (σ002 , σ012 , . . . , σ0p2 , ρ01, . . . , ρ0p).

6 Appendix B. Some useful lemmas

We introduce some lemmas which are used in the proofs of the following main results. The lemmas are a

little modiﬁcations of lemmas in Lee(2004) for non-square matrices.

Lemma 1 Let A be an n× m non-square matrix whose column sums are uniformly bounded, C be a n × k matrix whose elements are uniformly bounded, and f_i be i.i.d noise with mean 0 and variance σ2. Then,

1

√_mCAf = O_p(1).

Proof. Let B = CA, b_i,j be the (i, j)-th element of B and b_ibe the i-th coumn of B. Because the elements of C are uniformly bounded and the column sums of A are uniformly bounded, the element of B is uniformly

bounded by Lemmas in Lee (2004). Let b be a constant such as |b_i,j| ≤ b. Because Bf = m_i=1b_if_i,

V ar(Bf ) = E(m_i=1m_i=1b_if_if_jb_j) = σ2m_i=1b_ib_i ≤_i=1m b1_k1_k = O(m), where 1_k is a k× 1 vector whose elements are 1. Thus, √1_mCAf = O_p(1) by Chebyshev’s inequality.

Lemma 2 Let A be an m1× m2 non-square matrix whose column sums are uniformly bounded, f1,i and

f_2,i are i.i.d noise with mean 0 and variance σ1and σ2, respectively. Then,

• E(f1Af2) = 0.

(19)

• f1Af2= Op(√m1).

Proof. E(f1Af2) =m_i=11 _j=1m2 ai,jE(f1,if2,j) = 0. V (f1Af2) =

m1 i1=1 m1 i2=1 m2 j1=1 m2 j2=1ai1,j1ai2,j2E(f1,i1f1,j2f2,j1f2,j2) = σ21σ22 m1 i=1 m2 j=1a2i,j ≤ σ21σ22mi=11( n j=1|ai,j|)2≤

σ2₁σ₂2m_i=11 c2= O(m1). Thus, f1Af2= Op(√m1) by Chebyshev’s inequality.

Lemma 3 Let A_ibe an m_i×m_imatrix for i = 1, . . . , p, B be an n×n matrix, C be an n×k matrix, and ε and

f_i, i = 1, . . . , p be an n×1 and m_i×1 random noise with means 0 and variances σ₀2and σ_i2. Moreover, we deﬁne

U_i be an n× m_imatrix which consists only of zeros and ones and there exist one 1 in each row and at least one 1 in each column, i = 1, . . . , p We denote u = ε +p_i=1U_iA_if_i and m = min{m1, . . . , mp}. We assume

U_iis uniformly bounded in column sums, the elements of C is uniformly bounded, B is uniformly bounded in both row and column sums, and m_i is a function of n and tends to inﬁnity and lim_n→∞mi

n = ci≤ 1. Then, • 1 nCBu = op(1). • 1 nuBu = Op(1). • 1 n(uBu− E(uBu)) = op(1).

Proof. By the Lemma,

1 nC _{Bu =} 1 nC _{Bε +}p i=1 1 nC _BU iAifi, = √1 n 1 √ nC _{Bε +}p i=1 m_i n 1 √_m i 1 √_m iC _BU iAifi, = o(1)O_p(1) + p i=1 O(1)o(1)O_p(1), = o_p(1).

We denote f0= ε0, A0= In and U0= In. Because u0= U0A0f0+ U1A1f1+· · · + UpApfp,

1 mu _{Bu =}p i=0 p j=0 1 mf iAiUiBUjAjfj.

(20)

column sums, by Lee(2004), 1 nf iAiUiBUiAifi= m_ni_m1 if iAiUiBUiAifi, = O(1)O_p(1), = O_p(1). Moreover, mi n m1i(f iAiUiBUiAifi− E(fiAiUiBUiAifi) = O(1)op(1) = op(1).

Secondly, we will consider the case of i= j. By the Lemma, 1 nf iAiUiBUjAjfj= m_ni√1_m i 1 √_m if iAiUiBUjAjfj, = O(1)o(1)O_p(1), = o_p(1). It is clear that mi n m1i(f iAimniUimnBUjAjfj− E(fiAimniUimnBUjAjfj) = op(1)

Therefore, 1_nuBu = O_p(1) and _n1(uBu− E(uBu)) = o_p(1).

Lemma 4 Let A be an m1× m2non-square matrix, Ti= J Ui(Ii− ρiWi)−1, and fii = 0, . . . , p are mi× 1

i.i.d. random noise with mean 0 and variances σ_i2, respectively. Moreover, the elements of each f_i has more than fourth moment, i.e. E|f_1,i|4+δ<∞ for some δ > 0. Let us deﬁne u =p_i=1T_if_i. Then,

1. E(uu) = Σ(η).

2. E(uAu) =p_i=0σ_i2tr(T_iAT_i).

3. E(uuAu) =p_i=0μ_i,3T_idiag(T_iAT_i). 4. E(uAuuBu) =p_i=1

(μ_i,4−3σ4_i)n_j=1(T_iAT_i)_j,j(T_iBT_i)_j,j+σ4_i(tr(T_iAT_i)tr(T_iBT_i)+tr(T_iAT_i(T_i(B+

B)T_i))

(21)

Proof. E(uu) = E p i=0 T_if_i p i=0 f_iT_i , = p i=0 T_iE(f_if_i)T_i, = p i=0 σ_i2T_iT_i, = Σ(η). E(uAu) = E p i1=0 p i2=0 f_i₁T_i₁AT_i₂f_i₂ , = p i=0 E(f_iT_iAT_if_i), = p i=0 σ_i2tr(T_iAT_i), E(uuAu) = E p i1=0 p i2=0 p i3=0 T_i₁f_i₁f_i₂T_i₂AT_i₃f_i₃ , = p i=0 E(T_if_if_iT_iAT_if_i), = p i=0 E T_if_i mi j1=1 mi j2=1 (T_iAT_i)_j₁_,j₂f_i,j₁f_i,j₂ , = p i=0 μ_i,3T_idiag(T_iAT_i).

E(uAuuBu) = E

p i1 p i2 p i3 p i4 f_i₁T_i₁AT_i₂f_i₂f_i₃T_i₃BT_i₄f_i₄ , = p i E(f_iT_iAT_if_if_iT_iBT_if_i) + p i1 p i2 =i1 E(f_i₁T_i₁AT_i₁f_i₁f_i₂T_i₂BT_i₂f_i₂) + p i1 p i2 =i1 E(f_i₁T_i₁AT_i₂f_i₂f_i₁T_i₁BT_i₂f_i₂) + p i1 p i2 =i1 E(f_i₁T_i₁AT_i₂f_i₂f_i₂T_i₂BT_i₁f_i₁), = p i=1 (μ_i,4− 3σ_i4) n j=1 (T_iAT_i)_j,j(T_iBT_i)_j,j+ σ_i4(tr(T_iAT_i)tr(T_iBT_i) + tr(T_iAT_i(T_i(B + B)T_i)) + p i1 p i2 σ_i2₁σ2_i₂tr(T_i₁AT_i₁)tr(T_i₂BT_i₂) + 2 p i1 p i2 σ2_i₁σ2_i₂tr(T_i₁AT_i₁T_i₂BT_i₂)

(22)

Appendix C. Proofs of the theorems

Proof of theorem 1

To prove the consistency of QMLE ˆθ, it is suﬃcient to show that the following two facts hold (See white

(1994)). The ﬁrst one is the identiﬁcation uniqueness condition: lim sup_n→∞

max_θ∈Bc_(θ₀_,ε)∩ΘE log L(θ)−

E log L(θ0)

< 0 for any ε > 0, where Bc(θ0, ε) is the compliment of an ε-neighborhood of θ0. The second

one is the uniform convergence in probability: sup_θ∈Θ _n1log L(θ)− 1_nE log L(θ) = o_p(1).

The identiﬁcation uniqueness

Firstly, we will show that the identiﬁcation uniqueness condition hold. From the deﬁnition of the concentrated

expected log-likelihood function, we have

1 n(E log L(θ)− E log L(θ0)) =− 1 2log(˜σ 2 0(θ))− 1 2nlog|Ω(θ)| + 1 2log(σ 2 00(θ)) + 1 2nlog|Ω(θ0)|, =− 1 2nlog|˜σ 2

0(θ)In| −_2n1 log|Ω(θ)| +_2n1 log|σ002 (θ)In| +_2n1 log|Ω(θ0)|,

= 1 2nlog|σ 2 00Ω(θ0)| − 1 2nlog|ˆσ 2 0(θ)Ω(θ)|.

By Assumption 7, for any θ= θ0,

lim n→∞ 1 n log|σ₀₀2 Ω(θ0)| − log |˜σ02(θ)Ω(θ)| = 0. Thus, lim n→∞ 1 n(E log L(θ)− E log L(θ0))= 0.

Let p_n(β, σ₀2, θ) = exp(log L(β, σ2₀, θ)) be the quasi-joint p.d.f of u0= (Y − Xβ0) and p0_n(β, σ20, θ) be the

true joint p.d.f. We denote Eq as the expectation with respect to p_n(β, σ₀2, θ) and E as the expectation with

respect to p0_n(β, σ₀2, θ).

By the Jensen’s inequality,

0 = log Eq p_n(β, σ2₀, θ) p_n(β0, σ200, θ0) ≥ Eq_log pn(β, σ02, θ) p_n(β0, σ002 , θ0)

(23)

Thus, Eqlog p_n(β, σ₀2, θ) p_n(β0, σ200, θ0) = E log p_n(β, σ2₀, θ) p_n(β0, σ200, θ0) . This implies E log L(θ) = max β,σ2 0

E[log L(β, σ₀2, θ)]≤ E[log L(β0, σ200, θ0)] = E log L(θ0).

Collecting the above results, we have

lim n→∞ max θ∈Bc_(θ₀_,ε)∩ΘE log L(θ)− E log L(θ0) < 0,

for any ε > 0, where Bc(θ0, ε) is the compliment of an ε-neighborhood of θ0. The identiﬁcation uniqueness

condition holds.

Uniform convergence

Secondly, we will show that the uniform convergence condition hold. From the deﬁnition, we have

1 nlog L(θ)− 1 nE log L(θ) =− 1 2log ˆσ 2 0+ 1 2log ˜σ 2 0.

By the mean value theorem,

| log ˆσ2 0− log ˜σ02| = 1 ¯ σ2₀|ˆσ 2 0− ˜σ02|,

where ¯σ₀2 lies between ˆσ₀2 and ˜σ2₀. It is suﬃcient to show the following two facts. The ﬁrst one is ˜σ₀2 is uniformly bounded away from zero and the second one is uniform convergence of|ˆσ₀2− ˜σ2₀| in probability.

Firstly, we will show that ˜σ2₀ is uniformly bounded away from zero. By Assumption 4,

inf θ∈Θ˜σ 2 0(θ) = inf_θ∈Θ σ₀₀2 n tr(Ω(θ) −1_Ω(θ 0)) , ≥ σ2 00_θ∈Θinf(γmin(Ω−1(θ)))1 ntr(Ω(θ0)), ≥ c0cωc1, > 0,

(24)

where c0and c1are some constants. Therefore, ˜σ20 must be uniformly bounded away from zero.

Secondly, we will show that sup_θ∈Θ|ˆσ₀2− ˜σ₀2| = o_p(1). Because M (θ)Ω−12(θ)X = 0,

ˆ σ₀2− ˜σ₀2= 1 nY _Ω−1 2_{(θ)M (θ)Ω}−12_(θ)Y − 1 nE[u 0Ω− 1 2_{(θ)M (θ)Ω}−12_(θ)u 0]− 1 nE[u 0Ω− 1 2_{(θ)P (θ)Ω}−12_(θ)u 0], = 1 n u₀Ω−12_{(θ)M (θ)Ω}−12_(θ)u₀− Eu 0Ω− 1 2_{(θ)M (θ)Ω}−12_(θ)u₀ −_n1E u₀Ω−12_{(θ)P (θ)Ω}−12_(θ)u₀ .

We will consider the uniform convergence of the above two terms.

Let us consider the uniform convergence of the second term. We note that 0 < c_ωc_x≤ inf_θ∈Θγmin(Ω−1(θ))γmin

X_X n ≤ γmin X_Ω−1_(θ)X n ≤ γmax X_Ω−1_(θ)X n

≤ supθ∈Θγmax(Ω−1(θ))γmax

X_X n ≤ cωcx<∞. By assumption 4 and 5, sup θ∈Θ 1nE u₀Ω−12_{(θ)P (θ)Ω}−12_(θ)u₀ = sup θ∈Θ n1σ 2 00tr Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)Ω(θ0) , ≤ _n1sup θ∈Θ σ2 00γmax XΩ−1(θ)X n −1 γmax(Ω−2(θ)) γmax(Ω(θ0))_n1tr(XX), ≤ _n1σ2₀₀sup θ∈Θ γmax XΩ−1(θ)X n −1 sup θ∈Θ γmax(Ω−2(θ)) γmax(Ω(θ0))1 ntr(XX _), = 1 nO(1)O(1)O(1)O(1)O(1), = o(1).

This implies that the second term converges uniformly.

To show that the uniform convergence of the ﬁrst term, we will show that the pointwise convergence and

stochastic equicontinuity of the term (See Andrew (1992)).

Firstly, we will consider the pointwise convergence of the ﬁrst term. From Assumption 4 and 5, Ω−1(θ) and

X(XΩ−1(θ)X)−1Xare uniformly bounded in both row and column sums. Therefore, Ω−12_{(θ)M (θ)Ω}−12_{(θ) =}

Ω−1(θ)−Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ) is uniformly bounded in both row and column sums. By Lemma 3, it follows hat 1_n u₀Ω−12(θ)M (θ)Ω−12(θ)u0− Eu0Ω− 1 2(θ)M (θ)Ω−12(θ)u0

= o_p(1). This implies the ﬁrst term converges pointwise.

(25)

theorem, 1 nu 0Ω− 1 2_(θ₁_{)M (θ}₁_)Ω−12_(θ₁_)u₀− 1 nu 0Ω− 1 2_(θ₂_{)M (θ}₂_)Ω−12_(θ₂_)u₀ = 1 n p i=1 ∂u₀Ω−12(¯θ)M (¯θ)Ω−12(¯θ)u0 ∂ρ_i (ρi,1− ρi,2) + 1 n p i=1 ∂u₀Ω−12(¯θ)M (¯θ)Ω−12(¯θ)u0 ∂τ_i2 (τ 2 i,1− τi,22 ),

where ¯θ lies between θ1 and θ2. Thus, it is suﬃce to show that supθ∈Θ 1n∂u

0Ω− 12(θ)M(θ)Ω− 12(θ)u0

∂ρi

= Op(1)

and sup_θ∈Θ _n1∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0

∂τi

= Op(1) (See, Davidoson (1994)).

Here, we note that the partial derivatives of Ω−1(θ) are given by,

∂Ω−1(θ) ∂ρ_i =−τ 2 iΩ−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ), ∂Ω−1(θ) ∂τ_i =−Ω −1_{(θ)J U} iA−1i (ρi)UiJΩ−1(θ), where B_i(ρ_i) = W_i(I_i− ρ_iW_i) + (I_i− ρ_iW_i)W_i.

Let us consider the uniform boundedness of _n1∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0

∂ρi . The matrix Ω −1

2(θ)M (θ)Ω−12(θ) consists of the two termes Ω−1(θ) and Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ). The uniform boundedness of

1 n∂u 0Ω−1(θ)u0 ∂ρi is given by, sup θ∈Θ n1 ∂u₀Ω−1(θ)u0 ∂ρ_i = sup_θ∈Θ 1nτ 2 iu0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)u0 , = sup θ∈Θ τ2 iγmax(J UiA−1i (ρi)Bi−1(ρi)A−1i (ρi)UiJ)γmax2 (Ω−1(θ)) n1u 0u0, = O(1)O(1)O(1)O_p(1), = O_p(1).

Next, we will show that the uniform boundness of _n1∂u0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0

∂ρi . The partial

derivative of the matrix is

1 n ∂u₀Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0 ∂ρ_i = 1 nu 0 ∂Ω−1(θ) ∂ρ_i X(X _Ω−1_(θ)X)−1_X_Ω−1_(θ)u 0 + 1 nu 0Ω−1(θ)X ∂(XΩ−1(θ)X)−1 ∂ρ_i X _Ω−1_(θ)u 0 + 1 nu 0Ω−1(θ)X(XΩ−1(θ)X)−1X ∂Ω−1(θ) ∂ρ_i u0, = φ1+ φ2+ φ3,

(26)

where φ1= 1_nu0∂Ω −1_(θ) ∂ρi X(X _Ω−1_(θ)X)−1_X_Ω−1_(θ)u 0, φ2= _n1u0Ω−1(θ)X(XΩ−1(θ)X)−1X ∂Ω −1_(θ) ∂ρi u0 and φ3= _n1u0Ω−1(θ)X(XΩ−1(θ)X)−1X ∂Ω −1_(θ) ∂ρi u0.

By Lemma 3, the uniform boundness of φ2 is given by

sup θ∈Θ n1u 0Ω−1(θ)X ∂(XΩ−1(θ)X)−1 ∂ρ_i X _Ω−1_(θ)u 0 , = sup θ∈Θ n1τ 2 iu0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0 , ≤ sup θ∈Θ τ2

iγmax(J UiA−1i (ρi)Bi(ρi)Ai−1(ρi)UiJ)γmax(Ω−1(θ))γmax((X

_Ω−1_(θ)X)−1 n )γmax( XX n )γmax(Ω −2_(θ)) 1 nu 0u0, = O(1)O(1)O(1)O(1)O(1)O(1)O_p(1), = O_p(1).

Let us consider the uniform boundness of φ1. The term is

1 nu 0 ∂Ω−1(θ) ∂ρ_i X(X _Ω−1_(θ)X)−1_X_Ω−1_(θ)u 0, = τ_i21 nu 0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0, = tr(a(θ)b(θ)),

where a(θ) = τ_i2√1_nu₀Ω−1(θ)J U_iA−1_i (ρ_i)B_i(ρ_i)A_i−1(ρ_i)U_iJand b(θ) =√1_nΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0

It suﬃces to show that

sup_θ∈Θ tr(a(θ)b(θ)) 2

= O_p(1). Because of (sup_θ∈Θf (θ))2 = sup_θ∈Θf (θ)2, by Cauchy-Schwarz inequality, sup θ∈Θ tr(a_(θ)b(θ)) 2_{= sup} θ∈Θtr 2_(a_(θ)b(θ)), ≤ sup θ∈Θtr(a _{(θ)a(θ)) sup} θ∈Θtr(b _(θ)b(θ)).

(27)

The uniform boundenss of tr(a(θ)a(θ)) is given by, sup θ∈Θtr(a _(θ)a(θ)) = sup θ∈Θ τ_i21 ntr(u 0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJJ UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)u0) , ≤ sup θ∈Θ τ_i2γ2_max(J U_iA−1_i (ρ_i)B_i(ρ_i)A−1_i (ρ_i)U_iJ)γ_max2 (Ω−1(θ)) 1 nu 0u0, = O(1)O(1)O(1)O_p(1), = O_p(1).

Similarly, the uniform boundness of tr(b(θ)b(θ)) is given by,

sup θ∈Θtr(b _(θ)b(θ)) = sup θ∈Θ 1 ntr(u 0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0) , ≤ sup θ∈Θ γmax(Ω−1(θ))γmax((X _Ω−1_(θ)X)−1 n )γmax( XX n )γ 2 max(Ω−1(θ)) 1 nu 0u0, = O(1)O(1)O(1)O(1)O_p(1), = O_p(1).

The uniform boundedness of 1_n∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0

∂τi can be proved by the similar manner. By collecting

above results, we can show that the uniform convergence of 1_n

u₀Ω−12_{(θ)M (θ)Ω}−12_(θ)u₀−Eu₀_Ω−12_{(θ)M (θ)Ω}−12_(θ)u₀

.

Therefore, sup_θ∈Θ _n1log L(θ)−_n1E log L(θ) = o_p(1), and the QMLE ˆθ is a consistent estimator of θ0by White

(1994).

Proof of Theorem 2

To derive the asymptotic normality of the proposed estimator, we will show that the following three results:

1. _n1∂2log L(ψ0) ∂ψ∂ψ p −→ E 1 n∂ 2_{log L(ψ}₀₎ ∂ψ∂ψ . 2. _n1∂2_∂ψ∂ψlog L( ¯ψ) p −→ 1 n∂ 2_{log L(ψ}₀₎ ∂ψ∂ψ . 3. √1_n∂ log L(ψ0) ∂ψ −→ N(0, Γ).D