• 検索結果がありません。

東北大学機関リポジトリTOUR

N/A
N/A
Protected

Academic year: 2021

シェア "東北大学機関リポジトリTOUR"

Copied!
33
0
0

読み込み中.... (全文を見る)

全文

(1)

Spatial Extension of Mixed Analysis of

Variance Models

著者

Sato Takaki, Matsuda Yasumasa

journal or

publication title

DSSR Discussion Papers

number

120

page range

1-31

year

2021-02

URL

http://hdl.handle.net/10097/00130485

(2)



Data Science and Service Research

Discussion Paper

Discussion Paper No. 120

Spatial Extension of Mixed Analysis of

Variance Models

Takaki Sato and Yasumasa Matsuda

February, 2021

Center for Data Science and Service Research

Graduate School of Economic and Management

Tohoku University

27-1 Kawauchi, Aobaku

Sendai 980-8576, JAPAN

(3)

Spatial Extension of Mixed Analysis of Variance Models

Takaki Sato

∗1

and Yasumasa Matsuda

2

1

Advanced institute for Yotta informatics, Tohoku University, Sendai, Japan

2

Graduate School of Economics and Management, Tohoku University, Sendai, Japan

Abstract

This paper proposes a spatial extension of mixed analysis of variance models for spatial multilevel data in which individual belongs to one of spatial regions, which are called spatial error models for multilevel data (SEMM). We have introduced empirical bayes estimation methods in two steps because SEMM models which are defined by two level equations, individual and regional levels, can be regarded as a Bayesian hierarchal model. The first step estimator based on quasi-maximum likelihood estimation methods specifies the hyper parameters and has been justified in asymptotic situations, and posterior distributions for the parameters are evaluated with the hyperparameters estimated in the first step. The proposed models are applied to happiness survey data in Japan to demonstrate empirical properties.

Keywords: Spatial econometrics, Multilevel data, MANOVA model, Empirical Bayes method,

Quasi-maximum likelihood estimation.

1

Introduction

This paper aims to extend mixed analysis of variance (MANOVA) models for multilevel data (see Demidenko

(2013)) to those for spatial multilevel data, which we call spatial error models for multilevel data (SEMM).

Multilevel data is a kind of cluster data that is observations belong to some kinds of nested clusters (e.g.

students are members of one of schools in school effectiveness research) and widely used in both of social

and natural science (see De Leeuw et al. (2008) and Hox et al. (2017)). Spatial multilevel data in which

clusters are organized with spatial regions are also used in many fields to capture spatial correlation between Corresponding author. E-mail address: [email protected]

(4)

regions. With spatial multilevel data, Fazio & Piacentino (2010) investigates spatial variability of Small and

Medium Enterprises productivity across the Italian territory, and Pierewan & Tampubolon (2014) examines

how spatial clusters explain variations in individual well-begin across regions in Europe.

MANOVA models which are linear regression models incorporating several kinds of random effect terms

corresponding to the type of clusters have been used to analyze multilevel data. By including random effect

terms, we can evaluate common feature within same clusters as grouping structures. Hartley and Rao (1967)

and Miller (1977) discuss asymptotic properties of maximum likelihood estimator for MANOVA models.

To evaluate spatial correlations between random effects in multilevel data, we provide SEMM models as a

spatial extension of MANOVA models in this paper. The conventional way to estimate spatial correlation is to

include spatial lag terms into regression models (see Anselin (1988) and Arbia (2014)), and thus we combine

spatial lag terms with random effect terms in MANOVA models to propose new spatial econometrics models.

Because SEMM models are defined by two level equations, individual level and regional level equations,

the models can be regarded as hierarchical bayesian models whose parameters and hyperparameters can be

estimated by empirical bayesian estimation methods in two steps. In the first step, hyperparameters are

estimated with quasi-maximum likelihood (QML) estimation methods, which makes it possible to apply a

method developed in spatial econometrics studies (see Lee (2004), Liu and Yang (2015), Su and Yang (2015),

and Yang (2018)) Posterior distributions for parameters are derived with hyperparemters estimated in the

first step.

The interesting feature of SEMM models are summarized as follows. Firstly, SEMM models can analyze

regional specific effects for dependent variables, considering the effect of individual characteristics. Here, let

us note that regional effects are not the same as random effects. Regional effects of a region are defined by the

effect of observed regional characteristics and the sum of random effects for clusters, each of which corresponds

to the group in clusters which the region belongs to. Spatial econometrics models that have been considered

ever can’t take into account of the effect of individual characteristics in estimating regional effect in multilevel

data because we need to summarize the data on regions where more than one individuals are observed to

apply the models, and then individual characteristics are lost. Defining the SEMM model in two level

equations would allow for both individual characteristics and regional effects in analysis. Secondly, spatial

correlations between random effects can be estimated. Some sources of random effects such as cultures or

customs specific to a region may tend to be similar to them in nearby regions. Then, random effects may have

spatial correlation, namely, regional effects in nearby areas may take on similar values. Therefore, taking into

(5)

we can estimate regional effects in areas where there are no observed individuals by using the information of

the region where observed individuals exist. Because regional effects are estimated by Bayesian estimation

method, we can evaluate regional effects for all regions, regardless of whether the individuals belong to them

or not by properly estimating the prior information of regional effects.

Applications of SEMM models to happiness survey data in Japan demonstrate several interesting features

of the effect of individual characteristics and regional specific characteristics on happiness. Firstly, individual

characteristics are important factor for happiness studies. People’s happiness is U-shaped with respect to

age, namely, happiness decreases until middle age and then increases. Moreover, female is basically happier

than male. Happiness increases monotonically as household income and personal income increase and getting

married greatly increases people’s happiness. Secondly, random effects for each city have spatial correlation.

The similarity of culture or customs of nearby areas which greatly affects the way people feel about their

happiness may cause the spatial correlation. Finally, spatial cluster exists in the regional effects which can be

regarded as average happiness of each regions. The level of happiness in the southern and middle regions of

Japan is higher than that in the eastern region, and the estimated happiness of eastern coastal regions is the

lowest. The reason for this is thought to be that the effects of the nuclear accident caused by the East-Japan

earthquake which occurred in 2011 are still lingering.

This paper is organized as follow. In Section 2, we define SEMM models as a spatial extension of MANOVA

models. A two step estimation method to estimate the parameters in SEMM models and asymptotic

prop-erties of the first step estimator are discussed in Section 3. We apply the SEMM model to happiness survey

data in Japan to demonstrate empirical properties of the proposed models in Section 4. Section 5 concludes

the paper. All the proofs in Section 3 are discussed in Appendix.

2

Model specification

Let n and L be the number of individual and regions, respectively. We assume that each individuals belong

to one of regions and admit that there are regions where no individuals are observed. In this paper, we call

the nested dataset whose grouping is based on spatial units as spatial multilevel data. Moreover, we assume

that the regions can be grouped into larger regional units by p different groupings, and let ml, l = 1, . . . , p, be

the number of larger regions obtained by the l-th grouping. For examples, several cities are grouped together

to form a prefecture in Japan.

(6)

j-th region and Y is the n× 1vector of yi,js. Spatial error models for multilevel data (SEMM) is given by,

Y = X1β1+ J d + ε, (1)

d = X2β2+ u, (2)

u = U1(I1− ρ1W1)−1f1+· · · Up(Ip− ρpWp)−1fp, (3)

where X1 is an n× k1 matrix for individual level explanatory variables, J is an n× L matrix for regional

dummy variables, X2 is an L× k2 matrix for regional level explanatory variables, Ul is an L× ml matrix

for a random effect which consists only of zeros and ones and there is exactly one 1 in each row and at

least one 1 in each column, l = 1, . . . , p, Il is an ml× ml identity matrix, and Wl is an ml× ml spatial weight matrix which describes spatial relationships among the l-th grouped regions. A random variables

εi, i = 1, . . . , n, is independent and identically distributed (i.i.d.) with mean 0 and variance σ20and an n× 1 vector ε = (ε1, . . . , εn), and fl,j, j = 1, . . . , mlis also i.i.d. with mean 0 and variance σ2l and an ml×1 vector

fl = (fl1, . . . , flml) is a random effect for the l-the groped regions. The vector β1 and β2 are regression

coefficients for individual and regional level explanatory variables, respectively, and ρlis a spatial correlation parameter which describe the strength of spatial dependence between regions in the l-th grouping.

SEMM models is a spatial extension of mixed analysis of variance (MANOVA) models because SEMM

models reduce to MANOVA models when spatial correlation parameters, ρls are equal to 0. In the analysis of spatial multilevel data, consideration of spatial correlation between random effects can improve the accuracy

of the model fitting. Random effects fl,j express the effect of the j-th region in the l-th larger regional units on regional effect, d, and some sources of random effects may be cultures or customs in the larger region.

Then, because the cultures and customs of nearby regions tend to be similar, random effects fl,j may have spatial correlation, namely, regional effects in nearby areas may take on similar values. Therefore, taking

into account the spatial correlation between regional effects allows for more detailed analysis of the spatial

multilevel data.

SEMM models are defined by two level equations, individual level equations (1) and region level equations

(2) and (3), and this two level modeling has some advantages. One advantage is that we can analyze regional

effect, d, considering the effect of individual characteristics. Spatial econometrics models that have been

considered can’t take into account of individual characteristics when we estimate regional effect in multilevel

data because they assume that exactly one observation is observed each regions. Thus, we need to summarize

(7)

one region. A commonly used method to summarize data is to take the average of the data in the area, but

then individual characteristics are lost then. Defining the SEMM model in two level equations would allow

for both individual and regional effects, which would allow for more accurate estimation of regional effects.

Another advantage of modeling multilevel models with the hierarchical structure is that we can estimate

regional effects, d, in areas where there are no observed individuals by using the information of the region

where observed individuals exist. Let us remember that J is a regional dummy matrix which may have

columns whose elements are all zeros because we admit the existence of regions where no individuals are

observed. Thus, usual ordinary least squares does not work because J is rank deficient. The proposed

model can be regarded as a Bayesian hierarchical model and equation (2) and (3) describe prior information

of regional effects. As discussed in the estimation section, by properly estimating the prior information of

regional effects with marginal likelihood which is based on the information of regions where individuals are

observed, we can evaluate regional effects for all regions, regardless of whether the individuals belong to them

or not.

3

Estimation

Let us consider a method to estimate the parameters for SEMM models and discuss asymptotic properties

of proposed estimators the size of ml, l = 1, . . . , p, tends to be infinity along with the sample size n. Because

the proposed model can be regarded as a Bayesian hierarchical model, we propose empirical Bayes

estima-tion procedure in two steps. The first step is the estimaestima-tion of the hyperparameters in prior distribuestima-tions

with quasi-maximum likelihood (QML) estimation methods, and the second step is calculation of posterior

distributions with the hyperparameters estimated in the first step. Moreover, we introduce the asymptotic

properties of the first step estimators when the sample size of both individuals and regions tends to be large.

3.1

Empirical Bayes Estimation

We introduce empirical Bayes estimation procedure in two steps. Let β = (β1, β2), τl = σ

2

l σ2

0, l = 1, . . . , p,

θ = (β2, τ1, . . . , τp, ρ1, . . . , ρp), ψ = (β, σ20, . . . , σ2p, ρ1, . . . , ρp) and δ = (β1, d). In this paper, we call σ20 and

θ as hyperparameters and δ as parameters, respectively.

The first step is the estimation of σ20and θ by a quasi-maximum likelihood (QML) estimation method with a marginal likelihood of Y . Let us denote that f (Y|β1, d, σ20) is a probability density function for the data

(8)

variables εi and fl,j which may be not normally distributed random variables follows normal distribution, and then the marginal distribution of Y ,

m(Y|ψ) =



f (Y|β, σ2ε, d)g(d|β2, ρ1, . . . , ρp, σ21, . . . , σp2)dd,

follows a multivariate normal distribution. Thus, the marginal log-likelihood function is given by

log L(ψ) =−n 2log(2πσ 2 0) 1 2log|Ω(θ)| − (Y − Xβ)Ω−1(θ)(Y − Xβ) 20 , (4) where X = (X1, J X2), Ω(θ) = In+ τ1J U1(I1− ρ1W1)−1(I1− ρ1W)−1U1J+· · · + τpJ Up(Ip− ρpWp)−1(Ip− ρpWp)−1UpJ.

We will derive a concentrated marginal log-likelihood function to reduce the number of parameters for

numerical optimization. The first-order condition of the marginal log-likelihood function is

ˆ β(θ) = (XΩ−1(θ)X)−1XΩ−1(θ)Y, ˆ σ02(θ) = 1 n(Y − X ˆβ(θ)) Ω−1(θ)(Y − X ˆβ(θ)).

By substituting ˆβ(θ) and ˆσ20(θ) into (4), we obtain the concentrated marginal log-likelihood function,

log L(θ) =−n 2(log(2π) + 1)− n 2log(ˆσ 2 0(θ))− 1 2log|Ω(θ)|.

Maximizing the concentrated marginal log-likelihood function gives the ML estimator ˆθ of θ, and then the

ML estimators ˆβ and ˆσ02are obtained by ˆβ = ˆβ∗θ) and ˆσ02= ˆσ02(ˆθ), respectively.

The second step is the Bayesian estimation of the parameters δ based on the estimated hyperparameters ˆ

β2, ˆρ, ˆσ20 and ˆσl2= ˆτlσˆ02, l = 1, . . . , p. Let δ = (β, d) and ˜X = (X, J ). The estimated posterior distribution

for δ is given by

P (δ| Y, ˜X, ˆβ2, ˆρ1, . . . , ˆρp, ˆσ02, . . . , ˆp2f, b)∝ L(Y | ˜X, δ, ˆσ20)π(δ| ˆβ2, ˆρ, ˆσ12, . . . , ˆσp2, b),

where b is a hyperparameter for a prior information for β, L(Y| ˜X, θ, ˆσ20) is the likelihood of the data Y , and π(θ| ˆβ2, ˆρ, ˆσ21, . . . , ˆσp2, b) is the prior distribution of the model parameters, δ. If the prior and posterior

distribution for δ is conjugate distributions, then prior distribution can be calculated explicit form, and if

(9)

Carlo (MCMC) methods.

As one example of conjugate distributions, we will show the explicit form of the posterior distribution

when the likelihood and the prior distribution are multivariate normal distributions and the number of

random effect is one. Then, the estimated posterior distribution follows a multivariate normal distribution.

We set prior means and the inverse of prior variance matrices of the multivariate normal distribution for

the prior distribution as ˆs0 = (0k×1, ˆβ2) and ˆS0−1 =

⎛ ⎜ ⎝ 0k×k 0k×m 0m×k σˆ12 1(In− ˆρW )(In− ˆρW ) ⎞ ⎟ ⎠, where 0n1×n2

is the n1× n2 matrix whose elements are zeros. Then, the posterior covariance matrix and mean vector

is S1 = 1 ˆ σ2 0 ˜ XX + ˆ˜ S0−1 −1 and s1 = S1 1 ˆ σ2 0 ˜ XY + ˆS−10 sˆ0

, respectively. Thus, the estimated posterior

distribution is given by

P (δ| Y, ˜X, ˆβ2, ˆρ1, ˆσ02, ˆσ21, b)∼ N(s1, S1),

where N (s1, S1) means the multivariate normal distribution with mean s1and covariance matrix S1. In other

cases, the posterior distribution can be derived in the same way.

3.2

Asymptotic properties

Here, we discuss the conditions under which the QML estimators ˆθ = ( ˆβ2, ˆτ1, . . . , ˆτp, ˆρ1, . . . , ˆρp) and ˆψ =

( ˆβ, ˆσ20, . . . , ˆσp2, ˆρ1, . . . , ˆρp) in the first step is consistent and asymptotically normal when the size of ml, l =

1, . . . , p, tends to be infinity along with the sample size n. All of the proofs and Lemmas for the asymptotic

results are given in the Appendix.

Let θ0= (δ0, τ10, . . . , τp0, ρ10, . . . , ρp0) and ψ0= (β0, σ002, θ0)be the true values for θ and ψ. Assume the

following conditions.

Assumption 1 The true parameter θ0lies in the interior of a compact parameter space Θ.

Asuumption 2 εi, i = 1, . . . , n and fl,j, l = 1, . . . , p, j = 1, . . . , ml are i.i.d with mean 0 and variances σ02 and σ2j, respectively. And, E|εi|4+δ<∞ and E|fl,j|4+δ<∞ for some δ > 0.

Assumption 3 The number of regions in l-th grouping, ml, tends to infinity along with the sample size n.

Assumption 4 The matrices J , Ui, Wi, (Ij− ρj,0Wj)−1and Ω−1(θ) is uniformly bounded in both row and column sums. Moreover, 0 < cω ≤ infθ∈Θγmin(Ω−1(θ))≤ supθ∈Θγmax(Ω−1(θ))≤ cω<∞.

(10)

Assumption 5 X has full column rank and its elements are uniformly bounded constants, limn→∞n1XΩ−1(θ)X exists and is non-singular.

Assumption 6 Let A−1i i) = (Ii− ρi,0Wj)−1(Ii− ρi,0Wi)−1 and Bii) = (Ii− ρiWi)−1Wi+ Wi(Ii

ρiWi)−1. We assume that supθ∈Θmax(J UiA−1i (ρi)UiJ)| < ∞ and supθ∈Θ|γmax(J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJ)| <

∞, i = 1, . . . , p. Assumption 7 lim n→∞ 1 n log002 Ω(θ0)| − log |˜σ2(θ)Ω(θ)| = 0, for any θ = θ0.

First, we introduce the consistency of ˆθ. The expected log-likelihood function for the proposed model is

given by E log L(ψ) =−n 2log(2πσ 2 0) 1 2log|Ω(θ)| − E (Y − Xβ)Ω−1(θ)(Y − Xβ) 02 .

The expected log-likelihood is maximized at

˜ β(θ) = β0, ˜ σ02(θ) = σ 2 00 n tr(Ω(θ) −1Ω(θ 0)), = 1 nE[u  1 2(θ)M (θ)Ω12(θ)u0] +1 nE[u  1 2(θ)P (θ)Ω12(θ)u0],

where P (θ) = I − M(θ) and M(θ) = I − Ω−12(θ)X(XΩ−1(θ)X)−1XΩ12(θ). Thus, the concentrated expected log-likelihood function is given by,

E log L(θ) =−n 2(log(2π) + 1)− n 2log(˜σ 2 0(θ))− 1 2log|Ω(θ)|.

Consistency of ˆθ is obtained by the following two facts. The first one is the identification uniqueness

condition: lim supn→∞

maxθ∈Bc0,ε)∩ΘE log L(θ)− E log L(θ0)

< 0 for any ε > 0, where Bc(θ0, ε) is

the compliment of an ε-neighborhood of θ0. The second one is the uniform convergence in probability:

supθ∈Θ n1log L(θ)− 1nE log L(θ) = op(1).

(11)

Next, let us consider the the asymptotic distribution of the QML estimator ˆψ. To derive the asymptotic

normality, we need to consider the the Taylor expansion of ∂ψ log Ln( ˆψ) at ψ0. The first-order derivatives of

the log-likelihood function at ψ has the elements

∂ log L(ψ) ∂β = X Σ−1(η)(Y − Xβ), ∂ log L(ψ) ∂σ2i = 1 2tr(Σ −1(η)G i(ρi)) +12(Y − Xβ)Σ−1(η)Gi(ρi−1(η)(Y − Xβ), ∂ log L(θ) ∂ρi = σ2i 2 tr(Σ −1(η)H i(ρi))−σ 2 i 2 (Y − Xβ) Σ−1(η)H i(ρi−1(η)(Y − Xβ), where η = (σ20, σ21, . . . , σp2, ρ1, . . . , ρp), A−1i (ρi) = (Ii− ρiWi)−1(Ii− ρiWi)−1, Bi(ρi) = Wi(Ii− ρiWi) +

(Ii− ρiWi)Wi, Gii) = J UiA−1i i)UiJ, and Hii) = J UiA−1i i)Bii)A−1i i)UiJ. By the mean value theorem, n( ˆψ− ψ0) = 1 n 2log L( ¯ψ) ∂ψ∂ψ −1 1 n ∂ log L(ψ0) ∂ψ ,

where ¯ψ lies between ˆψ and ψ0.

The score function which is the first-order derivatives of the log-likelihood function at ψ0, ∂ log L(ψ∂ψ 0), are

linear and quadratic functions of u0= (Y − Xβ0). By applying the central limit theorem for linear-quadratic

forms by Kelejian and Prucha (2001) to the score functions, we have the asymptotic normality for the QMLE ˆ

ψ under proper asymptotic behavior of the Hessian matrix and the variance of the score function whose

explicit forms are given in Appendix.

Theorem 2. Under Assumptions 1-7, if there exist Σ =− limn→∞E

1 n∂ 2log L(ψ 0) ∂ψ∂ψ and Γ = limn→∞n1E ∂ log L(ψ) ∂ψ ∂ log L(ψ)∂ψ

, and−Σ is positive definite, then

n( ˆψ− ψ0)−→ N(0, ΣD −1ΓΣ−1).

4

Empirical Application

We apply the proposed model to happiness survey data in Japan to analyze the effect of individual

charac-teristics on happiness and spatial correlation of regional effect which is random effects in each region. In this

analysis, we use

(12)

Macromill Co, LTD, which is a market research company in Japan, to conduct a survey of 26, 984 people

living in 1534 cities. Here, respondents were selected so that the distribution of age, population, and area of

residence would be the same as that of the Japanese census. The demographic information of the respondents

contains gender, age, personal and family incomes, marital status, and presence of children.

Individual happiness for dependent variables, Y , was collected as a response to the question: Currently,

how happy do you feel? Score the degree of your happiness between 1 (very unhappy) and 10 (very happy).

Thus, the happiness is measured discrete values between 1 and 10.

We use dummy variables created from the demographic information as explanatory variables, X. Age

and gender are divided into 22 categories, namely, all the respondents were separated into the two groups of

female and male, each of which is categorized as the 11 mutually disjoint subgroups corresponding with (1)

age < 20, (2) < 25, (3) < 30,. . . ,(10) < 65 and (11) ≥ 65. As the result, we obtain the 22 disjoint groups in total and define the group of female with age younger than 20 as the base group. Personal income is

categorized as the 7 mutually disjoint groups of income, i.e. (1) < 2 million yen, (2) < 4 million, (3) < 6

million, (4) < 8 million, (5) < 10 million, (6) < 12 million and (7)≥ 12 million yen, with the group less than 2 million yen as the base, while family income is categorized as the 10 groups of income, i.e. (1) < 2 million

yen, (2) < 4 million, (3) < 6 million, (4) < 8 million, (5) < 10 million, (6) < 12 million, (7) < 15 million, (8)

< 20 million, (9)≥ 20 million yen and (10) the group of no-response, with the group less than 2 million yen

as the base. We have personal and family income as the categorical variables with 7 and 10 subgroups as

the result, respectively. Presence of child is summarized as the dummy variable of taking 1 if a respondent

have more than one child and 0 otherwise. Martial status is recorded as the category variable with the three

groups of (1) single, (2) married and (3) divorced or widowed, with the single group taken as the base.

Regional dummy variables, J , is the 26, 984× 1845 matrix. Here, we note that the rank of J is less than 1845, at 1534 which is the number of cities which at least one respondent belong to. Let us remember that

our proposed models can also estimate regional effects, d, in areas where there are no observed individuals

by using the information of the surrounding regions where observed individuals exist. Thus, the matrix, J ,

contains the columns whose elements are zeros which correspond to the areas where there are no respondents.

We use two spatial weight matrix(SWM), one at the prefecture level and the other are the city level. The

prefecture level SWM, W1, is a 46× 46 matrix based on the adjacencies of each prefecture in Japan, namely,

if the prefecture i and j share a border, then the i, j and j, i elements of W1is 1, and otherwise 0. The city

level SWM, W2, is a 1845× 1845 matrix created in two-steps. Firstly, If the distance between city i and j

(13)

Figure 1: a plot of estimated regional effects for individual happiness for each cities with spatial error models for multilevel data which contains city level and prefecture level random effects by applying it them to happiness dataset in Japan.

within 30 km of a city, then the element of W2 corresponding to the three closest cities is set to 1.

Table 1 reports the estimates of the parameters for SEMM models with standard errors. As a benchmark

for comparison, those of MANOVA models which is a special case of SEMM models where spatial parameters

for random effects are equal to zeros, i.e. ρi = 0, i = 1, 2. In comparison between fittings of SMMM models and ANOVA models, the former model accounts for happiness better than the later model in terms of

Akaike information criterion (AIC). This indicated that taking into account of spatial correlation improve

the accuracy of the model fitting. We find from table 1 that spatial correlation between random effects for city

level, ρ1, are positively significant at 5 % level, which indicates that random effect on a city takes similar value

with random effects on surrounding cities of the city. One reason which derives spatial correlation between

city level random coefficient is the similarity of culture or customs which greatly affects the way people feel

about their happiness. Figure 1 is a plot of estimated regional effects for each cities, d, which mean average

happiness of people living in that city. We can find that spatial cluster in the regional effects exists and

the level of happiness in the southern and middle regions of Japan is higher than that in the eastern region.

Especially, eastern coastal regions show the lowest level of happiness compared to other regions. The reason

for this is thought to be that the effects of the nuclear accident caused by the East-Japan earthquake which

occurred in 2011 are still lingering. The estimated coefficients of age takes a U-Shaped, namely, coefficients

decreases until middle age and then increases. Moreover, the coefficients of gender indicates that female is

(14)

increase and getting married greatly increases people’s happiness.

5

Conclusion

We have proposed spatial error models for multilevel dataset which is a spatial extension of analysis of variance

models in this paper. Because the proposed model can be regarded as a Bayesian hierarchal model, we have

introduced empirical bayesian estimation methods in two steps as estimation strategy for the parameters

in the proposed models. The first step estimator specifies the hyper parameters and has been justified in

asymptotic situations, and the second step estimator for parameters are derived by the Bayes formula with

the hyperparameters estimated in the first step. Fitting the proposed model to happiness survey data in

Japan, we can evaluate the effect of individual and regional level explanatory variables on happiness and

spatial correlation of regional effects which are random effects in each regions. Empirical results suggest that

happiness is U-shaped with age, female’s happiness is higher than male’s happiness at all ages, and regional

effects on happiness are spatially correlated. The existence of spatial correlations between random effects

indicates that unobserved features which affect on individual happiness such as culture and custom tend to

be similar in nearby regions.

For future study, several extensions are possible. In this analysis, we regard individual happiness as

con-tinuous variables. However, the treatment creates a gap between the data and the model because individual

happiness takes only discrete values between 1 and 10. Thus, the extension of the proposed model to discrete

choice models fills the gap and allows for rigorous analysis of happiness. One more possibility is a panel

ex-tension of the proposed model. Our proposed model can capture only spatial correlation. However, it is said

that happiness on individual has a time series correlation. A panel extension would reveal more interesting

(15)

Table 1: Estimation values and their standard errors forβ1, β2, log likelihood (logL) and Akaike Information Criterion (AIC) in both spatial error models for multilevel data (SEMM2) and mixed analysis of variance models(MANOVA2) which contains city level and prefecture level random effects, and estimates and standard errors of spatial parameters ρ1 andρ2 in SEMM2 which are obtained by applying them to happiness dataset in Japan.

SEEM2 MANOVA2

coef s.e coef s.e

20< Female < 25 -0.007 0.076 -0.005 0.076 Female < 30 -0.234 0.071 -0.232 0.071 Female < 35 -0.522 0.069 -0.521 0.069 Female < 40 -0.694 0.068 -0.691 0.068 Female < 45 -0.718 0.064 -0.717 0.064 Female < 50 -0.857 0.064 -0.856 0.064 Female < 55 -0.811 0.067 -0.813 0.066 Female < 60 -0.789 0.068 -0.790 0.068 Female < 65 -0.504 0.067 -0.506 0.067 Female > 65 -0.172 0.063 -0.172 0.062 Male < 20 0.185 0.078 0.184 0.078 Male < 25 -0.186 0.072 -0.183 0.072 Male < 30 -0.836 0.070 -0.835 0.070 Male < 35 -1.117 0.068 -1.118 0.068 Male < 40 -1.409 0.069 -1.411 0.068 Male < 45 -1.557 0.065 -1.558 0.065 Male < 50 -1.628 0.065 -1.629 0.065 Male < 55 -1.692 0.068 -1.694 0.068 Male < 60 -1.799 0.069 -1.802 0.069 Male < 65 -1.308 0.068 -1.311 0.068 Male < 65 -0.915 0.065 -0.917 0.064

200 < Personal Income (PI) < 400 0.076 0.032 0.075 0.032

PI < 600 0.263 0.041 0.264 0.041

PI < 800 0.335 0.057 0.338 0.057

PI < 1000 0.401 0.081 0.403 0.081

PI < 1200 0.572 0.120 0.577 0.120

PI > 1200 0.458 0.146 0.459 0.146

200 < Family Income (FI) < 400 0.055 0.048 0.055 0.048

FI < 600 0.337 0.048 0.339 0.048 FI < 800 0.494 0.052 0.495 0.052 FI < 1000 0.619 0.058 0.621 0.058 FI < 1200 0.769 0.069 0.770 0.069 FI < 1500 0.881 0.087 0.882 0.087 FI < 2000 1.181 0.117 1.183 0.117 FI > 2000 1.111 0.162 1.109 0.163 FI Unknown 0.137 0.045 0.138 0.045 Married 0.961 0.041 0.961 0.041 Divorced 0.390 0.057 0.391 0.057 Children 0.109 0.035 0.110 0.035 rho1(city) 0.440 0.013 rho2(Pref) 0.252 1.217

(16)

Appendix A. Hessian and Covariance matrix

Here, we show the detailed expression of the Hessian matrix and covariance matrix which is discussed in

The-orem 2. Firstly, we show the Hessian matrix. For simplicity, we denote A−1i i) = (Ii−ρiWi)−1(Ii−ρiWi)−1,

Bii) = Wi(Ii−ρiWi) + (Ii−ρiWi)Wi, Gii) = J UiA−1i i)UiJ, Hii) = J UiA−1i i)Bii)A−1i i)UiJ,

H1,ii) = J UiA−1i i)Bii)A−1i i)Bii)A−1i i)UiJ, and H2,ii) = J UiA−1i i)WiWiA−1i i)UiJ, i =

1, . . . , p. Moreover, we define A00)−1 = IL and Ui = IL. Then, the variance matrix of the proposed

model is given by, Σ(η) =pi=0σi2Gii), where η = (σ20, σ12, . . . , σp2, ρ1, . . . , ρp). Moreover, the derivatives of

Gii), Hii), Σ(η) are given by, ∂Gi(ρi) ∂ρ2 i =−Hi(ρi), ∂Hi(ρi) ∂σ2 i = −2(H1,i(ρi) + H2,i(ρi)), ∂Σ(η) ∂σ2 i = Gi(ρi) and ∂Σ(η) ∂ρi =−σ 2 iHi(ρi), respectively.

By using above notations, the gradients of the log-likelihood function, ∂ log L(ψ)∂ψ , is given by

∂ log L(ψ) ∂β = X Σ−1(η)(Y − Xβ), ∂ log L(ψ) ∂σ2i = 1 2tr(Σ −1(η)G i(ρi)) +12(Y − Xβ)Σ−1(η)Gi(ρi−1(η)(Y − Xβ), ∂ log L(θ) ∂ρi = σ2i 2 tr(Σ −1(η)H i(ρi))−σ 2 i 2 (Y − Xβ) Σ−1(η)H i(ρi−1(η)(Y − Xβ).

Moreover, the hessian matrix of the log-likelihood function, 2∂ψ∂ψlog L(ψ) has the elements:

2log L(ψ) ∂β∂β =−XΣ −1(η)X, 2log L(ψ) ∂β∂σi2 =−X Σ−1(η)G i(ρi−1(η)(Y − Xβ), 2log L(ψ) ∂β∂ρi = σ 2 iXΣ−1(η)Hi(ρi−1(η)(Y − Xβ), 2log L(ψ) ∂σ2i∂σ2i = 1 2tr(Σ −1(η)G i(ρi−1(η)Gi(ρi)) − (Y − Xβ)Σ−1(η)G i(ρi−1(η)Gi(ρi−1(η)(Y − Xβ),

(17)

2log L(ψ) ∂σi2∂σj2 = 1 2tr(Σ −1(η)G j(ρj−1(η)Gi(ρi)) − (Y − Xβ)Σ−1(η)G j(ρj−1(η)Gi(ρi−1(η)(Y − Xβ), 2log L(ψ) ∂σ2i∂ρi = σi2 2 tr(Σ −1(η)H i(ρi−1(η)Gi(ρi))12tr(Σ−1(η)Hi(ρi)) + σi2(Y − Xβ)Σ−1(η)Hii−1(η)Gii−1(η)(Y − Xβ) 12(Y − Xβ)Σ−1(η)Hii−1(η)(Y − Xβ) 2log L(ψ) ∂σi2∂ρj = σ2j 2 tr(Σ −1(η)H j(ρj−1(η)Gi(ρi)) + σj2(Y − Xβ)Σ−1(η)Hjj−1(η)Gii−1(η)(Y − Xβ), 2log L(ψ) ∂ρi∂ρi = σi4 2 tr(Σ −1(η)H i(ρi−1(η)Hi(ρi))− σi2tr(Σ−1(η)(H1,i(ρi) + H2,i(ρi)) −σ4i 2 (Y − Xβ) Σ−1(η)H i(ρi−1(η)Hi(ρi−1(η)(Y − Xβ)

+ σi2(Y − Xβ)Σ−1(η)(H1,ii) + H2,ii))Σ−1(η)(Y − Xβ),

2log L(ψ) ∂ρi∂ρj = σi2σ2j 2 tr(Σ −1(η)H j(ρj−1(η)Hi(ρi)) −σ 2 iσj2 2 (Y − Xβ) Σ−1(η)H j(ρj−1(η)Hi(ρi−1(η)(Y − Xβ).

Next, let us consider the variance matrix of the log likelihood function, E(∂ log L(ψ0)

∂ψ ∂ log L(ψ∂ψ 0)). The

explicit form of each elements can be obtained form Lemma 4 in Appendix B:

E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂β = XΣ−1(η0)X, E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂σ2i = 1 2X Σ−1 0)E(u0u−1(η0)Gi(ρ0i−1(η0)u0), E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂ρi =−σ 2 0i 2 X Σ−1 0)E(u0u−1(η0)Hi(ρ0i−1(η0)u0),

(18)

E ∂ log L(ψ0) ∂σi2 ∂ log L(ψ0) ∂σi2 =1 4[E(u −1(η0)Gi(ρ0i−1(η0)u0)]2+1 4E[(u −1(η0)Gi(ρ0i−1(η0)u0)2], E ∂ log L(ψ0) ∂σi2 ∂ log L(ψ0) ∂σj2 =1 4E(u −1(η0)Gi(ρ0i−1(η0)u0)E(u−1(η0)Gj(ρ0j−1(η0)u0) +1 4E[u −1(η0)Gi(ρ0i−1(η0)u0u−1(η0)Gj(ρ0j−1(η0)u0], E ∂ log L(ψ0) ∂σi2 ∂ log L(ψ0) ∂ρj = σ 2 j 4 E(u −1(η0)Gi(ρ0i−1(η0)u0)E(u−1(η0)Hj(ρ0j−1(η0)u0), 14E[u0Σ−1(η0)Gi(ρ0i−1(η0)u0u−1(η0)Hj(ρ0j−1(η0)u0], E ∂ log L(ψ0) ∂ρ2i ∂ log L(ψ0) ∂ρj =−σ 2 j 4 E(u −1(η0)Hi(ρ0i−1(η0)u0)E(u−1(η0)Hj(ρ0j−1(η0)u0), +1 4E[u −1(η0)Hi(ρ0i−1(η0)u0u−1(η0)Hj(ρ0j−1(η0)u0], where η0= (σ002 , σ012 , . . . , σ0p2 , ρ01, . . . , ρ0p).

6

Appendix B. Some useful lemmas

We introduce some lemmas which are used in the proofs of the following main results. The lemmas are a

little modifications of lemmas in Lee(2004) for non-square matrices.

Lemma 1 Let A be an n× m non-square matrix whose column sums are uniformly bounded, C be a n × k matrix whose elements are uniformly bounded, and fi be i.i.d noise with mean 0 and variance σ2. Then,

1

mCAf = Op(1).

Proof. Let B = CA, bi,j be the (i, j)-th element of B and bibe the i-th coumn of B. Because the elements of C are uniformly bounded and the column sums of A are uniformly bounded, the element of B is uniformly

bounded by Lemmas in Lee (2004). Let b be a constant such as |bi,j| ≤ b. Because Bf = mi=1bifi,

V ar(Bf ) = E(mi=1mi=1bififjbj) = σ2mi=1bibi i=1m b1k1k = O(m), where 1k is a k× 1 vector whose elements are 1. Thus, 1mCAf = Op(1) by Chebyshev’s inequality.

Lemma 2 Let A be an m1× m2 non-square matrix whose column sums are uniformly bounded, f1,i and

f2,i are i.i.d noise with mean 0 and variance σ1and σ2, respectively. Then,

• E(f1Af2) = 0.

(19)

• f1Af2= Op(√m1).

Proof. E(f1Af2) =mi=11 j=1m2 ai,jE(f1,if2,j) = 0. V (f1Af2) =

m1 i1=1 m1 i2=1 m2 j1=1 m2 j2=1ai1,j1ai2,j2E(f1,i1f1,j2f2,j1f2,j2) = σ21σ22 m1 i=1 m2 j=1a2i,j ≤ σ21σ22mi=11( n j=1|ai,j|)2

σ21σ22mi=11 c2= O(m1). Thus, f1Af2= Op(√m1) by Chebyshev’s inequality.

Lemma 3 Let Aibe an mi×mimatrix for i = 1, . . . , p, B be an n×n matrix, C be an n×k matrix, and ε and

fi, i = 1, . . . , p be an n×1 and mi×1 random noise with means 0 and variances σ02and σi2. Moreover, we define

Ui be an n× mimatrix which consists only of zeros and ones and there exist one 1 in each row and at least one 1 in each column, i = 1, . . . , p We denote u = ε +pi=1UiAifi and m = min{m1, . . . , mp}. We assume

Uiis uniformly bounded in column sums, the elements of C is uniformly bounded, B is uniformly bounded in both row and column sums, and mi is a function of n and tends to infinity and limn→∞mi

n = ci≤ 1. Then, 1 nCBu = op(1). 1 nuBu = Op(1). 1 n(uBu− E(uBu)) = op(1).

Proof. By the Lemma,

1 nC Bu = 1 nC Bε +p i=1 1 nC BU iAifi, = 1 n 1 nC Bε +p i=1 mi n 1 m i 1 m iC BU iAifi, = o(1)Op(1) + p  i=1 O(1)o(1)Op(1), = op(1).

We denote f0= ε0, A0= In and U0= In. Because u0= U0A0f0+ U1A1f1+· · · + UpApfp,

1 mu Bu =p i=0 p  j=0 1 mf  iAiUiBUjAjfj.

(20)

column sums, by Lee(2004), 1 nf  iAiUiBUiAifi= mnim1 if  iAiUiBUiAifi, = O(1)Op(1), = Op(1). Moreover, mi n m1i(f  iAiUiBUiAifi− E(fiAiUiBUiAifi) = O(1)op(1) = op(1).

Secondly, we will consider the case of i= j. By the Lemma, 1 nf  iAiUiBUjAjfj= mni√1m i 1 m if  iAiUiBUjAjfj, = O(1)o(1)Op(1), = op(1). It is clear that mi n m1i(f  iAimniUimnBUjAjfj− E(fiAimniUimnBUjAjfj) = op(1)

Therefore, 1nuBu = Op(1) and n1(uBu− E(uBu)) = op(1).

Lemma 4 Let A be an m1× m2non-square matrix, Ti= J Ui(Ii− ρiWi)−1, and fii = 0, . . . , p are mi× 1

i.i.d. random noise with mean 0 and variances σi2, respectively. Moreover, the elements of each fi has more than fourth moment, i.e. E|f1,i|4+δ<∞ for some δ > 0. Let us define u =pi=1Tifi. Then,

1. E(uu) = Σ(η).

2. E(uAu) =pi=0σi2tr(TiATi).

3. E(uuAu) =pi=0μi,3Tidiag(TiATi). 4. E(uAuuBu) =pi=1

i,4−3σ4i)nj=1(TiATi)j,j(TiBTi)j,j4i(tr(TiATi)tr(TiBTi)+tr(TiATi(Ti(B+

B)Ti))

(21)

Proof. E(uu) = E p i=0 Tifi p i=0 fiTi , = p  i=0 TiE(fifi)Ti, = p  i=0 σi2TiTi, = Σ(η). E(uAu) = E p i1=0 p  i2=0 fi1Ti1ATi2fi2 , = p  i=0 E(fiTiATifi), = p  i=0 σi2tr(TiATi), E(uuAu) = E p i1=0 p  i2=0 p  i3=0 Ti1fi1fi2Ti2ATi3fi3 , = p  i=0 E(TififiTiATifi), = p  i=0 E Tifi mi  j1=1 mi  j2=1 (TiATi)j1,j2fi,j1fi,j2 , = p  i=0 μi,3Tidiag(TiATi).

E(uAuuBu) = E

p i1 p  i2 p  i3 p  i4 fi1Ti1ATi2fi2fi3Ti3BTi4fi4 , = p  i E(fiTiATififiTiBTifi) + p  i1 p  i2 =i1 E(fi1Ti1ATi1fi1fi2Ti2BTi2fi2) + p  i1 p  i2 =i1 E(fi1Ti1ATi2fi2fi1Ti1BTi2fi2) + p  i1 p  i2 =i1 E(fi1Ti1ATi2fi2fi2Ti2BTi1fi1), = p  i=1 i,4− 3σi4) n  j=1 (TiATi)j,j(TiBTi)j,j+ σi4(tr(TiATi)tr(TiBTi) + tr(TiATi(Ti(B + B)Ti)) + p  i1 p  i2 σi21σ2i2tr(Ti1ATi1)tr(Ti2BTi2) + 2 p  i1 p  i2 σ2i1σ2i2tr(Ti1ATi1Ti2BTi2)

(22)

Appendix C. Proofs of the theorems

Proof of theorem 1

To prove the consistency of QMLE ˆθ, it is sufficient to show that the following two facts hold (See white

(1994)). The first one is the identification uniqueness condition: lim supn→∞

maxθ∈Bc0,ε)∩ΘE log L(θ)−

E log L(θ0)

< 0 for any ε > 0, where Bc(θ0, ε) is the compliment of an ε-neighborhood of θ0. The second

one is the uniform convergence in probability: supθ∈Θ n1log L(θ)− 1nE log L(θ) = op(1).

The identification uniqueness

Firstly, we will show that the identification uniqueness condition hold. From the definition of the concentrated

expected log-likelihood function, we have

1 n(E log L(θ)− E log L(θ0)) = 1 2log(˜σ 2 0(θ))− 1 2nlog|Ω(θ)| + 1 2log(σ 2 00(θ)) + 1 2nlog|Ω(θ0)|, = 1 2nlog|˜σ 2

0(θ)In| −2n1 log|Ω(θ)| +2n1 log002 (θ)In| +2n1 log|Ω(θ0)|,

= 1 2nlog 2 00Ω(θ0)| − 1 2nlog|ˆσ 2 0(θ)Ω(θ)|.

By Assumption 7, for any θ= θ0,

lim n→∞ 1 n log002 Ω(θ0)| − log |˜σ02(θ)Ω(θ)| = 0. Thus, lim n→∞ 1 n(E log L(θ)− E log L(θ0))= 0.

Let pn(β, σ02, θ) = exp(log L(β, σ20, θ)) be the quasi-joint p.d.f of u0= (Y − Xβ0) and p0n(β, σ20, θ) be the

true joint p.d.f. We denote Eq as the expectation with respect to pn(β, σ02, θ) and E as the expectation with

respect to p0n(β, σ02, θ).

By the Jensen’s inequality,

0 = log Eq pn(β, σ20, θ) pn0, σ200, θ0) ≥ Eqlog pn(β, σ02, θ) pn0, σ002 , θ0)

(23)

Thus, Eqlog pn(β, σ02, θ) pn0, σ200, θ0) = E log pn(β, σ20, θ) pn0, σ200, θ0) . This implies E log L(θ) = max β,σ2 0

E[log L(β, σ02, θ)]≤ E[log L(β0, σ200, θ0)] = E log L(θ0).

Collecting the above results, we have

lim n→∞ max θ∈Bc0,ε)∩ΘE log L(θ)− E log L(θ0) < 0,

for any ε > 0, where Bc(θ0, ε) is the compliment of an ε-neighborhood of θ0. The identification uniqueness

condition holds.

Uniform convergence

Secondly, we will show that the uniform convergence condition hold. From the definition, we have

1 nlog L(θ)− 1 nE log L(θ) =− 1 2log ˆσ 2 0+ 1 2log ˜σ 2 0.

By the mean value theorem,

| log ˆσ2 0− log ˜σ02| = 1 ¯ σ20|ˆσ 2 0− ˜σ02|,

where ¯σ02 lies between ˆσ02 and ˜σ20. It is sufficient to show the following two facts. The first one is ˜σ02 is uniformly bounded away from zero and the second one is uniform convergence of|ˆσ02− ˜σ20| in probability.

Firstly, we will show that ˜σ20 is uniformly bounded away from zero. By Assumption 4,

inf θ∈Θ˜σ 2 0(θ) = infθ∈Θ σ002 n tr(Ω(θ) −1Ω(θ 0)) , ≥ σ2 00θ∈Θinfmin(Ω−1(θ)))1 ntr(Ω(θ0)), ≥ c0cωc1, > 0,

(24)

where c0and c1are some constants. Therefore, ˜σ20 must be uniformly bounded away from zero.

Secondly, we will show that supθ∈Θ|ˆσ02− ˜σ02| = op(1). Because M (θ)Ω−12(θ)X = 0,

ˆ σ02− ˜σ02= 1 nY Ω1 2(θ)M (θ)Ω12(θ)Y 1 nE[u  1 2(θ)M (θ)Ω12(θ)u 0] 1 nE[u  1 2(θ)P (θ)Ω12(θ)u 0], = 1 n u0Ω12(θ)M (θ)Ω12(θ)u0− Eu 1 2(θ)M (θ)Ω12(θ)u0 n1E u0Ω12(θ)P (θ)Ω12(θ)u0 .

We will consider the uniform convergence of the above two terms.

Let us consider the uniform convergence of the second term. We note that 0 < cωcx≤ infθ∈Θγmin(Ω−1(θ))γmin

XX n ≤ γmin XΩ−1(θ)X n ≤ γmax XΩ−1(θ)X n

≤ supθ∈Θγmax(Ω−1(θ))γmax

XX n ≤ cωcx<∞. By assumption 4 and 5, sup θ∈Θ 1nE u0Ω12(θ)P (θ)Ω12(θ)u0 = sup θ∈Θ n1σ 2 00tr Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)Ω(θ0) , n1sup θ∈Θ σ2 00γmax XΩ−1(θ)X n −1 γmax(Ω−2(θ)) γmax(Ω(θ0))n1tr(XX), n1σ200sup θ∈Θ γmax XΩ−1(θ)X n −1 sup θ∈Θ γmax(Ω−2(θ)) γmax(Ω(θ0))1 ntr(XX ), = 1 nO(1)O(1)O(1)O(1)O(1), = o(1).

This implies that the second term converges uniformly.

To show that the uniform convergence of the first term, we will show that the pointwise convergence and

stochastic equicontinuity of the term (See Andrew (1992)).

Firstly, we will consider the pointwise convergence of the first term. From Assumption 4 and 5, Ω−1(θ) and

X(XΩ−1(θ)X)−1Xare uniformly bounded in both row and column sums. Therefore, Ω12(θ)M (θ)Ω12(θ) =

Ω−1(θ)−Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ) is uniformly bounded in both row and column sums. By Lemma 3, it follows hat 1n u0Ω12(θ)M (θ)Ω−12(θ)u0− Eu 1 2(θ)M (θ)Ω−12(θ)u0

= op(1). This implies the first term converges pointwise.

(25)

theorem, 1 nu  1 21)M (θ1121)u0 1 nu  1 22)M (θ2122)u0 = 1 n p  i=1 ∂u0Ω12(¯θ)M (¯θ)Ω−12(¯θ)u0 ∂ρi (ρi,1− ρi,2) + 1 n p  i=1 ∂u0Ω12(¯θ)M (¯θ)Ω−12(¯θ)u0 ∂τi2 2 i,1− τi,22 ),

where ¯θ lies between θ1 and θ2. Thus, it is suffice to show that supθ∈Θ 1n∂u 

− 12(θ)M(θ)Ω− 12(θ)u0

∂ρi

= Op(1)

and supθ∈Θ n1∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0

∂τi

= Op(1) (See, Davidoson (1994)).

Here, we note that the partial derivatives of Ω−1(θ) are given by,

∂Ω−1(θ) ∂ρi =−τ 2 iΩ−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ), ∂Ω−1(θ) ∂τi =−Ω −1(θ)J U iA−1i (ρi)UiJΩ−1(θ), where Bii) = Wi(Ii− ρiWi) + (Ii− ρiWi)Wi.

Let us consider the uniform boundedness of n1∂u− 12(θ)M(θ)Ω− 12(θ)u0

∂ρi . The matrix Ω 1

2(θ)M (θ)Ω−12(θ) consists of the two termes Ω−1(θ) and Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ). The uniform boundedness of

1 n∂u −1(θ)u0 ∂ρi is given by, sup θ∈Θ n1 ∂u0Ω−1(θ)u0 ∂ρi = supθ∈Θ 1 2 iu−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)u0 , = sup θ∈Θ τ2 max(J UiA−1i (ρi)Bi−1(ρi)A−1i (ρi)UiJ)γmax2 (Ω−1(θ)) n1u  0u0, = O(1)O(1)O(1)Op(1), = Op(1).

Next, we will show that the uniform boundness of n1∂u−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0

∂ρi . The partial

derivative of the matrix is

1 n ∂u0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0 ∂ρi = 1 nu  0 ∂Ω−1(θ) ∂ρi X(X Ω−1(θ)X)−1XΩ−1(θ)u 0 + 1 nu −1(θ)X ∂(XΩ−1(θ)X)−1 ∂ρi X Ω−1(θ)u 0 + 1 nu −1(θ)X(XΩ−1(θ)X)−1X ∂Ω−1(θ) ∂ρi u0, = φ1+ φ2+ φ3,

(26)

where φ1= 1nu0∂Ω −1(θ) ∂ρi X(X Ω−1(θ)X)−1XΩ−1(θ)u 0, φ2= n1u−1(θ)X(XΩ−1(θ)X)−1X ∂Ω −1(θ) ∂ρi u0 and φ3= n1u−1(θ)X(XΩ−1(θ)X)−1X ∂Ω −1(θ) ∂ρi u0.

By Lemma 3, the uniform boundness of φ2 is given by

sup θ∈Θ n1u −1(θ)X ∂(XΩ−1(θ)X)−1 ∂ρi X Ω−1(θ)u 0 , = sup θ∈Θ n1τ 2 iu−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0 , ≤ sup θ∈Θ τ2

max(J UiA−1i (ρi)Bi(ρi)Ai−1(ρi)UiJ)γmax(Ω−1(θ))γmax((X

Ω−1(θ)X)−1 n max( XX n max(Ω −2(θ)) 1 nu  0u0, = O(1)O(1)O(1)O(1)O(1)O(1)Op(1), = Op(1).

Let us consider the uniform boundness of φ1. The term is

1 nu  0 ∂Ω−1(θ) ∂ρi X(X Ω−1(θ)X)−1XΩ−1(θ)u 0, = τi21 nu −1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0, = tr(a(θ)b(θ)),

where a(θ) = τi21nu0Ω−1(θ)J UiA−1i i)Bii)Ai−1(ρi)UiJand b(θ) =√1nΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0

It suffices to show that

supθ∈Θ tr(a(θ)b(θ)) 2

= Op(1). Because of (supθ∈Θf (θ))2 = supθ∈Θf (θ)2, by Cauchy-Schwarz inequality, sup θ∈Θ tr(a(θ)b(θ)) 2= sup θ∈Θtr 2(a(θ)b(θ)), ≤ sup θ∈Θtr(a (θ)a(θ)) sup θ∈Θtr(b (θ)b(θ)).

(27)

The uniform boundenss of tr(a(θ)a(θ)) is given by, sup θ∈Θtr(a (θ)a(θ)) = sup θ∈Θ τi21 ntr(u −1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJJ UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)u0) , ≤ sup θ∈Θ τi2γ2max(J UiA−1i i)Bii)A−1i i)UiJ)γmax2 (Ω−1(θ)) 1 nu  0u0, = O(1)O(1)O(1)Op(1), = Op(1).

Similarly, the uniform boundness of tr(b(θ)b(θ)) is given by,

sup θ∈Θtr(b (θ)b(θ)) = sup θ∈Θ 1 ntr(u −1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0) , ≤ sup θ∈Θ γmax(Ω−1(θ))γmax((X Ω−1(θ)X)−1 n max( XX n 2 max(Ω−1(θ)) 1 nu  0u0, = O(1)O(1)O(1)O(1)Op(1), = Op(1).

The uniform boundedness of 1n∂u− 12(θ)M(θ)Ω− 12(θ)u0

∂τi can be proved by the similar manner. By collecting

above results, we can show that the uniform convergence of 1n

u0Ω12(θ)M (θ)Ω12(θ)u0−Eu0Ω12(θ)M (θ)Ω12(θ)u0

.

Therefore, supθ∈Θ n1log L(θ)−n1E log L(θ) = op(1), and the QMLE ˆθ is a consistent estimator of θ0by White

(1994).

Proof of Theorem 2

To derive the asymptotic normality of the proposed estimator, we will show that the following three results:

1. n12log L(ψ0) ∂ψ∂ψ p −→ E 1 n∂ 2log L(ψ0) ∂ψ∂ψ . 2. n12∂ψ∂ψlog L( ¯ψ) p −→ 1 n∂ 2log L(ψ0) ∂ψ∂ψ . 3. 1n∂ log L(ψ0) ∂ψ −→ N(0, Γ).D

Figure 1: a plot of estimated regional effects for individual happiness for each cities with spatial error models for multilevel data which contains city level and prefecture level random effects by applying it them to happiness dataset in Japan.
Table 1: Estimation values and their standard errors for β 1 , β 2 , log likelihood (logL) and Akaike Information Criterion (AIC) in both spatial error models for multilevel data (SEMM2) and mixed analysis of variance models(MANOVA2) which contains city le

参照

関連したドキュメント

Based on the asymptotic expressions of the fundamental solutions of 1.1 and the asymptotic formulas for eigenvalues of the boundary-value problem 1.1, 1.2 up to order Os −5 ,

We presented simple and data-guided lexisearch algorithms that use path representation method for representing a tour for the benchmark asymmetric traveling salesman problem to

If in the infinite dimensional case we have a family of holomorphic mappings which satisfies in some sense an approximate semigroup property (see Definition 1), and converges to

The numerical tests that we have done showed significant gain in computing time of this method in comparison with the usual Galerkin method and kept a comparable precision to this

It is also aimed to bring out the effect of body acceleration, stenosis shape parameter, yield stress, and pressure gradient on the physiologically important flow quantities such as

Then the strongly mixed variational-hemivariational inequality SMVHVI is strongly (resp., weakly) well posed in the generalized sense if and only if the corresponding inclusion

The group acts on this space by right translation of functions; the implied representation is smooth... We want to compute the cocy-

In Figure 6(a) we present the final drawing of Trans graph using Module Drawing, in Figure 6(b) we show its modular decomposition tree and in Figure 6(c) we present