Spatial Extension of Mixed Analysis of
Variance Models
著者
Sato Takaki, Matsuda Yasumasa
journal or
publication title
DSSR Discussion Papers
number
120
page range
1-31
year
2021-02
URL
http://hdl.handle.net/10097/00130485
Data Science and Service Research
Discussion Paper
Discussion Paper No. 120
Spatial Extension of Mixed Analysis of
Variance Models
Takaki Sato and Yasumasa Matsuda
February, 2021
Center for Data Science and Service Research
Graduate School of Economic and Management
Tohoku University
27-1 Kawauchi, Aobaku
Sendai 980-8576, JAPAN
Spatial Extension of Mixed Analysis of Variance Models
Takaki Sato
∗1and Yasumasa Matsuda
21
Advanced institute for Yotta informatics, Tohoku University, Sendai, Japan
2
Graduate School of Economics and Management, Tohoku University, Sendai, Japan
Abstract
This paper proposes a spatial extension of mixed analysis of variance models for spatial multilevel data in which individual belongs to one of spatial regions, which are called spatial error models for multilevel data (SEMM). We have introduced empirical bayes estimation methods in two steps because SEMM models which are defined by two level equations, individual and regional levels, can be regarded as a Bayesian hierarchal model. The first step estimator based on quasi-maximum likelihood estimation methods specifies the hyper parameters and has been justified in asymptotic situations, and posterior distributions for the parameters are evaluated with the hyperparameters estimated in the first step. The proposed models are applied to happiness survey data in Japan to demonstrate empirical properties.
Keywords: Spatial econometrics, Multilevel data, MANOVA model, Empirical Bayes method,
Quasi-maximum likelihood estimation.
1
Introduction
This paper aims to extend mixed analysis of variance (MANOVA) models for multilevel data (see Demidenko
(2013)) to those for spatial multilevel data, which we call spatial error models for multilevel data (SEMM).
Multilevel data is a kind of cluster data that is observations belong to some kinds of nested clusters (e.g.
students are members of one of schools in school effectiveness research) and widely used in both of social
and natural science (see De Leeuw et al. (2008) and Hox et al. (2017)). Spatial multilevel data in which
clusters are organized with spatial regions are also used in many fields to capture spatial correlation between ∗Corresponding author. E-mail address: [email protected]
regions. With spatial multilevel data, Fazio & Piacentino (2010) investigates spatial variability of Small and
Medium Enterprises productivity across the Italian territory, and Pierewan & Tampubolon (2014) examines
how spatial clusters explain variations in individual well-begin across regions in Europe.
MANOVA models which are linear regression models incorporating several kinds of random effect terms
corresponding to the type of clusters have been used to analyze multilevel data. By including random effect
terms, we can evaluate common feature within same clusters as grouping structures. Hartley and Rao (1967)
and Miller (1977) discuss asymptotic properties of maximum likelihood estimator for MANOVA models.
To evaluate spatial correlations between random effects in multilevel data, we provide SEMM models as a
spatial extension of MANOVA models in this paper. The conventional way to estimate spatial correlation is to
include spatial lag terms into regression models (see Anselin (1988) and Arbia (2014)), and thus we combine
spatial lag terms with random effect terms in MANOVA models to propose new spatial econometrics models.
Because SEMM models are defined by two level equations, individual level and regional level equations,
the models can be regarded as hierarchical bayesian models whose parameters and hyperparameters can be
estimated by empirical bayesian estimation methods in two steps. In the first step, hyperparameters are
estimated with quasi-maximum likelihood (QML) estimation methods, which makes it possible to apply a
method developed in spatial econometrics studies (see Lee (2004), Liu and Yang (2015), Su and Yang (2015),
and Yang (2018)) Posterior distributions for parameters are derived with hyperparemters estimated in the
first step.
The interesting feature of SEMM models are summarized as follows. Firstly, SEMM models can analyze
regional specific effects for dependent variables, considering the effect of individual characteristics. Here, let
us note that regional effects are not the same as random effects. Regional effects of a region are defined by the
effect of observed regional characteristics and the sum of random effects for clusters, each of which corresponds
to the group in clusters which the region belongs to. Spatial econometrics models that have been considered
ever can’t take into account of the effect of individual characteristics in estimating regional effect in multilevel
data because we need to summarize the data on regions where more than one individuals are observed to
apply the models, and then individual characteristics are lost. Defining the SEMM model in two level
equations would allow for both individual characteristics and regional effects in analysis. Secondly, spatial
correlations between random effects can be estimated. Some sources of random effects such as cultures or
customs specific to a region may tend to be similar to them in nearby regions. Then, random effects may have
spatial correlation, namely, regional effects in nearby areas may take on similar values. Therefore, taking into
we can estimate regional effects in areas where there are no observed individuals by using the information of
the region where observed individuals exist. Because regional effects are estimated by Bayesian estimation
method, we can evaluate regional effects for all regions, regardless of whether the individuals belong to them
or not by properly estimating the prior information of regional effects.
Applications of SEMM models to happiness survey data in Japan demonstrate several interesting features
of the effect of individual characteristics and regional specific characteristics on happiness. Firstly, individual
characteristics are important factor for happiness studies. People’s happiness is U-shaped with respect to
age, namely, happiness decreases until middle age and then increases. Moreover, female is basically happier
than male. Happiness increases monotonically as household income and personal income increase and getting
married greatly increases people’s happiness. Secondly, random effects for each city have spatial correlation.
The similarity of culture or customs of nearby areas which greatly affects the way people feel about their
happiness may cause the spatial correlation. Finally, spatial cluster exists in the regional effects which can be
regarded as average happiness of each regions. The level of happiness in the southern and middle regions of
Japan is higher than that in the eastern region, and the estimated happiness of eastern coastal regions is the
lowest. The reason for this is thought to be that the effects of the nuclear accident caused by the East-Japan
earthquake which occurred in 2011 are still lingering.
This paper is organized as follow. In Section 2, we define SEMM models as a spatial extension of MANOVA
models. A two step estimation method to estimate the parameters in SEMM models and asymptotic
prop-erties of the first step estimator are discussed in Section 3. We apply the SEMM model to happiness survey
data in Japan to demonstrate empirical properties of the proposed models in Section 4. Section 5 concludes
the paper. All the proofs in Section 3 are discussed in Appendix.
2
Model specification
Let n and L be the number of individual and regions, respectively. We assume that each individuals belong
to one of regions and admit that there are regions where no individuals are observed. In this paper, we call
the nested dataset whose grouping is based on spatial units as spatial multilevel data. Moreover, we assume
that the regions can be grouped into larger regional units by p different groupings, and let ml, l = 1, . . . , p, be
the number of larger regions obtained by the l-th grouping. For examples, several cities are grouped together
to form a prefecture in Japan.
j-th region and Y is the n× 1vector of yi,js. Spatial error models for multilevel data (SEMM) is given by,
Y = X1β1+ J d + ε, (1)
d = X2β2+ u, (2)
u = U1(I1− ρ1W1)−1f1+· · · Up(Ip− ρpWp)−1fp, (3)
where X1 is an n× k1 matrix for individual level explanatory variables, J is an n× L matrix for regional
dummy variables, X2 is an L× k2 matrix for regional level explanatory variables, Ul is an L× ml matrix
for a random effect which consists only of zeros and ones and there is exactly one 1 in each row and at
least one 1 in each column, l = 1, . . . , p, Il is an ml× ml identity matrix, and Wl is an ml× ml spatial weight matrix which describes spatial relationships among the l-th grouped regions. A random variables
εi, i = 1, . . . , n, is independent and identically distributed (i.i.d.) with mean 0 and variance σ20and an n× 1 vector ε = (ε1, . . . , εn), and fl,j, j = 1, . . . , mlis also i.i.d. with mean 0 and variance σ2l and an ml×1 vector
fl = (fl1, . . . , flml) is a random effect for the l-the groped regions. The vector β1 and β2 are regression
coefficients for individual and regional level explanatory variables, respectively, and ρlis a spatial correlation parameter which describe the strength of spatial dependence between regions in the l-th grouping.
SEMM models is a spatial extension of mixed analysis of variance (MANOVA) models because SEMM
models reduce to MANOVA models when spatial correlation parameters, ρls are equal to 0. In the analysis of spatial multilevel data, consideration of spatial correlation between random effects can improve the accuracy
of the model fitting. Random effects fl,j express the effect of the j-th region in the l-th larger regional units on regional effect, d, and some sources of random effects may be cultures or customs in the larger region.
Then, because the cultures and customs of nearby regions tend to be similar, random effects fl,j may have spatial correlation, namely, regional effects in nearby areas may take on similar values. Therefore, taking
into account the spatial correlation between regional effects allows for more detailed analysis of the spatial
multilevel data.
SEMM models are defined by two level equations, individual level equations (1) and region level equations
(2) and (3), and this two level modeling has some advantages. One advantage is that we can analyze regional
effect, d, considering the effect of individual characteristics. Spatial econometrics models that have been
considered can’t take into account of individual characteristics when we estimate regional effect in multilevel
data because they assume that exactly one observation is observed each regions. Thus, we need to summarize
one region. A commonly used method to summarize data is to take the average of the data in the area, but
then individual characteristics are lost then. Defining the SEMM model in two level equations would allow
for both individual and regional effects, which would allow for more accurate estimation of regional effects.
Another advantage of modeling multilevel models with the hierarchical structure is that we can estimate
regional effects, d, in areas where there are no observed individuals by using the information of the region
where observed individuals exist. Let us remember that J is a regional dummy matrix which may have
columns whose elements are all zeros because we admit the existence of regions where no individuals are
observed. Thus, usual ordinary least squares does not work because J is rank deficient. The proposed
model can be regarded as a Bayesian hierarchical model and equation (2) and (3) describe prior information
of regional effects. As discussed in the estimation section, by properly estimating the prior information of
regional effects with marginal likelihood which is based on the information of regions where individuals are
observed, we can evaluate regional effects for all regions, regardless of whether the individuals belong to them
or not.
3
Estimation
Let us consider a method to estimate the parameters for SEMM models and discuss asymptotic properties
of proposed estimators the size of ml, l = 1, . . . , p, tends to be infinity along with the sample size n. Because
the proposed model can be regarded as a Bayesian hierarchical model, we propose empirical Bayes
estima-tion procedure in two steps. The first step is the estimaestima-tion of the hyperparameters in prior distribuestima-tions
with quasi-maximum likelihood (QML) estimation methods, and the second step is calculation of posterior
distributions with the hyperparameters estimated in the first step. Moreover, we introduce the asymptotic
properties of the first step estimators when the sample size of both individuals and regions tends to be large.
3.1
Empirical Bayes Estimation
We introduce empirical Bayes estimation procedure in two steps. Let β = (β1, β2), τl = σ
2
l σ2
0, l = 1, . . . , p,
θ = (β2, τ1, . . . , τp, ρ1, . . . , ρp), ψ = (β, σ20, . . . , σ2p, ρ1, . . . , ρp) and δ = (β1, d). In this paper, we call σ20 and
θ as hyperparameters and δ as parameters, respectively.
The first step is the estimation of σ20and θ by a quasi-maximum likelihood (QML) estimation method with a marginal likelihood of Y . Let us denote that f (Y|β1, d, σ20) is a probability density function for the data
variables εi and fl,j which may be not normally distributed random variables follows normal distribution, and then the marginal distribution of Y ,
m(Y|ψ) =
f (Y|β, σ2ε, d)g(d|β2, ρ1, . . . , ρp, σ21, . . . , σp2)dd,
follows a multivariate normal distribution. Thus, the marginal log-likelihood function is given by
log L(ψ) =−n 2log(2πσ 2 0)− 1 2log|Ω(θ)| − (Y − Xβ)Ω−1(θ)(Y − Xβ) 2σ20 , (4) where X = (X1, J X2), Ω(θ) = In+ τ1J U1(I1− ρ1W1)−1(I1− ρ1W)−1U1J+· · · + τpJ Up(Ip− ρpWp)−1(Ip− ρpWp)−1UpJ.
We will derive a concentrated marginal log-likelihood function to reduce the number of parameters for
numerical optimization. The first-order condition of the marginal log-likelihood function is
ˆ β(θ) = (XΩ−1(θ)X)−1XΩ−1(θ)Y, ˆ σ02(θ) = 1 n(Y − X ˆβ(θ)) Ω−1(θ)(Y − X ˆβ(θ)).
By substituting ˆβ(θ) and ˆσ20(θ) into (4), we obtain the concentrated marginal log-likelihood function,
log L(θ) =−n 2(log(2π) + 1)− n 2log(ˆσ 2 0(θ))− 1 2log|Ω(θ)|.
Maximizing the concentrated marginal log-likelihood function gives the ML estimator ˆθ of θ, and then the
ML estimators ˆβ and ˆσ02are obtained by ˆβ = ˆβ∗(ˆθ) and ˆσ02= ˆσ02(ˆθ), respectively.
The second step is the Bayesian estimation of the parameters δ based on the estimated hyperparameters ˆ
β2, ˆρ, ˆσ20 and ˆσl2= ˆτlσˆ02, l = 1, . . . , p. Let δ = (β, d) and ˜X = (X, J ). The estimated posterior distribution
for δ is given by
P (δ| Y, ˜X, ˆβ2, ˆρ1, . . . , ˆρp, ˆσ02, . . . , ˆp2f, b)∝ L(Y | ˜X, δ, ˆσ20)π(δ| ˆβ2, ˆρ, ˆσ12, . . . , ˆσp2, b),
where b is a hyperparameter for a prior information for β, L(Y| ˜X, θ, ˆσ20) is the likelihood of the data Y , and π(θ| ˆβ2, ˆρ, ˆσ21, . . . , ˆσp2, b) is the prior distribution of the model parameters, δ. If the prior and posterior
distribution for δ is conjugate distributions, then prior distribution can be calculated explicit form, and if
Carlo (MCMC) methods.
As one example of conjugate distributions, we will show the explicit form of the posterior distribution
when the likelihood and the prior distribution are multivariate normal distributions and the number of
random effect is one. Then, the estimated posterior distribution follows a multivariate normal distribution.
We set prior means and the inverse of prior variance matrices of the multivariate normal distribution for
the prior distribution as ˆs0 = (0k×1, ˆβ2) and ˆS0−1 =
⎛ ⎜ ⎝ 0k×k 0k×m 0m×k σˆ12 1(In− ˆρW )(In− ˆρW ) ⎞ ⎟ ⎠, where 0n1×n2
is the n1× n2 matrix whose elements are zeros. Then, the posterior covariance matrix and mean vector
is S1 = 1 ˆ σ2 0 ˜ XX + ˆ˜ S0−1 −1 and s1 = S1 1 ˆ σ2 0 ˜ XY + ˆS−10 sˆ0
, respectively. Thus, the estimated posterior
distribution is given by
P (δ| Y, ˜X, ˆβ2, ˆρ1, ˆσ02, ˆσ21, b)∼ N(s1, S1),
where N (s1, S1) means the multivariate normal distribution with mean s1and covariance matrix S1. In other
cases, the posterior distribution can be derived in the same way.
3.2
Asymptotic properties
Here, we discuss the conditions under which the QML estimators ˆθ = ( ˆβ2, ˆτ1, . . . , ˆτp, ˆρ1, . . . , ˆρp) and ˆψ =
( ˆβ, ˆσ20, . . . , ˆσp2, ˆρ1, . . . , ˆρp) in the first step is consistent and asymptotically normal when the size of ml, l =
1, . . . , p, tends to be infinity along with the sample size n. All of the proofs and Lemmas for the asymptotic
results are given in the Appendix.
Let θ0= (δ0, τ10, . . . , τp0, ρ10, . . . , ρp0) and ψ0= (β0, σ002, θ0)be the true values for θ and ψ. Assume the
following conditions.
Assumption 1 The true parameter θ0lies in the interior of a compact parameter space Θ.
Asuumption 2 εi, i = 1, . . . , n and fl,j, l = 1, . . . , p, j = 1, . . . , ml are i.i.d with mean 0 and variances σ02 and σ2j, respectively. And, E|εi|4+δ<∞ and E|fl,j|4+δ<∞ for some δ > 0.
Assumption 3 The number of regions in l-th grouping, ml, tends to infinity along with the sample size n.
Assumption 4 The matrices J , Ui, Wi, (Ij− ρj,0Wj)−1and Ω−1(θ) is uniformly bounded in both row and column sums. Moreover, 0 < cω ≤ infθ∈Θγmin(Ω−1(θ))≤ supθ∈Θγmax(Ω−1(θ))≤ cω<∞.
Assumption 5 X has full column rank and its elements are uniformly bounded constants, limn→∞n1XΩ−1(θ)X exists and is non-singular.
Assumption 6 Let A−1i (ρi) = (Ii− ρi,0Wj)−1(Ii− ρi,0Wi)−1 and Bi(ρi) = (Ii− ρiWi)−1Wi+ Wi(Ii−
ρiWi)−1. We assume that supθ∈Θ|γmax(J UiA−1i (ρi)UiJ)| < ∞ and supθ∈Θ|γmax(J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJ)| <
∞, i = 1, . . . , p. Assumption 7 lim n→∞ 1 n log|σ002 Ω(θ0)| − log |˜σ2(θ)Ω(θ)| = 0, for any θ = θ0.
First, we introduce the consistency of ˆθ. The expected log-likelihood function for the proposed model is
given by E log L(ψ) =−n 2log(2πσ 2 0)− 1 2log|Ω(θ)| − E (Y − Xβ)Ω−1(θ)(Y − Xβ) 2σ02 .
The expected log-likelihood is maximized at
˜ β(θ) = β0, ˜ σ02(θ) = σ 2 00 n tr(Ω(θ) −1Ω(θ 0)), = 1 nE[u 0Ω− 1 2(θ)M (θ)Ω−12(θ)u0] +1 nE[u 0Ω− 1 2(θ)P (θ)Ω−12(θ)u0],
where P (θ) = I − M(θ) and M(θ) = I − Ω−12(θ)X(XΩ−1(θ)X)−1XΩ−12(θ). Thus, the concentrated expected log-likelihood function is given by,
E log L(θ) =−n 2(log(2π) + 1)− n 2log(˜σ 2 0(θ))− 1 2log|Ω(θ)|.
Consistency of ˆθ is obtained by the following two facts. The first one is the identification uniqueness
condition: lim supn→∞
maxθ∈Bc(θ0,ε)∩ΘE log L(θ)− E log L(θ0)
< 0 for any ε > 0, where Bc(θ0, ε) is
the compliment of an ε-neighborhood of θ0. The second one is the uniform convergence in probability:
supθ∈Θ n1log L(θ)− 1nE log L(θ) = op(1).
Next, let us consider the the asymptotic distribution of the QML estimator ˆψ. To derive the asymptotic
normality, we need to consider the the Taylor expansion of ∂ψ∂ log Ln( ˆψ) at ψ0. The first-order derivatives of
the log-likelihood function at ψ has the elements
∂ log L(ψ) ∂β = X Σ−1(η)(Y − Xβ), ∂ log L(ψ) ∂σ2i =− 1 2tr(Σ −1(η)G i(ρi)) +12(Y − Xβ)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂ log L(θ) ∂ρi = σ2i 2 tr(Σ −1(η)H i(ρi))−σ 2 i 2 (Y − Xβ) Σ−1(η)H i(ρi)Σ−1(η)(Y − Xβ), where η = (σ20, σ21, . . . , σp2, ρ1, . . . , ρp), A−1i (ρi) = (Ii− ρiWi)−1(Ii− ρiWi)−1, Bi(ρi) = Wi(Ii− ρiWi) +
(Ii− ρiWi)Wi, Gi(ρi) = J UiA−1i (ρi)UiJ, and Hi(ρi) = J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJ. By the mean value theorem, √ n( ˆψ− ψ0) =− 1 n ∂2log L( ¯ψ) ∂ψ∂ψ −1 1 √ n ∂ log L(ψ0) ∂ψ ,
where ¯ψ lies between ˆψ and ψ0.
The score function which is the first-order derivatives of the log-likelihood function at ψ0, ∂ log L(ψ∂ψ 0), are
linear and quadratic functions of u0= (Y − Xβ0). By applying the central limit theorem for linear-quadratic
forms by Kelejian and Prucha (2001) to the score functions, we have the asymptotic normality for the QMLE ˆ
ψ under proper asymptotic behavior of the Hessian matrix and the variance of the score function whose
explicit forms are given in Appendix.
Theorem 2. Under Assumptions 1-7, if there exist Σ =− limn→∞E
1 n∂ 2log L(ψ 0) ∂ψ∂ψ and Γ = limn→∞n1E ∂ log L(ψ) ∂ψ ∂ log L(ψ)∂ψ
, and−Σ is positive definite, then √
n( ˆψ− ψ0)−→ N(0, ΣD −1ΓΣ−1).
4
Empirical Application
We apply the proposed model to happiness survey data in Japan to analyze the effect of individual
charac-teristics on happiness and spatial correlation of regional effect which is random effects in each region. In this
analysis, we use
Macromill Co, LTD, which is a market research company in Japan, to conduct a survey of 26, 984 people
living in 1534 cities. Here, respondents were selected so that the distribution of age, population, and area of
residence would be the same as that of the Japanese census. The demographic information of the respondents
contains gender, age, personal and family incomes, marital status, and presence of children.
Individual happiness for dependent variables, Y , was collected as a response to the question: Currently,
how happy do you feel? Score the degree of your happiness between 1 (very unhappy) and 10 (very happy).
Thus, the happiness is measured discrete values between 1 and 10.
We use dummy variables created from the demographic information as explanatory variables, X. Age
and gender are divided into 22 categories, namely, all the respondents were separated into the two groups of
female and male, each of which is categorized as the 11 mutually disjoint subgroups corresponding with (1)
age < 20, (2) < 25, (3) < 30,. . . ,(10) < 65 and (11) ≥ 65. As the result, we obtain the 22 disjoint groups in total and define the group of female with age younger than 20 as the base group. Personal income is
categorized as the 7 mutually disjoint groups of income, i.e. (1) < 2 million yen, (2) < 4 million, (3) < 6
million, (4) < 8 million, (5) < 10 million, (6) < 12 million and (7)≥ 12 million yen, with the group less than 2 million yen as the base, while family income is categorized as the 10 groups of income, i.e. (1) < 2 million
yen, (2) < 4 million, (3) < 6 million, (4) < 8 million, (5) < 10 million, (6) < 12 million, (7) < 15 million, (8)
< 20 million, (9)≥ 20 million yen and (10) the group of no-response, with the group less than 2 million yen
as the base. We have personal and family income as the categorical variables with 7 and 10 subgroups as
the result, respectively. Presence of child is summarized as the dummy variable of taking 1 if a respondent
have more than one child and 0 otherwise. Martial status is recorded as the category variable with the three
groups of (1) single, (2) married and (3) divorced or widowed, with the single group taken as the base.
Regional dummy variables, J , is the 26, 984× 1845 matrix. Here, we note that the rank of J is less than 1845, at 1534 which is the number of cities which at least one respondent belong to. Let us remember that
our proposed models can also estimate regional effects, d, in areas where there are no observed individuals
by using the information of the surrounding regions where observed individuals exist. Thus, the matrix, J ,
contains the columns whose elements are zeros which correspond to the areas where there are no respondents.
We use two spatial weight matrix(SWM), one at the prefecture level and the other are the city level. The
prefecture level SWM, W1, is a 46× 46 matrix based on the adjacencies of each prefecture in Japan, namely,
if the prefecture i and j share a border, then the i, j and j, i elements of W1is 1, and otherwise 0. The city
level SWM, W2, is a 1845× 1845 matrix created in two-steps. Firstly, If the distance between city i and j
Figure 1: a plot of estimated regional effects for individual happiness for each cities with spatial error models for multilevel data which contains city level and prefecture level random effects by applying it them to happiness dataset in Japan.
within 30 km of a city, then the element of W2 corresponding to the three closest cities is set to 1.
Table 1 reports the estimates of the parameters for SEMM models with standard errors. As a benchmark
for comparison, those of MANOVA models which is a special case of SEMM models where spatial parameters
for random effects are equal to zeros, i.e. ρi = 0, i = 1, 2. In comparison between fittings of SMMM models and ANOVA models, the former model accounts for happiness better than the later model in terms of
Akaike information criterion (AIC). This indicated that taking into account of spatial correlation improve
the accuracy of the model fitting. We find from table 1 that spatial correlation between random effects for city
level, ρ1, are positively significant at 5 % level, which indicates that random effect on a city takes similar value
with random effects on surrounding cities of the city. One reason which derives spatial correlation between
city level random coefficient is the similarity of culture or customs which greatly affects the way people feel
about their happiness. Figure 1 is a plot of estimated regional effects for each cities, d, which mean average
happiness of people living in that city. We can find that spatial cluster in the regional effects exists and
the level of happiness in the southern and middle regions of Japan is higher than that in the eastern region.
Especially, eastern coastal regions show the lowest level of happiness compared to other regions. The reason
for this is thought to be that the effects of the nuclear accident caused by the East-Japan earthquake which
occurred in 2011 are still lingering. The estimated coefficients of age takes a U-Shaped, namely, coefficients
decreases until middle age and then increases. Moreover, the coefficients of gender indicates that female is
increase and getting married greatly increases people’s happiness.
5
Conclusion
We have proposed spatial error models for multilevel dataset which is a spatial extension of analysis of variance
models in this paper. Because the proposed model can be regarded as a Bayesian hierarchal model, we have
introduced empirical bayesian estimation methods in two steps as estimation strategy for the parameters
in the proposed models. The first step estimator specifies the hyper parameters and has been justified in
asymptotic situations, and the second step estimator for parameters are derived by the Bayes formula with
the hyperparameters estimated in the first step. Fitting the proposed model to happiness survey data in
Japan, we can evaluate the effect of individual and regional level explanatory variables on happiness and
spatial correlation of regional effects which are random effects in each regions. Empirical results suggest that
happiness is U-shaped with age, female’s happiness is higher than male’s happiness at all ages, and regional
effects on happiness are spatially correlated. The existence of spatial correlations between random effects
indicates that unobserved features which affect on individual happiness such as culture and custom tend to
be similar in nearby regions.
For future study, several extensions are possible. In this analysis, we regard individual happiness as
con-tinuous variables. However, the treatment creates a gap between the data and the model because individual
happiness takes only discrete values between 1 and 10. Thus, the extension of the proposed model to discrete
choice models fills the gap and allows for rigorous analysis of happiness. One more possibility is a panel
ex-tension of the proposed model. Our proposed model can capture only spatial correlation. However, it is said
that happiness on individual has a time series correlation. A panel extension would reveal more interesting
Table 1: Estimation values and their standard errors forβ1, β2, log likelihood (logL) and Akaike Information Criterion (AIC) in both spatial error models for multilevel data (SEMM2) and mixed analysis of variance models(MANOVA2) which contains city level and prefecture level random effects, and estimates and standard errors of spatial parameters ρ1 andρ2 in SEMM2 which are obtained by applying them to happiness dataset in Japan.
SEEM2 MANOVA2
coef s.e coef s.e
20< Female < 25 -0.007 0.076 -0.005 0.076 Female < 30 -0.234 0.071 -0.232 0.071 Female < 35 -0.522 0.069 -0.521 0.069 Female < 40 -0.694 0.068 -0.691 0.068 Female < 45 -0.718 0.064 -0.717 0.064 Female < 50 -0.857 0.064 -0.856 0.064 Female < 55 -0.811 0.067 -0.813 0.066 Female < 60 -0.789 0.068 -0.790 0.068 Female < 65 -0.504 0.067 -0.506 0.067 Female > 65 -0.172 0.063 -0.172 0.062 Male < 20 0.185 0.078 0.184 0.078 Male < 25 -0.186 0.072 -0.183 0.072 Male < 30 -0.836 0.070 -0.835 0.070 Male < 35 -1.117 0.068 -1.118 0.068 Male < 40 -1.409 0.069 -1.411 0.068 Male < 45 -1.557 0.065 -1.558 0.065 Male < 50 -1.628 0.065 -1.629 0.065 Male < 55 -1.692 0.068 -1.694 0.068 Male < 60 -1.799 0.069 -1.802 0.069 Male < 65 -1.308 0.068 -1.311 0.068 Male < 65 -0.915 0.065 -0.917 0.064
200 < Personal Income (PI) < 400 0.076 0.032 0.075 0.032
PI < 600 0.263 0.041 0.264 0.041
PI < 800 0.335 0.057 0.338 0.057
PI < 1000 0.401 0.081 0.403 0.081
PI < 1200 0.572 0.120 0.577 0.120
PI > 1200 0.458 0.146 0.459 0.146
200 < Family Income (FI) < 400 0.055 0.048 0.055 0.048
FI < 600 0.337 0.048 0.339 0.048 FI < 800 0.494 0.052 0.495 0.052 FI < 1000 0.619 0.058 0.621 0.058 FI < 1200 0.769 0.069 0.770 0.069 FI < 1500 0.881 0.087 0.882 0.087 FI < 2000 1.181 0.117 1.183 0.117 FI > 2000 1.111 0.162 1.109 0.163 FI Unknown 0.137 0.045 0.138 0.045 Married 0.961 0.041 0.961 0.041 Divorced 0.390 0.057 0.391 0.057 Children 0.109 0.035 0.110 0.035 rho1(city) 0.440 0.013 rho2(Pref) 0.252 1.217
Appendix A. Hessian and Covariance matrix
Here, we show the detailed expression of the Hessian matrix and covariance matrix which is discussed in
The-orem 2. Firstly, we show the Hessian matrix. For simplicity, we denote A−1i (ρi) = (Ii−ρiWi)−1(Ii−ρiWi)−1,
Bi(ρi) = Wi(Ii−ρiWi) + (Ii−ρiWi)Wi, Gi(ρi) = J UiA−1i (ρi)UiJ, Hi(ρi) = J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJ,
H1,i(ρi) = J UiA−1i (ρi)Bi(ρi)A−1i (ρi)Bi(ρi)A−1i (ρi)UiJ, and H2,i(ρi) = J UiA−1i (ρi)WiWiA−1i (ρi)UiJ, i =
1, . . . , p. Moreover, we define A0(ρ0)−1 = IL and Ui = IL. Then, the variance matrix of the proposed
model is given by, Σ(η) =pi=0σi2Gi(ρi), where η = (σ20, σ12, . . . , σp2, ρ1, . . . , ρp). Moreover, the derivatives of
Gi(ρi), Hi(ρi), Σ(η) are given by, ∂Gi(ρi) ∂ρ2 i =−Hi(ρi), ∂Hi(ρi) ∂σ2 i = −2(H1,i(ρi) + H2,i(ρi)), ∂Σ(η) ∂σ2 i = Gi(ρi) and ∂Σ(η) ∂ρi =−σ 2 iHi(ρi), respectively.
By using above notations, the gradients of the log-likelihood function, ∂ log L(ψ)∂ψ , is given by
∂ log L(ψ) ∂β = X Σ−1(η)(Y − Xβ), ∂ log L(ψ) ∂σ2i =− 1 2tr(Σ −1(η)G i(ρi)) +12(Y − Xβ)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂ log L(θ) ∂ρi = σ2i 2 tr(Σ −1(η)H i(ρi))−σ 2 i 2 (Y − Xβ) Σ−1(η)H i(ρi)Σ−1(η)(Y − Xβ).
Moreover, the hessian matrix of the log-likelihood function, ∂2∂ψ∂ψlog L(ψ) has the elements:
∂2log L(ψ) ∂β∂β =−XΣ −1(η)X, ∂2log L(ψ) ∂β∂σi2 =−X Σ−1(η)G i(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂β∂ρi = σ 2 iXΣ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂σ2i∂σ2i = 1 2tr(Σ −1(η)G i(ρi)Σ−1(η)Gi(ρi)) − (Y − Xβ)Σ−1(η)G i(ρi)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ),
∂2log L(ψ) ∂σi2∂σj2 = 1 2tr(Σ −1(η)G j(ρj)Σ−1(η)Gi(ρi)) − (Y − Xβ)Σ−1(η)G j(ρj)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂σ2i∂ρi = σi2 2 tr(Σ −1(η)H i(ρi)Σ−1(η)Gi(ρi))−12tr(Σ−1(η)Hi(ρi)) + σi2(Y − Xβ)Σ−1(η)Hi(ρi)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ) −12(Y − Xβ)Σ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ) ∂2log L(ψ) ∂σi2∂ρj =− σ2j 2 tr(Σ −1(η)H j(ρj)Σ−1(η)Gi(ρi)) + σj2(Y − Xβ)Σ−1(η)Hj(ρj)Σ−1(η)Gi(ρi)Σ−1(η)(Y − Xβ), ∂2log L(ψ) ∂ρi∂ρi = σi4 2 tr(Σ −1(η)H i(ρi)Σ−1(η)Hi(ρi))− σi2tr(Σ−1(η)(H1,i(ρi) + H2,i(ρi)) −σ4i 2 (Y − Xβ) Σ−1(η)H i(ρi)Σ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ)
+ σi2(Y − Xβ)Σ−1(η)(H1,i(ρi) + H2,i(ρi))Σ−1(η)(Y − Xβ),
∂2log L(ψ) ∂ρi∂ρj = σi2σ2j 2 tr(Σ −1(η)H j(ρj)Σ−1(η)Hi(ρi)) −σ 2 iσj2 2 (Y − Xβ) Σ−1(η)H j(ρj)Σ−1(η)Hi(ρi)Σ−1(η)(Y − Xβ).
Next, let us consider the variance matrix of the log likelihood function, E(∂ log L(ψ0)
∂ψ ∂ log L(ψ∂ψ 0)). The
explicit form of each elements can be obtained form Lemma 4 in Appendix B:
E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂β = XΣ−1(η0)X, E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂σ2i = 1 2X Σ−1(η 0)E(u0u0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0), E ∂ log L(ψ0) ∂β ∂ log L(ψ0) ∂ρi =−σ 2 0i 2 X Σ−1(η 0)E(u0u0Σ−1(η0)Hi(ρ0i)Σ−1(η0)u0),
E ∂ log L(ψ0) ∂σi2 ∂ log L(ψ0) ∂σi2 =−1 4[E(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)]2+1 4E[(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)2], E ∂ log L(ψ0) ∂σi2 ∂ log L(ψ0) ∂σj2 =−1 4E(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)E(u0Σ−1(η0)Gj(ρ0j)Σ−1(η0)u0) +1 4E[u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0u0Σ−1(η0)Gj(ρ0j)Σ−1(η0)u0], E ∂ log L(ψ0) ∂σi2 ∂ log L(ψ0) ∂ρj = σ 2 j 4 E(u 0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0)E(u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0), −14E[u0Σ−1(η0)Gi(ρ0i)Σ−1(η0)u0u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0], E ∂ log L(ψ0) ∂ρ2i ∂ log L(ψ0) ∂ρj =−σ 2 j 4 E(u 0Σ−1(η0)Hi(ρ0i)Σ−1(η0)u0)E(u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0), +1 4E[u 0Σ−1(η0)Hi(ρ0i)Σ−1(η0)u0u0Σ−1(η0)Hj(ρ0j)Σ−1(η0)u0], where η0= (σ002 , σ012 , . . . , σ0p2 , ρ01, . . . , ρ0p).
6
Appendix B. Some useful lemmas
We introduce some lemmas which are used in the proofs of the following main results. The lemmas are a
little modifications of lemmas in Lee(2004) for non-square matrices.
Lemma 1 Let A be an n× m non-square matrix whose column sums are uniformly bounded, C be a n × k matrix whose elements are uniformly bounded, and fi be i.i.d noise with mean 0 and variance σ2. Then,
1
√mCAf = Op(1).
Proof. Let B = CA, bi,j be the (i, j)-th element of B and bibe the i-th coumn of B. Because the elements of C are uniformly bounded and the column sums of A are uniformly bounded, the element of B is uniformly
bounded by Lemmas in Lee (2004). Let b be a constant such as |bi,j| ≤ b. Because Bf = mi=1bifi,
V ar(Bf ) = E(mi=1mi=1bififjbj) = σ2mi=1bibi ≤i=1m b1k1k = O(m), where 1k is a k× 1 vector whose elements are 1. Thus, √1mCAf = Op(1) by Chebyshev’s inequality.
Lemma 2 Let A be an m1× m2 non-square matrix whose column sums are uniformly bounded, f1,i and
f2,i are i.i.d noise with mean 0 and variance σ1and σ2, respectively. Then,
• E(f1Af2) = 0.
• f1Af2= Op(√m1).
Proof. E(f1Af2) =mi=11 j=1m2 ai,jE(f1,if2,j) = 0. V (f1Af2) =
m1 i1=1 m1 i2=1 m2 j1=1 m2 j2=1ai1,j1ai2,j2E(f1,i1f1,j2f2,j1f2,j2) = σ21σ22 m1 i=1 m2 j=1a2i,j ≤ σ21σ22mi=11( n j=1|ai,j|)2≤
σ21σ22mi=11 c2= O(m1). Thus, f1Af2= Op(√m1) by Chebyshev’s inequality.
Lemma 3 Let Aibe an mi×mimatrix for i = 1, . . . , p, B be an n×n matrix, C be an n×k matrix, and ε and
fi, i = 1, . . . , p be an n×1 and mi×1 random noise with means 0 and variances σ02and σi2. Moreover, we define
Ui be an n× mimatrix which consists only of zeros and ones and there exist one 1 in each row and at least one 1 in each column, i = 1, . . . , p We denote u = ε +pi=1UiAifi and m = min{m1, . . . , mp}. We assume
Uiis uniformly bounded in column sums, the elements of C is uniformly bounded, B is uniformly bounded in both row and column sums, and mi is a function of n and tends to infinity and limn→∞mi
n = ci≤ 1. Then, • 1 nCBu = op(1). • 1 nuBu = Op(1). • 1 n(uBu− E(uBu)) = op(1).
Proof. By the Lemma,
1 nC Bu = 1 nC Bε +p i=1 1 nC BU iAifi, = √1 n 1 √ nC Bε +p i=1 mi n 1 √m i 1 √m iC BU iAifi, = o(1)Op(1) + p i=1 O(1)o(1)Op(1), = op(1).
We denote f0= ε0, A0= In and U0= In. Because u0= U0A0f0+ U1A1f1+· · · + UpApfp,
1 mu Bu =p i=0 p j=0 1 mf iAiUiBUjAjfj.
column sums, by Lee(2004), 1 nf iAiUiBUiAifi= mnim1 if iAiUiBUiAifi, = O(1)Op(1), = Op(1). Moreover, mi n m1i(f iAiUiBUiAifi− E(fiAiUiBUiAifi) = O(1)op(1) = op(1).
Secondly, we will consider the case of i= j. By the Lemma, 1 nf iAiUiBUjAjfj= mni√1m i 1 √m if iAiUiBUjAjfj, = O(1)o(1)Op(1), = op(1). It is clear that mi n m1i(f iAimniUimnBUjAjfj− E(fiAimniUimnBUjAjfj) = op(1)
Therefore, 1nuBu = Op(1) and n1(uBu− E(uBu)) = op(1).
Lemma 4 Let A be an m1× m2non-square matrix, Ti= J Ui(Ii− ρiWi)−1, and fii = 0, . . . , p are mi× 1
i.i.d. random noise with mean 0 and variances σi2, respectively. Moreover, the elements of each fi has more than fourth moment, i.e. E|f1,i|4+δ<∞ for some δ > 0. Let us define u =pi=1Tifi. Then,
1. E(uu) = Σ(η).
2. E(uAu) =pi=0σi2tr(TiATi).
3. E(uuAu) =pi=0μi,3Tidiag(TiATi). 4. E(uAuuBu) =pi=1
(μi,4−3σ4i)nj=1(TiATi)j,j(TiBTi)j,j+σ4i(tr(TiATi)tr(TiBTi)+tr(TiATi(Ti(B+
B)Ti))
Proof. E(uu) = E p i=0 Tifi p i=0 fiTi , = p i=0 TiE(fifi)Ti, = p i=0 σi2TiTi, = Σ(η). E(uAu) = E p i1=0 p i2=0 fi1Ti1ATi2fi2 , = p i=0 E(fiTiATifi), = p i=0 σi2tr(TiATi), E(uuAu) = E p i1=0 p i2=0 p i3=0 Ti1fi1fi2Ti2ATi3fi3 , = p i=0 E(TififiTiATifi), = p i=0 E Tifi mi j1=1 mi j2=1 (TiATi)j1,j2fi,j1fi,j2 , = p i=0 μi,3Tidiag(TiATi).
E(uAuuBu) = E
p i1 p i2 p i3 p i4 fi1Ti1ATi2fi2fi3Ti3BTi4fi4 , = p i E(fiTiATififiTiBTifi) + p i1 p i2 =i1 E(fi1Ti1ATi1fi1fi2Ti2BTi2fi2) + p i1 p i2 =i1 E(fi1Ti1ATi2fi2fi1Ti1BTi2fi2) + p i1 p i2 =i1 E(fi1Ti1ATi2fi2fi2Ti2BTi1fi1), = p i=1 (μi,4− 3σi4) n j=1 (TiATi)j,j(TiBTi)j,j+ σi4(tr(TiATi)tr(TiBTi) + tr(TiATi(Ti(B + B)Ti)) + p i1 p i2 σi21σ2i2tr(Ti1ATi1)tr(Ti2BTi2) + 2 p i1 p i2 σ2i1σ2i2tr(Ti1ATi1Ti2BTi2)
Appendix C. Proofs of the theorems
Proof of theorem 1
To prove the consistency of QMLE ˆθ, it is sufficient to show that the following two facts hold (See white
(1994)). The first one is the identification uniqueness condition: lim supn→∞
maxθ∈Bc(θ0,ε)∩ΘE log L(θ)−
E log L(θ0)
< 0 for any ε > 0, where Bc(θ0, ε) is the compliment of an ε-neighborhood of θ0. The second
one is the uniform convergence in probability: supθ∈Θ n1log L(θ)− 1nE log L(θ) = op(1).
The identification uniqueness
Firstly, we will show that the identification uniqueness condition hold. From the definition of the concentrated
expected log-likelihood function, we have
1 n(E log L(θ)− E log L(θ0)) =− 1 2log(˜σ 2 0(θ))− 1 2nlog|Ω(θ)| + 1 2log(σ 2 00(θ)) + 1 2nlog|Ω(θ0)|, =− 1 2nlog|˜σ 2
0(θ)In| −2n1 log|Ω(θ)| +2n1 log|σ002 (θ)In| +2n1 log|Ω(θ0)|,
= 1 2nlog|σ 2 00Ω(θ0)| − 1 2nlog|ˆσ 2 0(θ)Ω(θ)|.
By Assumption 7, for any θ= θ0,
lim n→∞ 1 n log|σ002 Ω(θ0)| − log |˜σ02(θ)Ω(θ)| = 0. Thus, lim n→∞ 1 n(E log L(θ)− E log L(θ0))= 0.
Let pn(β, σ02, θ) = exp(log L(β, σ20, θ)) be the quasi-joint p.d.f of u0= (Y − Xβ0) and p0n(β, σ20, θ) be the
true joint p.d.f. We denote Eq as the expectation with respect to pn(β, σ02, θ) and E as the expectation with
respect to p0n(β, σ02, θ).
By the Jensen’s inequality,
0 = log Eq pn(β, σ20, θ) pn(β0, σ200, θ0) ≥ Eqlog pn(β, σ02, θ) pn(β0, σ002 , θ0)
Thus, Eqlog pn(β, σ02, θ) pn(β0, σ200, θ0) = E log pn(β, σ20, θ) pn(β0, σ200, θ0) . This implies E log L(θ) = max β,σ2 0
E[log L(β, σ02, θ)]≤ E[log L(β0, σ200, θ0)] = E log L(θ0).
Collecting the above results, we have
lim n→∞ max θ∈Bc(θ0,ε)∩ΘE log L(θ)− E log L(θ0) < 0,
for any ε > 0, where Bc(θ0, ε) is the compliment of an ε-neighborhood of θ0. The identification uniqueness
condition holds.
Uniform convergence
Secondly, we will show that the uniform convergence condition hold. From the definition, we have
1 nlog L(θ)− 1 nE log L(θ) =− 1 2log ˆσ 2 0+ 1 2log ˜σ 2 0.
By the mean value theorem,
| log ˆσ2 0− log ˜σ02| = 1 ¯ σ20|ˆσ 2 0− ˜σ02|,
where ¯σ02 lies between ˆσ02 and ˜σ20. It is sufficient to show the following two facts. The first one is ˜σ02 is uniformly bounded away from zero and the second one is uniform convergence of|ˆσ02− ˜σ20| in probability.
Firstly, we will show that ˜σ20 is uniformly bounded away from zero. By Assumption 4,
inf θ∈Θ˜σ 2 0(θ) = infθ∈Θ σ002 n tr(Ω(θ) −1Ω(θ 0)) , ≥ σ2 00θ∈Θinf(γmin(Ω−1(θ)))1 ntr(Ω(θ0)), ≥ c0cωc1, > 0,
where c0and c1are some constants. Therefore, ˜σ20 must be uniformly bounded away from zero.
Secondly, we will show that supθ∈Θ|ˆσ02− ˜σ02| = op(1). Because M (θ)Ω−12(θ)X = 0,
ˆ σ02− ˜σ02= 1 nY Ω−1 2(θ)M (θ)Ω−12(θ)Y − 1 nE[u 0Ω− 1 2(θ)M (θ)Ω−12(θ)u 0]− 1 nE[u 0Ω− 1 2(θ)P (θ)Ω−12(θ)u 0], = 1 n u0Ω−12(θ)M (θ)Ω−12(θ)u0− Eu 0Ω− 1 2(θ)M (θ)Ω−12(θ)u0 −n1E u0Ω−12(θ)P (θ)Ω−12(θ)u0 .
We will consider the uniform convergence of the above two terms.
Let us consider the uniform convergence of the second term. We note that 0 < cωcx≤ infθ∈Θγmin(Ω−1(θ))γmin
XX n ≤ γmin XΩ−1(θ)X n ≤ γmax XΩ−1(θ)X n
≤ supθ∈Θγmax(Ω−1(θ))γmax
XX n ≤ cωcx<∞. By assumption 4 and 5, sup θ∈Θ 1nE u0Ω−12(θ)P (θ)Ω−12(θ)u0 = sup θ∈Θ n1σ 2 00tr Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)Ω(θ0) , ≤ n1sup θ∈Θ σ2 00γmax XΩ−1(θ)X n −1 γmax(Ω−2(θ)) γmax(Ω(θ0))n1tr(XX), ≤ n1σ200sup θ∈Θ γmax XΩ−1(θ)X n −1 sup θ∈Θ γmax(Ω−2(θ)) γmax(Ω(θ0))1 ntr(XX ), = 1 nO(1)O(1)O(1)O(1)O(1), = o(1).
This implies that the second term converges uniformly.
To show that the uniform convergence of the first term, we will show that the pointwise convergence and
stochastic equicontinuity of the term (See Andrew (1992)).
Firstly, we will consider the pointwise convergence of the first term. From Assumption 4 and 5, Ω−1(θ) and
X(XΩ−1(θ)X)−1Xare uniformly bounded in both row and column sums. Therefore, Ω−12(θ)M (θ)Ω−12(θ) =
Ω−1(θ)−Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ) is uniformly bounded in both row and column sums. By Lemma 3, it follows hat 1n u0Ω−12(θ)M (θ)Ω−12(θ)u0− Eu0Ω− 1 2(θ)M (θ)Ω−12(θ)u0
= op(1). This implies the first term converges pointwise.
theorem, 1 nu 0Ω− 1 2(θ1)M (θ1)Ω−12(θ1)u0− 1 nu 0Ω− 1 2(θ2)M (θ2)Ω−12(θ2)u0 = 1 n p i=1 ∂u0Ω−12(¯θ)M (¯θ)Ω−12(¯θ)u0 ∂ρi (ρi,1− ρi,2) + 1 n p i=1 ∂u0Ω−12(¯θ)M (¯θ)Ω−12(¯θ)u0 ∂τi2 (τ 2 i,1− τi,22 ),
where ¯θ lies between θ1 and θ2. Thus, it is suffice to show that supθ∈Θ 1n∂u
0Ω− 12(θ)M(θ)Ω− 12(θ)u0
∂ρi
= Op(1)
and supθ∈Θ n1∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0
∂τi
= Op(1) (See, Davidoson (1994)).
Here, we note that the partial derivatives of Ω−1(θ) are given by,
∂Ω−1(θ) ∂ρi =−τ 2 iΩ−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ), ∂Ω−1(θ) ∂τi =−Ω −1(θ)J U iA−1i (ρi)UiJΩ−1(θ), where Bi(ρi) = Wi(Ii− ρiWi) + (Ii− ρiWi)Wi.
Let us consider the uniform boundedness of n1∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0
∂ρi . The matrix Ω −1
2(θ)M (θ)Ω−12(θ) consists of the two termes Ω−1(θ) and Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ). The uniform boundedness of
1 n∂u 0Ω−1(θ)u0 ∂ρi is given by, sup θ∈Θ n1 ∂u0Ω−1(θ)u0 ∂ρi = supθ∈Θ 1nτ 2 iu0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)u0 , = sup θ∈Θ τ2 iγmax(J UiA−1i (ρi)Bi−1(ρi)A−1i (ρi)UiJ)γmax2 (Ω−1(θ)) n1u 0u0, = O(1)O(1)O(1)Op(1), = Op(1).
Next, we will show that the uniform boundness of n1∂u0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0
∂ρi . The partial
derivative of the matrix is
1 n ∂u0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0 ∂ρi = 1 nu 0 ∂Ω−1(θ) ∂ρi X(X Ω−1(θ)X)−1XΩ−1(θ)u 0 + 1 nu 0Ω−1(θ)X ∂(XΩ−1(θ)X)−1 ∂ρi X Ω−1(θ)u 0 + 1 nu 0Ω−1(θ)X(XΩ−1(θ)X)−1X ∂Ω−1(θ) ∂ρi u0, = φ1+ φ2+ φ3,
where φ1= 1nu0∂Ω −1(θ) ∂ρi X(X Ω−1(θ)X)−1XΩ−1(θ)u 0, φ2= n1u0Ω−1(θ)X(XΩ−1(θ)X)−1X ∂Ω −1(θ) ∂ρi u0 and φ3= n1u0Ω−1(θ)X(XΩ−1(θ)X)−1X ∂Ω −1(θ) ∂ρi u0.
By Lemma 3, the uniform boundness of φ2 is given by
sup θ∈Θ n1u 0Ω−1(θ)X ∂(XΩ−1(θ)X)−1 ∂ρi X Ω−1(θ)u 0 , = sup θ∈Θ n1τ 2 iu0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0 , ≤ sup θ∈Θ τ2
iγmax(J UiA−1i (ρi)Bi(ρi)Ai−1(ρi)UiJ)γmax(Ω−1(θ))γmax((X
Ω−1(θ)X)−1 n )γmax( XX n )γmax(Ω −2(θ)) 1 nu 0u0, = O(1)O(1)O(1)O(1)O(1)O(1)Op(1), = Op(1).
Let us consider the uniform boundness of φ1. The term is
1 nu 0 ∂Ω−1(θ) ∂ρi X(X Ω−1(θ)X)−1XΩ−1(θ)u 0, = τi21 nu 0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0, = tr(a(θ)b(θ)),
where a(θ) = τi2√1nu0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)Ai−1(ρi)UiJand b(θ) =√1nΩ−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0
It suffices to show that
supθ∈Θ tr(a(θ)b(θ)) 2
= Op(1). Because of (supθ∈Θf (θ))2 = supθ∈Θf (θ)2, by Cauchy-Schwarz inequality, sup θ∈Θ tr(a(θ)b(θ)) 2= sup θ∈Θtr 2(a(θ)b(θ)), ≤ sup θ∈Θtr(a (θ)a(θ)) sup θ∈Θtr(b (θ)b(θ)).
The uniform boundenss of tr(a(θ)a(θ)) is given by, sup θ∈Θtr(a (θ)a(θ)) = sup θ∈Θ τi21 ntr(u 0Ω−1(θ)J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJJ UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJΩ−1(θ)u0) , ≤ sup θ∈Θ τi2γ2max(J UiA−1i (ρi)Bi(ρi)A−1i (ρi)UiJ)γmax2 (Ω−1(θ)) 1 nu 0u0, = O(1)O(1)O(1)Op(1), = Op(1).
Similarly, the uniform boundness of tr(b(θ)b(θ)) is given by,
sup θ∈Θtr(b (θ)b(θ)) = sup θ∈Θ 1 ntr(u 0Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)Ω−1(θ)X(XΩ−1(θ)X)−1XΩ−1(θ)u0) , ≤ sup θ∈Θ γmax(Ω−1(θ))γmax((X Ω−1(θ)X)−1 n )γmax( XX n )γ 2 max(Ω−1(θ)) 1 nu 0u0, = O(1)O(1)O(1)O(1)Op(1), = Op(1).
The uniform boundedness of 1n∂u0Ω− 12(θ)M(θ)Ω− 12(θ)u0
∂τi can be proved by the similar manner. By collecting
above results, we can show that the uniform convergence of 1n
u0Ω−12(θ)M (θ)Ω−12(θ)u0−Eu0Ω−12(θ)M (θ)Ω−12(θ)u0
.
Therefore, supθ∈Θ n1log L(θ)−n1E log L(θ) = op(1), and the QMLE ˆθ is a consistent estimator of θ0by White
(1994).
Proof of Theorem 2
To derive the asymptotic normality of the proposed estimator, we will show that the following three results:
1. n1∂2log L(ψ0) ∂ψ∂ψ p −→ E 1 n∂ 2log L(ψ0) ∂ψ∂ψ . 2. n1∂2∂ψ∂ψlog L( ¯ψ) p −→ 1 n∂ 2log L(ψ0) ∂ψ∂ψ . 3. √1n∂ log L(ψ0) ∂ψ −→ N(0, Γ).D