東北大学機関リポジトリTOUR

(1)

models in data science perspective

著者

WU JUNYUE

学位授与機関

Tohoku University

学位授与番号

11301

(2)

Tohoku University

Doctoral Thesis

Three developments on spatial

econometric models in data

science perspective

Author:

Junyue Wu

Supervisor:

Yasumasa Matsuda

A thesis submitted in fulfillment of the requirements

for the degree of Doctor of Philosophy

in the

Graduate School of Economics and Management

(3)

(4)

i

Abstract

Recent years have seen rapid development and increasing adoption of spatial econometric models. This thesis offers three expansions tackling different subjects in this field, including a threshold extension of spatial dynamic panel data (SDPD) model, illustration of bias and inefficiency when estimating spatial models with sample data and a corresponding correction method, and a least absolute shrinkage and selection operator (LASSO) estimator for spatial weight matrix estimation.

Chapter 1 explains the motivation behind the studies included in this thesis and gives a summary of the contents in the following chapters. We introduce the spatial econometric model, then establish the three topics: a threshold extension of spatial dynamic panel data (SDPD) model, illustration of bias and inefficiency when estimating spatial models with sample data and a corresponding correction method, and a least absolute shrinkage and selection operator (LASSO) estimator for spatial weight matrix estimation.

Chapter 2 proposes a threshold extension of the spatial dynamic panel data (SDPD) model with fixed effects. We introduce a threshold variable to account for the regional dependencies of parameters in SDPD models. Moreover, we applied an extension of a unified M-estimation to estimate the parameters in the threshold SDPD models, where the consistency and asymptotic normality are established theoretically when the number of cross-sectional units tends to be infinite. The M-estimation is compared with the conditional quasi-maximum likelihood estimation by Monte Carlo experiments, showing that the M-estimation yields an estimation of less bias in cases of short time panels with robust standard errors under non-normality. We illustrate an empirical application of the threshold SDPD model to the U.S. state-level GDP and power usage growth data from 1998 to 2018, detecting the non-trivial regional dependencies of SDPD model parameters.

Chapter 3 explores bias problems when estimating spatial regression models with sample data points, and offer a solution under certain conditions. Often the individual observations used to estimate spatial regression models constitute only a sample of the theoretically observable data points. In many cases, such a sample does not even obey a specific design and it is collected only with convenience criteria as it happens e. g. when data are webscraped or crowdsourced. In this situation, we expect to observe possible biases and inefficiencies in the estimation of the spatial regression parameters. In this paper, we present the results of various Monte Carlo experiments aimed at assessing the extent of this problem in the estimation of a spatial econometric model by isolating the effects due to the sample size, those due to the pattern of the point distribution and those related to the sample criterion used in the data collection process. We also suggest an approach based on the Gibbs sampler that can be used in order to replace the unsampled data points. Our simulations and a real data case study confirm that our proposed strategy succeeds in reducing the distorting effects produced by the sample observation thus providing more reliable parameters’ estimations.

(5)

selec-tion operator (ML-LASSO) estimator for the estimaselec-tion of spatial weight matrix in spatial econometric models. A cyclic coordinate descent based algorithm is used for the optimization, and the effectiveness is examined by Monte Carlo experiments. We find out that the estimator has favorable performance for arbitrary weight matrices when independent variables are present in the model, or symmetric weight matrices when there is none. An empirical use case is also illustrated with US state-level precipitation data from 1895 to 1997.

(6)

iii

Acknowledgment

First, I would like to express my deepest appreciation to my supervisor, Professor Yasumasa Matsuda, who provides me with generous support during my graduate course. He guides me though my study with inspiring ideas and invaluable suggestions on my research. I would also like to extend my sincere thanks to Professor Giuseppe Arbia, who gave me the chance to study abroad and conduct a joint research.

Second, I am also grateful to all the people I have met in Tohoku University. I would never been able to have a memorable experience during my time here without them.

Last but not least, I would like to thank my family, whose unwavering support greatly helps me pursuing a PhD degree.

(7)

(8)

List of Tables

2.1 Simulation result of M-estimator and CQMLE . . . 24

2.2 Simulation result of M-estimator, Gaussian error . . . 24

2.3 Simulation result of M-estimator, Gaussian mix error . . . 24

2.4 Simulation result of M-estimator, Chi-square error . . . 25

2.5 Simulation result of M-estimator, Randomized parameters . . . . 25

2.6 Simulation result of M-estimator, Randomized weight matrices . 25 2.7 Simulation result of M-estimator, N equals T . . . 26

2.8 Descriptive statistics of U.S state-level GDP and power consump-tion growth, 1998-2018. . . 26

2.9 US state-level GDP and power consumption growth, 1998-2018, non-threshold models . . . 27

2.10 US state-level GDP and power consumption growth, 1998-2018, threshold models . . . 27

2.11 US state-level GDP and power consumption growth, 1998-2018, two-sided Wald test, p-values . . . 27

2.12 Simulation result of M-estimator, SL submodel . . . 36

2.13 Simulation result of M-estimator, SE submodel . . . 36

2.14 Simulation result of M-estimator, SLTL submodel . . . 37

3.1 Gibbs sampler, Simulation result . . . 53

3.2 Gibbs sampler, Empirical result . . . 55

4.1 ML-LASSO simulation results with independent variables . . . . 64

4.2 ML-LASSO simulation results without independent variables . . 64

4.3 ML-LASSO simulation results with different ρ . . . 65

4.4 post-LASSO results . . . 65

4.5 Descriptive statistics of US precipitation data . . . 67

4.6 Comparison of estimation results of three models . . . 68

(11)

(12)

List of Figures

2.1 US state-level income per capita in 1998, by median . . . 28

3.1 Bias and inefficiency, Simulation 1 . . . 44

3.6 Random, Clustered, and Inhibitory point patterns . . . 47

3.7 Bias and standard deviation of β at different sample proportion . 47 3.8 Bias and standard deviation of ρ at different sample proportion . 47 3.9 Illustration of four different sampling designs . . . 49

3.10 Bias and standard deviation of β with different sampling designs 49 3.11 Bias and standard deviation of ρ with different sampling designs 50 3.12 Data points, land price data in western Tokyo,2019 . . . 54

4.1 Non-zero elements identified by ML-LASSO . . . 66

4.2 Data points, Continental US weather station 1895 - 1997 . . . 67

4.3 Estimated weight matrix . . . 69

(13)

(14)

Chapter 1

Introduction

This thesis focus on developing new methods to solve existing problems and expand the research possibilities in the field of spatial econometrics. Spatial econometrics is a relatively new sub-field of econometrics that deals with re-gression models with spatial interactions. Compared to previous models, the inclusion of various spatial correlations, such as spatial lag and spatial error, allows more accurate modeling thus better quality in empirical studies.

A typical spatial econometric model can be expressed as follows:

Y = cln+ Xβ + ρW Y + u,

u = αW u + ε, (1.1)

where Y is an n × 1 vector of observations, ln an n × 1 vector of ones, X an n × k

matrix of regressors, W an n × n matrix of spatial weight matrix, u an n × 1 vector of disturbance term, and ε follows independent and identically distributed (i.i.d.) error. This model is commonly refered as spatial autoregressive with additional autoregressive error stucture(SARAR, Kelejian and Prucha, 1998) or spatial autoregressive confused (SAC, LeSage and Pace, 2009) model. The spatial weight matrix W is what distinguish the model from a ordinary regression model, it represents the spatial correlation among observations Y . The spatial lag term ρW Y and the spatial error term αW u are common features in the spatial models, representing the spatial stucture among observation and error respectively, where model parameter ρ and α indicate the intensity of the spatial effect.

Spatial models and their corresponding estimation methods have been de-veloped for both cross-sectional and panel data sets in the past decades. Cliff and Ord (1973) proposed the cross-sectional spatial autoregressive (SAR) model, later also known as the spatial lag model, marking the beginning of the sub-field in econometrics. Models with different kinds of spatial interactions have been developed by Anselin (1988); Cressie (1993); Arbia (2006); LeSage and Pace (2009) since then. Meanwhile, Elhorst (2003); Baltagi et al. (2003); Kapoor et al. (2007); Lee and Yu (2010) extended spatial models to panel data with

(15)

various settings. This thesis deals with both cross-sectional and panel data in the following chapters.

Chapter 2 introduces a threshold extension of the spatial dynamic panel data (SDPD) model with fixed effects. A threshold model is a statistic model that allows its parameters to differ depending on the range of values divided by one or more threshold values in meaningful ways. Tong and Lim (1980) pioneered threshold models in time series literature by self-exciting threshold autoregressive (SETAR) models, where a lagged variable is used as a threshold on which the model switches. However, threshold models have only seen limited adaptation in spatial econometrics, Aquaro et al. (2015) applied spatial econometrics models with spatially dependent parameters, whereas Hansen (1999) extended the threshold techniques in time series to panel data, providing estimation and testing procedures for non-dynamic panels. Meanwhile, Majumdar et al. (2005) proposed a spatio-temporal model, which allows certain parameters to shift at a given time point. We believe a more generalized threshold extension and its accompanying estimation method would be a valuable addition to the spatial econometric literature.

We consider an SDPD model with a fixed effect as our base model, which can be regarded as a reduced version of the general spatial panel model summarized by Elhorst (2014). The threshold extension allows the model parameters to switch depending on predetermined groups, allowing researchers to examine regional differences. The estimation of our threshold model is based on the M-estimator proposed by Yang (2018). We have chosen this method because it incorporates a bias correction mechanism to achieve better accuracy and robustness over quasi-maximum likelihood (QML) methods in short panel settings. The M-estimator obtain the estimation result by solving the adjusted score function of the model, and since (i) the adjusted score function for the threshold model can be expressed similarly as the original linear model; (ii) the matrices composing the adjusted score in the quadratic or linear forms are still uniformly bounded in both row/column sums, we are able to expand the method to suit our model under the same set of assumptions, and prove the original theorems still stand. Also, we adapt the outer-product-of-martingale-difference (OPMD) method for the estimation of the asymptotic variance matrix accompanying the M-estimator.

Compared to past attempts, our model is based on a more generic SDPD model, along with its submodels, it offers more flexibility. Applying the threshold on the spatial terms is a first in the literature, allowing differences in the intensity of the spatial effect among regions. Moreover, the innovative M-estimator adoption makes the estimation more robust, especially for short panels.

We conducted a series of simulations to validate the proposed estimator, including comparisons between M-estimator and QML method, Gaussian and non-Gaussian errors, fixed and randomized parameters, shorter and longer panels, etc. The results show smaller bias for the M-estimator, robustness under non-Gaussian error, overall good performance under different true values of parameters, and they are comparable to Yang (2018)’s original study. Moreover, we conducted an empirical illustration using U.S. state-level GDP and power usage growth data is also conducted to demonstrate how threshold SDPD models

(16)

3

work to account for spatial dependencies of parameters, leading to a deeper analysis than that using usual SDPD models. We are able to identify the difference in model parameters between groups of states with different levels of income.

Chapter 3 examines the bias and inefficiency in spatial econometric model estimation when using data sets sampled from a population, and offers a Gibbs sampling based correction method to mitigate the issue. In empirical statistic analysis, estimating a model with a sample from the population is common practice, and spatial econometrics is no exception. However, apart from the usual problem that the sample may not represent the population very well, there are additional challenges in spatial analysis. In point pattern analysis literature, it is common to distinguish the situation when all points are available (known as a mapped pattern) from the situation when only a sample of them can be observed (referred to as a sample pattern. See Diggle (1983); Illian et al. (2008)). Nevertheless, the problem is usually overlooked in the field of spatial econometrics.

Some previous research has considered the missing data problem in spatial analysis. Bennett et al. (1984); Haining et al. (1984); Griffith et al. (1989) analyzed the effects of missing spatial data and compared the performances of different methods under such circumstance. More recently Arbia et al. (2016) have reappraised the problem extending the study to the effects on the estimation of a spatial regression model. They show that the presence of missing data reduces the precision of the estimates of all the regression parameters with a reduction of the efficiency which is emphasized by the presence of strong spatial correlation and by the presence of missing points that are clustered in space. The problem of missing data is well known in the statistical literature (Little, 1988; Little and Rubin, 2019; Rubin, 1976) where solutions have been suggested to replace the observations that are missing following different interpolation strategies (Dempster et al., 1977; Rubin, 2004), although with no explicit reference to the spatial data peculiarities.

The issues of using sampled spatial data have their unique characteristics. Firstly, for the missing data problem we usually assume most observations are available, but in our case it is the opposite: we only have a small portion of the population available. Secondly, unlike usual missing data problems, we need to consider the spatial structure of the observation. To address these issues, we start by examining what kind of problem it may cause.

To examine the possible consequences of observing a sample of data when the observations are distributed in space following a certain point pattern, we conducted a series of Monte Carlo simulations. We assess different combinations of spatial patterns (Complete Spatial Randomness(CSR), clustered, inhibition), sampling methods (random, quadrant, contagion, threshold), and sample propor-tions, and come to the conclusion that (i) biases and inefficiencies in parameters are observed in all simulations; (ii) the differences in bias are similar among different point patterns; (iii) quadrant and contagion sampling scheme has less bias and inefficiency, the threshold has the most.

(17)

the inaccuracies. We consider the situation that only a sample of observations is available, also the position of all data points and all independent variables are available. Under the SL model, the Gibbs sampler builds up the unknown observations with posterior samples every iteration. We validate the method with a set of Monte Carlo simulations, the results indicate that the method is effective under all four aforementioned sampling schemes. An empirical illustration is also put forward using western Tokyo land price data in 2019, showing the proposed method is able to make estimation result from sample data closer to the population one.

Chapter 4 proposed a maximum-likelihood-based least absolute shrinkage and selection operator (ML-LASSO) to estimate the spatial weight matrix in spatial lag (SL) models. In spatial analysis, the spatial weight matrix is usually pre-determined. Its choice remains a problem for the researchers, contiguity based matrices, geographic or economic distance-based matrices are some popular candidates. some previous studies have explored methods to estimate spatial weight matrices, Bhattacharjee and Jensen-Butler (2013) proposed a method to infer weight matrix in spatial error models under the assumption that the matrix is symmetric, and Beenstock and Felsenstein (2012) use moments from the sample covariance matrix to estimate the weight matrix in spatial lag models under restrictions.

LASSO is a regression method used for variable selection and regularization, and it is proposed by Tibshirani (1996) in the field of statistics. Recently, the method is gaining popularity thanks to the widespread adoption in machine learning. Under the assumption that the spatial weight matrix is sparse, which is reasonable since contiguity based or certain kinds of geographically based matrices are sparse in nature, we can use LASSO for its estimation. Ahrens and Bhattacharjee (2015) propose a two-step LASSO estimator for spatial lag models with potentially large n and reasonable T. Lam and Souza (2019) introduced a LASSO based method which allows a combination of predetermined weight matrix as well as an estimated one.

We focus on the SL model in this study and propose a cyclic coordinate descent type method to obtain the maximum likelihood LASSO (ML-LASSO) estimation result. There are two points that make this strategy more practical: (i) the partial derivative of the likelihood function with respect to an element of

the spatial weight matrix is quadratic; (ii) the active set strategy can be applied. The LASSO penalty level is chosen by minimizing extended Bayesian information criteria(EBIC) proposed by Chen and Chen (2008), and the covariance matrix adaptation evolution strategy (CMA-ES, Hansen and Ostermeier (1996)) is employed to find the optimal value.

The performance of the ML-LASSO is tested by a series of Monte Carlo experiments. The specifications consist of combinations of a weight matrix with non-zero elements on super-diagonal or both super-diagonal and sub-diagonal, with spatial unit numbers of 30,50,100, sample sizes of 50, 100, 150, and with or without regressors. The results show that (i) the estimator works well overall except the case when there is no regressor and the underlying weight matrix is non-symmetric (with super-diagonal non-zero elements); (ii)the estimator

(18)

5

performs better with regressors; (iii) the performance improves as the sparsity and sample size increase. we also have an empirical illustration using US precipitation data from 1895 through 1997. The proposed estimator is able to identify a first-order-contiguity-like spatial pattern.

(19)

(20)

Bibliography

Ahrens, A. and Bhattacharjee, A. (2015). Two-step lasso estimation of the spatial weights matrix. Econometrics, 3(1):128–155.

Anselin, L. (1988). Spatial Econometrics: Methods and Models, volume 4. Springer Science & Business Media.

Aquaro, M., Bailey, N., and Pesaran, M. H. (2015). Quasi maximum likelihood estimation of spatial models with heterogeneous coefficients. USC-INET Research Paper, (15-17).

Arbia, G. (2006). Spatial econometrics: statistical foundations and applications to regional convergence. Springer Science & Business Media.

Arbia, G., Espa, G., and Giuliani, D. (2016). Dirty spatial econometrics. The Annals of Regional Science, 56(1):177–189.

Baltagi, B. H., Song, S. H., and Koh, W. (2003). Testing panel data regression models with spatial error correlation. Journal of econometrics, 117(1):123–150. Beenstock, M. and Felsenstein, D. (2012). Nonparametric estimation of the

spatial connectivity matrix using spatial panel data. Geographical Analysis, 44(4):386–397.

Bennett, R. J., Haining, R. P., and Griffith, D. A. (1984). The problem of missing data on spatial surfaces. Annals of the Association of American Geographers, 74(1):138–156.

Bhattacharjee, A. and Jensen-Butler, C. (2013). Estimation of the spatial weights matrix under structural constraints. Regional Science and Urban Economics, 43(4):617–634.

Chen, J. and Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759–771.

Cliff, A. and Ord, J. (1973). Spatial autocorrelation. Pion Ltd., London. Cressie, N. A. (1993). Statistics for spatial data. Technical report, John Wiley

& Sons, Inc.

(21)

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22.

Diggle, P. J. (1983). Statistical analysis of spatial point patterns. Academic Press.

Elhorst, J. P. (2003). Specification and estimation of spatial panel data models. International regional science review, 26(3):244–268.

Elhorst, J. P. (2014). Spatial econometrics: from cross-sectional data to spatial panels, volume 479. Springer.

Griffith, D. A., Bennett, R. J., and Haining, R. P. (1989). Statistical analysis of spatial data in the presence of missing observations: a methodological guide and an application to urban census data. Environment and Planning A, 21(11):1511–1523.

Haining, R., Griffith, D. A., and Bennett, R. (1984). A statistical approach to the problem of missing spatial data using a first-order markov model. The Professional Geographer, 36(3):338–345.

Hansen, B. E. (1999). Threshold effects in non-dynamic panels: Estimation, testing, and inference. Journal of econometrics, 93(2):345–368.

Hansen, N. and Ostermeier, A. (1996). Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of IEEE international conference on evolutionary computation, pages 312–317. IEEE.

Illian, J., Penttinen, A., Stoyan, H., and Stoyan, D. (2008). Statistical analysis and modelling of spatial point patterns, volume 70. John Wiley & Sons. Kapoor, M., Kelejian, H. H., and Prucha, I. R. (2007). Panel data models with

spatially correlated error components. Journal of econometrics, 140(1):97–130. Kelejian, H. H. and Prucha, I. R. (1998). A generalized spatial two-stage least

squares procedure for estimating a spatial autoregressive model with autore-gressive disturbances. The Journal of Real Estate Finance and Economics, 17(1):99–121.

Lam, C. and Souza, P. C. (2019). Estimation and selection of spatial weight matrix in a spatial lag model. Journal of Business and Economic Statistics. Lee, L. and Yu, J. (2010). Estimation of spatial autoregressive panel data models

with fixed effects. Journal of Econometrics, 154(2):165 – 185.

LeSage, J. and Pace, R. (2009). Introduction to Spatial Econometrics. CRC Taylor & Francis Group.

(22)

9

Little, R. J. (1988). Missing-data adjustments in large surveys. Journal of Business & Economic Statistics, 6(3):287–296.

Little, R. J. and Rubin, D. B. (2019). Statistical analysis with missing data, volume 793. John Wiley & Sons.

Majumdar, A., Gelfand, A. E., and Banerjee, S. (2005). Spatio-temporal change-point modeling. Journal of Statistical Planning and Inference, 130(1-2):149– 166.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3):581–592. Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys, volume 81.

John Wiley & Sons.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288. Tong, H. and Lim, K. (1980). Threshold autoregression, limit cycles and

cycli-cal data-with discussion. Journal of the Royal Statisticycli-cal Society. Series B: Statistical Methodology, 42(3):245–292.

Yang, Z. (2018). Unified m-estimation of fixed-effects spatial dynamic models with short panels. Journal of econometrics, 205(2):423–447.

(23)

(24)

Chapter 2

A threshold extension of

spatial dynamic panel

model with fixed effects

2.1 Introduction

A panel data model with spatial interactions has been gaining more attention since the works of Baltagi et al. (2003), Kapoor et al. (2007) and Yu et al. (2008) in the field of econometrics. Many different settings, such as static or dynamic model with spatial lag (SL) or spatial error (SE) and fixed or random effect, have been explored with their corresponding estimation methods. The spatial dynamic panel data (SDPD) model with a fixed effect is one of the most popular models in spatial panel data analysis. See, for example, studies by Lee and Yu (2010) and (Elhorst, 2014, Chapter 3)

This study focuses on a threshold extension of SDPD model with fixed effects, especially in a short panel setting. Tong and Lim (1980) pioneered threshold models in time series literature by self-exciting threshold autoregressive (SETAR) models, where a lagged variable is used as a threshold on which the model switches. Let yt be a time series and R1∪ · · · ∪ RQ = R be a partition of real line by

mutually disjoint subsets. Using a threshold variable yt−d, d > 0, they defined

SETAR for t such that yt−d∈ Rq, q = 1, . . . , Q, as follows:

yt= φ (q) 0 + p X i=1 φ(q)_i yt−i+ εt,

where εtis a sequence of independent and identically distributed (i.i.d.) error

terms. SETAR allows a temporal dependency of parameters by a threshold variable yt−d.

Extensions of threshold models from time series to spatial data have not been conducted extensively except for a few empirical studies. For instance,

(25)

Aquaro et al. (2015) applied spatial econometrics models with spatially dependent parameters, whereas Hansen (1999) extended the threshold techniques in time series to panel data, providing estimation and testing procedures for non-dynamic panels. Meanwhile, Majumdar et al. (2005) proposed a spatio-temporal model, which allows certain parameters to shift at a given time point.

This study tries a threshold extension of SDPD model with fixed effect by allowing model parameters θ to switch from one to another, depending on a threshold variable. We introduce an exogenous and time-independent variable zi at ith spatial region as a threshold variable. Dependent on the

partition Rq, q = 1, . . . , Q in which zi is included, dependent variable ytifollows

a SDPD model with parameters θq fixed from among θ1, . . . , θQ. In other words,

parameters in SDPD models can be spatially heterogeneous depending on a threshold variable, but they are time-independent. Threshold SDPD models reduce to usual SDPD when the threshold variables are constants.

The threshold variable and the corresponding threshold value need to be determined prior to the estimation due to their exogenous nature. There are several possible criteria for the choice of the threshold, such as (i) one that provide the best forecasting performance; (ii) one that shows significant differences in the estimated parameter values among the groups and so on.

We extend the unified M-estimation originally designed by Yang (2018) for usual SDPD models to that for threshold SDPD models. Yang (2018) demonstrated the consistency and asymptotic normality of the M-estimators in cases of short temporal length when the number of spatial regions tends to be infinite. We show that the M-estimation for threshold SDPD models still holds the consistency and asymptotic normality.

Additionally, we conduct Monte Carlo experiments to compare M-estimation with conditional quasi-maximum likelihood estimation (CQMLE) to demonstrate the advantage of the M-estimation over CQMLE in finite sample sizes. The empirical illustration using U.S. state-level GDP and power usage growth data is also conducted to demonstrate how threshold SDPD models work to account for spatial dependencies of parameters, leading to a deeper analysis than that using usual SDPD models. Although existing studies (Fallahi, 2011; Mahalingam and Orman, 2018) have examined the relations between economic growth and power usage, this study is the first trial to it spatial models accounting for spatial dependencies of parameters to them.

In Section 2, we set forth our threshold SDPD models as an extension of SDPD models. The M-estimation for threshold SDPD models is introduced in relation to that for SDPD models and the asymptotic results of consistency and asymptotic normality in Section 3. Monte Carlo experiments and applications to the real example are illustrated in Sections 4 and 5, respectively. Finally, Section 7 presents our conclusion.

(26)

13

2.2 A threshold extension of SDPD models

Let ytibe a spatial panel data at period t and regional unit i for t = 1, . . . , T and

i = 1, . . . , n. We consider a SDPD model with a fixed effect as our base model, which can be regarded as a reduced version of the general spatial panel model summarized by Elhorst (2014). Let Wr, r = 1, 2, 3 be a predetermined n × n

spatial weight matrix whose (i, j)th element represents the spatial correlation between units i and j; the diagonal elements are zero and the other elements are normalized to obtain a sum of 1 for every row. Then our base model is described by yti=µi+ αt+ xtiβ + ρyt−1,i+ λ1 n X j=1 w1,ijytj+ λ2 n X j=1 w2,ijyt−1,j+ uti, uti=λ3 n X j=1 w3,ijutj+ vti, i = 1, . . . , n, t = 1, . . . , T, (2.1)

where {vti} is i.i.d. across i and t with mean 0 and variance σ2v, {µi} is

individual-specific effects, {αt} is time-specific effects, and xti= (xti1, . . . , xtip) denotes a

vector of explanatory variable at time t that does not contain any time-invariant variable because of the model identification. Moreover, ρ, λ1, λ2 and λ3 are

all scalar parameters reflecting the strength of spatio-temporal dependence. Theoretical and empirical studies have extensively examined the spatial dynamic panel model with fixed effect (Lee and Yu, 2010).

Some empirical studies pointed out that regional difference in the regression parameters β and the spatio-temporal ones of ρ, λr, r = 1, 2, 3 is often detected

and suitable modeling could significantly improve the model performance (LeSage and Chih, 2018). Here we propose a threshold extension of the aforementioned model as a base to account for the regional differences. This extension is inspired by the threshold models in time series literature (Tong and Lim, 1980) where the model switches by lagged dependent variable. Despite the popularity in time series literature, threshold models have rarely been used in the field of spatial econometrics. Hansen (1999) proposed a threshold non-dynamic panel with fixed effects, allowing parameters β for regressors xtito switch between two groups.

Majumdar et al. (2005) proposed a spatio-temporal model with a mean shift at certain time points.

Let zi, i = 1, . . . , n be an exogenous time-independent univariate threshold

variable and Rr, r = 1, 2, ..., Q be mutually disjoint subsets that satisfy

R1∪ R2∪ ... ∪ RQ = R.

(27)

The model is described as follows, for units i such that zi∈ Rq, q = 1, . . . , Q, yti= µi+ αt+ xtiβq+ ρqyt−1,i+ λ1q n X j=1 w1,ijytj + λ2q n X j=1 w2,ijyt−1,i+ uti, uti= λ3q n X j=1 w3,ijutj+ vti, (2.2)

where βq, ρq and λrq, r = 1, 2, 3 are the parameters in the qth regime. In

comparison with the existing approaches by Hansen (1999) or Majumdar et al. (2005), our model is more flexible in the sense that it allows for three spatial

effects. Here we list three examples. First, by setting all λqr, r = 2, 3 to 0, we

obtain the submodel that only contains the SL. Second, by setting all λqr, r = 1.2

to 0, we leave the model with only SE. Finally, by setting all λq3to 0, we obtain

the submodel with SL and space-time lag (SLTL). In the case of spatially lagged independent variablesP

jwr,ijxtj, they can be included as part of xti.

2.3 Estimation

2.3.1 M-estimation

The estimation of our threshold model is based on the M-estimator proposed by Yang (2018). We have chosen this method because it incorporates bias correction mechanism to achieve better accuracy and robustness over quasi-maximum likelihood (QML) methods, especially when T is smaller than n. Note that the threshold is determined before the estimation rather than during the estimation process. In other words, R1, . . . , RQ are assumed to be known in this

section, although in the later empirical section we shall choose the threshold in a trial-and-error fashion.

Following the procedure of Yang (2018), we eliminate all the time-invariant terms by first-differencing (2.2). For units i such that zi∈ Rq, q = 1, . . . , Q,

∆yti= ∆αt+ ∆xtiβq+ ρq∆yt−1,i+ λ1q n X j=1 w1,ij∆ytj + λ2q n X j=1

w2,ij∆yt−1,j+ ∆uti,

∆uti= λ3q n X j=1 w3,ij∆utj+ ∆vti, t = 2, . . . , T. (2.3)

Let us re-express in a matrix form for ∆yt = (∆yt1, . . . , ∆ytn)0 by ∆ut =

(28)

15

variable matrix xt = (x0t1, · · · , x0tn)0, let x (q)

t be xt whose i rows are replaced

with 0 for units i such that zi ∈ R/ q. Similarly, for the weight matrix Wr =

(wr,ij).r = 1, 2, 3, let W (q)

r be Wrwhose ith rows replaced with 0 for units i such

that zi∈ R/ q. Additionally, define W (q)

0 by the identity matrix In whose ith rows

are replaced with 0 for zi∈ R/ q. Notice that

xt=x (1) t + · · · + x (Q) t , In=W (1) 0 + · · · + W (Q) 0 , Wr=Wr(1)+ · · · + W (Q) r , r = 1, 2, 3.

Then we have the matrix expression for (2.3) given by

∆yt= ∆αt1n+ Q X q=1 ∆x(q)_t βq+ Q X q=1 ρqW (q) 0 ! ∆yt−1 + Q X q=1 λ1qW (q) 1 ! ∆yt+ Q X q=1 λ2qW (q) 2 ! ∆yt−1+ ∆ut, ∆ut= Q X q=1 λ3qW (q) 3 ! ∆ut+ ∆vt, t = 2, . . . , T. (2.4)

Let ∆Y = (∆y₂0, . . . , ∆y0_T)0, ∆Y−1 = (∆y10, . . . , ∆y0T −1)0, and ∆u = (∆u02, . . . , ∆u0T)0

and ∆v = (∆v0₂, . . . , ∆v0_T)0. For di, the n × (T − 1) matrix whose ith

col-umn is 1 and 0 otherwise. Define ∆Xt = (∆x (1) t , . . . , ∆x (Q) t , dt−1), ∆X = (∆X₂0, . . . , ∆X_T0)0 and β = (β₁0, . . . , β_Q0 , ∆α2, . . . , ∆αT)0. Let W (q) r = IT −1⊗ Wr(q), r = 0, 1, 2, 3, q = 1, . . . , Q and Br(λr) = In− Q X q=1 λrqWr(q), r = 1, 3, B2(ρ, λ2) = Q X q=1 ρqW (q) 0 + λ2qW (q) 2 ,

for λr = (λr1, . . . , λrQ)0 and ρ = (ρ1, . . . , ρQ)0. Then we can write the model

(2.4) as ∆Y = ∆Xβ + Q X q=1 ρqW (q) 0 ! ∆Y−1+ Q X q=1 λ1qW (q) 1 ! ∆Y + Q X q=1 λ2qW (q) 2 ! ∆Y−1+ ∆u, ∆u = Q X q=1 λ3qW (q) 3 ! ∆u + ∆v. (2.5)

(29)

In the following, we distinguish the true value of a parameter from its general value by adding a subscript 0, e.g. β0is the true value of β. Let ρ = (ρ1, . . . , ρQ)

and λr= (λr1, . . . λrQ), r = 1, 2, 3. It is easy to see that

V ar(∆u) = σ_v02 C ⊗ [B0

3(λ30)B3(λ30)]−1 ≡ σv02 Ω(λ30),

where C is the (T − 1) × (T − 1) constant matrix

C =        2 −1 0 · · · 0 0 0 −1 2 −1 · · · 0 0 0 .. . ... ... . .. ... ... ... 0 0 0 · · · −1 2 −1 0 0 0 · · · 0 −1 2        .

The quasi Gaussian loglikelihood for the parameter ψ = (β, σv2, ρ, λ1, λ2, λ3) in

terms of ∆y2, . . . , ∆yT, as if ∆y1is exogenous, takes the form

l(ψ) = − n(T − 1) 2 log(σ 2 v) − 1 2log |Ω(λ3)| + log |B1(λ1)| − 1 2σ2 v ∆u(θ)0Ω(λ3)−1∆u(θ), (2.6)

where θ = (β, ρ, λ1, λ2), ∆u(θ) = B1(λ1)∆Y − B2(ρ, λ2)∆Y−1− ∆Xβ, B1(λ1) =

IT −1⊗ B1(λ1) and B2(ρ, λ2) = IT −1⊗ B2(ρ, λ2).

Let S(ψ) = _∂ψ∂ l(ψ) be the conditional quasi score function. We have

S(ψ) =                      1 σ2 v ∆X0_Ω−1_(λ 3)∆u(θ), 1 2σ2 v ∆u(θ)0_Ω−1_(λ 3)∆u(θ) −n(T −1)2σ2 v , 1 σ2 v∆u(θ) 0_Ω−1_(λ 3)W(q)0 ∆Y−1, q = 1, . . . , Q, 1 σ2 v ∆u(θ)0_Ω−1_(λ 3)W (q) 1 ∆Y − tr(B−11 W (q) 1 ), q = 1, . . . , Q, 1 σ2 v∆u(θ) 0_Ω−1_(λ 3)W(q)2 ∆Y−1, q = 1, . . . , Q, 1 2σ2 v ∆u(θ)0_(C−1_{⊗ A}(q) 3 (λ3))∆u(θ) − (T − 1)tr(G (q) 3 (λ3)), q = 1, . . . , Q, (2.7) where A(q)₃ = W₃0(q)B3(λ3) + B30(λ3)W (q) 3 and G (q) 3 = W (q) 3 B −1 3 (λ3).

The estimator ˆψ given by solving the equation S(ψ) = 0, is equivalent to the QML estimator that maximizes (2.6). It inconsistent unless T goes to infinity, because the necessary condition for the consistency, which is given by

lim n→∞ 1 nTS(ψ0) p − → 0,

is not satisfied when T is fixed. Following Yang (2018) ), we shall adjust the score function to overcome this inconsistency. Particularly, we shall remove the bias of the initial conditions that does not converge to 0 when T is fixed. We need the assumption to evaluate the bias on initial conditions for y0in (2.2).

Assumption 1 Under model (2.2), (i) the process started m periods before data collection begins, the 0th period, and (ii) if m ≥ 1, ∆y0 is independent of future

(30)

17

Under Assumption 1, using (i) the error term vit in 2.2, which is independent

across i and t with mean 0 and variance σ2

v, (ii) the regressor Xt, which is

exogenous, and (iii) both B−1₁₀ and B₃₀−1, we shall evaluate ES(ψ), the bias term. by reducing (2.4):

∆yt= B0∆yt−1+ B10−1∆Xtβ0+ B10−1B −1

30∆vt, t = 2, . . . , T, (2.8)

where B = B(ρ, λ1, λ2) = B1−1(λ1)B2(ρ, λ2). After tedious but straightforward

calculations, we obtain

E(∆Y−1∆v0) = −σv02 D−10B−130,

E(∆Y ∆v0) = −σ_v02 D0B−130,

where D−1= D−1(ρ, λ1, λ2) and D = D(ρ, λ1, λ2) are given by

D ≡ D(ρ, λ1, λ2) =      B − 2In In · · · 0 (In− B)2 B − 2In · · · 0 .. . ... . .. ... BT −3_(I n− B)2 BT −4(In− B)2 · · · B − 2In      B−11 , D−1≡ D−1(ρ, λ1, λ2) =      In 0 · · · 0 0 B − 2In In · · · 0 0 .. . ... . .. ... ... BT −4_(I n− B)2 BT −5(In− B)2 · · · B − 2In In      B−11 . (2.9)

Let C = C ⊗ In. It follows that, for q = 1, . . . , Q,

E(∆u0Ω−1₀ W₀(q)∆Y−1) = − σv02 tr(C−1D−10W(q)0 ), E(∆u0Ω−1₀ W(q)₁ ∆Y ) = − σ_v02 tr(C−1D0W (q) 1 ), E(∆u0Ω−1₀ W₂(q)∆Y−1) = − σv02 tr(C −1_D −10W (q) 2 ), (2.10)

which leads to the adjusted score function, by removing the bias term for S(ψ) in (2.7). The adjusted score function is as follows:

S∗(ψ) =                      1 σ2 v∆X 0_Ω−1_(λ 3)∆u(θ), 1 2σ2 v∆u(θ) 0_Ω−1_(λ 3)∆u(θ) −n(T −1)_2σ2 v , 1 σ2 v ∆u(θ)0_Ω−1_(λ 3)W (q) 0 ∆Y−1+ tr(C−1D−1W(q)0 ), q = 1, . . . , Q, 1 σ2 v ∆u(θ)0_Ω−1_(λ 3)W (q) 1 ∆Y + tr(C −1_DW(q) 1 ), q = 1, . . . , Q, 1 σ2 v∆u(θ) 0_Ω−1_(λ 3)W(q)2 ∆Y−1+ tr(C−1D−1W(q)2 ), q = 1, . . . , Q, 1 2σ2 v∆u(θ) 0_(C−1_{⊗ A}(q) 3 (λ3))∆u(θ) − (T − 1)tr(G (q) 3 (λ3)), q = 1, . . . , Q. (2.11) Solving S∗(ψ) = 0 yields the M-estimation. To simplify the process, β, σ2 v

can be concentrated out by solving the equation for given δ = (ρ, λ1, λ2, λ3):

ˆ βM(δ) = (∆X0Ω−1∆X)−1∆X0Ω−1(B1∆Y − B2∆Y−1), ˆ σ2_v,M(δ) = 1 n(T − 1)∆ˆu(δ) 0_Ω−1_∆ˆ_u(δ), (2.12)

(31)

where ∆ˆu(δ) = ∆u( ˆβM(δ), ρ, λ1, λ2). Substituting them back into (2.11) gives

the concentrated adjusted score functions, which is given by, for q = 1, . . . , Q,

Sc∗(δ) =              1 ˆ σ2 v,M ∆û(δ)0_Ω−1_(λ 3)W (q) 0 ∆Y−1+ tr(C−1D−1W (q) 0 ), 1 ˆ σ2 v,M ∆û(δ)0_Ω−1_(λ 3)W (q) 1 ∆Y + tr(C−1DW (q) 1 ), 1 ˆ σ2 v,M ∆û(δ)0_Ω−1_(λ 3)W (q) 2 ∆Y−1+ tr(C−1D−1W(q)2 ), 1 2ˆσ2 v,M ∆û(δ)0_(C−1_{⊗ A}(q) 3 (λ3))∆û(δ) − (T − 1)tr(G (q) 3 (λ3)). (2.13) Solving the equations S_c∗(δ) = 0, we obtain the M-estimators ˆδM, from which

we have ˆβM = ˆβM(ˆδM) and ˆσv,M2 = ˆσ 2

v,M(ˆδM) by (2.12).

2.3.2 Asymptotic properties

In this section we present the asymptotic properties of the M-estimator for our threshold model. The asymptotic properties of our modified M-estimator can be proved in the same way as the original, therefore we only list the assumptions and the resulting theorems. Notice (i) the subscript 0 indicates the true value; (ii) ρ = (ρ1, . . . , ρQ) and λr = (λr1, . . . λrQ), r = 1, 2, 3; and (iii) γmin(M )

and γmax(M ) denote the smallest and largest eigenvalues of a real symmetric

matrix M respectively. In addition to Assumption 1, we present the following assumptions in addition to Assumption 1 in the last section:

Assumption 2 The innovations vti are i.i.d. for all t and i with E(vti) = 0,

V ar(vti) = σ2v0, and E|vti|4+< ∞ for some > 0.

Assumption 3 The time-varying regressors {xt, t = 0, 1, ..., T } are exogenous,

and their values are uniformly bounded, and limn→∞_nT1 ∆X0∆X exists and is

non-singular.

Assumption 4 The space ∆ for the parameter δ = (ρ, λ1, λ2, λ3) is compact,

and the true parameter δ0 lies in its interior.

Assumption 5 (i) For r = 1, 2, 3,, the elements wr,ij of Wr are at most of

order h−1_n , uniformly bounded in all i and j, and wr,ii= 0 for all i;

(ii) hn

n → 0 as n → ∞;

(iii) {Wr, r = 1, 2, 3} and {Br0−1, r = 1, 3} are uniformly bounded in both row

and column sums;

(iv) For r = 1, 3, {Br−1} are uniformly bounded in either row or column

sums, uniformly bounded in λr in a compact parameter space Λr, and

0 < cr≤ infλr∈Λrγmin(B

0

rBr) ≤ supλr∈Λrγmax(B

0

rBr) ≤ cr< ∞, where

γmin and γmax denote minimum and maximum eigenvalues, respectively.

Assumption 6 For an n × n matrix Φ uniformly bounded in either row or column sums, with elements of uniform order h−1n , and an n × 1 vector φ with

(32)

19

(i) hn

n ∆y 0

1Φ∆y1= Op(1) and h_nn∆y01Φ∆v2= Op(1);

(ii) hn n (∆y1− E(∆y1)) 0_{φ = o} p(1); (iii) hn n [∆y 0

1Φ∆y1− E(∆y10Φ∆y1)] = op(1);

(iv) hn

n [∆y 0

1Φ∆v2− E(∆y10Φ∆v2)] = op(1).

Let us define the population counterpart of (2.13). Let S∗(ψ) = E[S∗(ψ)], and partially solve S∗(ψ) = 0. For a given δ, we obtain the following,

βM(δ) = (∆X0Ω−1∆X)−1∆X0Ω−1(B1E∆Y − B2E∆Y−1),

σ2_v,M(δ) = 1

n(T − 1)E[∆u(δ)

0_Ω−1_∆u(δ)], (2.14)

where ∆u(δ) = ∆u(β_M(δ), ρ, λ1, λ2) = B1∆Y − B2∆Y−1− ∆XβM(δ).

Substi-tuting them back into S∗(ψ), we obtain, for q = 1, . . . , Q,

S∗c(δ) =              1 σ2 v,M E[∆u(δ)0_Ω−1_(λ 3)W0(q)∆Y−1] + tr(C−1D−1W(q)0 ), 1 σ2 v,M E[∆u(δ)0Ω−1(λ3)W1(q)∆Y ] + tr(C −1_DW(q) 1 ), 1 σ2 v,M E[∆u(δ)0Ω−1(λ3)W2(q)∆Y−1] + tr(C−1D−1W(q)2 ), 1 2σ2 v,ME[∆u(δ) 0_(C−1_{⊗ A}(q) 3 (λ3))∆ˆu(δ)] − (T − 1)tr(G (q) 3 (λ3)). (2.15) The M-estimator ˆδM is a zero of Sc∗(δ), whereas δ0 is a zero of S

∗

c(δ), which is

easy to see through β_M(δ0) = β0 and σ2v,M(δ) = σ2v0. Thus ˆδM is consistent for

δ0if supδ∈∆||Sc∗(δ) − S ∗

c(δ)|| → 0 in probability, and the identifiability condition

holds. Denote ˆψM = ( ˆβ0M, ˆσv,M2 , ˆρ0M, ˆλ0M)0.

Assumption 7 infδ:d(δ,δ0)≥||S

∗

c(δ)|| > 0 for every > 0, where d(δ, δ0) is a

measure of distance between δ0 and δ.

We have the consistency and asymptotic normality for our M-estimator as an extension of those of Yang (2018).

Theorem 1 Suppose Assumptions 1-7 hold. Assume further that (i) γmax[V ar(∆Y )] and γmax[V ar(∆Y−1)] are bounded;

(ii) infδ∈∆γmin[V ar(B1∆Y − B2∆Y−1)] ≥ cy > 0.

We then obtain n → ∞, ˆψM p

−→ ψ0.

Theorem 2 Under the assumptions of Theorem 1, we have, as n → ∞, p n(T − 1) ˆψM − ψ0 _D −→ N [0, lim n→∞Σ −1_(ψ 0)Γ(ψ0)Σ0−1(ψ0)],

where Σ(ψ0) = −_{n(T −1)}1 E[_∂ψ∂0S∗(ψ0)] and Γ(ψ0) = _{n(T −1)}1 V ar[S∗(ψ0)], both

(33)

2.3.3 The robust estimation for the asymptotic variance

matrix

The asymptotic variance-covariance matrix of the M -estimator ˆψM is shown

in Theorem 2. Statistical inference for the M -estimation requires consistent estimators for Σ(ψ0) and Γ(ψ0). As Σ(ψ0) is the Hessian of S∗(ψ) at ψ0, it is

estimated by the Hessian at ˆψM, namely by

ˆ Σ = − 1 n(T − 1) ∂ ∂ψ0S ∗_{( ˆ}_ψ M), (2.16)

which is consistent with the similar arguments in the proof of Theorem 2. Meanwhile, it is difficult to obtain a consistent estimator for

Γ(ψ0) =

1

n(T − 1)V ar[S

∗_(ψ 0)],

as it requires giving a model for ∆y1. Following Yang (2018), who proposed

the outer-product-of-martingale-difference (OPMD) method free of the initial models, we extend his method to our threshold cases.

Let us re-express our model in (2.5) with the initial value ∆y1. Under

Assumption 1, we apply the reduced form Equation (2.8) to (2.5) to obtain ∆Y = R∆y1+ η + S∆v,

∆Y−1= R−1∆y1+ η−1+ S−1∆v,

(2.17)

where

∆y1= (∆y10, . . . , ∆y 0 1) 0 _{R = diag(B} 0, B02, . . . , B T −1 0 ) R−1= diag(In, B0, . . . , BT −20 ) η = BB −1 10∆Xβ0 η−1= B−1B−110∆Xβ 0 S = BB−110B−130 S−1= B−1B−110B −1 30 B =        IN 0 0 · · · 0 B0 IN 0 · · · 0 B2 0 B0 IN · · · 0 .. . ... ... . .. ... B₀T −2 B₀T −3 B₀T −4 · · · IN        , _B−1=        0 0 0 · · · 0 0 IN 0 0 · · · 0 0 B0 IN 0 · · · 0 0 .. . ... ... . .. ... ... B₀T −3 B₀T −4 B₀T −5 · · · In 0        ,

which leads to the re-expression of the adjusted score function at ψ0 in (2.11)

given by, for q = 1, . . . , Q,

S∗(ψ0) =                      Π01∆v, ∆v0Φ1∆v −n(T −1)_2σ2 v0 , ∆v0Ψ(q)₁ ∆y1+ Π0(q)2 ∆v + ∆v0Φ (q) 2 ∆v+ tr(C−1D−10W0(q)), ∆v0Ψ(q)₂ ∆y1+ Π0(q)3 ∆v + ∆v0Φ (q) 3 ∆v+ tr(C−1D0W1(q)), ∆v0Ψ(q)₃ ∆y1+ Π0(q)4 ∆v + ∆v0Φ (q) 4 ∆v+ tr(C−1D−10W2(q)), ∆v0Φ(q)₅ ∆v − (T − 1)tr(G(q)₃₀), (2.18)

(34)

21 where Π1= 1 σ2 v0 Cb∆X, Cb= C−1⊗ B30, Π(q)₂ = 1 σ2 v0 CbW (q) 0 η−1, Π (q) 3 = 1 σ2 v0 CbW (q) 1 η, Π(q)₄ = 1 σ2 v0 CbW (q) 2 η−1, Φ1= 1 2σ2 v0 (C−1⊗ In), Φ(q)₂ = 1 σ2 v0 CbW (q) 0 S−1, Φ(q)3 = 1 σ2 v0 CbW (q) 1 S, Φ(q)₄ = 1 σ2 v0 CbW (q) 2 S−1, Φ(q)5 = 1 σ2 v0 [C−1⊗ (G0(q)30 + G (q) 30)], Ψ(q)₁ = 1 σ2 v0 CbW (q) 0 R−1, Ψ(q)2 = 1 σ2 v0 CbW (q) 1 R, Ψ(q)₃ = 1 σ_v02 CbW (q) 2 R−1. Define Ψt+=P T s=2Ψts, t = 2, . . . , T, Θ = Ψ2+(B30B10)−1, ∆yo1= B30B10∆y1

and ∆y1t∗ = Ψt+∆y1. For a square matrix A, let Au, Al, and Ad be the

upper-triangular, lower-triangular and diagonal matrix of A respectively. Then A = Au_{+ A}l_{+ A}d_{. Let Π}

t, Ψts, Φts be the sub-matrices of corresponding ones

partitioned according to t, s = 2, 3, .., T . Then define g1i= T X t=2 Π0_it∆vit, g2i= T X t=2 (∆vit∆ξit+ ∆vit∆v∗it− σ 2 v0dit), (2.19) g3i= ∆v2i∆ζi+ Θii(∆v2i∆y1io + σ 2 v0) + T X t=3 ∆vit∆y1it∗ , where ∆ξt = P T s=2(Φ0ust+ Φlts)∆vs, ∆vt∗ = PT s=2Φ d ts∆vs, dit is the diagonal

element of Φ, and ∆ζ = (Θu_{+ Θ}l_)∆yo

1. Let Gn,i be the σ-field generated by

(vj1, . . . , vjT, j = 1, . . . , i), i = 1, . . . , n, n ≥ 1 and Fn,0 be the σ-field generated

by (v0, ∆y0), and define Fn,i= Fn,0⊗ Gn,i. Then we obtain

Π0∆v = n X i=1 g1i, ∆v0Φ∆v − E(∆v0Φ∆v) = n X i=1 g2i,

∆v0Ψ∆y1− E(∆v0Ψ∆y1) = n

X

i=1

(35)

and {(g_1i0 , g2i, g3i)0, Fn,i}ni=1 form a martingale difference (M.D.) sequence.

Let g_1ji(q) be g1i, replacing Π with Π (q)

j for j = 1, 2, 3, 4 and q = 1, . . . , Q. Let

g_2ji(q) and g(q)_3jibe g2i and g3i, which are constructed in the same way. Define

gi= (g11i0 , g21i, h (1) 0i , . . . , h (Q) 0i , h (1) 1i , . . . , h (Q) 1i , h (1) 2i , . . . , h (Q) 2i , h (1) 3i , . . . , h (Q) 3i ) 0_, where

h(q)_0i = g(q)_31i+ g_12i(q)+ g_22i(q), h(q)_1i = g(q)_32i+ g_13i(q)+ g_23i(q), h(q)_2i = g(q)_33i+ g_14i(q)+ g_24i(q), h(q)_3i = g(q)_25i. Then S∗(ψ0) = n X i=1 gi,

and {gi, Fn,i} form a vector M.D. sequence. It follows that Γ(ψ0) = V ar(S∗(ψ0))

is estimated consistently by ˆ Γ = 1 n(T − 1) n X i=1 ˆ giˆgi0,

where ˆgiis obtained by replacing ψ0with ˆψM and ∆v by its observed counterpart.

Theorem 3 Under the assumptions in 1, we have, as n → ∞,

ˆ Γ − Γ(ψ0) = 1 n(T − 1) n X i=1 {ˆgigˆi0− E(gigi0)} p −→ 0, and hence ˆΣ−1Γ ˆˆΣ0−1− Σ−1_(ψ 0)Γ(ψ0)Σ0−1(ψ0) p − → 0.

2.4 Simulation

This section conducts simulations under several settings to compare the M-estimation with CQMLE. The M-M-estimation is obtained by solving (2.13), whereas CQMLE is obtained by solving (2.7). The model for the simulation experiments is yti= µi+ xtiβq+ ρqyt−1,i+ λ1q n X j=1 w1,ijytj + λ2q n X j=1 w2,ijyt−1,i+ uti, uti= λ3q n X j=1 w3,ijutj+ vti, (2.20)

(36)

23

where xti is independently drawn from the uniform distribution on [−1, 1], and

the fixed effects µi are generated from _T1 P T

t=1Xt+ , where ∼ i.i.d. N (0, 0.5).

We simulated ytjfor t from 1 to 100 by setting y0j = 0, and use the last T periods

for the studies to guarantee Assumption 1. We simulate the threshold variables of Zi, i == 1, 2, . . . , n from i.i.d. standard normal variables, dividing R == R1, R2

at the origin with by Q = = 2. We established seven cases to validate the threshold SDPD model. Case 1 is designed to examine the M-estimation in comparison with CQMLE under the following setting:

C1. n = 50, T = 5 standard normal errors and the identical weights W1 =

W2= W3of the first contiguity over 5 × 10 grid.

Meanwhile, Cases 2-7 are established to check (1) the bias correction of the M-estimation for short panels, (2) the effects of non-Gaussian errors, and (3) the effects when the weight matrices of W 1 , W 2 and W 3 are not necessarily identical, under the following variety of settings:

C2. n = 50, T = 5, standard normal errors and the identical first contiguity weights;

C3. n = 50, T = 5, Gaussian mixture errors of 90% N (0, 1) and 10% N (0, 42_),

then row-normalized;

C4. n = 50, T = 5, χ2₃ errors and the identical first contiguity weights; C5. n = 50, T = 5, standard normal errors, and the identical first contiguity

weights, where the model parameters are simulated uniformly on [−0.3, 0.3] for each iteration;

C6. n = 50, T = 5 standard normal errors and the randomized weight matrices, where for every row two random elements wij (j = 1, 2, . . . , n and j 6= i)

are assigned 1, others 0, then row-normalized;

C7. n = 20, T = 20 , standard normal errors and the identical first contiguity weights over 5 × 4 grid.

We constructed the M-estimator and CQMLE in Case 1, whereas in Cases 2-7, we constructed the M-estimator with the robust standard error, denoted as

ˆ

se, which is obtained in Theorem 3 by the square root of the diagonal elements of ˆΣ−1Γ ˆˆΣ0−1, where the standard error, denoted asseˆH, obtained by those of

ˆ

Σ−1, was also evaluated for comparison. Using R Core Team (2016) and Konen and Hansen (2015), we evaluated the bias and root mean squared error (RMSE) of the M-estimator with the average ˆse andseˆH from 100 iterations. The results

are reported in Tables 1-7.

Table 2.1 compares the bias and RMSE of the M-estimator and CQMLE of the full model under Gaussian error. We can see that the M-estimator performed preferably for ρq and β, with much smaller bias. Moreover, Tables 2.2, 2.3 and

2.4 present the performance of the M-estimator under the three diferent error distributions, ˆse andseˆH. Similar to the results of Yang (2018), there are no

(37)

M-estimator CQMLE TRUE Bias RMSE Bias RMSE λ11 0.3 -0.0126 0.0677 -0.0009 0.0833 λ21 0.1 0.0133 0.0742 -0.0139 0.0643 λ31 0.1 0.0170 0.1992 -0.0137 0.0804 ρ1 0.3 -0.0026 0.0337 -0.0175 0.0351 β1 7 -0.0042 0.2225 -0.0465 0.2413 λ12 0.1 0.0036 0.0702 -0.0093 0.0798 λ22 0.2 -0.0134 0.0741 -0.0107 0.0642 λ32 0.3 0.0036 0.1902 0.0422 0.2061 ρ2 0.1 0.0011 0.0620 -0.0830 0.1030 β2 3 0.0173 0.2138 -0.0931 0.2314 σ2 1 -0.0635 0.1338 -0.0994 0.1497

Table 2.1: The M-estimator and CQMLE in Case 1: n = 50, T = 5 standard normal errors and the identical weights W 1 = W 2 = W 3 of the first contiguity over 5 × 10 grid.

TRUE Mean sd Bias RMSE seˆ seHˆ λ11 0.3 0.2874 0.0669 -0.0126 0.0677 0.0840 0.0717 λ21 0.1 0.1133 0.0734 0.0133 0.0742 0.1090 0.0698 λ31 0.1 0.1170 0.1995 0.0170 0.1992 0.2115 0.1988 ρ1 0.3 0.2974 0.0338 -0.0026 0.0337 0.0328 0.0321 β1 7 6.9958 0.2236 -0.0042 0.2225 0.2105 0.2159 λ12 0.1 0.1036 0.0705 0.0036 0.0702 0.0775 0.0702 λ22 0.2 0.1866 0.0732 -0.0134 0.0741 0.0892 0.0700 λ32 0.3 0.3036 0.1911 0.0036 0.1902 0.2014 0.1931 ρ2 0.1 0.1011 0.0623 0.0011 0.0620 0.0705 0.0631 β2 3 3.0173 0.2141 0.0173 0.2138 0.2144 0.2102 σ2 1 0.9365 0.1184 -0.0635 0.1338 0.1103 0.1108

Table 2.2: The M-estimator in Case 2: n = 50, T = 5, standard normal errors and the identical first contiguity weights over 5 × 10 grid.

Table 2.3: The M-estimator in Case 3: n = 50, T = 5, Gaussian mixture errors of 90% N (0, 1) and 10% N (0, 42), and the identical first contiguity weights over 5 × 10 grid.

significant differences exist between ˆse andseˆH for Gaussian errors, whereas ses

of for the other two non-Gaussian error cases are much closer to the corresponding RMSEs, which shows the robustness under non-normality. Table 2.5 reports the results under randomized parameters, The biases and RMSEs are comparable with those in Table 2.2, which indicates that the M-estimator works well for different sets of parameters. Table 2.6 presents the results for randomized W1,

W2, and W3, which are also comparable with those of Table 2.2, indicating that

identical choice of W1 , W2, and W3does not necessarily affect the estimation

(38)

25

TRUE Mean sd Bias RMSE seˆ seHˆ λ11 0.3 0.2739 0.0790 -0.0261 0.0828 0.0789 0.0707 λ21 0.1 0.0863 0.0749 -0.0137 0.0758 0.1093 0.0690 λ31 0.1 0.1072 0.1978 0.0072 0.1970 0.2100 0.2023 ρ1 0.3 0.3008 0.0335 0.0008 0.0334 0.0333 0.0325 β1 7 6.9827 0.2366 -0.0173 0.2361 0.2099 0.2161 λ12 0.1 0.1153 0.0771 0.0153 0.0783 0.0761 0.0693 λ22 0.2 0.2070 0.0667 0.0070 0.0667 0.0903 0.0705 λ32 0.3 0.2925 0.2020 -0.0075 0.2011 0.2036 0.1954 ρ2 0.1 0.1006 0.0666 0.0006 0.0663 0.0674 0.0621 β2 3 2.9900 0.2340 -0.0100 0.2330 0.2106 0.2070 σ2 1 0.9391 0.1819 -0.0609 0.1910 0.1592 0.1112

Table 2.4: The M-estimator in Case 4: n = 50, T = 5, χ2₃ errors and the identical first contiguity weights over 5 × 10 grid.

TRUE Mean sd Bias RMSE λ11 −0.3 ∼ 0.3 - - -0.0090 0.0736 λ21 −0.3 ∼ 0.3 - - 0.0010 0.0715 λ31 −0.3 ∼ 0.3 - - -0.0069 0.2208 ρ1 −0.3 ∼ 0.3 - - 0.0020 0.0317 β1 7 6.9736 0.2241 -0.0264 0.2245 λ12 −0.3 ∼ 0.3 - - -0.0138 0.0656 λ22 −0.3 ∼ 0.3 - - 0.0022 0.0753 λ32 −0.3 ∼ 0.3 - - 0.0226 0.1715 ρ2 −0.3 ∼ 0.3 - - 0.0197 0.0682 β2 3 2.9886 0.1899 -0.0114 0.1893 σ2 1 0.9400 0.0971 -0.0600 0.1137

Table 2.5: The M-estimator in Case 5: n = 50, T = 5, standard normal errors, and the identical first contiguity weights over 5 × 10 grid, where the model parameters are simulated uniformly on [−0.3, 0.3] for each iteration.

Table 2.6: The M-estimator in Case 6: n = 50, T = 5 standard normal errors and the randomized weight matrices, where for every row two random elements wij (j = 1, 2, . . . , n and j 6= i) are assigned 1, others 0, then row-normalized.

Table 2.2, although the RMSEs are improved because of the longer period of T. The results for submodels yield similar performance, which are shown in 2.A.4.

2.5 Empirical example

As an applied illustration, we examined state GDP and power usage growth based on panel data for 48 conterminous United States during 1998-2018. Mahalingam and Orman (2018) uses this data set1 _{to examine the Granger causality between}

1_{The original study uses panel data from 1978 to 2014, but the authors mention that the}

(39)

TRUE Mean sd Bias RMSE seˆ seHˆ λ11 0.3 0.2939 0.0488 -0.0061 0.0490 0.0365 0.0406 λ21 0.1 0.1011 0.0425 0.0011 0.0423 0.0438 0.0404 λ31 0.1 0.0950 0.1259 -0.0050 0.1253 0.1130 0.1209 ρ1 0.3 0.2991 0.0192 -0.0009 0.0191 0.0171 0.0182 β1 7 7.0064 0.1464 0.0064 0.1458 0.1188 0.1321 λ12 0.1 0.1049 0.0378 0.0049 0.0380 0.0376 0.0394 λ22 0.2 0.1961 0.0475 -0.0039 0.0474 0.0387 0.0388 λ32 0.3 0.2983 0.1265 -0.0017 0.1259 0.1076 0.1189 ρ2 0.1 0.1001 0.0437 0.0001 0.0435 0.0351 0.0373 β2 3 3.0002 0.1408 0.0002 0.1401 0.1243 0.1299 σ2 1 0.9670 0.0710 -0.0330 0.0780 0.0682 0.0727

Table 2.7: The M-estimator in Case 7: n = 20, T = 20 , standard normal errors and the identical first contiguity weights over 5 × 4 grid.

GDP and power consumption. They concluded that the causality exists but varies among the states, suggesting spatial dependencies of the parameters on states and possible applications of threshold SDPD models. We obtain the Real State Gross Domestic Product (GDP, in millions) data from the Bureau of Economic Analysis (BEA) and total energy consumption from all sources (POW, in billions of BTUs) from the U.S. Energy Information Administration. Moreover, we transform these data into the percentage growth with first difference of logarithm. Table 2.8 reports the basic statistics of the data set.

Min. 1st Qu. Median Mean 3rd Qu. Max. GDPg -0.0920 0.0072 0.0208 0.0203 0.0340 0.2023

P OWg -0.1758 -0.0153 0.0068 0.0046 0.0270 0.1296

Table 2.8: Descriptive statistics of U.S state-level GDP and power consumption growth, 1998-2018.

We fit the threshold SDPD model with Zi, the threshold variables, given

by state-level income per capita in 1998 (obtained from BEA). Dividing R into two regions, R1 and R2, at the median of z1, . . . , zn, we fit the threshold SDPD

model given by, for zi∈ Rq, q = 1, 2, the following:

GDPg,ti = αt+ P OWg,tiβq+ ρqGDPg,t−1,i+ λ1q N X j=1 wijGDPg,tj + λ2q N X j=1 wijGDPg,t−1,i+ uti, q = 1, 2, uti= λ3q N X j=1 wijutj+ εti, q = 1, 2, εti∼ N (0, σ2). (2.21)

where GDPg and P OWg are the percentage growth of GDP and POW,

(40)

27

tively, αtis the fixed effect for time period t, wij is the elements of the spatial

weight matrix defined by first-order contiguity of the states. See Figure 2.1 for the two groups of states separated by the threshold.

With Theorem 2 and 3, we can conduct Wald test to see whether there are significant differences in estimated parameters between two groups. For parameter ˆτ , ˆτ = ˆλ1, ˆλ2, ˆλ3, ˆρ, ˆβ, we have the null hypothesis H0: ˆτ1= ˆτ2 and

the alternative hypothesis H1 : ˆτ1 6= ˆτ2. Under the null hypothesis, ˆτ1− ˆτ2

follows N (0, V ar( ˆτ1) + V ar( ˆτ2) − 2Cov( ˆτ1, ˆτ2)).

FULL SL SE SLTL λ1 0.1212 0.3582 0.3588 (0.0614) (0.0487) (0.0494) λ2 0.0046 0.0141 (0.0738) (0.0664) λ3 0.2437 0.3580 (0.0584) (0.0509) ρ -0.0672 -0.0653 -0.0662 -0.0684 (0.0370) (0.0352) (0.0370) (0.0375) β 0.2419 0.2420 0.2359 0.2414 (0.0678) (0.0654) (0.0693) (0.0653) σ2 0.0006 0.0006 0.0006 0.0006 (0.0001) (0.0001) (0.0001) (0.0001)

Table 2.9: US state-level GDP and power consumption growth, 1998-2018, non-threshold models FULL SL SE SLT λ11 0.1159 0.3836 0.3780 (0.1049) (0.0598) (0.0636) λ21 -0.0886 -0.0731 (0.0837) (0.0752) λ31 0.3408 0.4322 (0.1365) (0.1134) ρ1 0.0483 0.0062 0.0087 0.0470 (0.0406) (0.0446) (0.0442) (0.0397) β1 0.1966 0.1710 0.1673 0.1803 (0.0813) (0.0794) (0.0804) (0.0766) λ12 0.1753 0.3348 0.3431 (0.1214) (0.0425) (0.0657) λ22 0.0860 0.0776 (0.0828) (0.0710) λ32 0.0955 0.2800 (0.1666) (0.0969) ρ2 -0.1529 -0.1239 -0.1162 -0.1557 (0.0470) (0.0408) (0.0398) (0.0453) β2 0.2665 0.2879 0.2763 0.2806 (0.0829) (0.0770) (0.0751) (0.0792) σ2 0.0006 0.0006 0.0006 0.0006 (0.0001) (0.0001) (0.0001) (0.0001)

Table 2.10: US state-level GDP and power consumption growth, 1998-2018, threshold models FULL SL SE SLT λ1 0.5732 0.4874 0.6775 λ2 0.0490 0.0533 λ3 0.3243 0.4156 ρ 0.0010 0.0249 0.0153 0.0006 β 0.5001 0.2179 0.1818 0.2995

Table 2.11: US state-level GDP and power consumption growth, 1998-2018, two-sided Wald test, p-values

(41)

Figure 2.1: US state-lev el income p er capita in 1998, b y median

(42)

29

Table 2.9 and 2.10 report the estimation results of the non-threshold (2.1) and threshold SDPD models (2.2) respectively. Table 2.11 presents the p-values of the two-sided Wald test for estimated parameters in the threshold model. The non-threshold result presents the existence of spatial effect in addition to the correlation between power usage and GDP growth revealed by previous studies of Mahalingam and Orman (2018). The results of the threshold models give us further insight, showing significant difference in spatio-temporal lag and dynamic parameters between the two groups of regions, with stronger spatio-temporal correlation for regions with higher income and significant negative temporal correlation for lower income regions. Meanwhile, we can see that power consumption growth tends to have a larger impact on GDP growth in regions with lower income, although the difference is not statistically significant.

2.6 Conclusion

We introduced a threshold extension of SDPD model with fixed effects to account for spatial dependencies of parameters often observed in several empirical studies. Adapting the M-estimation for SDPD models of Yang (2018) to the threshold extension, we proposed the M-estimation to correct the bias of CQMLE in cases of short time panels. The simulation experiments reveal that the M-estimation has less bias with the standard error robust against non-normality of error terms. The empirical application to U.S GDP and power consumption successfully identifies the spatial dependencies of the parameters on per-capita income to clarify the relationship between them.

One significant restriction in the paper is that a threshold needs to be known and time-invariant to guarantee the asymptotic properties of the M-estimation. Time-varying threshold that may bring more possibilities is left for future studies.

2.A

Appendix

2.A.1

Proof of Theorem 1

By the equations in (2.13) and (2.15), we have, for q = 1, . . . , Q,

S_c∗(δ) − S∗_c(δ) =              1 ˆ σ2 v,M∆û(δ) 0_Ω−1_(λ 3)W (q) 0 ∆Y−1− 1 σ2 v,ME[∆u(δ) 0_Ω−1_(λ 3)W (q) 0 ∆Y−1, 1 ˆ σ2 v,M∆û(δ) 0_Ω−1_(λ 3)W (q) 1 ∆Y − 1 σ2 v,ME[∆u(δ) 0_Ω−1_(λ 3)W (q) 1 ∆Y ], 1 ˆ σ2 v,M∆û(δ) 0_Ω−1_(λ 3)W (q) 2 ∆Y−1− 1 σ2 v,ME[∆u(δ) 0_Ω−1_(λ 3)W (q) 2 ∆Y−1], 1 2ˆσ2 v,M∆û(δ) 0_(C−1_{⊗ A}(q) 3 (λ3))∆û(δ) −_2σ12 v,ME[∆u(δ) 0_(C−1_{⊗ A}(q) 3 (λ3))∆û(δ)].

Under Assumption 7, the consistency of ˆδM follows from, for q = 1, . . . , Q,

(43)

Ω−1(λ3)W2(q)∆Y−1−E[∆u(δ)0Ω−1(λ3)W2(q)∆Y−1]| = op(1),

(f) supδ∈∆ 1

n(T −1)|∆ˆu(δ) 0

(C−1⊗A(q)3 (λ3))∆ˆu(δ)−E[∆u(δ)0(C−1⊗A(q)3 (λ3))∆ˆu(δ)]| =

op(1).

With the following notations:

P = Ω−1/2∆X(∆X0Ω−1∆X)−1∆X0Ω−1/2, M = In− P,

˜

B1= Ω−1/2B1, ˜B2= Ω−1/2B2,

we have the identities:

Ω−1/2∆ˆu(δ) = M ( ˜B1∆Y − ˜B2∆Y−1), (2.22)

Ω−1/2∆u(δ) = M ( ˜B1E∆Y − ˜B2E∆Y−1) + ˜B1(∆Y − E∆Y ) − ˜B2(∆Y−1− E∆Y−1) (2.23)

= M ( ˜B1∆Y − ˜B2∆Y−1) + Pn ˜B1(∆Y − E∆Y ) − ˜B2(∆Y−1− E∆Y−1)

o

. (2.24)

Proof of (a). From (2.23), we have

σ2_v,M = 1 n(T − 1)tr

h

V ar( ˜B1∆Y − ˜B2∆Y−1)

i

+ 1

n(T − 1)( ˜B1E∆Y − ˜B2E∆Y−1)

0_{M ( ˜}_B

1E∆Y − ˜B2E∆Y−1).

The second term is non-negative uniformly in δ ∈ ∆, since M is non-negative definite. The first term is evaluated as

1 n(T − 1)trΩ −1_{V ar(B} 1∆Y − B2∆Y−1) ≥ 1 n(T − 1)γmin(C −1_)γ

min(B03B3)tr [V ar(B1∆Y − B2∆Y−1] > c > 0,

uniformly in δ ∈ ∆, by Assumption 5. Proof of (b). By (2.22), we have ˆ σ_v,M2 (δ) = 1 n(T − 1)( ˜B1∆Y − ˜B2∆Y−1) 0_{M ( ˜}_B 1∆Y − ˜B2∆Y−1).

It follows from (2.24) that

ˆ

東北大学機関リポジトリTOUR

models in data science perspective

著者

WU JUNYUE

学位授与機関

Tohoku University

学位授与番号

11301

Tohoku University

Doctoral Thesis

Three developments on spatial

econometric models in data

science perspective

Author:

Junyue Wu

Supervisor:

Yasumasa Matsuda

A thesis submitted in fulfillment of the requirements

for the degree of Doctor of Philosophy

in the

Graduate School of Economics and Management

Abstract

Acknowledgment

Contents

List of Tables

List of Figures

Chapter 1

Introduction

Bibliography

Chapter 2

A threshold extension of

spatial dynamic panel

model with fixed effects

2.1

Introduction

2.2

A threshold extension of SDPD models

2.3

Estimation

2.3.1

M-estimation

2.3.2

Asymptotic properties

2.3.3

The robust estimation for the asymptotic variance

matrix

2.4

Simulation

2.5

Empirical example

2.6

Conclusion

2.A

Appendix

2.A.1

Proof of Theorem 1