• 検索結果がありません。

多重混合型漸近推測における正則化推定

N/A
N/A
Protected

Academic year: 2022

シェア "多重混合型漸近推測における正則化推定"

Copied!
61
0
0

読み込み中.... (全文を見る)

全文

(1)

九州大学学術情報リポジトリ

Kyushu University Institutional Repository

多重混合型漸近推測における正則化推定

清水, 優祐

https://doi.org/10.15017/1806828

出版情報:Kyushu University, 2016, 博士(数理学), 課程博士 バージョン:

権利関係:Fulltext available.

(2)

Regularized estimation under multiple and mixed-rates

asymptotics

Doctoral dissertation February 14, 2017

Yusuke Shimizu

Graduate School of Mathematics, Kyushu University.

744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan.

(3)

Abstract

This doctoral dissertation is based on two original papers [22] and [28]. InM- estimation under standard asymptotics, the weak convergence combined with a large deviation estimate of the associated statistical random field provides us with a general tool for deriving not only the asymptotic distribution of the associated M-estimator, but also the convergence of its moments, where the latter plays an important role in theoretical statistics. Here the standard asymptotics refers to the situation where the statistical random field can be well investigated by a single matrix norming, which, however, may be impossible in several situations including sparsely regularizedM-estimation.

Through this thesis, we consider the uniform tail-probability estimate of a class of scaledM-estimators under multiple and mixed-rates asymptotics in the sense of [26], where the associated statistical random fields may be non- differentiable and may fail to be partially locally asymptotically quadratic so that the conventional approach through the polynomial type large deviation inequality (PLDI) developed by [36] does not work directly. To my best knowledge, there is no results deriving the PLDI under multiple and mixed- rates asymptotics.

In particular, our results are applied to regularized estimation of the ergodic diffusion process observed at high frequency. The model is described by the Wiener-driven stochastic differential equation

dXt=a(Xt, α)dwt+b(Xt, β)dt,

and for the qualitatively different parameters α and β we can estimate the former more quickly than the latter, hence the estimator we will deal with is of multiple-scaling type. In the literature, [8] studied an adaptive-lasso type regularized estimation of the same model, with taking the quadratic approximation of the quasi-likelihood function into account: they deduced the oracle property of their estimator. In this study, we will derive the asymptotic behaviors of a regularized estimator of diffusion process under the general regularization term, without resorting to convexity at all.

Further, our results enable us to deduce convergence of moments of a wide range of regularized M-estimators, which especially serves as a critical tool when, for example, analyzing the mean squared prediction error and bias correction for using AIC-type information criteria for tuning-parameter selection in sparse estimation (we refer to, for example, [7], [11], [16], [24], [27], [28], [29], [30], as well as [36]).

(4)

Acknowledgements

First, I would like to express my deepest gratitude to Professor Hiroki Masuda for his valuable guidance, suggestions and encouragement. I am grateful to Professor Ryuei Nishii, Professor Yoshihiko Maesono, Associate Professor Yoshiyuki Ninomiya and Associate Professor Kei Hirose for their valuable comments. I also thank to my friends for their kind support during my studies. At last but not least, my gratitude goes to my family whose heartfelt assistance to my daily life.

Yusuke Shimizu January, 2017

(5)

Contents

1 Introduction 4

2 Setup 7

3 Basic asymptotics 9

3.1 Sparse case . . . 9

3.1.1 Consistency . . . 9

3.1.2 Rates of convergence . . . 11

3.1.3 Sparse consistency . . . 14

3.1.4 Asymptotic non-degenerate distribution . . . 17

3.1.5 Uniform tail-probability estimate . . . 20

3.2 Standard case . . . 30

3.3 Discussion . . . 32

3.3.1 Specific choice of regularization term . . . 32

3.3.2 Prediction-related issues . . . 36

4 Example: regularized estimation of discre- tely observed ergodic diffusion process 39 5 Generalization of the regularization terms 47 5.1 Settings and asymptotics . . . 47 5.2 Regularized least-squares estimation for linear regression model 50

(6)

1 Introduction

Suppose that we observe data Xn, the distribution of which is indexed by a finite-dimensional parameter θ Θ Rp. In order to estimate θ based on Xn, we usually introduce an appropriate (quasi-)likelihood or contrast function Hn : Ω× Θ R, and estimate an optimal parameter value θ0 by any point ˆθn argminHn. For assessing asymptotic performance of ˆθn quantitatively, we look at the statistical random fields

Mn(u;θ0) = Hn0+An0)u)Hn0), (1.1) where An0) denotes the rate matrix such that |An0)| →0 asn → ∞and the components may decrease at different rates; estimation with multiple- rates of convergence has appeared in the literature of, for example, econo- metrics [4]. Throughout this paper, we use the notation|A|2 = tr(AA) for a matrixA withdenoting the transpose. As is well-known, the weak conver- gence of Mn to some M0 over compact sets, the identifiability condition on M0, and the tightness of the scaled estimator ˆun:=An0)1θn−θ0) make the “argmin” functional continuous for Mn: ˆun argminMn −→L argminM0. See e.g., [33, Section 5]. Further, when concerned with moments of ˆun- dependent statistics, such as the mean squared error, more than the weak convergence is required. Then the polynomial type large deviation inequality (PLDI) of [36], which estimates the tail ofLun) in such a way that

sup

r>0

sup

n>0

rLP(|uˆn| ≥r)<∞ (1.2) for a given L > 0, plays an important role: we set ˆu0 argminM0 for a random variable ˆu0. The moment convergence

E[|uˆn|q]→E[|uˆ0|q], q >0 (1.3) holds if there exists a q > q such that supn>0E[|uˆn|q] <∞. Let us assume that the PLDI (1.2) holds for some L > q. Then we obtain

sup

n>0

E[|uˆn|q] = sup

n>0

0

P(|uˆn|q > s)ds <∞.

It has been known that the PLDI can be proved under modest conditions when Mn admit a locally asymptotically quadratic (LAQ) structure, which is satisfied for many situations including asymptotically mixed-normal type models under multi-scaling. Here, in the multi-scaling case where the random

(7)

vector ˆθn converges at different rates, the LAQ structure at “first” step takes the form1

Mn(u, τ;θ0) = ∆n(τ;θ0)[u] + 1

0(τ;θ0)[u, u] +rn(u, τ;θ0), (1.4) where we are required to verify, among others, the following conditions which are to hold uniformly in “the second- and the subsequent-step” parameter τ, which is regarded as a nuisance parameter in the first step: sufficient in- tegrability of the random linear form ∆n(τ;θ0); the non-degeneracy of the possibly random bilinear form Γ0(τ;θ0); and a kind of “non-explosiveness” of the scaled remainder term (1+|u|2)1rn(u, τ;θ0), where theu-pointwise limit of rn(u, τ;θ0), whenever exists, typically equals zero. For notational conve- nience, here and in the sequel we writeA[b1, . . . , bm] =∑

i1,...,imAi1...imb1i1. . . bmim for multilinear forms A ={Ai1...im}i1,...,im and bj ={bjik}ik; sometimes bj themselves may be tensors, hence the resulting form is also a multilin- ear form, e.g. A[B, C] = {

i,jAijBikCjl}k,l for A = {Aij}, B = {Bik}, and C = {Cjl}. See [36, Section 5] for the detailed account of the above- mentioned multistep procedures. In many standard statistical models, the form (1.1) is enough to find the asymptotic distribution of all the compo- nents of ˆθn; this case may be called the standard asymptotics, to be briefly discussed in Section 3.2.

In principle, anyM-estimation procedure, typically producing an asymp- totically mixed-normally distributed estimator, may have its “regularized”

counterpart; we refer to [5] for some general backgrounds of statistical regu- larization. We are concerned here with extending the random-field structure to deal with possibly dependent data and a broader class of regularized M- estimation under the “mixed-rates” asymptotics. In particular, we will show how the PLDI of [36] can carry over to the mixed-rates M-estimation where the target statistical random fields may have components converging at dif- ferent rates; we refer to [22] and [28] for details in case of linear regression with general regularization term. We will adopt the very general theoreti- cal framework developed by [26, Sections 2 and 3]. It will be shown that the PLDI criterion of [36] can apply to some mixed-rates cases while it may require some modification when the key LAQ structure of the original statis- tical random field fails to hold; it may even happen thatrn(u, τ;θ0) diverges in probability. Indeed, most of the existing sparse estimation procedures may fall into this type of asymptotics. Consequently, with a true parameter being fixed, our moment-convergence result provides yet another theoretical insight about the regularized estimation, the well-established methodology especially

1The sign in front of the quadratic term (1/2)Γ0(τ;θ0)[u, u] is different from the original LAQ of [36] since we consider minimization of (1.1).

(8)

in variable and/or model selection.2 The logic of the sparse and more gen- erally shrinkage estimation would be best and most clearly described by the context of multiple linear regression, with many deep theoretical interpre- tation such as geometrical (projection) characterization, variable selection, stabilized prediction performance, etc. See e.g. [14, Chapter 3].

There exist a lot of previous works on moment convergence of estima- tors. It serves as a fundamental tool when analyzing asymptotic behavior of the expectations of statistics depending on the estimator such as asymp- totic bias and mean squared prediction error; to mention just a few, we refer to [7], [11], [16], [24], [27], [28], [29], [30], as well as [36]. Also, the con- vergence of moments of regularized sparse maximum-likelihood estimator of generalized linear model was deduced in [32] to verify the AIC type variable- selection. Further, [1] recently discussed optimal selection of random and k-fold cross-validation estimators, the theoretical backbone of which involves some moment bounds of the estimators used; the related paper [2] studied the uniform integrability of the ordinary least-squares estimator in the linear regression setting.

This thesis is organized as follows. Section 2 describes our model setup, for which a series of basic asymptotic statements are given in Section 3, where, in particular, the polynomial type large deviation estimate of the underlying statistical random fields will play a crucial role for the uniform tail-probability estimate concerning the scaled M-estimator; although the asymptotics is classical, in the literature there seems to exist no unified tools that can handle general M-estimation of multiple-rates and possibly mixed- rates type, and importantly, of possibly non-differentiable and non-convex type; in Section 3.3.1, we will briefly discuss a naive yet formal example of component-wise tuning-parameter choice. The shrinkage effect is still useful for dependent-data models: it can effectively diminish non-significant factor involved in the model, resulting in a model-complexity assessment and/or selection. In Section 4 we will apply the foregoing results to regularized estimation of an ergodic diffusion process observed at high frequency. The model is described by the Wiener-driven stochastic differential equation

dXt=a(Xt, α)dwt+b(Xt, β)dt,

and for the qualitatively different parameters α and β we can estimate the former more quickly than the latter, hence the estimator we will deal

2It should be noted that the sparse estimation has received mixed reception from a kind of estimation singularity similar to that of the classical Hodge’s super efficient estimator.

The unpleasant feature of the sparse-type estimator essentially stems from non-uniformity in weak convergence with respect to the true value of parameters, see [24] for details.

(9)

with is of multiple-scaling type. In the literature, [8] previously studied an adaptive-lasso type regularized estimation of the same model, with taking the quadratic approximation of the quasi-likelihood function into account:

they deduced the oracle property of their estimator. In this study, we will derive the asymptotic behaviors of a regularized estimator of diffusion pro- cess under the general regularization term, without resorting to convexity at all. In Section 5, we generalize the regularization terms, and give some ex- amples through considering the regularized least-squares estimator for linear regression model. The moment convergence of the estimator can be derived from our results.

2 Setup

Let us begin with description of the basic model setup for Section3. Through- out we are given an underlying probability space (Ω,F, P). For the purpose of accelerating estimation performance, we consider M-estimation of an ad- ditive regularization type. We will focus on the case of two-scaling, where the target statistical parameter θ Θ is divided into two parts, say θ = (α, β);

an extension to cases of more-than-two scaling is a trivial matter while mak- ing notation messy. We set α Rp and β Rq, and Θ = Θα×Θβ to be a bounded convex domain in Rp+q.

We are given a function Mn : Ω×Θ R, and regularization (possibly random) functions Ran(α) and Rbn(β). We then consider contrast functions Hn : Ω×ΘR of the form

Hn(θ) = Hn(α, β) =Mn(α, β) +Ran(α) +Rbn(β). (2.1) The associated regularized M-estimator is defined to be any element (for brevity, implicitly assumed to exist)

θˆnargmin

θΘ

Hn(θ).

We quantitatively distinguish zero parameters from non-zero ones. We de- note by θ0 = (α0, β0) the value we want to estimate (typically the true value of θ) and assume that it takes the form α0 = (α0, α0) = ((α0,k)k,0,k′′)k′′) and β0 = (β0, β0) = ((β0,l )l,0,l ′′)l′′) with

α0,k = 0, β0,l = 0, α0,k′′ ̸= 0, β0,l′′ ̸= 0.

We set α0 Rp, β0 Rq, α0 Rp and β0 Rq with p, q, p, q N; then, p = p +p and q = q +q. Correspondingly, we write θ = (θ, θ)

(10)

with θ = (α, β) and θ = (α, β) in the obvious manner. We also write θˆn = ( ˆαnˆn) = ( ˆαnˆnˆnˆn) with ˆθn = ( ˆαnˆn) and ˆθn = ( ˆαnˆn). For clarity we focus on the following regularization terms:

Ran(α) =

p k=1

λan,kRak), Rbn(β) =

q l=1

λbn,lRbl). (2.2) This form subsumes many of the existing types, e.g., [12] and [37] for linear regression model, although not essential for our basic asymptotic results given in Sections 3.1 and 3.2. In fact, we will generalize the regularization terms in Section 5. For convenience of reference in the regularity conditions given later, we write:

Ran(α) = Ran) +Ran) =

p

k=1

λan,kRak) +

p

k′′=1

λan,k′′Rak′′), (2.3)

Rbn(β) =Rbn) +Rbn) =

q

l=1

λbn,lRbl) +

q

l′′=1

λbn,l′′Rbl′′), (2.4) where:

λan,k , λan,k ′′,λbn,l and λbn,l′′ are non-negative random variables;

Ra(·) and Rb(·) are non-random non-negative functions onRsuch that Ra(0) =Rb(0) = 0;

For all a0, b0 ̸= 0 andk > 0, there exists a constantC =C(a0, b0, k)>

0 such that sup

(a,b):|a|∨|b|≤k

|Ra(a)−Ra(a0)|+|Rb(b)−Rb(b0)|

|a−a0|+|b−b0| ≤C. (2.5) The last condition (local Lipschitz continuity) is a technical one. Further conditions on the ingredient of Mn, Ran(α) and Rbn(β) will be imposed later on; in Section3.3.1, we will briefly discuss about how to set the regularization terms in naive yet specific ways.

We will deal with a situation where the non-zero part of the first compo- nent α can be estimated faster than that of the second component β; more specifically, we will suppose that the sequence

(

sn1( ˆαn−α0), tn1( ˆβn−β0) )

(11)

has a non-trivial asymptotic distribution for some possibly different positive sequence (sn) and (tn), both tending to zero and satisfying that sn =o(tn).

Although not explicitly mentioned, we presuppose that the “principal” part Mn(θ) reasonably makes sense even without regularization terms Ran(α) + Rbn(β); most typically, the un-regularized case, where Hn(θ) = Mn(θ), cor- responds to a negative of a (quasi) log-likelihood. We should note that the additive regularization can be interpreted as incorporating a prior informa- tion about the parameter of interest; see Section 3.3.1.

3 Basic asymptotics

Under the setting described in Section 2, Section 3.1 focuses on the sparse asymptotics, where the underlying statistical random field is of mixed-rates type. On the other hand, it is possible to treat the standard asymptotics as well, where the localization via matrix (1.1) can completely determine the asymptotic distribution of ˆθn; we consider this standard case in Section 3.2.

3.1 Sparse case

In this section, we consider regularity conditions under which the following properties hold without assuming the convexity of Hn.

(1) The (weak) consistency of ˆθn = (ˆθnˆn).

(2) The asymptotic distributions:

(a) The sparse consistency of ˆθn, i.e. Pθn= 0)1;

(b) The asymptotic distribution of ˆθn at possibly multiple rates of convergence (via a matrix norming).

(3) The uniform tail-probability estimate of ˆθn= (ˆθnˆn).

3.1.1 Consistency

We impose the uniform law of large numbers plus identifiability condition, and additionally some stochastic-order conditions on the regularization terms.

Recall that the parameter space Θ is bounded.

Assumption 3.1

(1) (sn)and(tn)are positive nonrandom sequences such thatmax(sn, tn) 0 and that sn =o(tn).

(12)

(2) There exist continuous random functions Ma0 : Ω×Θα R and Mb0 : Ω×ΘR such that:

(a) sup

α

s2n{Mn(α, β0)−Mn0, β0)} −Ma0(α) + sup

θ

t2n{Mn(α, β)−Mn(α, β0)} −Mb0(θ)−→p 0;

(b) argmin

α

Ma0(α) =0} a.s. and argmin

β

Mb00, β) = 0} a.s.

(3) sup

α

s2nRan(α)+ sup

β

t2nRbn(β)−→p 0.

We will take advantage of the general results given in [26] concerning mixed-rates asymptotics.

Lemma 3.2 Assume that a random function

Mn(u, v) = ¯anfn(u) + ¯bngn(u, v), (u, v)Rp×Rq,

and random variablesun,vˆn) andu0,vˆ0) satisfy the following conditions:

(L1) ¯an and ¯bn are positive numbers such that ¯bn =o(¯an);

(L2) (fn(·), gn(·,·))L (f0(·), g0(·,·))in C(Kf×Kg)for every compact Kf× Kg Rp×Rq;

(L3) Mnun,ˆvn)inf(u,v)Mn(u, v) +opbn);

(L4)un,vˆn) = Op(1);

(L5) u7→f0(u) has a.s. unique minimum at u= ˆu0; (L6) v 7→g0u0, v) has a.s. unique minimum at v = ˆv0. Then we have the following.

(1)un,vˆn)Lu0,ˆv0) under the conditions (L1) to (L6).

(2) uˆn L uˆ0 under the conditions (L1) to (L5).

Proof The first claim is just a simplified version of [26, Theorem 1]. The second one follows on applying the usual argmax theorem to the rescaled function u 7→ ¯an1Mn(u,vˆn) = fn(u) + (¯bn/¯an)gn(u,vˆn), which admits an approximate minimizer ˆun and weakly converges to f0 in C(Kf) for every

compact Kf Rp. □

(13)

Remark 3.3 Although we do not use the second claim of Theorem 3.2 in this study, it may be useful when considering stepwise estimation where there is an original contrast (or quasi-likelihood) function but some step-by-step strategy is taken for estimating parameter components that can be estimated more quickly than the others. See [31] for an ergodic diffusion model. □ To apply the first claim of Lemma3.2under Assumption3.1with (u, v) = (α, β), we set Mn(α, β) = Hn(α, β)Hn0, β0), ¯an=sn2 and ¯bn=tn2:

fn(α) = s2n(

Mn(α, β0)−Mn0, β0) +Ran(α)−Ran0)) , gn(α, β) = t2n

(

Mn(α, β)−Mn(α, β0) +Rbn(β)−Rbn0) )

.

According to the a.s. continuity of the random functions α 7→ Ma0(α) and θ 7→Mb0(θ), it is straightforward to verify all the conditions in Theorem 3.2 under Assumption 3.1, hence the following claim:

Theorem 3.4 We have θˆn−→p θ0 under Assumption 3.1.

3.1.2 Rates of convergence Next we prove (ˆun,ˆvn) = Op(1) where

ˆ

un:=sn1( ˆαn−α0), vˆn:=tn1( ˆβn−β0) (3.1) under additional conditions. To this end we introduce the following general result, which is a simplified version of [26, Lemma 1], to deduce “correct”

convergence rate of ˆθn and a “preliminary” convergence rate of ˆθn. Moreover, the latter preliminary rate may be used also to verify the conditions for the sparse consistency of ˆθn (Section 3.1.3). Let [a]+ := max(a,0) for a R. Lemma 3.5 Let ξ denote either α or β, and assume that the real-valued random function Hn(ξ) satisfies the following conditions:

(1) Hn( ˆξn)Hn0) a.s.

(2) There exist random functions Un(ξ) and Vn(ξ) such that Hn(ξ)Hn0) =Un(ξ)−Vn(ξ)

where, for some random variable U0 >0 a.s., constants0≤γ < ρ, and positive nonrandom sequence (kn) such that kn0, we have:

(a) P (

Un( ˆξn)≥ |ξˆn−ξ0|ρU0

)1;

(14)

(b) [Vn( ˆξn)]+ =Op(knˆn−ξ0|γ).

Then kn1/(ργ)( ˆξn−ξ0) = Op(1).

To deduce Lemma3.5, observe that on the set{|ξˆn−ξ0|ρ≤U01Un( ˆξn)}we haveˆn−ξ0|ρ≤U01{Vn( ˆξn)+Hn( ˆξn)−Hn0)} ≤U01Vn( ˆξn)≤U01[Vn( ˆξn)]+

= Lnknˆn−ξ0|γ, where Ln = Op(1). By applying [26, Lemma 3], for any δ >0 we can find a tight sequence (Nn) inRfor whichLnknˆn−ξ0|γ ≤δ|ξˆn ξ0|ρ+Nnknρ/(ργ). Picking a δ < 1, we conclude that ˆn−ξ0| ≤ kn1/(ργ)Nn, hence the assertion.

In what follows, for any square matrixAwe writeA2 =AA and denote byλmin(A) the smallest eigenvalue of A. Coming back to our model, we next impose:

Assumption 3.6

(1) Mn ∈ C3(Θ) a.s., and it holds that:

(a) sup

β |snαMn0, β)|+|tnβMn0)|=Op(1);

(b) sup

α |sntnαβMn(α, β0)|=Op(1);

(c) sup

θ

s2nζα2Mn(θ)+ sup

θ

t2nζβ2Mn(θ)=Op(1) for ζ =α, β;

(d) There exist symmetric random functions Γα0 : Ω×Θα RpRp and Γβ0 : Ω×ΘRqRq such that

s2nα2Mn0)Γα00)+t2nβ2Mn0)Γβ00)−→p 0, with λmin

α00))

∧λmin

β00))

>0 a.s.

(2) snλan,k′′ =Op(1) and tnλbn,l′′ =Op(1) for each k′′ and l′′.

Remark 3.7 Note that Assumption 3.6.2 is only concerned with the non- zero parameter parts, which of course are unknown a priori in practice; such a situation will appear a few times later. Hence, as is well recognized as a common situation in adaptive type sparse estimation, some appropriate data- driven choices of the weights are desirable. There would be many possibilities for this issue. See Section 3.3.1 for more discussions. □ Let Assumptions 3.1 and 3.6 hold; then, ˆθn −→p θ0 by Theorem 3.4. We apply Lemma 3.5 separately for proving ˆun = Op(1) and ˆvn = Op(1). To show ˆun=Op(1), set

Hn(α) = s2nHn(α,βˆn).

(15)

For notational convenience, for a random functionFn(θ) we will writeFn(θ) = Op(1) and Fn(θ) = op(1) if supθ|Fn(θ)| = Op(1) and supθ|Fn(θ)| = op(1), respectively. The first condition in Lemma 3.5 is trivial. To deduce the second one, making use of a third-order Taylor expansion we deriveHn( ˆαn) Hn0) =Un( ˆαn)−Vn( ˆαn) for

Un( ˆαn) := 1

2s2nα2Mn( ˜αnˆn)[

( ˆαn−α0)2] +s2n

p

k=1

λan,kRa( ˆαn,k)

= 1 2

(

s2nα2Mn0ˆn) +s2nα3Mn( ˇαnˆn)[ ˜αn−α0]) [

( ˆαn−α0)2] +s2n

p

k=1

λan,kRa( ˆαn,k)

= 1 2

{

Γα00) +Op(ˆn−α0| ∨ |βˆn−β0|)} [

( ˆαn−α0)⊗2] +s2n

p

k=1

λan,kRa( ˆαn,k)

= 1 2

α00) +op(1)} [

( ˆαn−α0)2] +s2n

p

k=1

λan,k Ra( ˆαn,k ), (3.2)

Vn( ˆαn) :=−s2nαMn0ˆn)[ ˆαn−α0]−s2n

p

k′′=1

λan,k′′

(Ra( ˆαn,k′′)−Ra0,k′′)) , where the points ˜αn and ˇαn are located on the segments connecting α0 and

ˆ

αn, and α0 and ˜αn, respectively. Note that the non-negativity of Ra enables us to ignore the second term of the right-hand side of (3.2) when estimating Un( ˆαn) from below. Also, under the local Lipschitz continuity (2.5) of Ra, we see that the conditions of Lemma 3.5 are satisfied with γ = 1, ρ = 2, (e.g.) U0 =λminα00))/4, and kn =sn: we have Un( ˆαn) (1/2){op(1) + λminα00))}|αˆn−α0|2 and |Vn( ˆαn)| ≤ Op(snˆn−α0|). Hence ˆun =Op(1) is proved. To deduce ˆvn =Op(1), we can follow the same way as above along with

Hn(β) = t2n{Hn( ˆαn, β)−Hn( ˆαn, β0)} in place of Hn(α) = s2nHn(α,βˆn).

Theorem 3.8 We haveun,ˆvn) =Op(1) under Assumptions 3.1 and 3.6.

(16)

3.1.3 Sparse consistency

The sparse consistency of ˆθn refers to the property Pθn = 0) 1; the asymptotic distribution of ˆθn then degenerates at the origin, with arbitrarily fast rate of convergence, i.e. Rnθˆn = op(1) for any Rn → ∞. The next general result is a variant of [26, Theorem 2], which is a tailor-made tool to establish the property.

Lemma 3.9 Let ξ denote either α or β (so ξ = (ξ, ξ) and ξ0 = (0, ξ0)), and assume that the real-valued random function Hn(ξ) = Hn, ξ) satisfies the following conditions.

(1) Hn( ˆξnˆn)Hn(0,ξˆn) a.s.

(2) There exist random functions Un(ξ) and Vn(ξ) such that Hn(ξ)Hn(0, ξ) = Un(ξ)−Vn(ξ),

where it holds that for some random variable U0 >0a.s. and constants ρ >0,

P [ {

[Vn( ˆξn)]+ = 0 }{

Un( ˆξn)≥ |ξˆn|ρU0

} ]1.

Then P( ˆξn = 0)1.

Lemma3.9follows on observing the following: on the eventAn :={[Vn( ˆξn)]+

= 0} ∩ {Un( ˆξnˆn) ≥ |ξˆn|ρU0}, we have ˆn|ρ U01{Vn( ˆξn) + Hn( ˆξn) Hn(0,ξˆn)} ≤U01[Vn( ˆξn)]+ = 0. HenceP(ˆn|= 0) can be bounded below by P(An)1.

Remark 3.10 To conclude P( ˆξn = 0)1 we may replace the second con- dition of Lemma 3.9 by

P [ {

[Vn( ˆξn)]+ = 0 }{

Un( ˆξn)≥ |ξˆn,m |ρU0

} ] 1

for each m running through{1, . . . , p} or{1, . . . , q}according as ξ =α or β; for each m, the same proof as above leads toP( ˆξn,m = 0)1. □ Let us go back to our main context. We keep imposing Assumptions 3.1 and 3.6. Denoting by Γα0(α) (resp. Γβ0(θ)) the upper left p ×p part of

参照

関連したドキュメント