5.5 Proofs
5.5.3 Proof of Theorem 5.1.13
The PLDI for ergodic diffusion model with h = h0 (τ = 1) being known has been derived [71, Section 6]. However, the same scenario would not go through in our proof without additional considerations because the random function ˜Hn(θ) is different from the original GQLFHn(θ;h).
gq-exponential ergodicity. Let {Pt(x, dy)}t∈R+ denote the family of the transition functions of X. Given a function ρ : Rd → R+ and a signed measure m on the d-dimensional Borel space, we define
∥m∥ρ = sup{ ∫
f(y)m(dy)
: f is R-valued and measurable, such that|f| ≤ρ }
. Lemma 5.5.1. Under Assumption 5.1.12, the following statements hold.
(1) There exist a probability measure π0 and a nonnegative C2 function g such that lim sup
|x|→∞
1 +|x|q g(x) = 0 for every q >0, and that
∥Pt(x,·)−π0(·)∥gq ≲e−atgq(x), x∈Rd (5.5.29) for some constant a >0.
(2) suptE(|Xt|q)<∞ for every q >0.
Proof. (1) In view of the general results developed in [54, Section 6], under Assumption 5.1.12 it suffices for (5.5.29) to show the following: (i) every compact sets in Rd are petite for the Markov chain (Xtj)nj=0 for anyh >0 small enough; (ii) the drift condition
Aθg(x)≤ −c′1g(x) +c′2, x∈Rd,
holds for some c′1, c′2 > 0 and a nonnegative C2 function g such that g(x) → ∞ as
|x| → ∞ faster than any polynomial, where Aθ denotes the infinitesimal generator of X: with writing b= (bk) and S= (Skl),
Aθg(x) :=∑
k
bk(x, θ)∂xkg(x) + 1 2
∑
k
∑
l
Skl(x, α)∂xk∂xlg(x). (5.5.30) Indeed, both conditions follow from [30, Propositions 1.1, 1.2 and 5.1]: under As-sumption 5.1.12, X admits a transition density which is positive for every (x, y, h) ∈ Rd×Rd×(0,1], ensuring the topological condition (i); the drift condition (ii) can be derived forg equaling exp(c′′|x|2) or exp(c′′|x|) for somec′′ >0 outside a neighborhood of the origin in case of Assumption5.1.12(2)(a) or (b), respectively.
(2) follows from a standard application of the drift condition. See [30, Propositions 1.1(1) and 5.1(1)].
Bounding inverse moment. The next lemma will be used to deduce some moment bounds required later.
Lemma 5.5.2. Under Assumption 5.1.12, for every q >0 we have sup
n>2q/d
E (
sup
α∈Θα
1 nh0
∑n j=1
Sj−−11(α)[
(∆jX)⊗2]
−q)
<∞.
Proof. Write ζj = |h−01/2∆jX|2. First, observe that the expectation can be bounded from above by
( sup
(x,α)
λqmax(S(x, α)) )
E {(1
n
∑n j=1
h−01/2∆jX2 )−q}
≲E {(1
n
∑n j=1
ζj )−q}
. (5.5.31) Under Assumption5.1.12, we have the following modified Aronson type bound with pos-sibly unbounded drift coefficient for the transition density ofX, sayph0(x, y) (P(Xh0 ∈ dy|X0 = x) = ph0(x, y)dy): there exist constants A, B > 1 and B0 ≥ 0 such that for every (x, y, h0)∈Rd×Rd×(0,1],
1
Ahd/20 e−B0h0|x|2exp (
− B|y−x|2 h0
)
≤ph0(x, y)
≤ A hd/20
eB0h0|x|2exp (
− |y−x|2 h0B
)
, (5.5.32) where we can take B0 = 0 especially when the drift b is bounded (see [30, Proposition 1.2] for details). This together with (5.5.32) implies that the conditional distribution of h−01/2∆jX given Xtj−1 =x, which we denote byy 7→p¯h0(y|x), satisfies that
A−1e−B0h0|x|2exp(−B|y|2)≤p¯h0(y|x)≤AeB0h0|x|2exp(−|y|2/B).
It follows that
sup
y
¯
ph0(y|x)≲exp(
h0B0|x|2)
. (5.5.33)
In what follows, the constantB0 ≥0 may change at each appearance, with keeping the rule that B0 = 0 if b is bounded (that is, if the Assumption 5.1.12(2)(b) holds).
Letk ∈N. We will prove that sup
i≥0 P (∑k
j=1
ζi+j ≤ϵ )
≲ϵkd/2, ϵ >0. (5.5.34)
To this end, we make use the argument of the proof of [6, Eq.(2.2)], while our conditions are apparently weaker. Write Pl(·) for the conditional expectation given Ftl. Observe that from (5.5.33) we have a.s.
Pi+k−1(ζi+k ≤ϵ) =
∫
|y|≤√ ϵ
¯
ph0(y|Xti+k−1)dy≲ϵd/2exp(
h0B0|Xti+k−1|2) . This in turn implies that
I(ζi+k−1 ≤ϵ)Pi+k−1(ζi+k≤ϵ)
≲ϵd/2I(ζi+k−1 ≤ϵ)I
(|Xti+k−1| ≤√
ϵh0+|Xti+k−2|) exp(
h0B0|Xti+k−1|2)
≲ϵd/2I(ζi+k−1 ≤ϵ) exp(
h0B0|Xti+k−2|2) .
Iterating the same manner along with taking the conditional expectations successively, we can deduce
P (∑k
j=1
ζi+j ≤ϵ )
≤E (∏k
j=1
I(ζi+j ≤ϵ) )
=E
{(k∏−1 j=1
I(ζi+j ≤ϵ) )
Pi+k−1(ζi+k ≤ϵ) }
≲ϵd/2E
{(k∏−2 j=1
I(ζi+j ≤ϵ) )
I(ζi+k−1 ≤ϵ) exp(
h0B0|Xti+k−2|2) }
≲ϵd/2E
{(k∏−2 j=1
I(ζi+j ≤ϵ) )
exp(
h0B0|Xti+k−2|2)
Pi+k−2(ζi+k−1 ≤ϵ) }
≲(ϵd/2)2E
{(k∏−2 j=1
I(ζi+j ≤ϵ) )
exp(
h0B0|Xti+k−3|2) }
≲· · ·≲ϵ(k−1)d/2E (
I(ζi+1 ≤ϵ) exp(
h0B0|Xti|2) )
≲ϵkd/2sup
t E{ exp(
h0B0|Xt|2)}
≲ϵkd/2,
the last estimate holding for everyh0 small enough; again note that we can takeB0 = 0 whenb is bounded. Thus we have verified the estimate (5.5.34), so that
sup
i≥0 P
{(∑k j=1
ζi+j )−q
≥r }
≲r−kd/(2q), r >0.
Now, lettingk >2q/d we obtain sup
i≥0
E
{(∑k j=1
ζi+j )−q}
≤1 +
∫ ∞
1
r−kd/(2q)dr ≲1. (5.5.35) Having (5.5.35) in hand, we can complete the proof in a similar manner to [26, Lemma A.1]. Let m := [n/k]. Obviously it suffices to consider n = km (m ∈ N). Write λl=∑kl
j=k(l−1)+1ζj; then∑n
j=1ζj =∑m
l=1λl, and (5.5.35) ensures that suplE(λ−l q)≲1.
By the convexity of the mapping s7→s−q (s >0), Jensen’s inequality gives E
{(1 n
∑n j=1
ζj )−q}
=kqE {(1
m
∑m l=1
λl )−q}
≲kq,
which combined with (5.5.31) completes the proof.
Now, in order to show (5.1.15) and (5.1.16), it is enough to check [A1′′], [A4′], and [A6] of [71]; the conditions [B1] and [B2] therein are given in Section 5.5.1 and Theorem 5.2.1, respectively. The claim (5.1.17) is trivial from the (5.1.15), (5.1.16), and the definition of ˜h′. We put ϵ1 =ϵ0/2 in the sequel.
Proof of (5.1.15). We will verify the following conditions: for everyM > 0, E
( sup
β
1
√n∂αH˜n(α0, β) M)
<∞; (5.5.36)
E {(
sup
θ
nϵ1Y˜1n(α, β)−Y˜10(α))M}
<∞; (5.5.37)
E {(1
n sup
θ
∂α3H˜n(θ))M}
<∞; (5.5.38)
E {
sup
β
( nϵ1
− 1
n∂α2H˜n(α0, β)−Γ˜1,0
)M}
<∞. (5.5.39) The conditions (5.5.36) to (5.5.39) imply [A1′′] and [A6]. The left-hand side of (5.5.37) satisfies
E {(
sup
θ
nϵ1Y˜1n(α, β)−Y˜10(α))M}
≲E {(
sup
θ
nϵ1 1
2n
∑n j=1
log|Sj−−11(α)Sj−1(α0)| −1 2
∫
Rd
log|S−1(x, α)S(x, α0)|π(dx) )M} +E
{(
sup
θ
nϵ1
log h(α)/h0
τ d
∫
Rdtr(
S−1(x, α)S(x, α0)) π(dx)
)M} +E
{(
sup
θ
nϵ1 1
n
∑n j=1
Sj−1−1 (α)[∆jX, bj−1(θ)]
)M} +E
{(
sup
θ
nϵ1 1
2h(α) (1
n
∑n j=1
Sj−−11[bj−1(θ)⊗2]) )M}
≲E {(
sup
θ
nϵ1 1
2n
∑n j=1
log|Sj−−11(α)Sj−1(α0)| −1 2
∫
Rd
log|S−1(x, α)S(x, α0)|π(dx) )M} +E
[{
sup
α
nϵ1
h(α)/h0
τ d
∫
Rdtr(
S−1(x, α)S(x, α0)) π(dx)
−1
×
h(α)/h0
τ d
∫
Rdtr(
S−1(x, α)S(x, α0))
π(dx) −1 }M]
+ 1
≲E {(
sup
θ
nϵ1 1
2n
∑n j=1
log|Sj−−11(α)Sj−1(α0)| −1 2
∫
Rd
log|S−1(x, α)S(x, α0)|π(dx) )M} +E
[{
sup
α
nϵ1 h(α)
h0 −1
h(α) h0 − τ
d
∫
Rd
tr(
S−1(x, α)S(x, α0)) π(dx)
}M] + 1
≤E {(
sup
θ
nϵ1 1
2n
∑n j=1
log|Sj−−11(α)Sj−1(α0)| −1 2
∫
Rd
log|S−1(x, α)S(x, α0)|π(dx) )M} +E
( sup
α
h(α) h0
−2M)1/2
×E {(
sup
α
nϵ1 h(α)
h0 − τ d
∫
Rd
tr(
S−1(x, α)S(x, α0)) π(dx)
)−2M}1/2
+ 1,
where in the second step we used (4.5.2), Lemma 5.5.1(2), and H¨older’s inequality for the third term and fourth term. As in [50, Lemma 4.3], Lemma 5.5.1 ensures
E {(
sup
θ
nϵ1 1
2n
∑n j=1
log|Sj−−11(α)Sj−1(α0)| −1 2
∫
Rd
log|S−1(x, α)S(x, α0)|π(dx) )M}
<∞, E
{(
sup
α
nϵ1 h(α)
h0 −τ d
∫
Rd
tr(
S−1(x, α)S(x, α0)) π(dx)
)−2M}
<∞. Further, Lemma 5.5.2 implies
E (
sup
α
h(α) h0
−2M )
<∞. Hence, (5.5.37) is established. In a similar way, we have
E (
sup
β
1
√n∂αH˜n(α0, β) M)
≲1 +E( 1
2√ n
∑n j=1
tr (
Sj−−11(α0)(
∂αSj−1(α0)))
+ 1
h(α0) 1 2√ n
∑n j=1
∂αSj−−11(α0)[
(∆jX)2]
M)
≲1 +E{ h0
h(α0) 1 2√ n
∑n j=1
tr (
Sj−−11(α0)(
∂αSj−1(α0)))(h(α0)
h0 −τ) M}
≤1 +E( h0
h(α0)
3M)1/3
E( 1
2n
∑n j=1
tr (
Sj−1−1(α0)(
∂αSj−1(α0)))
3M)1/3
×E( 1
√nh0d
∑n j=1
Sj−−11(α0)[
(∆jX)⊗2−τ h0Sj−1(α0)]
3M)1/3
<∞, E
{ sup
β
( nϵ1
− 1
n∂α2H˜n(α0, β)−Γ˜1,0 )M}
≲1 +E [{
nϵ1 1
2n
∑n j=1
∂α2(
logSj−1(α0))+ 1 2
(1 n
∑n j=1
∂α2Sj−−11(α0)[
Sj−1(α0)])
− 1 2d
(1 n
∑n j=1
∂αSj−1−1 (α0)[
Sj−1(α0)])⊗2
−Γ˜1,0 }M] +E
[{
nϵ1 1
2 ( 1
nτ h0
∑n j=1
∂α2Sj−−11(α0)[
(∆jX)⊗2])
− 1 2d
( 1 nτ h0
∑n j=1
∂αSj−−11(α0)[
(∆jX)⊗2])⊗2
− d 2∂α2(
logh(α)) }M]
≲1 +E [{
nϵ1 1
2 ( 1
nτ h0
∑n j=1
∂α2Sj−−11(α0)[
(∆jX)⊗2])
− 1 2d
( 1 nτ h0
∑n j=1
∂αSj−−11(α0)[
(∆jX)⊗2])⊗2
− d 2
( 1 nd
∑n j=1
∂α2Sj−−11(α0)[
(∆jX)⊗2]) 1 h(α) +d
2 ( 1
nd
∑n j=1
∂αSj−−11(α0)[
(∆jX)⊗2])⊗2( 1 h(α)
)2 }M]
≤1 +E [{
nϵ1 h0
h(α) ( 1
nh0
∑n j=1
∂α2Sj−−11(α0)[
(∆jX)⊗2])(h(α)
τ h0 −1) }M] +E
[{
nϵ1 (
h0
h(α) )2(
1 nh0
∑n j=1
∂2αSj−−11(α0)[
(∆jX)⊗2])⊗2
×
{(h(α) τ h0 −1
)2
+ 2
(h(α)
τ h0 −1)}
}M]
<∞, E
{(1 nsup
θ
∂α3H˜n(θ))M}
≲1 +E{(
sup
α
∂α3(
logh(α)))M}
≲1 +E [{
sup
α
1 nh0d
∑n j=1
∂α3Sj−−11(α)[
(∆jX)⊗2] h0
h(α) }M] +E
[{
sup
α
1 nh0d
∑n j=1
∂α2Sj−1−1 (α)[
(∆jX)⊗2]
× 1
nh0d
∑n j=1
∂αSj−−11(α)[
(∆jX)⊗2] h0
h(α)
2}M] +E
[{
sup
α
1 nh0d
∑n j=1
∂αSj−−11(α)[
(∆jX)⊗2]
3 h0
h(α)
3}M]
<∞.
The proofs of the conditions (5.5.36), (5.5.38), and (5.5.39) are complete. The tuning-parameter condition [A4′] can be verified exactly in the same way as in [71, Section 6].
We thus obtain (5.1.15).
Proof of (5.1.16). We will prove the following conditions: for every M >0, sup
n E( 1
√nτ h0∂βH˜n(θ0) M)
<∞; (5.5.40)
sup
n E {(
sup
β
(nτ h0)ϵ1Y˜2n(β;α0)−Y˜20(β))M}
<∞; (5.5.41) sup
n E
{( 1 nτ h0
sup
β
∂β3H˜n(α0, β))M}
<∞; (5.5.42)
sup
n E {(
(nτ h0)ϵ1 − 1
nτ h0∂β2H˜n(θ0)−Γ˜2,0
)M}
<∞. (5.5.43) We have
1
√nτ h0∂βH˜n(θ0)
≲
√ 1 nτ h0
∑n j=1
Sj−−11(α0)[
∆jX−τ h0bj−1(θ0), ∂βbj−1(θ0)] +
√nh0 τ
1 nh0d
∑n j=1
Sj−−11(α0)[
(∆jX)⊗2−Ej−1{(∆jX)⊗2}]
× 1 n
∑n j=1
Sj−1−1(α0)[
bj−1(θ0), ∂βbj−1(θ0)]
+
√nh0 τ
1 nh0d
∑n j=1
Sj−−11(α0)[
Ej−1{(∆jX)⊗2} −τ h0Sj−1(α0)]
× 1 n
∑n j=1
Sj−−11(α0)[
bj−1(θ0), ∂βbj−1(θ0)] .
We can deduce (5.5.40) from using Lemma 8(b) of [71], Burkholder’s inequality, and H¨older’s inequality. From Lemmas 8(a) and 9 of [71], we can show the following in-equalities in a similar way as the proof of (5.5.40):
E {(
sup
β
(nτ h0)ϵ1Y˜2n(β;α0)−Y˜20(β))M}
≲1 +τϵ1E {(
sup
β
(nh0)ϵ1 1
nh0
∑n j=1
(
Sj−−11(α0)[
∆jX, bj−1(α0, β)−bj−1(θ0)]
− h0
2 Sj−−11(α0)[
bj−1(α0, β)⊗2−bj−1(θ0)⊗2])
−Y˜20(β) )M}
<∞, E
{( 1 nτ h0 sup
β
∂β3H˜n(α0, β))M}
<∞, E
{(
(nτ h0)ϵ1 − 1
nτ h0∂β2H˜n(θ0)−Γ˜2,0 )M}
≲1 +τϵ1E {(
(nh0)ϵ1 1
n
∑n j=1
Sj−−11(α0)[
∂βbj−1(θ0), ∂βbj−1(θ0)]
−Γ˜2,0 )M} +τϵ1E
{(
(nh0)ϵ1 (
h(α0) τ h0 −1
)1 n
∑n j=1
Sj−−11(α0)[
bj−1(θ0), ∂β2bj−1(θ0)] )M}
<∞. Hence, we have established (5.5.41) to (5.5.43) as well. Finally, the tuning-parameter condition [A4′] can be verified as before, completing the proof of (5.1.16).
Chapter 6
Implementation of model selection function in Yuima package
In this chapter, we explain the specification of the model selection function IC. Based on the studies of model selection for stochastic differential equations (see Chapter 4), we create the function IC. This function can calculate QBIC, BIC and contrast-based information criterion (CIC), which is an AIC-type criterion introduced by [60] under the rapidly increasing experimental design nh2n → 0 (see also [28] for CIC under a weaker sampling-design condition nhqn →0 for someq≥2). The arguments in IC are:
IC(yuima, data, start, lower, upper, joint = FALSE, rcpp = FALSE, ...) In the following, we illustrate the arguments of IC.
– yuima is a yuima object.
– data is the data to be used for the model selection.
– start is a named list of the initial values of the parameters for optimization.
– lower is a named list for specifying lower bounds of the parameters.
– upper is a named list for specifying upper bounds of the parameters.
– joint specifies joint parameter estimation or two stage parameter estimation.
– rcpp specifies whether to use C++ code or not. Defaults to “FALSE”.
The functionIC has the following return values.
– $par are the estimators of the parameters included in the candidate model.
– $CIC is a value of the contrast-based information criterion.
– $BIC is a value of the Bayesian information criterion.
– $QBIC is a value of the quasi-Bayesian contrast-based information criterion.
Note that the functionIC uses the function qmlewith method="L-BFGS-B" internally.
Remark 6.0.1.
• Let (Xtj)Nj=0 be the observations (i.e. length(Xt)=N+1). Then, the element n of setSampling is given by N, and yuima@sampling@n should be given by N + 1.
For example, the argument yuima of the candidate model is written as follows:
N <- length(Xt)-1
mod <- setModel(drift="alpha1*x",
diffusion="exp((beta1*cos(x)+beta3)/2)") samp <- setSampling(Terminal=N^(1/3), n=N)
yuima <- setYuima(model=mod, sampling=samp) yuima@sampling@n <- N+1
• An argumentstartmust have the name of all parameters of an argumentyuima, i.e. names(start)⊇yuima@model@parameter@all. The argumentslower and upper are given in a similar manner.
• If an argument start includes the extra elements, the function IC extracts the required elements and uses it for a calculation. That is, when names(start) ⊃ yuima@model@parameter@all,ICuses only elements of the same name asyuima
@model@parameter@all.
• If the hessian matrix of the quasi-likelihood function is negative semidefinite, QBIC takes the same value as BIC.
Listing 6.1 shows the example of use of IC. In this example, we have a sample Xn= (Xtj)nj=0 with tj =jn−2/3 from the model
dXt = exp {1
2(−2 cosXt+ 1) }
dwt−Xtdt, t ∈[0, Tn], X0 = 1, whereTn=n1/3 and n= 1000. We consider the models
dXt= exp {1
2(α1cosXt+α2) }
dwt+β1Xtdt, dXt= exp
{1
2(α1cosXt) }
dwt+β1Xtdt in Ex.1 and Ex.2, respectively.
Listing 6.1: Example of use of IC
N <−1000; Ter<−Nˆ(1/3)# number of data and terminal sampling time
## Data generate set.seed(123)
mod <−setModel(drift="beta∗x",
diffusion="exp((alpha1∗cos(x)+alpha2)/2)") samp <−setSampling(Terminal=Ter, n = N)
yuima <− setYuima(model=mod, sampling=setSampling(Terminal=Ter, n=50∗N)) simu.yuima <−simulate(yuima, xinit=1, true.parameter=list(alpha1=−2,
alpha2=1, beta=−1), subsampling=samp) Xt<−NULL
for(i in 1:(N+1)){
Xt<−c(Xt, simu.yuima@[email protected][50∗(i−1)+1]) }
## Parameter settings
para.init <− list(alpha1=runif(1,max=−1.5,min=−2.5), alpha2=runif(1,max=1.5,min=0.5), beta=runif(1,max=−0.5,min=−1.5)) para.low<−list(alpha1=−7, alpha2=−4,beta=−6) para.upp<−list(alpha1=3, alpha2=6, beta=−0.01)
## Ex.1 (dXt = (beta∗x)∗dt + exp((alpha1∗cos(x)+alpha2)/2)∗dWt)
mod1<−setModel(drift="beta∗x",
diffusion="exp((alpha1∗cos(x)+alpha2)/2)") samp1<−setSampling(Terminal=Ter, n = N)
yuima1 <−setYuima(model=mod1, sampling=samp1) yuima1@sampling@n<−length(Xt)
ic1 <− IC(yuima1,data=Xt,start=para.init, upper=para.upp, lower=para.low, rcpp=TRUE)
>ic1
$par
alpha1 alpha2beta
−1.8874815 0.7749077 −0.8515751
$BIC
[1] −2670.394
$QBIC [1] −2676.686
$CIC
[1] −2680.512
## Ex.2 (dXt = (beta∗x)∗dt + exp(alpha1∗cos(x)/2)∗dWt)
mod2<−setModel(drift="beta∗x", diffusion="exp(alpha1∗cos(x)/2)") samp2<−setSampling(Terminal=Ter, n = N)
yuima2 <−setYuima(model=mod2, sampling=samp2) yuima2@sampling@n<−length(Xt)
ic2 <− IC(yuima2,data=Xt,start=para.init, upper=para.upp, lower=para.low, rcpp=TRUE)
>ic2
$par
alpha1beta
−1.0379374−0.9202064
$BIC
[1] −2670.074
$QBIC [1] −2671.756
$CIC
[1] −2675.285
Chapter 7 Appendix
We here step away from the main context and present a set of conditions under which a quasi-marginal log likelihood admits a Schwarz-type stochastic expansion, by making use of refer to the preprint [36, The proof of Theorem 2.1].
Let Hn: Θ×Ω→ R aC3(Θ)-random function where Θ ⊂Rp is a bounded convex domain. We set θ = (α, β) ∈ Rpα ×Rpβ. Let θ0 = (α0, β0) ∈ Θ be a constant, and Rn = Rn(θ0) = diag(
r1,nIpα, r2,nIpβ)
, where (r1,n) and (r2,n) are positive sequences possibly depending on θ0 and satisfying that r1,n∨r2,n → 0 and that r1,n/r2,n → 0 as n→ ∞. We then introduce the random field on Rp:
Zn(u) := exp{Hn(θ0+Rnu)−Hn(θ0)};
we set Zn ≡ 0 outside the set Un = Un(θ0) := Rn(Θ−θ0) ⊂ Rp. Let p(θ) be a prior probability density on Θ, which is assumed to be continuous and positive at θ0. Let
∆n(θ0) := Rn∂θHn(θ0), Γ1,0 ∈ Rpα ⊗Rpα and Γ2,0 ∈ Rpβ ⊗Rpβ a.s. positive definite random matrices, and then
Γ0 := diag(Γ1,0,Γ2,0).
Further, let
Y1,n(θ) := r1,n2 {Hn(α, β)−Hn(α0, β)}, Y2,n(β) := r2,n2 {Hn(α0, β)−Hn(α0, β0)},
andY1(α) andY2(β) be random functions. Finally, we introduce the quadratic random field
Z0n(u) = exp (
∆n(θ0)[u]− 1
2Γ0[u, u]
) .
Theorem 7.0.1. In addition to the aforementioned setting, suppose the following con-ditions.
• There exists an a.s. positive definite random matrix Σ0 ∈Rp⊗Rp such that (∆n(θ0), −Rn∂θ2Hn(θ0)Rn
) L
→(
Σ−01/2η, Γ0
)
, (7.0.1)
where η ∼ Np(0, Ip) is a random variable defined on an extension of the original probability space.
• We have
sup
β
r1,n∂αHn(α0, β)
=Op(1), (7.0.2)
sup
β
−r1,n2 ∂α2Hn(α0, β)−Γ1,0
=op(1), (7.0.3) sup
θ
Rn∂θ3Hn(θ)Rn=Op(1). (7.0.4)
• There exists a constant q∈(0,1) for which r1,n−qsup
θ
|Y1,n(θ)−Y1(α)| ∨r2,n−qsup
β
|Y2,n(β)−Y2(β)|−→P 0. (7.0.5)
• There exists an a.s. positive random variable χ0 such that for each κ >0, sup
α;|α−α0|≥κ
Y1(α)∨ sup
β;|β−β0|≥κ
Y2(β)≤ −χ0κ2 a.s. (7.0.6)
Then, any θˆn ∈argmaxHn satisfies that θˆn−→P θ0, and we have
∫ Zn(u)π(θ0+Rnu)−Z0n(u)π(θ0)
du−→P 0.
Bibliography
[1] H. Akaike. Information theory and an extension of the maximum likelihood prin-ciple. In Second International Symposium on Information Theory (Tsahkadsor, 1971), pages 267–281. Akad´emiai Kiad´o, Budapest, 1973.
[2] H. Akaike. A new look at the statistical model identification. IEEE Trans. Auto-matic Control, AC-19:716–723, 1974. System identification and time-series analy-sis.
[3] K. Antonio and J. Beirlant. Actuarial statistics with generalized linear mixed models. Insurance Math. Econom., 40(1):58–76, 2007.
[4] Y. Baraud, F. Comte, and G. Viennet. Adaptive estimation in autoregression or β-mixing regression via model selection. Ann. Statist., 29(3):839–875, 2001.
[5] D. Berg. Bankruptcy prediction by generalized additive models. Appl. Stoch.
Models Bus. Ind., 23(2):129–143, 2007.
[6] R. J. Bhansali and F. Papangelou. Convergence of moments of least squares es-timators for the coefficients of an autoregressive process of unknown order. Ann.
Statist., 19(3):1155–1162, 1991.
[7] H. P. Boswijk. Mixed normal inference on multicointegration.Econometric Theory, 26(5):1565–1576, 2010.
[8] H. Bozdogan. Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika, 52(3):345–370, 1987.
[9] A. Brouste, M. Fukasawa, H. Hino, S. M. Iacus, K. Kamatani, Y. Koike, H. Ma-suda, R. Nomura, T. Ogihara, Y. Shimizu, M. Uchida, and N. Yoshida. The yuima project: A computational framework for simulation and inference of stochastic dif-ferential equations. Journal of Statistical Software, 57(4):1–51, 2014.
[10] K. P. Burnham and D. R. Anderson. Model Selection and Multimodel Inference.
Springer-Verlag, New York, second edition edition, 2002.
[11] G. Casella, F. J. Gir´on, M. L. Mart´ınez, and E. Moreno. Consistency of Bayesian procedures for variable selection. Ann. Statist., 37(3):1207–1228, 2009.
[12] J. E. Cavanaugh and A. A. Neath. Generalizing the derivation of the Schwarz information criterion. Comm. Statist. Theory Methods, 28(1):49–66, 1999.
[13] N. H. Chan, S.-F. Huang, and C.-K. Ing. Moment bounds and mean squared prediction errors of long-memory time series.Ann. Statist., 41(3):1268–1298, 2013.
[14] N. H. Chan and C.-K. Ing. Uniform moment bounds of Fisher’s information with applications to time series. Ann. Statist., 39(3):1526–1550, 2011.
[15] J. Chen and Z. Chen. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759–771, 2008.
[16] G. Claeskens and N. L. Hjort. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2008.
[17] Y. A. Davydov. Mixing conditions for markov chains. Theory of Probability and Its Applications, 18(2):312–328, 1973.
[18] I. Domowitz and H. White. Misspecified models with dependent observations. J.
Econometrics, 20(1):35–58, 1982.
[19] P. Doukhan. Mixing: Properties and Examples. Springer, New York, 1994.
[20] J. J. Dziak, D. L. Coffman, S. T. Lanze, and R. Li. Sensitivity and specificity of information criteria. PeerJ PrePrints, 3, 2012.
[21] S. Eguchi. Model comparison for generalized linear models with dependent obser-vations. Econom. Stat., 5:171–188, 2018.
[22] S. Eguchi and H. Masuda. Data driven time scale in Gaussian quasi-likelihood inference. arXiv:1801.10378v2, 2018.
[23] S. Eguchi and H. Masuda. Schwarz type model comparison for LAQ models.
Bernoulli, 24(3):2278–2327, 2018.
[24] L. Fahrmeir and H. Kaufmann. Consistency and asymptotic normality of the max-imum likelihood estimator in generalized linear models. Ann. Statist., 13(1):342–
368, 1985.
[25] V. Fasen and S. Kimmig. Information criteria for multivariate CARMA processes.
arXiv:1505.00901, to appear in Bernoulli, 2015.
[26] D. F. Findley and C.-Z. Wei. AIC, overfitting principles, and the boundedness of moments of inverse matrices for vector autoregressions and related models. J.
Multivariate Anal., 83(2):415–450, 2002.
[27] D. P. Foster and E. I. George. The risk inflation criterion for multiple regression.
Ann. Statist., 22(4):1947–1975, 1994.
[28] T. Fujii and M. Uchida. AIC type statistics for discretely observed ergodic diffusion processes. Stat. Inference Stoch. Process., 17(3):267–282, 2014.
[29] V. Genon-Catalot and J. Jacod. On the estimation of the diffusion coefficient for multi-dimensional diffusion processes. Ann. Inst. H. Poincar´e Probab. Statist., 29(1):119–151, 1993.
[30] E. Gobet. LAN property for ergodic diffusions with discrete observations. Ann.
Inst. H. Poincar´e Probab. Statist., 38(5):711–737, 2002.
[31] C. Goutis and C. P. Robert. Model choice in generalised linear models: a Bayesian approach via Kullback-Leibler projections. Biometrika, 85(1):29–37, 1998.
[32] S. Haberman and A. E. Renshaw. Generalized linear models and actuarial science.
The Statistician, pages 407–436, 1996.
[33] T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.
[34] N. Herrndorf. A functional central limit theorem for weakly dependent sequences of random variables. Ann. Probab., 12(1):141–153, 1984.
[35] C. C. Heyde. Quasi-likelihood and its application. Springer Series in Statistics.
Springer-Verlag, New York, 1997. A general approach to optimal parameter esti-mation.
[36] A. Jasra, K. Kamatani, and H. Masuda. Bayesian inference for stable L´evy driven stochastic differential equations with high-frequency data. Preprint, arXiv:1707.08788, 2017.
[37] K. Kamatani and M. Uchida. Hybrid multi-step estimators for stochastic differen-tial equations based on sampled data. Stat. Inference Stoch. Process., 18(2):177–
204, 2015.
[38] R. L. Kashyap. Optimal choice of AR and MA parts in autoregressive moving average models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 4:99–104, 1982.
[39] M. Kessler. Estimation of an ergodic diffusion from discrete observations. Scand.
J. Statist., 24(2):211–229, 1997.
[40] J.-Y. Kim. Large sample properties of posterior densities, Bayesian information criterion and the likelihood principle in nonstationary time series models. Econo-metrica, 66(2):359–380, 1998.
[41] S. Konishi, T. Ando, and S. Imoto. Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika, 91(1):27–43, 2004.
[42] S. Konishi and G. Kitagawa. Generalised information criteria in model selection.
Biometrika, 83(4):875–890, 1996.
[43] S. Konishi and G. Kitagawa.Information criteria and statistical modeling. Springer Science & Business Media, 2008.
[44] Y. A. Kutoyants. Statistical inference for ergodic diffusion processes. Springer Series in Statistics. Springer-Verlag London, Ltd., London, 2004.
[45] M. Lavine and M. J. Schervish. Bayes factors: what they are and what they are not. Amer. Statist., 53(2):119–122, 1999.
[46] E. Liebscher. Towards a unified approach for proving geometric ergodicity and mix-ing properties of nonlinear autoregressive processes.J. Time Ser. Anal., 26(5):669–
689, 2005.
[47] W. Liu and Y. Yang. Parametric or nonparametric? A parametricness index for model selection. Ann. Statist., 39(4):2074–2102, 2011.
[48] J. Lv and J. S. Liu. Model selection principles in misspecified models. J. R. Stat.
Soc. Ser. B. Stat. Methodol., 76(1):141–167, 2014.
[49] J. R. Magnus and H. Neudecker. The commutation matrix: some properties and applications. Ann. Statist., 7(2):381–394, 1979.
[50] H. Masuda. Convergence of Gaussian quasi-likelihood random fields for ergodic L´evy driven SDE observed at high frequency. Ann. Statist., 41(3):1593–1641, 2013.
[51] H. Masuda and Y. Uehara. On stepwise estimation of L´evy driven stochastic differential equation (japanese). Proc. Inst. Statist. Math., 65(1):21–38, 2017.
[52] P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, London, 1989.
[53] A. J. McNeil and J. P. Wendin. Bayesian inference for generalized linear mixed models of portfolio credit risk. Journal of Empirical Finance, 14:131–149, 2007.
[54] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes. III. Foster-Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab., 25(3):518–
548, 1993.
[55] R. Nishii. Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist., 12(2):758–765, 1984.
[56] L. Pace and A. Salvan. Principles of statistical inference, volume 4 of Advanced Series on Statistical Science & Applied Probability. World Scientific Publishing Co., Inc., River Edge, NJ, 1997. From a neo-Fisherian perspective.
[57] G. Schwarz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978.
[58] S. L. Sclove. Application of model-selection criteria to some problems in multivari-ate analysis. Psychometrika, 52(3):333–343, 1987.
[59] T. Sei and F. Komaki. Bayesian prediction and model selection for locally asymp-totically mixed normal models. J. Statist. Plann. Inference, 137(7):2523–2534, 2007.
[60] M. Uchida. Contrast-based information criterion for ergodic diffusion processes from discrete observations. Ann. Inst. Statist. Math., 62(1):161–187, 2010.
[61] M. Uchida and N. Yoshida. Information criteria in model selection for mixing processes. Stat. Inference Stoch. Process., 4(1):73–98, 2001.
[62] M. Uchida and N. Yoshida. Asymptotic expansion and information criteria. SUT J. Math., 42(1):31–58, 2006.
[63] M. Uchida and N. Yoshida. Estimation for misspecified ergodic diffusion processes from discrete observations. ESAIM Probab. Stat., 15:270–290, 2011.
[64] M. Uchida and N. Yoshida. Adaptive estimation of an ergodic diffusion process based on sampled data. Stochastic Process. Appl., 122(8):2885–2924, 2012.
[65] M. Uchida and N. Yoshida. Quasi likelihood analysis of volatility and nondegener-acy of statistical random field. Stochastic Process. Appl., 123(7):2851–2876, 2013.
[66] M. Uchida and N. Yoshida. Model selection for volatility prediction. In The Fas-cination of Probability, Statistics and their Applications, pages 343–360. Springer, 2016.
[67] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.