Proof of Theorem 5.1.13 - 従属観測データにおける局所漸近二次構造モデルのモデル比較

5.5 Proofs

5.5.3 Proof of Theorem 5.1.13

The PLDI for ergodic diﬀusion model with h = h₀ (τ = 1) being known has been derived [71, Section 6]. However, the same scenario would not go through in our proof without additional considerations because the random function ˜Hn(θ) is diﬀerent from the original GQLFHn(θ;h).

g_q-exponential ergodicity. Let {P_t(x, dy)}t∈R+ denote the family of the transition functions of X. Given a function ρ : R^d → R+ and a signed measure m on the d-dimensional Borel space, we define

∥m∥ρ = sup{ ∫

f(y)m(dy)

: f is R-valued and measurable, such that|f| ≤ρ }

. Lemma 5.5.1. Under Assumption 5.1.12, the following statements hold.

(1) There exist a probability measure π₀ and a nonnegative C² function g such that lim sup

|x|→∞

1 +|x|^q g(x) = 0 for every q >0, and that

∥P_t(x,·)−π₀(·)∥gq ≲e⁻^atg_q(x), x∈R^d (5.5.29) for some constant a >0.

(2) sup_tE(|X_t|^q)<∞ for every q >0.

Proof. (1) In view of the general results developed in [54, Section 6], under Assumption 5.1.12 it suﬃces for (5.5.29) to show the following: (i) every compact sets in R^d are petite for the Markov chain (X_t_j)ⁿ_j=0 for anyh >0 small enough; (ii) the drift condition

Aθg(x)≤ −c^′₁g(x) +c^′₂, x∈R^d,

holds for some c^′₁, c^′₂ > 0 and a nonnegative C² function g such that g(x) → ∞ as

|x| → ∞ faster than any polynomial, where Aθ denotes the infinitesimal generator of X: with writing b= (b_k) and S= (S_kl),

Aθg(x) :=∑

b_k(x, θ)∂_x_kg(x) + 1 2

∑

S_kl(x, α)∂_x_k∂_x_lg(x). (5.5.30) Indeed, both conditions follow from [30, Propositions 1.1, 1.2 and 5.1]: under As-sumption 5.1.12, X admits a transition density which is positive for every (x, y, h) ∈ R^d×R^d×(0,1], ensuring the topological condition (i); the drift condition (ii) can be derived forg equaling exp(c^′′|x|²) or exp(c^′′|x|) for somec^′′ >0 outside a neighborhood of the origin in case of Assumption5.1.12(2)(a) or (b), respectively.

(2) follows from a standard application of the drift condition. See [30, Propositions 1.1(1) and 5.1(1)].

Bounding inverse moment. The next lemma will be used to deduce some moment bounds required later.

Lemma 5.5.2. Under Assumption 5.1.12, for every q >0 we have sup

n>2q/d

E (

sup

α∈Θα

1 nh₀

∑n j=1

S_j⁻₋¹₁(α)[

(∆_jX)^⊗²]

−q)

<∞.

Proof. Write ζ_j = |h⁻₀^1/2∆_jX|². First, observe that the expectation can be bounded from above by

( sup

(x,α)

λ^q_max(S(x, α)) )

E {(1

∑n j=1

h⁻₀^1/2∆_jX² )₋q}

≲E {(1

∑n j=1

ζ_j )₋q}

. (5.5.31) Under Assumption5.1.12, we have the following modified Aronson type bound with pos-sibly unbounded drift coeﬃcient for the transition density ofX, sayp_h₀(x, y) (P(X_h₀ ∈ dy|X₀ = x) = p_h₀(x, y)dy): there exist constants A, B > 1 and B₀ ≥ 0 such that for every (x, y, h₀)∈R^d×R^d×(0,1],

Ah^d/2₀ e⁻^B⁰^h⁰^|^x^|²exp (

− B|y−x|² h₀

)

≤ph0(x, y)

≤ A h^d/2₀

e^B⁰^h⁰^|^x^|²exp (

− |y−x|² h₀B

)

, (5.5.32) where we can take B0 = 0 especially when the drift b is bounded (see [30, Proposition 1.2] for details). This together with (5.5.32) implies that the conditional distribution of h⁻₀^1/2∆_jX given X_t_j₋₁ =x, which we denote byy 7→p¯_h₀(y|x), satisfies that

A⁻¹e⁻^B⁰^h⁰^|^x^|²exp(−B|y|²)≤p¯_h₀(y|x)≤Ae^B⁰^h⁰^|^x^|²exp(−|y|²/B).

It follows that

sup

p_h₀(y|x)≲exp(

h₀B₀|x|²)

. (5.5.33)

In what follows, the constantB₀ ≥0 may change at each appearance, with keeping the rule that B₀ = 0 if b is bounded (that is, if the Assumption 5.1.12(2)(b) holds).

Letk ∈N. We will prove that sup

i≥0 P (∑k

j=1

ζi+j ≤ϵ )

≲ϵ^kd/2, ϵ >0. (5.5.34)

To this end, we make use the argument of the proof of [6, Eq.(2.2)], while our conditions are apparently weaker. Write Pl(·) for the conditional expectation given Ftl. Observe that from (5.5.33) we have a.s.

Pi+k−1(ζi+k ≤ϵ) =

∫

|y|≤√ ϵ

ph0(y|Xt_i+k−1)dy≲ϵ^d/2exp(

h0B0|Xt_i+k−1|²) . This in turn implies that

I(ζ_i+k₋₁ ≤ϵ)Pi+k−1(ζ_i+k≤ϵ)

≲ϵ^d/2I(ζ_i+k₋₁ ≤ϵ)I

(|X_t_i+k−1| ≤√

ϵh₀+|X_t_i+k−2|) exp(

h₀B₀|X_t_i+k−1|²)

≲ϵ^d/2I(ζ_i+k₋₁ ≤ϵ) exp(

h₀B₀|X_t_i+k−2|²) .

Iterating the same manner along with taking the conditional expectations successively, we can deduce

P (∑k

j=1

ζ_i+j ≤ϵ )

≤E (∏k

j=1

I(ζ_i+j ≤ϵ) )

{(k∏−1 j=1

I(ζ_i+j ≤ϵ) )

Pi+k−1(ζ_i+k ≤ϵ) }

≲ϵ^d/2E

{(k∏−2 j=1

I(ζ_i+j ≤ϵ) )

I(ζ_i+k₋₁ ≤ϵ) exp(

h₀B₀|X_t_i+k−2|²) }

≲ϵ^d/2E

{(k∏−2 j=1

I(ζ_i+j ≤ϵ) )

exp(

h₀B₀|X_t_i+k₋₂|²)

Pi+k−2(ζ_i+k₋₁ ≤ϵ) }

≲(ϵ^d/2)²E

{(k∏−2 j=1

I(ζ_i+j ≤ϵ) )

exp(

h₀B₀|X_t_i+k₋₃|²) }

≲· · ·≲ϵ^(k⁻^1)d/2E (

I(ζ_i+1 ≤ϵ) exp(

h₀B₀|X_t_i|²) )

≲ϵ^kd/2sup

t E{ exp(

h₀B₀|X_t|²)}

≲ϵ^kd/2,

the last estimate holding for everyh0 small enough; again note that we can takeB0 = 0 whenb is bounded. Thus we have verified the estimate (5.5.34), so that

sup

i≥0 P

{(∑k j=1

ζ_i+j )₋q

≥r }

≲r⁻^kd/(2q), r >0.

Now, lettingk >2q/d we obtain sup

i≥0

{(∑k j=1

ζ_i+j )₋q}

≤1 +

∫ _∞

r⁻^kd/(2q)dr ≲1. (5.5.35) Having (5.5.35) in hand, we can complete the proof in a similar manner to [26, Lemma A.1]. Let m := [n/k]. Obviously it suﬃces to consider n = km (m ∈ N). Write λ_l=∑_kl

j=k(l−1)+1ζ_j; then∑_n

j=1ζ_j =∑_m

l=1λ_l, and (5.5.35) ensures that sup_lE(λ⁻_l ^q)≲1.

By the convexity of the mapping s7→s⁻^q (s >0), Jensen’s inequality gives E

{(1 n

∑n j=1

ζ_j )₋q}

=k^qE {(1

∑m l=1

λ_l )₋q}

≲k^q,

which combined with (5.5.31) completes the proof.

Now, in order to show (5.1.15) and (5.1.16), it is enough to check [A1^′′], [A4^′], and [A6] of [71]; the conditions [B1] and [B2] therein are given in Section 5.5.1 and Theorem 5.2.1, respectively. The claim (5.1.17) is trivial from the (5.1.15), (5.1.16), and the definition of ˜h^′. We put ϵ₁ =ϵ₀/2 in the sequel.

Proof of (5.1.15). We will verify the following conditions: for everyM > 0, E

( sup

√n∂_αH˜n(α₀, β) ^M)

<∞; (5.5.36)

E {(

sup

n^ϵ¹Y˜¹n(α, β)−Y˜¹0(α))M}

<∞; (5.5.37)

E {(1

n sup

∂_α³H˜n(θ))M}

<∞; (5.5.38)

E {

sup

( n^ϵ¹

− 1

n∂_α²H˜n(α0, β)−Γ˜1,0

)M}

<∞. (5.5.39) The conditions (5.5.36) to (5.5.39) imply [A1^′′] and [A6]. The left-hand side of (5.5.37) satisfies

E {(

sup

n^ϵ¹Y˜¹_n(α, β)−Y˜¹₀(α))M}

≲E {(

sup

n^ϵ¹ 1

∑n j=1

log|S_j⁻₋¹₁(α)Sj−1(α0)| −1 2

∫

R^d

log|S⁻¹(x, α)S(x, α0)|π(dx) )M} +E

{(

sup

n^ϵ¹

log h(α)/h₀

τ d

∫

R^dtr(

S⁻¹(x, α)S(x, α₀)) π(dx)

)M} +E

{(

sup

n^ϵ¹ 1

∑n j=1

S_j−1⁻¹ (α)[∆_jX, b_j₋₁(θ)]

)M} +E

{(

sup

n^ϵ¹ 1

2h(α) (1

∑n j=1

S_j⁻₋¹₁[b_j₋₁(θ)^⊗²]) )M}

≲E {(

sup

n^ϵ¹ 1

∑n j=1

log|S_j⁻₋¹₁(α)S_j₋₁(α₀)| −1 2

∫

R^d

log|S⁻¹(x, α)S(x, α₀)|π(dx) )M} +E

[{

sup

n^ϵ¹

h(α)/h₀

τ d

∫

R^dtr(

S⁻¹(x, α)S(x, α₀)) π(dx)

⁻¹

h(α)/h₀

τ d

∫

R^dtr(

S⁻¹(x, α)S(x, α₀))

π(dx) −1 }M]

+ 1

≲E {(

sup

n^ϵ¹ 1

∑n j=1

log|S_j⁻₋¹₁(α)Sj−1(α0)| −1 2

∫

R^d

log|S⁻¹(x, α)S(x, α0)|π(dx) )M} +E

[{

sup

n^ϵ¹ h(α)

h₀ ⁻¹

h(α) h₀ − τ

∫

R^d

tr(

S⁻¹(x, α)S(x, α₀)) π(dx)

}M] + 1

≤E {(

sup

n^ϵ¹ 1

∑n j=1

log|S_j⁻₋¹₁(α)S_j₋₁(α₀)| −1 2

∫

R^d

log|S⁻¹(x, α)S(x, α₀)|π(dx) )M} +E

( sup

h(α) h₀

⁻^2M)1/2

×E {(

sup

n^ϵ¹ h(α)

h0 − τ d

∫

R^d

tr(

S⁻¹(x, α)S(x, α₀)) π(dx)

)₋2M}1/2

+ 1,

where in the second step we used (4.5.2), Lemma 5.5.1(2), and H¨older’s inequality for the third term and fourth term. As in [50, Lemma 4.3], Lemma 5.5.1 ensures

E {(

sup

n^ϵ¹ 1

∑n j=1

log|S_j⁻₋¹₁(α)S_j₋₁(α₀)| −1 2

∫

R^d

log|S⁻¹(x, α)S(x, α₀)|π(dx) )M}

<∞, E

{(

sup

n^ϵ¹ h(α)

h₀ −τ d

∫

R^d

tr(

S⁻¹(x, α)S(x, α₀)) π(dx)

)₋2M}

<∞. Further, Lemma 5.5.2 implies

E (

sup

h(α) h₀

⁻^2M )

<∞. Hence, (5.5.37) is established. In a similar way, we have

E (

sup

√n∂_αH˜n(α₀, β) ^M)

≲1 +E( 1

2√ n

∑n j=1

tr (

S_j⁻₋¹₁(α₀)(

∂_αS_j₋₁(α₀)))

+ 1

h(α₀) 1 2√ n

∑n j=1

∂_αS_j⁻₋¹₁(α₀)[

(∆_jX)²]

≲1 +E{ h0

h(α₀) 1 2√ n

∑n j=1

tr (

S_j⁻₋¹₁(α₀)(

∂_αS_j₋₁(α₀)))(h(α0)

h₀ −τ) ^M}

≤1 +E( h₀

h(α₀)

^3M)1/3

E( 1

∑n j=1

tr (

S_j⁻¹₋₁(α₀)(

∂_αS_j₋₁(α₀)))

3M)1/3

×E( 1

√nh₀d

∑n j=1

S_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²−τ h₀S_j₋₁(α₀)]

3M)1/3

<∞, E

{ sup

( n^ϵ¹

− 1

n∂_α²H˜n(α₀, β)−Γ˜_1,0 )M}

≲1 +E [{

n^ϵ¹ 1

∑n j=1

∂_α²(

logSj−1(α0))+ 1 2

(1 n

∑n j=1

∂_α²S_j⁻₋¹₁(α0)[

Sj−1(α0)])

− 1 2d

(1 n

∑n j=1

∂_αS_j−1⁻¹ (α₀)[

S_j₋₁(α₀)])^⊗²

−Γ˜_1,0 }M] +E

[{

n^ϵ¹ 1

2 ( 1

nτ h₀

∑n j=1

∂_α²S_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²])

− 1 2d

( 1 nτ h0

∑n j=1

∂_αS_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²])^⊗²

− d 2∂_α²(

logh(α)) }M]

≲1 +E [{

n^ϵ¹ 1

2 ( 1

nτ h₀

∑n j=1

∂_α²S_j⁻₋¹₁(α0)[

(∆jX)^⊗²])

− 1 2d

( 1 nτ h₀

∑n j=1

∂_αS_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²])^⊗²

− d 2

( 1 nd

∑n j=1

∂_α²S_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²]) 1 h(α) +d

2 ( 1

∑n j=1

∂_αS_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²])^⊗²( 1 h(α)

)2 }M]

≤1 +E [{

n^ϵ¹ h₀

h(α) ( 1

nh₀

∑n j=1

∂_α²S_j⁻₋¹₁(α0)[

(∆jX)^⊗²])(h(α)

τ h₀ −1) }M] +E

[{

n^ϵ¹ (

h(α) )2(

1 nh₀

∑n j=1

∂²_αS_j⁻₋¹₁(α₀)[

(∆_jX)^⊗²])^⊗²

{(h(α) τ h₀ −1

+ 2

(h(α)

τ h₀ −1)}

}M]

<∞, E

{(1 nsup

∂_α³H˜n(θ))M}

≲1 +E{(

sup

∂_α³(

logh(α)))^M}

≲1 +E [{

sup

1 nh₀d

∑n j=1

∂_α³S_j⁻₋¹₁(α)[

(∆jX)^⊗²] h₀

h(α) }M] +E

[{

sup

1 nh₀d

∑n j=1

∂_α²S_j−1⁻¹ (α)[

(∆_jX)^⊗2]

× 1

nh₀d

∑n j=1

∂_αS_j⁻₋¹₁(α)[

(∆_jX)^⊗²] h₀

h(α)

²}M] +E

[{

sup

1 nh0d

∑n j=1

∂_αS_j⁻₋¹₁(α)[

(∆_jX)^⊗²]

3 h₀

h(α)

³}M]

<∞.

The proofs of the conditions (5.5.36), (5.5.38), and (5.5.39) are complete. The tuning-parameter condition [A4^′] can be verified exactly in the same way as in [71, Section 6].

We thus obtain (5.1.15).

Proof of (5.1.16). We will prove the following conditions: for every M >0, sup

n E( 1

√nτ h₀∂βH˜n(θ0) ^M)

<∞; (5.5.40)

sup

n E {(

sup

(nτ h₀)^ϵ¹Y˜²_n(β;α₀)−Y˜²₀(β))M}

<∞; (5.5.41) sup

n E

{( 1 nτ h0

sup

∂_β³H˜n(α₀, β))M}

<∞; (5.5.42)

sup

n E {(

(nτ h0)^ϵ¹ − 1

nτ h₀∂_β²H˜n(θ0)−Γ˜2,0

)M}

<∞. (5.5.43) We have

√nτ h₀∂_βH˜n(θ₀)

≲

√ 1 nτ h₀

∑n j=1

S_j⁻₋¹₁(α₀)[

∆_jX−τ h₀b_j₋₁(θ₀), ∂_βb_j₋₁(θ₀)] +

√nh₀ τ

1 nh₀d

∑n j=1

S_j⁻₋¹₁(α0)[

(∆jX)^⊗²−Ej−1{(∆jX)^⊗²}]

× 1 n

∑n j=1

S_j⁻¹₋₁(α₀)[

b_j₋₁(θ₀), ∂_βb_j₋₁(θ₀)]

√nh₀ τ

1 nh₀d

∑n j=1

S_j⁻₋¹₁(α₀)[

Ej−1{(∆_jX)^⊗²} −τ h₀S_j₋₁(α₀)]

× 1 n

∑n j=1

S_j⁻₋¹₁(α₀)[

b_j₋₁(θ₀), ∂_βb_j₋₁(θ₀)] .

We can deduce (5.5.40) from using Lemma 8(b) of [71], Burkholder’s inequality, and H¨older’s inequality. From Lemmas 8(a) and 9 of [71], we can show the following in-equalities in a similar way as the proof of (5.5.40):

E {(

sup

(nτ h₀)^ϵ¹Y˜²n(β;α₀)−Y˜²0(β))M}

≲1 +τ^ϵ¹E {(

sup

(nh₀)^ϵ¹ 1

nh₀

∑n j=1

(

S_j⁻₋¹₁(α₀)[

∆_jX, b_j₋₁(α₀, β)−b_j₋₁(θ₀)]

− h0

2 S_j⁻₋¹₁(α₀)[

b_j−1(α₀, β)^⊗²−b_j−1(θ₀)^⊗²])

−Y˜²0(β) )M}

<∞, E

{( 1 nτ h₀ sup

∂_β³H˜n(α₀, β))M}

<∞, E

{(

(nτ h₀)^ϵ¹ − 1

nτ h₀∂_β²H˜n(θ₀)−Γ˜_2,0 )M}

≲1 +τ^ϵ¹E {(

(nh₀)^ϵ¹ 1

∑n j=1

S_j⁻₋¹₁(α₀)[

∂_βb_j₋₁(θ₀), ∂_βb_j₋₁(θ₀)]

−Γ˜_2,0 )M} +τ^ϵ¹E

{(

(nh₀)^ϵ¹ (

h(α₀) τ h0 −1

)1 n

∑n j=1

S_j⁻₋¹₁(α₀)[

b_j₋₁(θ₀), ∂_β²b_j₋₁(θ₀)] )M}

<∞. Hence, we have established (5.5.41) to (5.5.43) as well. Finally, the tuning-parameter condition [A4^′] can be verified as before, completing the proof of (5.1.16).

Chapter 6 Implementation of model selection function in Yuima package

In this chapter, we explain the specification of the model selection function IC. Based on the studies of model selection for stochastic diﬀerential equations (see Chapter 4), we create the function IC. This function can calculate QBIC, BIC and contrast-based information criterion (CIC), which is an AIC-type criterion introduced by [60] under the rapidly increasing experimental design nh²_n → 0 (see also [28] for CIC under a weaker sampling-design condition nh^q_n →0 for someq≥2). The arguments in IC are:

IC(yuima, data, start, lower, upper, joint = FALSE, rcpp = FALSE, ...) In the following, we illustrate the arguments of IC.

– yuima is a yuima object.

– data is the data to be used for the model selection.

– start is a named list of the initial values of the parameters for optimization.

– lower is a named list for specifying lower bounds of the parameters.

– upper is a named list for specifying upper bounds of the parameters.

– joint specifies joint parameter estimation or two stage parameter estimation.

– rcpp specifies whether to use C++ code or not. Defaults to “FALSE”.

The functionIC has the following return values.

– $par are the estimators of the parameters included in the candidate model.

– $CIC is a value of the contrast-based information criterion.

– $BIC is a value of the Bayesian information criterion.

– $QBIC is a value of the quasi-Bayesian contrast-based information criterion.

Note that the functionIC uses the function qmlewith method="L-BFGS-B" internally.

Remark 6.0.1.

• Let (X_t_j)^N_j=0 be the observations (i.e. length(Xt)=N+1). Then, the element n of setSampling is given by N, and yuima@sampling@n should be given by N + 1.

For example, the argument yuima of the candidate model is written as follows:

N <- length(Xt)-1

mod <- setModel(drift="alpha1*x",

diffusion="exp((beta1*cos(x)+beta3)/2)") samp <- setSampling(Terminal=N^(1/3), n=N)

yuima <- setYuima(model=mod, sampling=samp) yuima@sampling@n <- N+1

• An argumentstartmust have the name of all parameters of an argumentyuima, i.e. names(start)⊇yuima@model@parameter@all. The argumentslower and upper are given in a similar manner.

• If an argument start includes the extra elements, the function IC extracts the required elements and uses it for a calculation. That is, when names(start) ⊃ yuima@model@parameter@all,ICuses only elements of the same name asyuima

@model@parameter@all.

• If the hessian matrix of the quasi-likelihood function is negative semidefinite, QBIC takes the same value as BIC.

Listing 6.1 shows the example of use of IC. In this example, we have a sample X_n= (X_t_j)ⁿ_j=0 with t_j =jn⁻^2/3 from the model

dX_t = exp {1

2(−2 cosX_t+ 1) }

dw_t−X_tdt, t ∈[0, T_n], X₀ = 1, whereT_n=n^1/3 and n= 1000. We consider the models

dX_t= exp {1

2(α₁cosX_t+α₂) }

dw_t+β₁X_tdt, dX_t= exp

2(α₁cosX_t) }

dw_t+β₁X_tdt in Ex.1 and Ex.2, respectively.

Listing 6.1: Example of use of IC

N <−1000; Ter<−Nˆ(1/3)# number of data and terminal sampling time

## Data generate set.seed(123)

mod <−setModel(drift="beta∗x",

diﬀusion="exp((alpha1∗cos(x)+alpha2)/2)") samp <−setSampling(Terminal=Ter, n = N)

yuima <− setYuima(model=mod, sampling=setSampling(Terminal=Ter, n=50∗N)) simu.yuima <−simulate(yuima, xinit=1, true.parameter=list(alpha1=−2,

alpha2=1, beta=−1), subsampling=samp) Xt<−NULL

for(i in 1:(N+1)){

Xt<−c(Xt, simu.yuima@[email protected][50∗(i−1)+1]) }

## Parameter settings

para.init <− list(alpha1=runif(1,max=−1.5,min=−2.5), alpha2=runif(1,max=1.5,min=0.5), beta=runif(1,max=−0.5,min=−1.5)) para.low<−list(alpha1=−7, alpha2=−4,beta=−6) para.upp<−list(alpha1=3, alpha2=6, beta=−0.01)

## Ex.1 (dXt = (beta∗x)∗dt + exp((alpha1∗cos(x)+alpha2)/2)∗dWt)

mod1<−setModel(drift="beta∗x",

diﬀusion="exp((alpha1∗cos(x)+alpha2)/2)") samp1<−setSampling(Terminal=Ter, n = N)

yuima1 <−setYuima(model=mod1, sampling=samp1) yuima1@sampling@n<−length(Xt)

ic1 <− IC(yuima1,data=Xt,start=para.init, upper=para.upp, lower=para.low, rcpp=TRUE)

>ic1

$par

alpha1 alpha2beta

−1.8874815 0.7749077 −0.8515751

$BIC

[1] −2670.394

$QBIC [1] −2676.686

$CIC

[1] −2680.512

## Ex.2 (dXt = (beta∗x)∗dt + exp(alpha1∗cos(x)/2)∗dWt)

mod2<−setModel(drift="beta∗x", diﬀusion="exp(alpha1∗cos(x)/2)") samp2<−setSampling(Terminal=Ter, n = N)

yuima2 <−setYuima(model=mod2, sampling=samp2) yuima2@sampling@n<−length(Xt)

ic2 <− IC(yuima2,data=Xt,start=para.init, upper=para.upp, lower=para.low, rcpp=TRUE)

>ic2

$par

alpha1beta

−1.0379374−0.9202064

$BIC

[1] −2670.074

$QBIC [1] −2671.756

$CIC

[1] −2675.285

Chapter 7 Appendix

We here step away from the main context and present a set of conditions under which a quasi-marginal log likelihood admits a Schwarz-type stochastic expansion, by making use of refer to the preprint [36, The proof of Theorem 2.1].

Let Hn: Θ×Ω→ R aC³(Θ)-random function where Θ ⊂R^p is a bounded convex domain. We set θ = (α, β) ∈ R^p^α ×R^p^β. Let θ₀ = (α₀, β₀) ∈ Θ be a constant, and R_n = R_n(θ₀) = diag(

r_1,nI_p_α, r_2,nI_p_β)

, where (r_1,n) and (r_2,n) are positive sequences possibly depending on θ₀ and satisfying that r_1,n∨r_2,n → 0 and that r_1,n/r_2,n → 0 as n→ ∞. We then introduce the random field on R^p:

Zn(u) := exp{Hn(θ₀+R_nu)−Hn(θ₀)};

we set Zn ≡ 0 outside the set Un = Un(θ₀) := R_n(Θ−θ₀) ⊂ R^p. Let p(θ) be a prior probability density on Θ, which is assumed to be continuous and positive at θ₀. Let

∆n(θ0) := Rn∂θHn(θ0), Γ1,0 ∈ R^p^α ⊗R^p^α and Γ2,0 ∈ R^p^β ⊗R^p^β a.s. positive definite random matrices, and then

Γ0 := diag(Γ1,0,Γ2,0).

Further, let

Y1,n(θ) := r_1,n² {Hn(α, β)−Hn(α₀, β)}, Y2,n(β) := r_2,n² {Hn(α₀, β)−Hn(α₀, β₀)},

andY1(α) andY2(β) be random functions. Finally, we introduce the quadratic random field

Z⁰n(u) = exp (

∆_n(θ₀)[u]− 1

2Γ₀[u, u]

) .

Theorem 7.0.1. In addition to the aforementioned setting, suppose the following con-ditions.

• There exists an a.s. positive definite random matrix Σ₀ ∈R^p⊗R^p such that (∆n(θ0), −Rn∂_θ²Hn(θ0)Rn

) _L

→(

Σ⁻₀^1/2η, Γ0

)

, (7.0.1)

where η ∼ N_p(0, I_p) is a random variable defined on an extension of the original probability space.

• We have

sup

r_1,n∂_αHn(α₀, β)

=O_p(1), (7.0.2)

sup

−r_1,n² ∂_α²Hn(α₀, β)−Γ_1,0

=o_p(1), (7.0.3) sup

R_n∂_θ³Hn(θ)R_n=O_p(1). (7.0.4)

• There exists a constant q∈(0,1) for which r_1,n⁻^qsup

|Y1,n(θ)−Y1(α)| ∨r_2,n⁻^qsup

|Y2,n(β)−Y2(β)|−→^P 0. (7.0.5)

• There exists an a.s. positive random variable χ₀ such that for each κ >0, sup

α;|α−α0|≥κ

Y1(α)∨ sup

β;|β−β0|≥κ

Y2(β)≤ −χ₀κ² a.s. (7.0.6)

Then, any θˆ_n ∈argmaxHn satisfies that θˆ_n−→^P θ₀, and we have

∫ Zn(u)π(θ₀+R_nu)−Z⁰_n(u)π(θ₀)

du−→^P 0.

Bibliography

[1] H. Akaike. Information theory and an extension of the maximum likelihood prin-ciple. In Second International Symposium on Information Theory (Tsahkadsor, 1971), pages 267–281. Akad´emiai Kiad´o, Budapest, 1973.

[2] H. Akaike. A new look at the statistical model identification. IEEE Trans. Auto-matic Control, AC-19:716–723, 1974. System identification and time-series analy-sis.

[3] K. Antonio and J. Beirlant. Actuarial statistics with generalized linear mixed models. Insurance Math. Econom., 40(1):58–76, 2007.

[4] Y. Baraud, F. Comte, and G. Viennet. Adaptive estimation in autoregression or β-mixing regression via model selection. Ann. Statist., 29(3):839–875, 2001.

[5] D. Berg. Bankruptcy prediction by generalized additive models. Appl. Stoch.

Models Bus. Ind., 23(2):129–143, 2007.

[6] R. J. Bhansali and F. Papangelou. Convergence of moments of least squares es-timators for the coeﬃcients of an autoregressive process of unknown order. Ann.

Statist., 19(3):1155–1162, 1991.

[7] H. P. Boswijk. Mixed normal inference on multicointegration.Econometric Theory, 26(5):1565–1576, 2010.

[8] H. Bozdogan. Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika, 52(3):345–370, 1987.

[9] A. Brouste, M. Fukasawa, H. Hino, S. M. Iacus, K. Kamatani, Y. Koike, H. Ma-suda, R. Nomura, T. Ogihara, Y. Shimizu, M. Uchida, and N. Yoshida. The yuima project: A computational framework for simulation and inference of stochastic dif-ferential equations. Journal of Statistical Software, 57(4):1–51, 2014.

[10] K. P. Burnham and D. R. Anderson. Model Selection and Multimodel Inference.

Springer-Verlag, New York, second edition edition, 2002.

[11] G. Casella, F. J. Gir´on, M. L. Mart´ınez, and E. Moreno. Consistency of Bayesian procedures for variable selection. Ann. Statist., 37(3):1207–1228, 2009.

[12] J. E. Cavanaugh and A. A. Neath. Generalizing the derivation of the Schwarz information criterion. Comm. Statist. Theory Methods, 28(1):49–66, 1999.

[13] N. H. Chan, S.-F. Huang, and C.-K. Ing. Moment bounds and mean squared prediction errors of long-memory time series.Ann. Statist., 41(3):1268–1298, 2013.

[14] N. H. Chan and C.-K. Ing. Uniform moment bounds of Fisher’s information with applications to time series. Ann. Statist., 39(3):1526–1550, 2011.

[15] J. Chen and Z. Chen. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759–771, 2008.

[16] G. Claeskens and N. L. Hjort. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2008.

[17] Y. A. Davydov. Mixing conditions for markov chains. Theory of Probability and Its Applications, 18(2):312–328, 1973.

[18] I. Domowitz and H. White. Misspecified models with dependent observations. J.

Econometrics, 20(1):35–58, 1982.

[19] P. Doukhan. Mixing: Properties and Examples. Springer, New York, 1994.

[20] J. J. Dziak, D. L. Coﬀman, S. T. Lanze, and R. Li. Sensitivity and specificity of information criteria. PeerJ PrePrints, 3, 2012.

[21] S. Eguchi. Model comparison for generalized linear models with dependent obser-vations. Econom. Stat., 5:171–188, 2018.

[22] S. Eguchi and H. Masuda. Data driven time scale in Gaussian quasi-likelihood inference. arXiv:1801.10378v2, 2018.

[23] S. Eguchi and H. Masuda. Schwarz type model comparison for LAQ models.

Bernoulli, 24(3):2278–2327, 2018.

[24] L. Fahrmeir and H. Kaufmann. Consistency and asymptotic normality of the max-imum likelihood estimator in generalized linear models. Ann. Statist., 13(1):342–

368, 1985.

[25] V. Fasen and S. Kimmig. Information criteria for multivariate CARMA processes.

arXiv:1505.00901, to appear in Bernoulli, 2015.

[26] D. F. Findley and C.-Z. Wei. AIC, overfitting principles, and the boundedness of moments of inverse matrices for vector autoregressions and related models. J.

Multivariate Anal., 83(2):415–450, 2002.

[27] D. P. Foster and E. I. George. The risk inflation criterion for multiple regression.

Ann. Statist., 22(4):1947–1975, 1994.

[28] T. Fujii and M. Uchida. AIC type statistics for discretely observed ergodic diﬀusion processes. Stat. Inference Stoch. Process., 17(3):267–282, 2014.

[29] V. Genon-Catalot and J. Jacod. On the estimation of the diﬀusion coeﬃcient for multi-dimensional diﬀusion processes. Ann. Inst. H. Poincar´e Probab. Statist., 29(1):119–151, 1993.

[30] E. Gobet. LAN property for ergodic diﬀusions with discrete observations. Ann.

Inst. H. Poincar´e Probab. Statist., 38(5):711–737, 2002.

[31] C. Goutis and C. P. Robert. Model choice in generalised linear models: a Bayesian approach via Kullback-Leibler projections. Biometrika, 85(1):29–37, 1998.

[32] S. Haberman and A. E. Renshaw. Generalized linear models and actuarial science.

The Statistician, pages 407–436, 1996.

[33] T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.

[34] N. Herrndorf. A functional central limit theorem for weakly dependent sequences of random variables. Ann. Probab., 12(1):141–153, 1984.

[35] C. C. Heyde. Quasi-likelihood and its application. Springer Series in Statistics.

Springer-Verlag, New York, 1997. A general approach to optimal parameter esti-mation.

[36] A. Jasra, K. Kamatani, and H. Masuda. Bayesian inference for stable L´evy driven stochastic diﬀerential equations with high-frequency data. Preprint, arXiv:1707.08788, 2017.

[37] K. Kamatani and M. Uchida. Hybrid multi-step estimators for stochastic diﬀeren-tial equations based on sampled data. Stat. Inference Stoch. Process., 18(2):177–

204, 2015.

[38] R. L. Kashyap. Optimal choice of AR and MA parts in autoregressive moving average models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 4:99–104, 1982.

[39] M. Kessler. Estimation of an ergodic diﬀusion from discrete observations. Scand.

J. Statist., 24(2):211–229, 1997.

[40] J.-Y. Kim. Large sample properties of posterior densities, Bayesian information criterion and the likelihood principle in nonstationary time series models. Econo-metrica, 66(2):359–380, 1998.

[41] S. Konishi, T. Ando, and S. Imoto. Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika, 91(1):27–43, 2004.

[42] S. Konishi and G. Kitagawa. Generalised information criteria in model selection.

Biometrika, 83(4):875–890, 1996.

[43] S. Konishi and G. Kitagawa.Information criteria and statistical modeling. Springer Science & Business Media, 2008.

[44] Y. A. Kutoyants. Statistical inference for ergodic diﬀusion processes. Springer Series in Statistics. Springer-Verlag London, Ltd., London, 2004.

[45] M. Lavine and M. J. Schervish. Bayes factors: what they are and what they are not. Amer. Statist., 53(2):119–122, 1999.

[46] E. Liebscher. Towards a unified approach for proving geometric ergodicity and mix-ing properties of nonlinear autoregressive processes.J. Time Ser. Anal., 26(5):669–

689, 2005.

[47] W. Liu and Y. Yang. Parametric or nonparametric? A parametricness index for model selection. Ann. Statist., 39(4):2074–2102, 2011.

[48] J. Lv and J. S. Liu. Model selection principles in misspecified models. J. R. Stat.

Soc. Ser. B. Stat. Methodol., 76(1):141–167, 2014.

[49] J. R. Magnus and H. Neudecker. The commutation matrix: some properties and applications. Ann. Statist., 7(2):381–394, 1979.

[50] H. Masuda. Convergence of Gaussian quasi-likelihood random fields for ergodic L´evy driven SDE observed at high frequency. Ann. Statist., 41(3):1593–1641, 2013.

[51] H. Masuda and Y. Uehara. On stepwise estimation of L´evy driven stochastic diﬀerential equation (japanese). Proc. Inst. Statist. Math., 65(1):21–38, 2017.

[52] P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, London, 1989.

[53] A. J. McNeil and J. P. Wendin. Bayesian inference for generalized linear mixed models of portfolio credit risk. Journal of Empirical Finance, 14:131–149, 2007.

[54] S. P. Meyn and R. L. Tweedie. Stability of Markovian processes. III. Foster-Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab., 25(3):518–

548, 1993.

[55] R. Nishii. Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist., 12(2):758–765, 1984.

[56] L. Pace and A. Salvan. Principles of statistical inference, volume 4 of Advanced Series on Statistical Science & Applied Probability. World Scientific Publishing Co., Inc., River Edge, NJ, 1997. From a neo-Fisherian perspective.

[57] G. Schwarz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978.

[58] S. L. Sclove. Application of model-selection criteria to some problems in multivari-ate analysis. Psychometrika, 52(3):333–343, 1987.

[59] T. Sei and F. Komaki. Bayesian prediction and model selection for locally asymp-totically mixed normal models. J. Statist. Plann. Inference, 137(7):2523–2534, 2007.

[60] M. Uchida. Contrast-based information criterion for ergodic diﬀusion processes from discrete observations. Ann. Inst. Statist. Math., 62(1):161–187, 2010.

[61] M. Uchida and N. Yoshida. Information criteria in model selection for mixing processes. Stat. Inference Stoch. Process., 4(1):73–98, 2001.

[62] M. Uchida and N. Yoshida. Asymptotic expansion and information criteria. SUT J. Math., 42(1):31–58, 2006.

[63] M. Uchida and N. Yoshida. Estimation for misspecified ergodic diﬀusion processes from discrete observations. ESAIM Probab. Stat., 15:270–290, 2011.

[64] M. Uchida and N. Yoshida. Adaptive estimation of an ergodic diﬀusion process based on sampled data. Stochastic Process. Appl., 122(8):2885–2924, 2012.

[65] M. Uchida and N. Yoshida. Quasi likelihood analysis of volatility and nondegener-acy of statistical random field. Stochastic Process. Appl., 123(7):2851–2876, 2013.

[66] M. Uchida and N. Yoshida. Model selection for volatility prediction. In The Fas-cination of Probability, Statistics and their Applications, pages 343–360. Springer, 2016.

[67] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.

ドキュメント内従属観測データにおける局所漸近二次構造モデルのモデル比較 (ページ 110-130)