Stock, J.H. (1987) “Asymptotic Properties of Least Squares Estimators of Cointegrating Vectors,” Econometrica, Vol.55, pp.1035 – 1056.

(1)

Stock, J.H. (1987) “Asymptotic Properties of Least Squares Estimators of Cointegrating Vectors,” Econometrica, Vol.55, pp.1035 – 1056.

Proposition:

Let y

₁_,_t

be a scalar, y

₂_,_t

be a k × 1 vector, and (y

₁_,_t

, y

⁰_2,t

)

⁰

be a g × 1 vector, where g = k + 1.

Consider the following model:

y

1,t

= α + γ

⁰

y

2,t

+ z

^∗_t

∆ y

2,t

= u

2,t

(2)

( z

^∗_t

u

2,t

)

= Ψ

^∗

(L)

t

is a g × 1 i.i.d. vector with E(

t

) = 0 and E(

t

_t⁰

) = PP

⁰

. OLSE is given by:

( α ˆ γ ˆ )

=

( T ∑

y

⁰₂_,_t

∑ y

₂_,_t

∑ y

₂_,_t

y

⁰₂_,_t

)

−1

( ∑ y

_1,t

∑ y

₁_,_t

y

₂_,_t

)

.

Define λ

^∗₁

, which is a g × 1 vector, and Λ

^∗₂

, which is a k × g matrix, as follows:

Ψ

^∗

(1) P = ( λ

^∗₁⁰

Λ

^∗₂

)

.

(3)

Then, we have the following results:

( T

¹^/²

( ˆ α − α ) T ( ˆ γ − γ )

)

−→

 



1 ( Λ

^∗₂

∫

W(r)dr )

₀

Λ

^∗₂

∫

W(r)dr Λ

^∗₂

(∫

(W(r)) (W(r))

⁰

dr )

Λ

^∗₂⁰

 



−1

( h

₁

h

₂

)

, where

( h

₁

h

2

)

=

 



λ

^∗₁⁰

W(1) Λ

^∗₂

(∫

W(r) (dW(r))

⁰

)

λ

^∗₁

+ ∑

^∞

τ=0

E(u

_2,t

z

^∗_t_+τ

)

 

 .

W(r) denotes a g-dimensional standard Brownian motion.

1) OLSE of the cointegrating vector is consistent even though u

_t

is serially

correlated.

(4)

2) The consistency of OLSE implies that T

⁻¹

∑

ˆu

²_t

−→ σ

²

. 3) Because T

⁻¹

∑

(y

₁_,_t

− y

₁

)

²

goes to infinity, a coe ﬃ cient of determination,

R

²

, goes to one.

(5)

3.4 Testing Cointegration

3.4.1 Engle-Granger Test y

_t

∼ I(1)

y

_1,t

= α + γ

⁰

y

_2,t

+ u

_t

• u

_t

∼ I(0) = ⇒ Cointegration

• u

_t

∼ I(1) = ⇒ Spurious Regression

Estimate y

₁_,_t

= α + γ

⁰

y

₂_,_t

+ u

_t

by OLS, and obtain ˆu

_t

.

Estimate ˆu

_t

= ρ ˆu

_t₋₁

+ δ

1

∆ ˆu

_t₋₁

+ δ

2

∆ ˆu

_t₋₂

+ · · · + δ

p−1

∆ ˆu

_t₋_p₊₁

+ e

_t

by OLS.

(6)

ADF Test:

• H

0

: ρ = 1 (Sprious Regression)

• H

₁

: ρ < 1 (Cointegration)

= ⇒ Engle-Granger Test

For example, see Engle and Granger (1987), Phillips and Ouliaris (1990) and Hansen

(1992).

(7)

Asymmptotic Distribution of Residual-Based ADF Test for Cointegration

# of Refressors, (a) Regressors have no drift (b) Some regressors have drift

excluding constant 1% 2.5% 5% 10% 1% 2.5% 5% 10%

1 − 3 . 96 − 3 . 64 − 3 . 37 − 3 . 07 − 3 . 96 − 3 . 67 − 3 . 41 − 3 . 13

2 − 4 . 31 − 4 . 02 − 3 . 77 − 3 . 45 − 4 . 36 − 4 . 07 − 3 . 80 − 3 . 52

3 − 4 . 73 − 4 . 37 − 4 . 11 − 3 . 83 − 4 . 65 − 4 . 39 − 4 . 16 − 3 . 84

4 − 5 . 07 − 4 . 71 − 4 . 45 − 4 . 16 − 5 . 04 − 4 . 77 − 4 . 49 − 4 . 20

5 − 5 . 28 − 4 . 98 − 4 . 71 − 4 . 43 − 5 . 36 − 5 . 02 − 4 . 74 − 4 . 46

J.D. Hamilton (1994), Time Series Analysis, p.766.

(8)

3.4.2 Error Correction Representation

VAR(p) model:

y

_t

= α + φ

1

y

_t₋₁

+ φ

2

y

_t₋₂

+ · · · + φ

p

y

_t₋_p

+

t

,

where y

t

, α and

t

indicate g × 1 vectors for t = 1 , 2 , · · · , T , and φ

s

is a g × g matrix

for s = 1 , 2 , · · · , p.

(9)

Rewrite:

y

t

= α + ρ y

t−1

+ δ

1

∆ y

t−1

+ δ

2

∆ y

t−2

+ · · · + +δ

p−1

∆ y

t−p+1

+

t

, where

ρ = φ

1

+ φ

2

+ · · · + φ

p

,

δ

s

= − ( φ

s+1

+ δ

s+2

+ · · · + φ

p

) , for s = 1 , 2 , · · · , p − 1.

(10)

Again, rewrite:

∆ y

t

= α + δ

0

y

t−1

+ δ

1

∆ y

t−1

+ δ

2

∆ y

t−2

+ · · · + +δ

p−1

∆ y

t−p+1

+

t

, where

δ

0

= ρ − I

_g

= −φ (1) ,

for φ (L) = I

g

− δ

1

L − δ

2

L

²

− · · · − δ

p

L

^p

.

(11)

If y

_t

has h cointegrating relations, we have the following error correction representation:

∆ y

_t

= α − BA

⁰

y

_t₋₁

+ δ

1

∆ y

_t₋₁

+ δ

2

∆ y

_t₋₂

+ · · · + +δ

p−1

∆ y

_t₋_p₊₁

+

t

,

where A

⁰

y

_t₋₁

is a stationary h × 1 vector (i.e., h I(0) processes), and B and A are g × h matrices.

Note that φ (1) = BA

⁰

for φ (L) = I

_g

− δ

1

L − δ

2

L

²

− · · · − δ

p

L

^p

.

Each row of A

⁰

denotes the cointegrating vector, i.e., A

⁰

consists of h cointegrating

vectors.

(12)

Suppose that

t

∼ N(0 , Σ ). The log-likelihood function is:

log l( α, δ

1

, · · · , δ

p−1

, B | A)

= − T g

2 log(2 π ) − T

2 log |Σ|

− 1 2

∑

T t=1

( ∆ y

_t

− α + BA

⁰

y

_t₋₁

− δ

1

∆ y

_t₋₁

− · · · − δ

p−1

∆ y

_t₋_p₊₁

)

⁰

Σ

⁻¹

× ( ∆ y

_t

− α + BA

⁰

y

_t₋₁

− δ

1

∆ y

_t₋₁

− · · · − δ

p−1

∆ y

_t₋_p₊₁

) Given A and h, maximize log l with respect to α, δ

1

, · · · , δ

p−1

, B.

Then, given h, how do we estimate A? = ⇒ Johansen (1988, 1991)

(13)

**(*) Canonical Correlatoion (正準相関)**

x

⁰

= (x

1

, x

2

, · · · , x

n

) and y

⁰

= (y

1

, y

2

, · · · , y

m

), where n ≤ m.

u = a

⁰

x = a

₁

x

₁

+ a

₂

x

₂

+ · · · + a

_n

x

_n

, v = b

⁰

y = b

₁

y

₁

+ b

₂

y

₂

+ · · · + b

_m

y

_m

, where V(u) = V(v) = 1 and E(x) = E(y) = 0 for simplicity.

Define:

V(x) = Σ

xx

, E(xy

⁰

) = Σ

xy

, V(y) = Σ

yy

, E(yx

⁰

) = Σ

yx

= Σ

⁰xy

.

(14)

The correlation coe ﬃ cient between u and v, denoted by ρ , is:

ρ = Cov(u , v)

√ V(u) √

V(v) = a

⁰

Σ

xy

b , where V(u) = a

⁰

Σ

xx

a = 1 and V(v) = b

⁰

Σ

yy

b = 1.

Maximize ρ = a

⁰

Σ

xy

b subject to a

⁰

Σ

xx

a = 1 and b

⁰

Σ

yy

b = 1.

The Lagrangian is:

L = a

⁰

Σ

xy

b − 1

2 λ (a

⁰

Σ

xx

a − 1) − 1

2 µ (b

⁰

Σ

yy

b − 1) .

(15)

Take a derivative with respect to a and b.

∂ L

∂ a = Σ

xy

b − λΣ

xx

a = 0 ,

∂ L

∂ b = Σ

⁰xy

a − µΣ

yy

b = 0 . Using a

⁰

Σ

xx

a = 1 and b

⁰

Σ

yy

b = 1, we obtain:

λ = µ = a

⁰

Σ

xy

b . From the first equation, we obtain:

a = 1

λ Σ

⁻_xx¹

Σ

xy

b ,

(16)

which is substituted into the second equation as follows:

1 λ Σ

⁰xy

Σ

⁻xx¹

Σ

xy

b − λΣ

yy

b = 0 , i.e.,

( Σ

⁻¹yy

Σ

⁰xy

Σ

⁻¹xx

Σ

xy

− λ

²

I

_m

)b = 0 , i.e.,

|Σ

⁻yy¹

Σ

⁰xy

Σ

⁻xx¹

Σ

xy

− λ

²

I

_m

| = 0 .

The solution of λ

²

is given by the maximum eigen value of Σ

⁻_yy¹

Σ

⁰_xy

Σ

⁻_xx¹

Σ

xy

, and b is

the corresponding eigen vector.

(17)

Back to the Cointegration:

Estimate the following two regressions:

∆ y

_t

= b

₁_,₀

+ b

₁_,₁

∆ y

_t₋₁

+ b

₁_,₂

∆ y

_t₋₂

+ · · · + b

₁_,_p₋₁

∆ y

_t₋_p₊₁

+ u

₁_,_t

y

_t₋₁

= b

₂_,₀

+ b

₂_,₁

∆ y

_t₋₁

+ b

₂_,₂

∆ y

_t₋₂

+ · · · + b

₂_,_p₋₁

∆ y

_t₋_p₊₁

+ u

₂_,_t

Obtain ˆu

i,t

for i = 1 , 2 and t = 1 , 2 , · · · , T , and compute as follow:

Σ ˆ

11

= 1 T

∑

T t=1

ˆu

₁_,_t

ˆu

⁰₁_,_t

, Σ ˆ

22

= 1 T

∑

T t=1

ˆu

₂_,_t

ˆu

⁰₂_,_t

, Σ ˆ

12

= 1

T

∑

T t=1

ˆu

1,t

ˆu

⁰₂_,_t

, Σ ˆ

21

= Σ ˆ

⁰12

.

(18)

From ˆ Σ

⁻¹₂₂

Σ ˆ

21

Σ ˆ

⁻¹₁₁

Σ ˆ

12

, compute h biggest eigenvalues, denoted by ˆ λ

1

, ˆ λ

2

, · · · , ˆ λ

h

, and the corresponding eigen vectors, denoted by ˆa

1

, ˆa

2

, · · · , ˆa

h

, where ˆ λ

1

> λ ˆ

2

> · · · >

λ ˆ

h

,

The estimate of A, ˆ A, is given by ˆ A = (ˆa

1

, ˆa

2

, · · · , ˆa

h

).

How do we obtain h?

(19)

3.5 Testing the Number of Cointegrating Vectors

Trace Test (

トレース検定

):

H

₀

: λ

h+1

= 0 and H

₁

: λ

h

> 0.

2(log l

1

− log l

0

) = − T

∑

g

i=h+1

log(1 − λ ˆ

i

) −→ tr(Q) , where

Q = (∫

1

0

W(r)dW(r)

⁰

)

0

(∫

1

0

W(r)W(r)

⁰

dr

)

−1

(∫

1 0

W(r)dW(r)

⁰

)

.

(20)

Trace Test for # of Cointegrating Relations

# of Random (a) Regressors have no drift (b) Some regressors have drift

Walks (g − h) 1% 2.5% 5% 10% 1% 2.5% 5% 10%

1 11.576 9.658 8.083 6.691 6.936 5.332 3.962 2.816

2 21.962 19.611 17.844 15.583 19.310 17.299 15.197 13.338

3 37.291 34.062 31.256 28.436 35.397 32.313 29.509 26.791

4 55.551 51.801 48.419 45.248 53.792 50.424 47.181 43.964

5 77.911 73.031 69.977 65.956 76.955 72.140 68.905 65.063

J.D. Hamilton (1994), Time Series Analysis, p.767.

(21)

Largest Eigenvalue Test (最大固有値検定):

H

0

: λ

h+1

= 0 and H

1

: λ

h

> 0.

2(log l

₁

− log l

₀

) = − T log(1 − λ ˆ

h+1

) −→ maxmum eigen value of Q ,

(22)

Maximum Eigenvalue Test for # of Cointegrating Relations

# of Random (a) Regressors have no drift (b) Some regressors have drift

Walks (g − h) 1% 2.5% 5% 10% 1% 2.5% 5% 10%

1 11.576 9.658 8.083 6.691 6.936 5.332 3.962 2.816

2 18.782 16.403 14.595 12.783 17.936 15.810 14.036 12.099

3 26.154 23.362 21.279 18.959 25.521 23.002 20.778 18.697

4 32.616 29.599 27.341 24.917 31.943 29.335 27.169 24.712

5 38.858 35.700 33.262 30.818 38.341 35.546 33.178 30.774

J.D. Hamilton (1994), Time Series Analysis, p.768.

(23)

4 GMM (Generalized Mothod of Moments,

^一般化積率法

)

1. Method of Moments (

積率法

):

Regression Model: y

_t

= x

_t

β +

t

From the assumption, E(x

⁰_t

t

) = 0.

The sample mean is given by:

1 T

∑

T t=1

x

⁰_t

t

= 1 T

∑

T t=1

x

⁰_t

(y

t

− x

t

β ) = 0 .

(24)

Therefore,

β

M M

=

 

 1 T

∑

T t=1

x

⁰_t

x

_t

 



−1





 1 T

∑

T t=1

x

⁰_t

y

_t

 

 ,

which is equivalent to OLS.

2. Generalized Mothod of Moments (GMM,

一般化積率法

):

E (h( θ ; w

_t

)) = 0 θ is a k × 1 parameter vector to be estimated.

w

_t

is an observed vector w

_t

= (y

_t

, x

_t

).

(25)

h( θ ; w

_t

) is a r × 1 vector function, where r ≥ k.

Define g( θ ; W

_T

) as follows:

g( θ ; W

_T

) = 1 T

∑

T t=1

h( θ ; w

_t

) , where W

_T

= { w

_T

, w

_T₋₁

, · · · , w

₁

} .

Compute:

min

θ

( g( θ ; W

_T

) )

₀

S

⁻¹

(

g( θ ; W

_T

) )

The solution of θ , denoted by ˆ θ

T

, corresponds to the GMM estimator, where

(26)

S is defined as follows:

S = lim

T→∞

1 T

∑

T t=1

∑

∞ τ=−∞

E (

(h( θ ; w

t

)) (h( θ ; w

t−τ

))

⁰

) .

In empirical studies, S is replaced by its estimate, i.e., ˆ S

T

.

When h( θ ; w

t

), t = 1 , 2 , · · · , T , are not serially correlated, the following ˆ S

T

is consistent, i.e.,

S ˆ

T

= 1 T

∑

T t=1

( h(ˆ θ

T

; w

t

) ) (

h(ˆ θ

T

; w

t

) )

₀

−→ S .

(27)

When h( θ ; w

_t

), t = 1 , 2 , · · · , T , are serially correlated, S ˆ

_T

= Γ ˆ (0) +

∑

q τ=1

k( τ

q + 1 )( ˆ Γ ( τ ) + Γ ˆ ( τ )

⁰

) , where ˆ Γ ( τ ) = 1

T

∑

T t=τ+1

h(ˆ θ

T

; w

t

)h(ˆ θ

T

; w

t−s

)

⁰

.

k(x) = 1 − x = ⇒ Bartlett kernel (Newwey-west estimator), k(x) = ⇒ Parzen kernel, and etc.