• 検索結果がありません。

Let ˜ θ be MLE of θ .

N/A
N/A
Protected

Academic year: 2021

シェア "Let ˜ θ be MLE of θ ."

Copied!
34
0
0

読み込み中.... (全文を見る)

全文

(1)

4. Asymptotic Normality of MLE:

Let ˜ θ be MLE of θ .

As n goes to infinity, we have the following result:

n(˜ θ − θ ) −→ N

  0 , lim

n →∞

( I( θ ) n

) −1

 , where it is assumed that lim

n →∞

( I( θ ) n

)

converges.

That is, when n is large, ˜ θ is approximately distributed as follows:

θ ˜ ∼ N (

θ, (I( θ )) −1 )

.

(2)

Suppose that s(X) = θ ˜ .

When n is large, V(s(X)) is approximately equal to (I( θ )) 1 . 5. Optimization (

最適化

):

MLE of θ results in the following maximization problem:

max

θ

log L( θ ; x) .

We often have the case where the solution of θ is not derived in closed form.

(3)

= ⇒ Optimization procedure 0 = ∂ log L( θ ; x)

∂θ = ∂ log L( θ ; x)

∂θ + ∂ 2 log L( θ ; x)

∂θ∂θ 0 ( θ − θ ) . Solving the above equation with respect to θ , we obtain the following:

θ = θ

( ∂ 2 log L( θ ; x)

∂θ∂θ 0

) 1

log L( θ ; x)

∂θ .

(4)

Replace the variables as follows:

θ −→ θ (i + 1) θ −→ θ (i)

Then, we have:

θ (i+1) = θ (i)

( ∂ 2 log L( θ (i) ; x)

∂θ∂θ 0

) 1

log L( θ (i) ; x)

∂θ .

= ⇒ Newton-Raphson method (

ニュートン・ラプソン法

)

(5)

Replacing ∂ 2 log L( θ (i) ; x)

∂θ∂θ 0 by E

( ∂ 2 log L( θ (i) ; x)

∂θ∂θ 0 )

, we obtain the following op- timization algorithm:

θ (i + 1) = θ (i) − (

E

( ∂ 2 log L( θ (i) ; x)

∂θ∂θ 0

)) 1

log L( θ (i) ; x)

∂θ

= θ (i) + (

I( θ (i) ) ) 1 ∂ log L( θ (i) ; x)

∂θ

= ⇒ Method of Scoring (

スコア法

)

(6)

9.1 MLE: The Case of Single Regression Model

The regression model:

y i = β 1 + β 2 x i + u i , 1. u iN(0 , σ 2 ) is assumed.

2. The density function of u i is:

f (u i ) = 1

√ 2 πσ 2 exp (

− 1 2 σ 2 u 2 i

) .

Because u 1 , u 2 , · · · , u n are mutually independently distributed, the joint den-

(7)

sity function of u 1 , u 2 , · · · , u n is written as:

f (u 1 , u 2 , · · · , u n ) = f (u 1 ) f (u 2 ) · · · f (u n )

= 1

(2 πσ 2 ) n / 2 exp

 

 − 1

2 σ 2

n

i = 1

u 2 i

 

3. Using the transformation of variable (u i = y i − β 1 − β 2 x i ), the joint density function of y 1 , y 2 , · · · , y n is given by:

f (y 1 , y 2 , · · · , y n ) = 1

(2 πσ 2 ) n / 2 exp

 

 − 1

2 σ 2

n

i = 1

(y i − β 1 − β 2 x i ) 2

 

L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n ) .

(8)

L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n ) is called the likelihood function.

log L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n ) is called the log-likelihood function.

log L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n )

= − n

2 log(2 π ) − n

2 log( σ 2 ) − 1 2 σ 2

n

i=1

(y t − β 1 − β 2 x i ) 2

(9)

4. Transformation of Variable (

変数変換

):

Suppose that the density function of a random variable X is f x (x).

Defining X = g(Y), the density function of Y, f y (y), is given by:

f y (y) = f x (g(y)) dg(y) dy

. In the case where X and g(Y) are n × 1 vectors, dg(y)

dy

should be replaced by ∂ g(y)

y 0

, which is an absolute value of a determinant of the matrix ∂ g(y)

y 0 .

(10)

Example: When XU(0 , 1), derive the density function of Y = − log(X).

f x (x) = 1

X = exp( − Y) is obtained.

Therefore, the density function of Y, f y (y), is given by:

f y (y) = dx dy

f x (g(y)) = | − exp( − y) | = exp( − y)

(11)

5. Given the observed data y 1 , y 2 , · · · , y n , the likelihood function L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n ), or the log-likelihood function log L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n ) is maximized with respect to ( α , β , σ 2 ).

Solve the following three simultaneous equations:

log L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n )

∂α = 1

σ 2

n

i=1

(y i − β 1 − β 2 x i ) = 0 ,

log L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n )

∂β = 1

σ 2

n

i = 1

(y i − β 1 − β 2 x i )x i = 0 ,

(12)

log L( β 1 , β 2 , σ 2 | y 1 , y 2 , · · · , y n )

∂σ 2 = − n

2 1 σ 2 + 1

2 σ 4

n

i = 1

(y i − β 1 − β 2 x i ) 2 = 0 .

The solutions of ( β 1 , β 2 , σ 2 ) are called the maximum likelihood estimates, denoted by ( ˜ β 1 , ˜ β 2 , ˜ σ 2 ).

The maximum likelihood estimates are:

β ˜ 2 =

n

i = 1 (x ix)(y iy)

n

i = 1 (x ix) 2 , β ˜ 1 = y − β ˜ 2 x , σ ˜ 2 = 1 n

n

i = 1

(y i − β ˜ 1 − β ˜ 2 x i ) 2 .

The MLE of σ 2 is divided by n, not n − 2.

(13)

9.2 MLE: The Case of Multiple Regression Model I

1. Multivariate Normal Distribution: X : n × 1 and XN( µ, Σ )

The density function of X is:

f (x) = (2 π ) n / 2 |Σ| 1 / 2 exp (

− 1

2 (x − µ ) 0 Σ 1 (x − µ ) )

.

(14)

2. Regression model: y = X β + u, uN(0 , σ 2 I n ) Transformation of Variables from u to y:

f u (u) = (2 πσ 2 ) n / 2 exp (

− 1 2 σ 2 u 0 u ) f y (y) = f u (yX β ) ∂ u

y 0

= (2 πσ 2 ) n / 2 exp (

− 1

2 σ 2 (yX β ) 0 (yX β ) )

= L( θ ; y , X) , where θ = ( β, σ 2 ), because of ∂ u

y 0 = I n .

(15)

Therefore, the log-likelihood function is:

log L( θ ; y , X) = − n

2 log(2 πσ 2 ) − 1

2 σ 2 (yX β ) 0 (yX β ) , Note that |Σ| 1 / 2 = |σ 2 I n | 1 / 2 = σ n / 2 .

3. max

θ

log L( θ ; y , X)

(FOC) ∂ log L( θ ; y , X)

∂θ = 0

(SOC) ∂ 2 log L( θ ; y , X)

∂θ∂θ 0 is a negative definite matrix.

(16)

We obtain MLE of β and σ 2 :

β ˜ = (X 0 X) 1 X 0 y , σ ˜ 2 = (yX ˜ β ) 0 (yX ˜ β )

n ,

where ˜ σ 2 is divided by n, not nk.

4. Fisher’s information matrix is:

I( θ ) = − E ( ∂ 2 log L( θ ; y , X)

∂θ∂θ 0 )

The inverse of the information matrix, I( θ ) 1 , provides a lower bound of the

(17)

variance - covariance matrix for unbiased estimators of θ . I( θ ) 1 =

( σ 2 (X 0 X) −1 0

0 2 σ 4

n )

For large n, we approximately obtain:

( β ˜ σ ˜ 2

)

N (( β

σ 2 )

,

( σ 2 (X 0 X) 1 0

0 2 σ 4

n ))

.

(18)

9.3 MLE: The Case of Multiple Regression Model II

1. Regression model: y = X β + u, uN(0 , σ 2 Ω ) Transformation of Variables from u to y:

f u (u) = (2 πσ 2 ) n / 2 |Ω| 1 / 2 exp (

− 1

2 σ 2 u 0 1 u ) f y (y) = f u (yX β ) ∂ u

y 0

= (2 πσ 2 ) n / 2 |Ω| 1 / 2 exp (

− 1

2 σ 2 (yX β ) 0 1 (yX β ) )

= L( θ ; y , X) ,

(19)

where θ = ( β, σ 2 ), because of ∂ u

y 0 = I n . The log-likelihood function is:

log L( θ ; y , X) = − n

2 log(2 πσ 2 ) − 1

2 log |Ω| − 1

2 σ 2 (yX β ) 0 1 (yX β ) , where θ = ( β, σ 2 ).

2. max

θ

log L( θ ; y , X)

(FOC) ∂ log L( θ ; y , X)

∂θ = 0

(SOC) ∂ 2 log L( θ ; y , X)

∂θ∂θ 0 is a negative definite matrix.

(20)

Then, we obtain MLE of β and σ 2 :

β ˜ = (X 0 1 X) 1 X 0 1 y , σ ˜ 2 = (yX ˜ β ) 0 1 (yX ˜ β ) n

3. Fisher’s information matrix is defined as:

I( θ ) = − E ( ∂ 2 log L( θ ; y , X)

∂θ∂θ 0 )

The inverse of the information matrix, I( θ ) 1 , provides a lower bound of the variance - covariance matrix for unbiased estimators of θ , which is given by:

I( θ ) 1 =

( σ 2 (X 0 1 X) 1 0

0 2 σ 4

n

)

(21)

9.4 MLE: AR(1) Model

The pth-order Autoregressive Model, i.e., AR(p) Model (p

次の自己回帰モデル

):

y t = φ 1 y t 1 + φ 2 y t 2 + · · · + φ p y t p + u t

AR(1) Model: t = 2 , 3 , · · · , n,

y t = φ 1 y t 1 + u t , u tN(0 , σ 2 )

where |φ 1 | < 1 is assumed for now.

(22)

To obtain the joint density function of y 1 , y 2 , · · · , y n , f (y n , y n 1 , · · · , y 1 ) is decom- posed as follows:

f (y n , y n−1 , · · · , y 1 ) = f (y 1 )

n

t = 2

f (y t | y t−1 , · · · , y 1 ) . From y t = φ 1 y t 1 + u t , we can obtain:

E(y t | y t 1 , · · · , y 1 ) = φ 1 y t 1 , and V(y t | y t 1 , · · · , y 1 ) = σ 2 . Therefore, the conditional distribution f (y t | y t − 1 , · · · , y 1 ) is:

f (y t | y t−1 , · · · , y 1 ) = 1

√ 2 πσ 2 exp (

− 1

2 σ 2 (y t − φ 1 y t−1 ) 2 )

.

(23)

To obtain the unconditional distribution f (y t ), y t is rewritten as follows:

y t = φ 1 y t − 1 + u t

= φ 2 1 y t − 2 + u t + φ 1 u t − 1

...

= φ 1 j y tj + u t + φ 1 u t − 1 + · · · + φ 1 j u tj

...

= u t + φ 1 u t 1 + φ 2 1 u t 2 + · · · , when j goes to infinity.

(24)

The unconditional expectation and variance of y t is:

E(y t ) = 0, and V(y t ) = σ 2 (1 + φ 2 1 + φ 4 1 + · · · ) = σ 2 1 − φ 2 1 . Therefore, the unconditional distribution of y t is given by:

f (y t ) = 1

2 πσ 2 / (1 − φ 2 1 ) exp

(

− 1

2 σ 2 / (1 − φ 2 1 ) y 2 t )

.

(25)

Finally, the joint distribution of y 1 , y 2 , · · · , y n is given by:

f (y n , y n 1 , · · · , y 1 ) = f (y 1 )

n

t=2

f (y t | y t 1 , · · · , y 1 )

= 1

2 πσ 2 / (1 − φ 2 1 ) exp

(

− 1

2 σ 2 / (1 − φ 2 1 ) y 2 1 )

×

n

t = 2

√ 1

2 πσ 2 exp (

− 1

2 σ 2 (y t − φ 1 y t 1 ) 2

)

(26)

The log-likelihood function is:

log L( φ 1 , σ 2 ; y n , y n 1 , · · · , y 1 ) = − 1

2 log(2 πσ 2 / (1 − φ 2 1 )) − 1

2 σ 2 / (1 − φ 2 1 ) y 2 1

n − 1

2 log(2 πσ 2 ) − 1 2 σ 2

n

t = 2

(y t − φ 1 y t 1 ) 2 . Maximize log L with respect to φ 1 and σ 2 .

Maximization Procedure:

• Newton-Raphson Method, or Method of Scoring

• Simple Grid Search (search maximization within the range − 1 < φ 1 < 1,

changing the value of φ 1 by 0.01)

(27)

9.5 MLE: Regression Model with AR(1) Error

When the error term is autocorrelated, the regression model is written as:

y t = x t β + u t , u t = ρ u t 1 + t , tiid N(0 , σ 2 ) .

The joint distribution of u n , u n 1 , · · · , u 1 is:

f u (u n , u n−1 , · · · , u 1 ; ρ, σ 2 ) = f u (u 1 ; ρ, σ 2 )

n

t = 2

f u (u t | u t−1 , · · · , u 1 ; ρ, σ 2 )

(28)

= (2 πσ 2 / (1 − ρ 2 )) 1 / 2 exp (

− 1

2 σ 2 / (1 − ρ 2 ) u 2 1 )

× (2 πσ 2 ) −(n−1)/2 exp

 

 − 1

2 σ 2

n

t = 2

(u t − ρ u t−1 ) 2

 

 .

By transformation of variables from u n , u n − 1 , · · · , u 1 to y n , y n − 1 , · · · , y 1 , the joint dis- tribution of y n , y n 1 , · · · , y 1 is:

f y (y n , y n − 1 , · · · , y 1 ; ρ, σ 2 , β )

= f u (y nx n β, y n − 1 − x n − 1 β, · · · , y 1 − x 1 β ; ρ, σ 2 ) ∂ u

y 0

(29)

= (2 πσ 2 / (1 − ρ 2 )) 1 / 2 exp (

− 1

2 σ 2 / (1 − ρ 2 ) (y 1x 1 β ) 2 )

× (2 πσ 2 ) −(n−1)/2 exp

 

 − 1

2 σ 2

n

t = 2

( (y t − ρ y t−1 ) − (x t − ρ x t−1 ) β ) 2

 

= (2 πσ 2 ) 1 / 2 (1 − ρ 2 ) 1 / 2 exp (

− 1 2 σ 2 ( √

1 − ρ 2 y 1 − √

1 − ρ 2 x 1 β ) 2 )

× (2 πσ 2 ) (n 1) / 2 exp

 

 − 1

2 σ 2

n

t = 2

( (y t − ρ y t − 1 ) − (x t − ρ x t − 1 ) β ) 2

 

= (2 πσ 2 ) −n/2 (1 − ρ 2 ) 1/2 exp (

− 1

2 σ 2 (y 1x 1 β ) 2 )

× exp

 

 − 1

2 σ 2

n

t = 2

(y tx t β ) 2

 

(30)

= (2 π ) n / 2 ( σ 2 ) n / 2 (1 − ρ 2 ) 1 / 2 exp

 

 − 1

2 σ 2

n

t = 1

(y tx t β ) 2

 

= L( ρ, σ 2 , β ; y n , y n−1 , · · · , y 1 ) , where y t and x t are given by:

y t =  



√ 1 − ρ 2 y t , for t = 1,

y t − ρ y t − 1 , for t = 2 , 3 , · · · , n, x t =  



√ 1 − ρ 2 x t , for t = 1,

x t − ρ x t − 1 , for t = 2 , 3 , · · · , n,

(31)

For maximization, the first derivative of L( ρ, σ 2 , β ; y n , y n 1 , · · · , y 1 ) with respect to β should be zero.

β ˜ = (

T

t = 1

x t 0 x t ) 1 (

T

t = 1

x t 0 y t )

= (X ∗0 X ) 1 X ∗0 y

= ⇒ This is equivalent to OLS from the regression model: y = X β + and ∼

N(0 , σ 2 I n ), where σ 2 = σ 2 / (1 − ρ 2 ).

(32)

For maximization, the first derivative of L( ρ, σ 2 , β ; y n , y n 1 , · · · , y 1 ) with respect to σ 2 should be zero.

σ ˜ 2 = 1 n

n

t = 1

(y tx t β ) 2 = 1

n (y X β ) 0 (y X β ) , where

y =

 







y 1 y 2 ...

y n

 





 =

 







√ 1 − ρ 2 y 1 y 2 − ρ y 1

...

y n − ρ y n − 1

 





 , X =

 







x 1 x 2 ...

x n

 





 =

 







√ 1 − ρ 2 x 1 x 2 − ρ x 1

...

x n − ρ x n − 1

 





 .

(33)

For maximization, the first derivative of L( ρ, σ 2 , β ; y n , y n 1 , · · · , y 1 ) with respect to ρ should be zero.

max β,σ

2

L( ρ, σ 2 , β ; y) is equivalent to max

ρ L( ρ, σ ˜ 2 , β ˜ ; y).

L( ρ, σ ˜ 2 , β ˜ ; y) is called the concentrated log-likelihood function (

集約対数尤度関 数

), which is a function of ρ , i.e., both ˜ σ 2 and ˜ β depend only on ρ .

(34)

The log-likelihood function is written as:

log L( ρ, σ ˜ 2 , β ˜ ; y) = − n

2 log(2 π ) − n

2 log( ˜ σ 2 ) + 1

2 log(1 − ρ 2 ) − n 2

= − n

2 log(2 π ) − n 2 − n

2 log ( σ ˜ 2 ( ρ ) )

+ 1

2 log(1 − ρ 2 )

For maximization of log L, use Newton-Raphson method, method of scoring or simple grid search

Note that ˜ σ 2 = σ ˜ 2 ( ρ ) = 1

n (y X β ˜ ) 0 (y X β ˜ ) for ˜ β = (X ∗0 X ) 1 X ∗0 y .

参照

関連したドキュメント

With the help of an integrable function θ a general summabil- ity method (called θ-summability) of different orthogonal series is considered. As special cases the trigonometric

In the present work, which is self-contained, we study the general case of a reward given by an admissible family φ = (φ(θ), θ ∈ T 0 ) of non negative random variables, and we solve

To deal with the complexity of analyzing a liquid sloshing dynamic effect in partially filled tank vehicles, the paper uses equivalent mechanical model to simulate liquid sloshing...

Key words and phrases: Linear system, transfer function, frequency re- sponse, operational calculus, behavior, AR-model, state model, controllabil- ity,

Abstract: In this paper, we proved a rigidity theorem of the Hodge metric for concave horizontal slices and a local rigidity theorem for the monodromy representation.. I

It is suggested by our method that most of the quadratic algebras for all St¨ ackel equivalence classes of 3D second order quantum superintegrable systems on conformally flat

We show that a discrete fixed point theorem of Eilenberg is equivalent to the restriction of the contraction principle to the class of non-Archimedean bounded metric spaces.. We

In particular, we consider a reverse Lee decomposition for the deformation gra- dient and we choose an appropriate state space in which one of the variables, characterizing the