TA10 最近の更新履歴 Econometrics Ⅰ 2016 TA session

(1)

TA session# 10

Jun Sakamoto

June 28,2016

Newton’s Method 1

2

_Example ₃

3

Likelihood function of AR(1) 4

4

Empirical example of MLE 5

1 Newton’s Method

We often face necessity which solve the non-linear function. Explicit solution may not always be obtained. Thus we use optimization procedure.

1.Newton’s method

Newton’s method is one of the procedure to find the maximum(or minimum) value.

Let X is (x1, x2, ..., xn) and f (X) is continuous and twice differentiable. We use Taylor expansion as below.

f( ¯x1+ ∆x1,x¯2+ ∆x2, ...,x¯n+ ∆xn) ≃ f( ¯^X^{) +}^Pi^fx^′_i^{( ¯}^X)∆xi+^P_i^P_jf_x^′′_i_x_j( ¯X)∆xi∆xj

f( ¯X) is value which assess function f (X) in ( ¯x1,x¯2, ...,x¯n). Then this approximation formula is differentiated by ∆xi and it’s put with 0.

0 = f_x^′_i( ¯X) +^P_jf_x^′′_i_x_j( ¯X)∆xj

Hessian is

H :=





 f^′′_x²

1^(X) ^{· · · f}

x′′1xn^(X)

... . .. ^... f^′′_x_n_x₁_{(X) · · ·} f_x^′′²

n^(X)







(2)

We represent ¯H which is assesed by ¯X. Then,

∆X = − ¯^H⁻¹^fX^′

Xi+1= Xi_{− ¯}H⁻¹f_X^′ Let set optimal tolerance ǫ, and repeat until ǫ > ∆X.

2.Gauss Newton’s method

Newton’s method require hessian. But often we can’t obtain hessian because the secondary derived function derivation of nonlinear function is very difficult. So we use Gauss Newton’s method.

Suppose Q(x, β) is a non-linear function, and we consider a β which satisfies below equation.

Q(x, β) ≃ 0(A) That β minimize,

J = 2⁻¹^P_i^P_lQ(x, β)

We can’t obtain a explicit solution because Q is non-linear function. Also it is very difficult to get a hessian. Then we use (A).

J is differenciated by β

J_β^′_i =^P_i^P_lQ(x, β)Q^′_β_i(x, β)

J_β^′′_i_β_j =^P_i^P_l(Q_β^′_i(x, β)Q^′_β_j(x, β) + Q(x, β)Q^′′_β_i_β_j(x, β))

If β is near enough true value, we use relation of(A). So we can approximate above equation,

J_β^′′_i_β_j _≃^P_i^P_l(Q^′_β_i(x, β)Q^′_β_j(x, β) Thus gradient and hessian of J are represented as below.

J_β^′

i ⁼

P

i

P

l^{Q(x, β)Q}^′β_i^{(x, β)}

H _≃^P_i^P_l(Q^′_β_i(x, β)Q^′_β_j(x, β)

Above equation were used instead of Newton’s method is called as Gauss Newton’s method.

Some problems of Newton’s method We consider the function as bellow.

f(x) = x³_{− 3x}²+ x + 3

We set a default is x = 1. Then x does not converge and repeat two value.(If you had an interest, please try.) Also calculation value diverge in another case. Thus a cuculation result may not convergent in Newton’s method.

Also Even if result converge on the value, that isn’t always the global extreme value.

(3)

2 Example

Example:1 We try to apply Newton’s method for MLE of binominal distribution.

A coin is thrown 100 times and a table has gone out 52 times. Then what is most suitable for parameter p? First derivative of likelihood function is

m

p ⁻^n−m_1−p^.

Also second derivative of likelihood function is

−_m(1−p)^p²^(1−p)²_+(n−m)p² ² So ¯H⁻¹f_p^′ is

− p(1−p)(m−np) m(1−p)²+(n−m)p²

We set a default is p0= 0.1. Then, computational result are, 0.1−→ 0.19 −→ 0.33 −→ 0.48 −→ 0.52

Also we set a default is p0= 0.8. Then, computational result are, 0.8−→ 0.66 −→ 0.54 −→ 0.52

Example:2

We can use Gauss Newton’s method Non linear regression model. We consider the regression model as below. yt= ft(β) + ut

Then J is,

J = n⁻¹^P_t(yt− ft(β))² We define the X(β)^′ as below.

X(β)^′=







f^′_1β₁(β) _{· · ·} f_nβ1^′ (β) ... . .. ^... f^′_1β_k(β) · · · fnβk^′ ^(β)





 H¯ and f^′(β) are represented as below.

f^′(β) = 2n⁻¹X(β)^′_{(y − f(β)} H¯ = 2n⁻¹X(β)^′X(β) So we can get,

βi+1= βi_{− (2n}⁻¹X(β)^′X(β))⁻¹2n⁻¹X(β)^′_{(y − f(β))}

= βi_{− (X(β)}^′X(β))⁻¹X(β)^′_{(y − f(β))}

(4)

3 Likelihood function of AR(1)

We derive a likelihood function of AR(1). AR(1) is illustrate as below.

Yt= c + ρY_t−1+ ut, ut_{∼ N(0, σ}²), |ρ| < 1 Paramators of AR(1) are θ = (c, ρ, σ². yt is stationary process. So

E[Yt] = c + ρE[Y_t−1] + E[ut]

↔ µ = c + ρµ

↔ µ = _1−ρ^c And variance is

γ0= E[(Yt− µ)²^{] = E[ρ((Y}t−1− µ)ut)²]

= ρ²E[(Y_t−1_{− µ)}²] + E[u²_t]

= ρ²γ0+ σ²

↔ γ⁰⁼ _1−ρ^σ²²

Then, Ytfollows normal distribution because utfollows normal distribution. Density function of Y1 is

fY1(y1; θ) = √ ¹

2πσ²_/(1−ρ²)^exp(

(y1_−c/(1−ρ 2))² 2σ²_/(1−ρ²) ⁾

Next, we consider the y1 conditioning density function of Y2. Then we can consider the y1 is realization value. Thus

Y₂= c + ρy1+ ut

Only utis a stochastic variable. By Normality

Y2∼ N((c + ρy1), σ²) Thus conditional density function of Y2 is

f_Y2_|Y1(y2; y1, θ) =√ ¹ 2πσ²^exp(

(y2_−c−ρy1)² 2σ² ⁾

Thus joint density fY1,Y2(y1, y2; θ) is

fY1,Y2(y1, y2; θ) = fY2_|Y1(y2; y1, θ)fY1(y1; θ) (Conditional probability P (A|B) = ^{P (A∩B)}P (B) ⁾

Also conditional density function of Y3and joint density are f_Y3_|Y2,Y1(y3; y2, y1, θ) = ^√_2πσ¹ 2exp(^(y³^−c−ρy²⁾

2

2σ² ^{) = f}^Y³|Y²^(y³^{; y}²^{, θ)}

fY3,Y2,Y 1(y3, y2, y1; θ) = fY3_|Y2(y3; y2, θ)fY2_|Y1(y2; y1, θ)fY1(y1; θ)

(5)

We can get conditional density function and joint density by same way.So conditional density function of Ytis

f_Y_t_|Y_t−1_,...,Y1(yt; y_t−1, .., y1, θ) = ^√_2πσ¹ 2exp(^(y^t^−c−ρy^t−1⁾

2

2σ² ⁾

= f_Y_t_|Y_t−1(yt; y_t−1, θ)

fYt,Y_t−1,...,Y 1(yt, y_t−1, ..., y1; θ) = fY_t_|Y_t−1(yt; y_t−1, θ)fY_t−1_|Y_t−2(y_t−1; y_t−2, θ), ..., fY1(y1; θ)

= fY1(y1; θ)^Q^T_t=2fYt|Yt−1^(y^t^{; y}t−1^{, θ)}

Thus conditional density function is affected by only last value t − 1. Thus likelihood function L(θ; y) of AR(1) is

L(θ; y) = fYt,Y_t−1,...,Y 1(θ; yt, y_t−1, ..., y1)

= fY1(θ; y1)^Q^T_t=2f_Y_t_|Y_t−1(θ, yt; y_t−1) Then log likelihood function is

l(θ; y) = lnfY1(θ; y1) +^P^T_t=2lnf_Y_t_|Y_t−1(θ, yt; y_t−1) Plug a density function in above equation,

l_{(θ; y) = −}¹₂ln_{2π −}¹₂lnσ²₋^(y¹^{−c/(1−ρ))}

2

2σ²_/(1−ρ²) ⁻^{T −1}2 ^ln^{2π −}^{T −1}2 ^lnσ 2₋^P^T

t=2

(yt−c−ρyt−1⁾²

2σ²

Above equation is log likelihood function of AR(1). Usually we estimate the paramator by opimization procedure because it’s difficult to get analytic solution.

4 Empirical example of MLE

”Testing efficient market hypothesis for the doller-sterling gold standard exchange rate 1809-1906:MLE with double truncation” E.Goldman, 2000, Economic Letters 69 p253-p259

If market player use full information (this aspect called as ”market efficient”.), then prices follows a martingale process:Et[Pt+1|St] = Pt.(St is the information set available up to time t.) To test the efficiency hypothesis under the gold standard system, let ytbe the exchange rate at time t given by

yt= ρy_t−1+ ǫt

The efficient hypothesis is given as the null hypothesis:H0 : ρ = 1 against the alternative hypothesis:H1ρ <1. But The upper and lower limit exist in fluctuation of price under the gold standard system called as ”gold point”. So If exchange rates are bounded by the gold point, then ytfollows

at_{≤ y}t_{≤ b}t

where atand btare the gold point. atand btare estimated another article. Maximum likelihood function under above restriction is

(6)

L=^Qⁿ_t=1

1 σ^φ(

yt−ρyt−¹

σ ⁾

(Φ(^bt−ρyt_σ ⁻¹_)−Φ(^at−ρyt=1_σ ))

where φ and Φ are respectively the probability density function and distribution function of the standardized normal variable. Figure 1 show exchange rate and gold point and table 1 and 2 show estimation result in market of New-York and London.

If bound does not exist, the exchange rate seems stationary. But that seems non-stationary in gold point. The estimator of ρ that ignores truncation is biased to the left, as we expected from Tsurumi(1998).

Above hypothesis can confirmed from Table 1 and 2. Ignoring truncation test reject a market efficiency hypothesis, and price follows stationary process at the 5% points. But with truncation test can’t reject a market efficiency hypothesis.

(7)

Figure 1:

Figure 2:

(8)

Figure 3:

Figure 4: