TA session# 10
Jun Sakamoto
June 28,2016
Contents
1
Newton’s Method 1
2
Example 3
3
Likelihood function of AR(1) 4
4
Empirical example of MLE 5
1 Newton’s Method
We often face necessity which solve the non-linear function. Explicit solution may not always be obtained. Thus we use optimization procedure.
1.Newton’s method
Newton’s method is one of the procedure to find the maximum(or minimum) value.
Let X is (x1, x2, ..., xn) and f (X) is continuous and twice differentiable. We use Taylor expansion as below.
f( ¯x1+ ∆x1,x¯2+ ∆x2, ...,x¯n+ ∆xn) ≃ f( ¯X) +Pifx′i( ¯X)∆xi+PiPjfx′′ixj( ¯X)∆xi∆xj
f( ¯X) is value which assess function f (X) in ( ¯x1,x¯2, ...,x¯n). Then this approximation formula is differentiated by ∆xi and it’s put with 0.
0 = fx′i( ¯X) +Pjfx′′ixj( ¯X)∆xj
Hessian is
H :=
f′′x2
1(X) · · · f
x′′1xn(X)
... . .. ... f′′xnx1(X) · · · fx′′2
n(X)
We represent ¯H which is assesed by ¯X. Then,
∆X = − ¯H−1fX′
Xi+1= Xi− ¯H−1fX′ Let set optimal tolerance ǫ, and repeat until ǫ > ∆X.
2.Gauss Newton’s method
Newton’s method require hessian. But often we can’t obtain hessian because the secondary derived function derivation of nonlinear function is very difficult. So we use Gauss Newton’s method.
Suppose Q(x, β) is a non-linear function, and we consider a β which satisfies below equation.
Q(x, β) ≃ 0(A) That β minimize,
J = 2−1PiPlQ(x, β)
We can’t obtain a explicit solution because Q is non-linear function. Also it is very difficult to get a hessian. Then we use (A).
J is differenciated by β
Jβ′i =PiPlQ(x, β)Q′βi(x, β)
Jβ′′iβj =PiPl(Qβ′i(x, β)Q′βj(x, β) + Q(x, β)Q′′βiβj(x, β))
If β is near enough true value, we use relation of(A). So we can approximate above equation,
Jβ′′iβj ≃PiPl(Q′βi(x, β)Q′βj(x, β) Thus gradient and hessian of J are represented as below.
Jβ′
i =
P
i
P
lQ(x, β)Q′βi(x, β)
H ≃PiPl(Q′βi(x, β)Q′βj(x, β)
Above equation were used instead of Newton’s method is called as Gauss Newton’s method.
Some problems of Newton’s method We consider the function as bellow.
f(x) = x3− 3x2+ x + 3
We set a default is x = 1. Then x does not converge and repeat two value.(If you had an interest, please try.) Also calculation value diverge in another case. Thus a cuculation result may not convergent in Newton’s method.
Also Even if result converge on the value, that isn’t always the global extreme value.
2 Example
Example:1 We try to apply Newton’s method for MLE of binominal distribution.
A coin is thrown 100 times and a table has gone out 52 times. Then what is most suitable for parameter p? First derivative of likelihood function is
m
p −n−m1−p.
Also second derivative of likelihood function is
−m(1−p)p2(1−p)2+(n−m)p2 2 So ¯H−1fp′ is
− p(1−p)(m−np) m(1−p)2+(n−m)p2
We set a default is p0= 0.1. Then, computational result are, 0.1−→ 0.19 −→ 0.33 −→ 0.48 −→ 0.52
Also we set a default is p0= 0.8. Then, computational result are, 0.8−→ 0.66 −→ 0.54 −→ 0.52
Example:2
We can use Gauss Newton’s method Non linear regression model. We consider the regression model as below. yt= ft(β) + ut
Then J is,
J = n−1Pt(yt− ft(β))2 We define the X(β)′ as below.
X(β)′=
f′1β1(β) · · · fnβ1′ (β) ... . .. ... f′1βk(β) · · · fnβk′ (β)
H¯ and f′(β) are represented as below.
f′(β) = 2n−1X(β)′(y − f(β) H¯ = 2n−1X(β)′X(β) So we can get,
βi+1= βi− (2n−1X(β)′X(β))−12n−1X(β)′(y − f(β))
= βi− (X(β)′X(β))−1X(β)′(y − f(β))
3 Likelihood function of AR(1)
We derive a likelihood function of AR(1). AR(1) is illustrate as below.
Yt= c + ρYt−1+ ut, ut∼ N(0, σ2), |ρ| < 1 Paramators of AR(1) are θ = (c, ρ, σ2. yt is stationary process. So
E[Yt] = c + ρE[Yt−1] + E[ut]
↔ µ = c + ρµ
↔ µ = 1−ρc And variance is
γ0= E[(Yt− µ)2] = E[ρ((Yt−1− µ)ut)2]
= ρ2E[(Yt−1− µ)2] + E[u2t]
= ρ2γ0+ σ2
↔ γ0= 1−ρσ22
Then, Ytfollows normal distribution because utfollows normal distribution. Density function of Y1 is
fY1(y1; θ) = √ 1
2πσ2/(1−ρ2)exp(
(y1−c/(1−ρ 2))2 2σ2/(1−ρ2) )
Next, we consider the y1 conditioning density function of Y2. Then we can consider the y1 is realization value. Thus
Y2= c + ρy1+ ut
Only utis a stochastic variable. By Normality
Y2∼ N((c + ρy1), σ2) Thus conditional density function of Y2 is
fY2|Y1(y2; y1, θ) =√ 1 2πσ2exp(
(y2−c−ρy1)2 2σ2 )
Thus joint density fY1,Y2(y1, y2; θ) is
fY1,Y2(y1, y2; θ) = fY2|Y1(y2; y1, θ)fY1(y1; θ) (Conditional probability P (A|B) = P (A∩B)P (B) )
Also conditional density function of Y3and joint density are fY3|Y2,Y1(y3; y2, y1, θ) = √2πσ1 2exp((y3−c−ρy2)
2
2σ2 ) = fY3|Y2(y3; y2, θ)
fY3,Y2,Y 1(y3, y2, y1; θ) = fY3|Y2(y3; y2, θ)fY2|Y1(y2; y1, θ)fY1(y1; θ)
We can get conditional density function and joint density by same way.So conditional density function of Ytis
fYt|Yt−1,...,Y1(yt; yt−1, .., y1, θ) = √2πσ1 2exp((yt−c−ρyt−1)
2
2σ2 )
= fYt|Yt−1(yt; yt−1, θ)
fYt,Yt−1,...,Y 1(yt, yt−1, ..., y1; θ) = fYt|Yt−1(yt; yt−1, θ)fYt−1|Yt−2(yt−1; yt−2, θ), ..., fY1(y1; θ)
= fY1(y1; θ)QTt=2fYt|Yt−1(yt; yt−1, θ)
Thus conditional density function is affected by only last value t − 1. Thus likelihood function L(θ; y) of AR(1) is
L(θ; y) = fYt,Yt−1,...,Y 1(θ; yt, yt−1, ..., y1)
= fY1(θ; y1)QTt=2fYt|Yt−1(θ, yt; yt−1) Then log likelihood function is
l(θ; y) = lnfY1(θ; y1) +PTt=2lnfYt|Yt−1(θ, yt; yt−1) Plug a density function in above equation,
l(θ; y) = −12ln2π −12lnσ2−(y1−c/(1−ρ))
2
2σ2/(1−ρ2) −T −12 ln2π −T −12 lnσ 2−PT
t=2
(yt−c−ρyt−1)2
2σ2
Above equation is log likelihood function of AR(1). Usually we estimate the paramator by opimization procedure because it’s difficult to get analytic solution.
4 Empirical example of MLE
”Testing efficient market hypothesis for the doller-sterling gold standard exchange rate 1809-1906:MLE with double truncation” E.Goldman, 2000, Economic Letters 69 p253-p259
If market player use full information (this aspect called as ”market efficient”.), then prices follows a martingale process:Et[Pt+1|St] = Pt.(St is the information set available up to time t.) To test the efficiency hypothesis under the gold standard system, let ytbe the exchange rate at time t given by
yt= ρyt−1+ ǫt
The efficient hypothesis is given as the null hypothesis:H0 : ρ = 1 against the alternative hypothesis:H1ρ <1. But The upper and lower limit exist in fluctuation of price under the gold standard system called as ”gold point”. So If exchange rates are bounded by the gold point, then ytfollows
at≤ yt≤ bt
where atand btare the gold point. atand btare estimated another article. Maximum likelihood function under above restriction is
L=Qnt=1
1 σφ(
yt−ρyt−1
σ )
(Φ(bt−ρytσ −1)−Φ(at−ρyt=1σ ))
where φ and Φ are respectively the probability density function and distribution function of the standardized normal variable. Figure 1 show exchange rate and gold point and table 1 and 2 show estimation result in market of New-York and London.
If bound does not exist, the exchange rate seems stationary. But that seems non-stationary in gold point. The estimator of ρ that ignores truncation is biased to the left, as we expected from Tsurumi(1998).
Above hypothesis can confirmed from Table 1 and 2. Ignoring truncation test reject a market efficiency hy- pothesis, and price follows stationary process at the 5% points. But with truncation test can’t reject a market efficiency hypothesis.
Figure 1:
Figure 2:
Figure 3:
Figure 4: