• 検索結果がありません。

TA7 最近の更新履歴 Econometrics Ⅰ 2016 TA session

N/A
N/A
Protected

Academic year: 2018

シェア "TA7 最近の更新履歴 Econometrics Ⅰ 2016 TA session"

Copied!
9
0
0

読み込み中.... (全文を見る)

全文

(1)

TA session note#7

Shouto Yonekura

June 2, 2016

Contents

1 The coefficient of determination 1

2 Quick review of the statistical hypothesis testing 2

3 t-test 2

4 F-test 3

5 Some examples 5

1 The coefficient of determination

Let ˆy := X ˆβ. We call this a fitted value. In order to analyze goodness of fit, we define some mesure of variability of the dependent variable. One mesure of it is the sum of squares,Piyi2= yy. We have the following decomposition of yy:

yy = (ˆy + e)(ˆy + e) , (e := y − ˆy)

= ˆyy + ˆˆ ye + ey + eˆ e

= ˆyy + eˆ e , (XXβ = Xy)

Deviding both side of above euation, we can get the coefficient of determination R2as follows:

yy yy =

ˆ yyˆ yy +

ee yy

R2:= 1 −yeey.

Since both ˆyy and eˆ e are nonnegative, 0 ≤ R2≤ 1. Since the term yeey shows the part which could not be explained by the model, R2 could be interpreted as goodness of fit. However, R2 has a tricky problem. To understand this problem, first we calculate ˆyy as follows:ˆ

ˆ

yy = (X ˆˆ β)(X ˆβ)

= ˆβXX ˆβ

= yX(XX)−1XX(XX)−1Xy

= yX(XX)−1Xy.

(2)

Next we take expectation of both side:

E[ˆyy] = trace[X(Xˆ X)−1XE[yy]]

= kσ2+ β(XX)β

= kσ2+ yy.

This equation means that ˆyy is a biased estimator of yˆ y and its bias is kσ2. Obviously, if k goes to large, then R2 also goes to large in the sense of the mean. Therefore, the bigger model is always good in terms of R2and this is clearly ridiculous. Thus we often use the adjusted coefficient of determination:

R2adj := 1 − (1 − R2)n−kn−1.

2 Quick review of the statistical hypothesis testing

Let X := (X1, · · · Xn) be a random vector whose distribution is PθX, θ ∈ Θ,where θ is the parametor and Θ is the parametor space. Let x := (x1, · · · xn) be a relized value vector(data) of X. Then we call the procedure which determines, based on data, whether the distribution is PθX, θ ∈ Θ0 or not, that is PθX, θ ∈ Θ1 = Θ − Θ0, the statistical hypothesis testing where Θ0 ⊂ Θ. We call Θ0 null hypothesis and Θ1 alternative hypothesis. Thus our problem could be described as follows:

N ull hypothesis H0: θ ∈ Θ0vs. Alternative hypothesis H1: θ ∈ Θ1.

Usually,the statistical hypothesis testing is formulated the following. First we divide the sample space B(R) into two parts B(R) = C ∪ Cc. Next if x ∈ C, then we reject H0and vice versa. We call this C rejection region and Cc acceptance region.

There are two possible mistakes when we do the statistical hypothesis testing.One is that in spite of H0 is true, we reject this. We call this tyepe of mistake type 1 error. The other is that in spite of H1 is true, we accept H0. We call this tyepe of mistake type 2 error.

Generally, we cannot simultaneously minimize the probability of type 1 error and type 2 error. Traditionally, we first set a value on tyep 1 error and try to keep its probability below α. We call this α the level of siginificance. After we decide α, then we seek a test which minimizes the probability of type 2 error. The Neyman–Pearson lemma provides such a test (see Yong and Smith(2010)).

3 t-test

prop7.1

Suppose that assumptions A1-A6 are satisfied. Under the null hypothesis H0: βi= ¯β1, ti:= √ βˆiβ¯i

s2(XX)−1ii

∼ t(n − k),

where ¯βi is hypothesized value and s2:=n−kee.

(3)

The proof has been already provided in #5.

The procedure of t-test

step1 Under the the null hypothesis H0: βi= ¯β1, we calculate t- statistic ti provided in prop 7.1.

step2 Set α (usually α = 0.05). Using a statistical table, we find the rejection region of tα/2(n − k) (here we assume the H1: βi 6= ¯βi). Since the t-distribution is symmetric, we can see that:

P rob −tα/2(n − k) < ti< tα/2(n − k) = 1 − α.

That is, if α = 0.05 and H1is true, the probability which covers the t-statisitc calculacated in step 1 is 95%. step3 If | ti|> tα/2(n − k), then we reject H0under the level of significance α.

4 F-test

Consider following plural linear null hypothesis:

H0: Rβ = r,

where rankr=#r, R is #r × k matrix and its rank is #r. Let J := R ˆβ = r, where ˆβ is the OLSE. Under the null hypothesis H0: Rβ = r, we can rewrite J as J = R( ˆβ − β). In this case, E[R ˆβ] = Rβ, and :

V [R ˆβ] = E[(R ˆβ − Rβ)(R ˆβ − Rβ)]

= RE[( ˆβ − β)( ˆβ − β))]R

= RV [ ˆβ]R

= Rσ2(XX)−1R. Thus, we can get following proposition:

prop7.2

Suppose that assumptions A1-A6 are satisfied. Under the null hypothesis H0: Rβ = r, F :=(R ˆβ−r)

[R(XX)−1R]−1(R ˆβ−r)/#r

s2 ∼ F (#r, n − k),

where s2:= ee/n − k and e := y − X ˆβ.

The proof has been already provided in #5(prop5.12).

(4)

Next, we consider following the OLS problem with restrictions: min (y − Xβ)(y − Xβ)

s.t. Rβ = r,

we call this problem the restricted OLS. The restricted OLSE ˆβrcould be calculated by using the method of Lagrange multiplier:

L = (y − Xβ)(y − Xβ) − 2λ(Rβ − r), where λis Lagrange multiplier(#r×1 vector). Then ˆβr is given by these:

∂β = −2Xy + 2XXβ − 2Rλ = 0,

∂λ = −2(Rβ − r) = 0. From these, we can obtain

βˆr= (XX)−1Xy + (XX)−1Rλ

= ˆβ + (XX)−1Rλˆ

r = R ˆβ + R(XX)−1Rλ , (r = R ˆˆ βR), where ˆβ is the OLSE. Therefore, we can finally get:

βˆR= ˆβ + (XX)−1R[R(XX)−1R]−1(r − R ˆβ), λ = [R(Xˆ X)−1R]−1(r − R ˆβ).

Let SSRu be the unrestricted sum of residuals and SSRrbe the restricted sum of residuals, that is: SSRu= (y − X ˆβ)(y − X ˆβ),

SSRr= (y − X ˆβr)(y − X ˆβr). Then, we get following following proposition:

prop7.3

Suppose that assumptions A1-A6 are satisfied. Under the null hypothesis H0: Rβ = r,

(SSRr−SSRu)/#r

SSRu/(n−k) ∼ F (#r, n − k). Proof

Since SSRu/(n−k) = s2is obvious, we only need to show that SSRr−SSRu= (R ˆβ −r)[R(XX)−1R](R ˆβ −r). Let e := (y − X ˆβ) and er:= (y − X ˆβr). Then we can write:

er= (y − X ˆβr)

= y − X ˆβ + X( ˆβ − ˆβr).

(5)

Thus, we can get:

SSRr= erer

= [(y − X ˆβ) + X( ˆβ − ˆβr)]](y − X ˆβ) + X( ˆβ − ˆβr)]

= (y − X ˆβ)(y − X ˆβ) + (y − X ˆβ)X( ˆβ − ˆβr) + ( ˆβ − ˆβr)Xy − X ˆβ) + ( ˆβ − ˆβr)XX( ˆβ − ˆβr)

= (y − X ˆβ)(y − X ˆβ) + ( ˆβ − ˆβr)XX( ˆβ − ˆβr) , (X(y − X ˆβ) = 0). Since, SSRu= (y − X ˆβ)(y − X ˆβ), we can see that:

SSRu− SSRr= ( ˆβ − ˆβr)XX( ˆβ − ˆβr). Next we can calculate ˆβ − ˆβr as follows:

β − ˆˆ βr= ˆβ − ( ˆβ + (XX)−1R[R(XX)−1R]−1(r − R ˆβ))

= (XX)−1R[R(XX)−1R]−1(R ˆβ − r). From these results, we can finally obtian:

( ˆβ − ˆβr)XX( ˆβ − ˆβr) = ((XX)−1R[R(XX)−1R]−1(R ˆβ − r))XX(XX)−1R[R(XX)−1R]−1(R ˆβ − r)

= (R ˆβ − r)[R(XX)−1R]−1R(XX)−1R[R(XX)−1R]−1(R ˆβ − r)

= (R ˆβ − r)[R(XX)−1R]−1(R ˆβ − r) Q.E.D.

The procedure of F-test

Step1 Under the null hypothesis H0: Rβ = r, calculated F-statistics provided in prop7.2 or prop7.3.

Step2 Set α (usually α = 0.05). Using a statistical table, we find the rejection region of Fα(#r, n − k). This means that:

P rob(Fn−k#r ≥ Fα(#r, n − k)) = α,

where Fn−k# is calculated F-statistics in step1. That is, if the null hypothesis H0is true, the probability that Fn−k# will be bigger than Fα(#r, n−k) is 95%. Remark that the F-distribution is not symmetric Fα(#r,n−k)1 = F1−α(n−k, #r) holds.

Step3 If Fn−k#r > Fα(#r, n − k), then we reject H0 under the level of significance α.

5 Some examples

Example1

Suppose that we want to estimate following money deman function:

lnMPtt = β0+ β1lnGDPt+ β2RAT E + ut t = 1980q1 ∼ 2007q4,

where Mtis monetary base, Ptis GDP deflator , GDPtis real GDP, RAT Etis the monthly average of call rate(O/N). In this case, sample size n is 112 and # of parametors k is 3. The result is given below:

(6)

Now we consider following hypothesis:

H0: β1= 0, H1: β16= 0.

In this case, the distribution of t-statisics is t(109). Next we calculate t1as follows: t1= Coef f icient1

Standard error1

1.610.05 = 32.2.

Since t(109)0.05/2 is approximately 1.97 and | t1 |> 1.97, we can reject H0. Similarly, we consider following hypothesis:

H0: β2= 0, H1: β26= 0.

We can calculate t2as follows:

t2= −0.0120.003 = −4.

Thus we can also reject H0(| t2|> 1.97). Example2

In macroeconomics, we often assume the production function is homogeneous of degree 1. The Cobb-Douglas function:

Yt= AtKt1−aLat

satisfies this assumption and is widely used. Where Yt is nominal GDP, Kt is the capital stock, Ltis # of labor, and At is the TFP.

Using F-test, we can test whether the production function is homogeneous of degree 1 or not. First, we have to estimate follwoing linear regression model:

lnYt= β1+ β2lnK + β3lnLt+ ututiidN (0, σ2). (1)

(7)

Homogeneous of degree 1 implies:

β2+ β3= 1 Thus, our problem is given by these:

H0: β2+ β3= 1 H1: notH0. Under null null hypotehis H0, we can rewrite as follows:

lnYt= β1+ β2lnK + (1 − β2)lnLt+ ut

←→ lnYt− lnLt= β1+ β2(lnKt− lnLt) + ut

←→ lnyt= β1+ β2lnkt+ ut , (yt:= Yt/Lt, kt:= Kt/Lt). (2)

I use extended annual data of Hayashi and Prescott(2002 RED) which covers from 1980 to 2009(named cobb.csv). ,

Code of R

Result of unrestricted case is given below:

(8)

That is;

ln ˆYt= −1.7623

(0.8183) + 0.3998lnKt (0.03395)

+ 1.4754lnLt (0.2264)

where () means the standar error of each coefficient. SSRuis given by this SSRu= 0.09887566. Next, Result of restricted case is given below:

(9)

That is:

ˆ

yt= 1.383208

(0.036317)+ 0.548995lnkt (0.008443)

.

SSRr is given by this:

SSRr= 0.1287382. From these result, we can calculate F-statistic as follows:

F (#r, n − k) = (SSRSSRr−SSRu/(n−k)u)/#r

=(0.1287382−0.09887566)/1 0.09887566/49

≃ 14.8.

Since F0.05(1, 49) ≃ 4.04 < 14.8, we reject H0 under the level of significance 0.05.

参照

関連したドキュメント

T. In this paper we consider one-dimensional two-phase Stefan problems for a class of parabolic equations with nonlinear heat source terms and with nonlinear flux conditions on the

Now it makes sense to ask if the curve x(s) has a tangent at the limit point x 0 ; this is exactly the formulation of the gradient conjecture in the Riemannian case.. By the

Keywords: continuous time random walk, Brownian motion, collision time, skew Young tableaux, tandem queue.. AMS 2000 Subject Classification: Primary:

Analogs of this theorem were proved by Roitberg for nonregular elliptic boundary- value problems and for general elliptic systems of differential equations, the mod- ified scale of

Section 4 will be devoted to approximation results which allow us to overcome the difficulties which arise on time derivatives while in Section 5, we look at, as an application of

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

Definition An embeddable tiled surface is a tiled surface which is actually achieved as the graph of singular leaves of some embedded orientable surface with closed braid

This paper gives a decomposition of the characteristic polynomial of the adjacency matrix of the tree T (d, k, r) , obtained by attaching copies of B(d, k) to the vertices of