• 検索結果がありません。

Modern Generalized Gauss-Markov

ドキュメント内 PDF ECONOMETRICS - Keio (ページ 126-130)

Least Squares Regression

Theorem 4.7 Modern Generalized Gauss-Markov

In the linear regression model with i.i.d. sampling, if E£ βe|X¤

= β then var£

βe|X¤

≥¡

X0D1X¢−1

.

The proof of Theorem 4.7 is technically advanced so we leave it to Section 4.26.

The interpretation of Theorem 4.7 is similar to Theorem 4.6 under i.i.d. sampling. Theorem 4.7 shows that the GLS covariance matrix¡

X0D−1X¢1

is the best possible among all unbiased estimators.

4.12 Residuals

What are some properties of the residualsebi=YiXi0βband prediction errorseei=YiXi0βb(i)in the context of the linear regression model?

Recall from (3.24) that we can write the residuals in vector notation as be =Me whereM =InX¡

X0X¢1

X0is the orthogonal projection matrix. Using the properties of conditional expectation E[be|X]=E[Me|X]=ME[e|X]=0

and

var [be|X]=var [Me|X]=Mvar [e|X]M=M D M (4.20) whereDis defined in (4.8).

We can simplify this expression under the assumption of conditional homoskedasticity E£

e2|X¤

=σ2. In this case (4.20) simplifies to

var [be|X]=Mσ2. (4.21)

In particular, for a single observationi we can find the variance ofebiby taking theit hdiagonal element of (4.21). Since theit hdiagonal element ofMis 1−hi ias defined in (3.40) we obtain

var [ebi|X]=E£ eb2i |X¤

=(1−hi i)σ2. (4.22)

As this variance is a function ofhi iand henceXithe residualsebiare heteroskedastic even if the errorsei

are homoskedastic. Notice as well that (4.22) implieseb2i is a biased estimator ofσ2.

Similarly, recall from (3.45) that the prediction errorseei=(1−hi i)−1ebican be written in vector nota- tion asee=Mbe whereMis a diagonal matrix withit hdiagonal element (1−hi i)1. Thusee=MMe.

We can calculate that

E[ee|X]=MME[e|X]=0 and

var [ee|X]=MMvar [e|X]M M=MM D M M which simplifies under homoskedasticity to

var [ee|X]=MM M Mσ2=MM Mσ2. The variance of theit hprediction error is then

var [eei|X]=E£ ee2i |X¤

=(1−hi i)−1(1−hi i) (1−hi i)−1σ2

=(1−hi i)1σ2.

A residual with constant conditional variance can be obtained by rescaling. Thestandardized resid- ualsare

ei=(1−hi i)−1/2ebi, (4.23)

and in vector notation

e

e1, ...,en¢0

=M∗1/2Me. (4.24)

From the above calculations, under homoskedasticity, var£

e|X¤

=M∗1/2M M∗1/2σ2 and

var£ ei|X¤

=E£ e2i |X¤

=σ2

and thus these standardized residuals have the same bias and variance as the original errors when the latter are homoskedastic.

4.13 Estimation of Error Variance

The error varianceσ2=E£ e2¤

can be a parameter of interest even in a heteroskedastic regression or a projection model.σ2measures the variation in the “unexplained” part of the regression. Its method of moments estimator (MME) is the sample average of the squared residuals:

σb2=1 n

Xn i=1

eb2i.

In the linear regression model we can calculate the mean ofσb2. From (3.28) and the properties of the trace operator observe that

σb2= 1

ne0Me=1 ntr¡

e0Me¢

=1 ntr¡

Mee0¢ . Then

σb2|X¤

= 1 ntr¡

Mee0|X¤¢

= 1 ntr¡

M

ee0|X¤¢

= 1

ntr (M D) (4.25)

= 1 n

n

X

i=1

(1−hi i)σ2i.

The final equality holds since the trace is the sum of the diagonal elements ofM D, and sinceDis diago- nal the diagonal elements ofM Dare the product of the diagonal elements ofMandDwhich are 1−hi i

andσ2i, respectively.

Adding the assumption of conditional homoskedasticityE£ e2|X¤

=σ2so thatD=Inσ2, then (4.25) simplifies to

σb2|X¤

= 1 ntr¡

Mσ2¢

=σ2 µnk

n

the final equality by (3.22). This calculation shows thatσb2is biased towards zero. The order of the bias depends onk/n, the ratio of the number of estimated coefficients to the sample size.

Another way to see this is to use (4.22). Note that E£

σb2|X¤

=1 n

n

X

i=1

ebi2|X¤

= 1 n

n

X

i=1

(1−hi i)σ2= µnk

n

σ2

the last equality using Theorem 3.6.

Since the bias takes a scale form a classic method to obtain an unbiased estimator is by rescaling.

Define

s2= 1 nk

n

X

i=1

eb2i. (4.26)

By the above calculationE£ s2|X¤

=σ2andE£ s2¤

=σ2. Hence the estimators2is unbiased forσ2. Con- sequently,s2is known as the “bias-corrected estimator” forσ2and in empirical practices2is the most widely used estimator forσ2.

Interestingly, this is not the only method to construct an unbiased estimator forσ2. An estimator constructed with the standardized residualsei from (4.23) is

σ2=1 n

n

X

i=1

e2i= 1 n

n

X

i=1

(1−hi i)−1eb2i. You can show (see Exercise 4.9) that

σ2|X¤

=σ2 (4.27)

and thusσ2is unbiased forσ2(in the homoskedastic linear regression model).

Whenk/n is small the estimatorsσb2,s2andσ2are likely to be similar to one another. However, if k/nis large thens2andσ2are generally preferred toσb2. Consequently it is best to use one of the bias- corrected variance estimators in applications.

4.14 Mean-Square Forecast Error

One use of an estimated regression is to predict out-of-sample. Consider an out-of-sample realiza- tion (Yn+1,Xn+1) whereXn+1is observed but notYn+1. Given the coefficient estimatorβbthe standard point estimator ofE[Yn+1|Xn+1]=Xn+10 βisYen+1=Xn+10 βb. The forecast error is the difference between the actual valueYn+1and the point forecastYen+1. This is the forecast erroreen+1=Yn+1Yen+1. The mean- squared forecast error (MSFE) is its expected squared value MSFEn =E£

ee2n+1¤

. In the linear regression modeleen+1=en+1Xn+10 ¡

βb−β¢ so MSFEn=E£

en+12 ¤

−2E£

en+1Xn+10 ¡ βb−β¢¤

+Eh Xn+10 ¡

βb−β¢ ¡ βb−β¢0

Xn+1i

. (4.28)

The first term in (4.28) isσ2. The second term in (4.28) is zero sinceen+1Xn+10 is independent ofβb−β and both are mean zero. Using the properties of the trace operator the third term in (4.28) is

tr³ E£

Xn+1Xn+10 ¤ Eh¡

βb−β¢ ¡

βb−β¢0

=tr³ E£

Xn+1Xn+10 ¤ Eh

Eh

¡ βb−β¢ ¡

β−b β¢0

|Xii´

=tr³ E£

Xn+1Xn+10 ¤ Eh

Vβb

=Eh tr³

¡Xn+1Xn+10 ¢ Vβb´i

=Eh

Xn+10 VβbXn+1i

(4.29) where we use the fact thatXn+1is independent ofβb, the definitionVβb=Eh

¡ βb−β¢ ¡

βb−β¢0

|Xi

, and the fact thatXn+1is independent ofVβb. Thus

MSFEn=σ2+Eh

Xn+10 VβbXn+1i .

Under conditional homoskedasticity this simplifies to MSFEn=σ2³

1+Eh Xn+10 ¡

X0X¢−1

Xn+1

.

A simple estimator for the MSFE is obtained by averaging the squared prediction errors (3.46)

σe2=1 n

Xn i=1

ee2i

whereeei=YiXi0βb(−i)=ebi(1−hi i)1. Indeed, we can calculate that E£

σe2¤

=E£ eei2¤

=Eh¡

eiXi0¡

βb(i)β¢¢2i

=σ2+Eh Xi0¡

βb(−i)β¢ ¡

βb(−i)β¢0

Xi

i. By a similar calculation as in (4.29) we find

σe2¤

=σ2+Eh Xi0Vβb

(−i)Xi

i

=MSFEn1.

This is the MSFE based on a sample of sizen−1 rather than sizen. The difference arises because the in-sample prediction errorseei forinare calculated using an effective sample size ofn−1, while the out-of sample prediction erroreen+1is calculated from a sample with the fullnobservations. Unlessnis very small we should expect MSFEn−1(the MSFE based onn−1 observations) to be close to MSFEn(the MSFE based onnobservations). Thusσe2is a reasonable estimator for MSFEn.

ドキュメント内 PDF ECONOMETRICS - Keio (ページ 126-130)