Modern Generalized Gauss-Markov - Least Squares Regression

Least Squares Regression

Theorem 4.7 Modern Generalized Gauss-Markov

In the linear regression model with i.i.d. sampling, if E£ βe|X¤

= β then var£

βe|X¤

≥¡

X⁰D⁻¹X¢−1

The proof of Theorem 4.7 is technically advanced so we leave it to Section 4.26.

The interpretation of Theorem 4.7 is similar to Theorem 4.6 under i.i.d. sampling. Theorem 4.7 shows that the GLS covariance matrix¡

X⁰D⁻¹X¢₋1

is the best possible among all unbiased estimators.

4.12 Residuals

What are some properties of the residualsebi=Yi−X_i⁰βband prediction errorseei=Yi−X_i⁰βb(−i)in the context of the linear regression model?

Recall from (3.24) that we can write the residuals in vector notation as be =Me whereM =In− X¡

X⁰X¢₋1

X⁰is the orthogonal projection matrix. Using the properties of conditional expectation E[be|X]=E[Me|X]=ME[e|X]=0

and

var [be|X]=var [Me|X]=Mvar [e|X]M=M D M (4.20) whereDis defined in (4.8).

We can simplify this expression under the assumption of conditional homoskedasticity E£

e²|X¤

=σ². In this case (4.20) simplifies to

var [be|X]=Mσ². (4.21)

In particular, for a single observationi we can find the variance ofebiby taking thei^{t h}diagonal element of (4.21). Since thei^{t h}diagonal element ofMis 1−h_{i i}as defined in (3.40) we obtain

var [ebi|X]=E£ eb²_i |X¤

=(1−hi i)σ². (4.22)

As this variance is a function ofhi iand henceXithe residualsebiare heteroskedastic even if the errorsei

are homoskedastic. Notice as well that (4.22) implieseb²_i is a biased estimator ofσ².

Similarly, recall from (3.45) that the prediction errorseei=(1−hi i)⁻¹ebican be written in vector nota- tion asee=M^∗be whereM^∗is a diagonal matrix withi^{t h}diagonal element (1−hi i)⁻¹. Thusee=M^∗Me.

We can calculate that

E[ee|X]=M^∗ME[e|X]=0 and

var [ee|X]=M^∗Mvar [e|X]M M^∗=M^∗M D M M^∗ which simplifies under homoskedasticity to

var [ee|X]=M^∗M M M^∗σ²=M^∗M M^∗σ². The variance of thei^{t h}prediction error is then

var [eei|X]=E£ ee²_i |X¤

=(1−hi i)⁻¹(1−hi i) (1−hi i)⁻¹σ²

=(1−hi i)⁻¹σ².

A residual with constant conditional variance can be obtained by rescaling. Thestandardized resid- ualsare

ei=(1−hi i)^−1/2ebi, (4.23)

and in vector notation

e=¡

e1, ...,en¢₀

=M^∗1/2Me. (4.24)

From the above calculations, under homoskedasticity, var£

e|X¤

=M^∗1/2M M^∗1/2σ² and

var£ ei|X¤

=E£ e²_i |X¤

=σ²

and thus these standardized residuals have the same bias and variance as the original errors when the latter are homoskedastic.

4.13 Estimation of Error Variance

The error varianceσ²=E£ e²¤

can be a parameter of interest even in a heteroskedastic regression or a projection model.σ²measures the variation in the “unexplained” part of the regression. Its method of moments estimator (MME) is the sample average of the squared residuals:

σb²=1 n

Xn i=1

eb²_i.

In the linear regression model we can calculate the mean ofσb². From (3.28) and the properties of the trace operator observe that

σb²= 1

ne⁰Me=1 ntr¡

e⁰Me¢

=1 ntr¡

Mee⁰¢ . Then

E£ σb²|X¤

= 1 ntr¡

E£

Mee⁰|X¤¢

= 1 ntr¡

ME£

ee⁰|X¤¢

= 1

ntr (M D) (4.25)

= 1 n

i=1

(1−hi i)σ²i.

The final equality holds since the trace is the sum of the diagonal elements ofM D, and sinceDis diago- nal the diagonal elements ofM Dare the product of the diagonal elements ofMandDwhich are 1−hi i

andσ²_i, respectively.

Adding the assumption of conditional homoskedasticityE£ e²|X¤

=σ²so thatD=I_nσ², then (4.25) simplifies to

E£ σb²|X¤

= 1 ntr¡

Mσ²¢

=σ² µn−k

the final equality by (3.22). This calculation shows thatσb²is biased towards zero. The order of the bias depends onk/n, the ratio of the number of estimated coefficients to the sample size.

Another way to see this is to use (4.22). Note that E£

σb²|X¤

=1 n

i=1

E£ eb_i²|X¤

= 1 n

i=1

(1−hi i)σ²= µn−k

¶ σ²

the last equality using Theorem 3.6.

Since the bias takes a scale form a classic method to obtain an unbiased estimator is by rescaling.

Define

s²= 1 n−k

i=1

eb²_i. (4.26)

By the above calculationE£ s²|X¤

=σ²andE£ s²¤

=σ². Hence the estimators²is unbiased forσ². Con- sequently,s²is known as the “bias-corrected estimator” forσ²and in empirical practices²is the most widely used estimator forσ².

Interestingly, this is not the only method to construct an unbiased estimator forσ². An estimator constructed with the standardized residualsei from (4.23) is

σ²=1 n

i=1

e²_i= 1 n

i=1

(1−hi i)⁻¹eb²_i. You can show (see Exercise 4.9) that

E£ σ²|X¤

=σ² (4.27)

and thusσ²is unbiased forσ²(in the homoskedastic linear regression model).

Whenk/n is small the estimatorsσb²,s²andσ²are likely to be similar to one another. However, if k/nis large thens²andσ²are generally preferred toσb². Consequently it is best to use one of the bias- corrected variance estimators in applications.

4.14 Mean-Square Forecast Error

One use of an estimated regression is to predict out-of-sample. Consider an out-of-sample realiza- tion (Y_n+1,X_n+1) whereX_n+1is observed but notY_n+1. Given the coefficient estimatorβbthe standard point estimator ofE[Yn+1|Xn+1]=X_n+1⁰ βisYen+1=X_n+1⁰ βb. The forecast error is the difference between the actual valueYn+1and the point forecastYen+1. This is the forecast erroreen+1=Yn+1−Yen+1. The mean- squared forecast error (MSFE) is its expected squared value MSFEn =E£

ee²_n₊₁¤

. In the linear regression modelee_n+1=e_n+1−X_n+1⁰ ¡

βb−β¢ so MSFEn=E£

e_n+1² ¤

−2E£

e_n+1X_n+1⁰ ¡ βb−β¢¤

+Eh X_n+1⁰ ¡

βb−β¢ ¡ βb−β¢0

X_n+1i

. (4.28)

The first term in (4.28) isσ². The second term in (4.28) is zero sincee_n+1X_n+1⁰ is independent ofβb−β and both are mean zero. Using the properties of the trace operator the third term in (4.28) is

tr³ E£

X_n+1X_n+1⁰ ¤ Eh¡

βb−β¢ ¡

βb−β¢0i´

=tr³ E£

X_n+1X_n+1⁰ ¤ Eh

¡ βb−β¢ ¡

β−b β¢0

|Xii´

=tr³ E£

X_n+1X_n+1⁰ ¤ Eh

V_β_bi´

=Eh tr³

¡X_n+1X_n+1⁰ ¢ V_β_b´i

=Eh

X_n+1⁰ V_β_bX_n+1i

(4.29) where we use the fact thatX_n+1is independent ofβb, the definitionV_β_b=Eh

¡ βb−β¢ ¡

βb−β¢0

|Xi

, and the fact thatX_n+1is independent ofV_β_b. Thus

MSFEn=σ²+Eh

X_n+1⁰ V_β_bX_n+1i .

Under conditional homoskedasticity this simplifies to MSFEn=σ²³

1+Eh X_n+1⁰ ¡

X⁰X¢−1

X_n+1i´

A simple estimator for the MSFE is obtained by averaging the squared prediction errors (3.46)

σe²=1 n

Xn i=1

ee²_i

whereeei=Yi−X_i⁰βb(−i)=ebi(1−hi i)⁻¹. Indeed, we can calculate that E£

σe²¤

=E£ ee_i²¤

=Eh¡

ei−X_i⁰¡

βb(−i)−β¢¢2i

=σ²+Eh X_i⁰¡

βb(−i)−β¢ ¡

βb(−i)−β¢0

i. By a similar calculation as in (4.29) we find

E£ σe²¤

=σ²+Eh X_i⁰V_β_b

(−i)Xi

=MSFEn−1.

This is the MSFE based on a sample of sizen−1 rather than sizen. The difference arises because the in-sample prediction errorseei fori≤nare calculated using an effective sample size ofn−1, while the out-of sample prediction erroreen+1is calculated from a sample with the fullnobservations. Unlessnis very small we should expect MSFE_n−1(the MSFE based onn−1 observations) to be close to MSFEn(the MSFE based onnobservations). Thusσe²is a reasonable estimator for MSFEn.

ドキュメント内 PDF ECONOMETRICS - Keio (ページ 126-130)