Least Squares Regression
Theorem 4.7 Modern Generalized Gauss-Markov
In the linear regression model with i.i.d. sampling, if E£ βe|X¤
= β then var£
βe|X¤
≥¡
X0D−1X¢−1
.
The proof of Theorem 4.7 is technically advanced so we leave it to Section 4.26.
The interpretation of Theorem 4.7 is similar to Theorem 4.6 under i.i.d. sampling. Theorem 4.7 shows that the GLS covariance matrix¡
X0D−1X¢−1
is the best possible among all unbiased estimators.
4.12 Residuals
What are some properties of the residualsebi=Yi−Xi0βband prediction errorseei=Yi−Xi0βb(−i)in the context of the linear regression model?
Recall from (3.24) that we can write the residuals in vector notation as be =Me whereM =In− X¡
X0X¢−1
X0is the orthogonal projection matrix. Using the properties of conditional expectation E[be|X]=E[Me|X]=ME[e|X]=0
and
var [be|X]=var [Me|X]=Mvar [e|X]M=M D M (4.20) whereDis defined in (4.8).
We can simplify this expression under the assumption of conditional homoskedasticity E£
e2|X¤
=σ2. In this case (4.20) simplifies to
var [be|X]=Mσ2. (4.21)
In particular, for a single observationi we can find the variance ofebiby taking theit hdiagonal element of (4.21). Since theit hdiagonal element ofMis 1−hi ias defined in (3.40) we obtain
var [ebi|X]=E£ eb2i |X¤
=(1−hi i)σ2. (4.22)
As this variance is a function ofhi iand henceXithe residualsebiare heteroskedastic even if the errorsei
are homoskedastic. Notice as well that (4.22) implieseb2i is a biased estimator ofσ2.
Similarly, recall from (3.45) that the prediction errorseei=(1−hi i)−1ebican be written in vector nota- tion asee=M∗be whereM∗is a diagonal matrix withit hdiagonal element (1−hi i)−1. Thusee=M∗Me.
We can calculate that
E[ee|X]=M∗ME[e|X]=0 and
var [ee|X]=M∗Mvar [e|X]M M∗=M∗M D M M∗ which simplifies under homoskedasticity to
var [ee|X]=M∗M M M∗σ2=M∗M M∗σ2. The variance of theit hprediction error is then
var [eei|X]=E£ ee2i |X¤
=(1−hi i)−1(1−hi i) (1−hi i)−1σ2
=(1−hi i)−1σ2.
A residual with constant conditional variance can be obtained by rescaling. Thestandardized resid- ualsare
ei=(1−hi i)−1/2ebi, (4.23)
and in vector notation
e=¡
e1, ...,en¢0
=M∗1/2Me. (4.24)
From the above calculations, under homoskedasticity, var£
e|X¤
=M∗1/2M M∗1/2σ2 and
var£ ei|X¤
=E£ e2i |X¤
=σ2
and thus these standardized residuals have the same bias and variance as the original errors when the latter are homoskedastic.
4.13 Estimation of Error Variance
The error varianceσ2=E£ e2¤
can be a parameter of interest even in a heteroskedastic regression or a projection model.σ2measures the variation in the “unexplained” part of the regression. Its method of moments estimator (MME) is the sample average of the squared residuals:
σb2=1 n
Xn i=1
eb2i.
In the linear regression model we can calculate the mean ofσb2. From (3.28) and the properties of the trace operator observe that
σb2= 1
ne0Me=1 ntr¡
e0Me¢
=1 ntr¡
Mee0¢ . Then
E£ σb2|X¤
= 1 ntr¡
E£
Mee0|X¤¢
= 1 ntr¡
ME£
ee0|X¤¢
= 1
ntr (M D) (4.25)
= 1 n
n
X
i=1
(1−hi i)σ2i.
The final equality holds since the trace is the sum of the diagonal elements ofM D, and sinceDis diago- nal the diagonal elements ofM Dare the product of the diagonal elements ofMandDwhich are 1−hi i
andσ2i, respectively.
Adding the assumption of conditional homoskedasticityE£ e2|X¤
=σ2so thatD=Inσ2, then (4.25) simplifies to
E£ σb2|X¤
= 1 ntr¡
Mσ2¢
=σ2 µn−k
n
¶
the final equality by (3.22). This calculation shows thatσb2is biased towards zero. The order of the bias depends onk/n, the ratio of the number of estimated coefficients to the sample size.
Another way to see this is to use (4.22). Note that E£
σb2|X¤
=1 n
n
X
i=1
E£ ebi2|X¤
= 1 n
n
X
i=1
(1−hi i)σ2= µn−k
n
¶ σ2
the last equality using Theorem 3.6.
Since the bias takes a scale form a classic method to obtain an unbiased estimator is by rescaling.
Define
s2= 1 n−k
n
X
i=1
eb2i. (4.26)
By the above calculationE£ s2|X¤
=σ2andE£ s2¤
=σ2. Hence the estimators2is unbiased forσ2. Con- sequently,s2is known as the “bias-corrected estimator” forσ2and in empirical practices2is the most widely used estimator forσ2.
Interestingly, this is not the only method to construct an unbiased estimator forσ2. An estimator constructed with the standardized residualsei from (4.23) is
σ2=1 n
n
X
i=1
e2i= 1 n
n
X
i=1
(1−hi i)−1eb2i. You can show (see Exercise 4.9) that
E£ σ2|X¤
=σ2 (4.27)
and thusσ2is unbiased forσ2(in the homoskedastic linear regression model).
Whenk/n is small the estimatorsσb2,s2andσ2are likely to be similar to one another. However, if k/nis large thens2andσ2are generally preferred toσb2. Consequently it is best to use one of the bias- corrected variance estimators in applications.
4.14 Mean-Square Forecast Error
One use of an estimated regression is to predict out-of-sample. Consider an out-of-sample realiza- tion (Yn+1,Xn+1) whereXn+1is observed but notYn+1. Given the coefficient estimatorβbthe standard point estimator ofE[Yn+1|Xn+1]=Xn+10 βisYen+1=Xn+10 βb. The forecast error is the difference between the actual valueYn+1and the point forecastYen+1. This is the forecast erroreen+1=Yn+1−Yen+1. The mean- squared forecast error (MSFE) is its expected squared value MSFEn =E£
ee2n+1¤
. In the linear regression modeleen+1=en+1−Xn+10 ¡
βb−β¢ so MSFEn=E£
en+12 ¤
−2E£
en+1Xn+10 ¡ βb−β¢¤
+Eh Xn+10 ¡
βb−β¢ ¡ βb−β¢0
Xn+1i
. (4.28)
The first term in (4.28) isσ2. The second term in (4.28) is zero sinceen+1Xn+10 is independent ofβb−β and both are mean zero. Using the properties of the trace operator the third term in (4.28) is
tr³ E£
Xn+1Xn+10 ¤ Eh¡
βb−β¢ ¡
βb−β¢0i´
=tr³ E£
Xn+1Xn+10 ¤ Eh
Eh
¡ βb−β¢ ¡
β−b β¢0
|Xii´
=tr³ E£
Xn+1Xn+10 ¤ Eh
Vβbi´
=Eh tr³
¡Xn+1Xn+10 ¢ Vβb´i
=Eh
Xn+10 VβbXn+1i
(4.29) where we use the fact thatXn+1is independent ofβb, the definitionVβb=Eh
¡ βb−β¢ ¡
βb−β¢0
|Xi
, and the fact thatXn+1is independent ofVβb. Thus
MSFEn=σ2+Eh
Xn+10 VβbXn+1i .
Under conditional homoskedasticity this simplifies to MSFEn=σ2³
1+Eh Xn+10 ¡
X0X¢−1
Xn+1i´
.
A simple estimator for the MSFE is obtained by averaging the squared prediction errors (3.46)
σe2=1 n
Xn i=1
ee2i
whereeei=Yi−Xi0βb(−i)=ebi(1−hi i)−1. Indeed, we can calculate that E£
σe2¤
=E£ eei2¤
=Eh¡
ei−Xi0¡
βb(−i)−β¢¢2i
=σ2+Eh Xi0¡
βb(−i)−β¢ ¡
βb(−i)−β¢0
Xi
i. By a similar calculation as in (4.29) we find
E£ σe2¤
=σ2+Eh Xi0Vβb
(−i)Xi
i
=MSFEn−1.
This is the MSFE based on a sample of sizen−1 rather than sizen. The difference arises because the in-sample prediction errorseei fori≤nare calculated using an effective sample size ofn−1, while the out-of sample prediction erroreen+1is calculated from a sample with the fullnobservations. Unlessnis very small we should expect MSFEn−1(the MSFE based onn−1 observations) to be close to MSFEn(the MSFE based onnobservations). Thusσe2is a reasonable estimator for MSFEn.