• 検索結果がありません。

Important Matrix Expressions βb = ¡

ドキュメント内 PDF ECONOMETRICS - Keio (ページ 93-100)

The Algebra of Least Squares

Theorem 3.2 Important Matrix Expressions βb = ¡

X0X¢−1¡ X0Y¢ be=YXβb X0be=0.

Early Use of Matrices

The earliest known treatment of the use of matrix methods to solve simultaneous systems is found in Chapter 8 of the Chinese textThe Nine Chapters on the Mathematical Art, written by sev- eral generations of scholars from the 10t hto 2ndcentury BCE.

3.11 Projection Matrix

Define the matrix

P=X¡ X0X¢1

X0. Observe that

P X=X¡ X0X¢1

X0X=X.

This is a property of a projection matrix. More generally, for any matrix Z which can be written as Z=XΓfor some matrixΓ(we say thatZ lies in therange spaceofX), then

P Z=P XΓ=X¡

X0X¢−1X0XΓ=XΓ=Z.

As an important example, if we partition the matrix X into two matrices X1 and X2 so thatX = [X1 X2] thenP X1=X1. (See Exercise 3.7.)

The projection matrixPhas the algebraic property that it isidempotent:P P=P. See Theorem 3.3.2 below. For the general properties of projection matrices see Section A.11.

The matrixPcreates the fitted values in a least squares regression:

P Y =X¡ X0X¢−1

X0Y =Xβb=Yb. Because of this propertyPis also known as thehat matrix.

A special example of a projection matrix occurs whenX =1nis ann-vector of ones. Then P=1n

¡10n1n

¢−1 10n=1

n1n10n. Note that in this case

P Y =1n¡

10n1n¢−1

10nY =1nY creates ann-vector whose elements are the sample meanY.

The projection matrixP appears frequently in algebraic manipulations in least squares regression.

The matrix has the following important properties.

Theorem 3.3 The projection matrixP=X¡ X0X¢−1

X0for anyn×kX withnkhas the following algebraic properties.

1. Pis symmetric (P0=P).

2. Pis idempotent (P P=P).

3. trP=k.

4. The eigenvalues ofPare 1 and 0. There arekeigenvalues equalling 1 and nkequalling 0.

5. rank (P)=k.

We close this section by proving the claims in Theorem 3.3. Part 1 holds since P0

X¡ X0X¢−1

X0´0

X0¢0³

¡X0X¢−1´0 (X)0

=X³¡

X0X¢0´1

X0

=X³ (X)0¡

X0¢0´1

X0=P. To establish part 2, the fact thatP X=X implies that

P P=P X¡ X0X¢−1

X0=X¡ X0X¢−1

X0=P as claimed. For part 3,

trP=tr³ X¡

X0X¢−1

X0´

=tr³

¡X0X¢−1

X0X´

=tr (Ik)=k.

See Appendix A.5 for definition and properties of the trace operator.

For part 4, it is shown in Appendix A.11 that the eigenvaluesλi of an idempotent matrix are all 1 and 0. Since trP equals the sum of then eigenvalues and trP =k by part 3, it follows that there arek eigenvalues equalling 1 and the remaindernkequalling 0.

For part 5, observe that P is positive semi-definite since its eigenvalues are all non-negative. By Theorem A.4.5 its rank equals the number of positive eigenvalues, which iskas claimed.

3.12 Annihilator Matrix

Define

M=InP=InX¡ X0X¢−1

X0 whereInis then×nidentity matrix. Note that

M X =(InP)X=XP X=XX =0. (3.21) ThusMandX are orthogonal. We callMtheannihilator matrixdue to the property that for any matrix Z in the range space ofX then

M Z=ZP Z=0.

For example,M X1=0 for any subcomponentX1ofX, andM P=0 (see Exercise 3.7).

The annihilator matrixMhas similar properties withP, including thatMis symmetric (M0=M) and idempotent (M M=M). It is thus a projection matrix. Similarly to Theorem 3.3.3 we can calculate

trM=nk. (3.22)

(See Exercise 3.9.) One implication is that the rank ofMisnk.

WhilePcreates fitted values,Mcreates least squares residuals:

M Y =YP Y =YXβb=be. (3.23)

As discussed in the previous section, a special example of a projection matrix occurs whenX =1nis ann-vector of ones, so thatP=1n¡

10n1n¢1

10n. The associated annihilator matrix is M=InP=In1n¡

10n1n¢−1

10n. WhilePcreates a vector of sample means,Mcreates demeaned values:

M Y=Y1nY.

For simplicity we will often write the right-hand-side asYY. Theit helement isYiY, thedemeaned value ofYi.

We can also use (3.23) to write an alternative expression for the residual vector. SubstitutingY = Xβ+eintobe=M Y and usingM X =0we find

be=M Y =M¡

Xβ+e¢

=Me (3.24)

which is free of dependence on the regression coefficientβ.

3.13 Estimation of Error Variance

The error varianceσ2=E£ e2¤

is a moment, so a natural estimator is a moment estimator. Ifeiwere observed we would estimateσ2by

σe2=1 n

n

X

i=1

e2i. (3.25)

However, this is infeasible aseiis not observed. In this case it is common to take a two-step approach to estimation. The residualsebiare calculated in the first step, and then we substituteebiforeiin expression (3.25) to obtain the feasible estimator

σb2=1 n

n

X

i=1

eb2i. (3.26)

In matrix notation, we can write (3.25) and (3.26) asσe2=n1e0eand

σb2=n1be0be. (3.27)

Recall the expressionsbe=M Y =Mefrom (3.23) and (3.24). Applied to (3.27) we find

σb2=n1be0be=n1e0M Me=n1e0Me (3.28) the third equality sinceM M=M.

An interesting implication is that

σe2σb2=n−1e0en−1e0Me=n−1e0Pe≥0.

The final inequality holds becauseP is positive semi-definite ande0Peis a quadratic form. This shows that the feasible estimatorσb2is numerically smaller than the idealized estimator (3.25).

3.14 Analysis of Variance

Another way of writing (3.23) is

Y =P Y+M Y =Yb+be. (3.29)

This decomposition isorthogonal, that is

Yb0be=(P Y)0(M Y)=Y0P M Y =0. (3.30) It follows that

Y0Y =Yb0Yb+2Yb0be+be0be=Yb0Yb+be0be

or n

X

i=1

Yi2=

n

X

i=1

Ybi2+

n

X

i=1

ebi2. SubtractingY from both sides of (3.29) we obtain

Y1nY =Yb−1nY+be.

This decomposition is also orthogonal whenX contains a constant, as

³

Yb−1nY´0

be=Yb0beY10nbe=0 under (3.17). It follows that

³

Y1nY´0³

Y1nY´

Yb−1nY´0³

Yb−1nY´ +be0be

or n

X

i=1

³

YiY´2

=

n

X

i=1

³

YbiY´2

+

n

X

i=1

eb2i.

This is commonly called theanalysis-of-varianceformula for least squares regression.

A commonly reported statistic is thecoefficient of determinationorR-squared:

R2= Pn

i=1

³

YbiY´2

Pn i=1

³YiY´2=1− Pn

i=1eb2i Pn

i=1

³YiY´2.

It is often described as “the fraction of the sample variance ofY which is explained by the least squares fit”.R2is a crude measure of regression fit. We have better measures of fit, but these require a statistical (not just algebraic) analysis and we will return to these issues later. One deficiency withR2is that it in- creases when regressors are added to a regression (see Exercise 3.16) so the “fit” can be always increased by increasing the number of regressors.

The coefficient of determination was introduced by Wright (1921).

3.15 Projections

One way to visualize least squares fitting is as a projection operation.

Write the regressor matrix asX =[X1X2... Xk] whereXj is the jt hcolumn ofX. The range space R(X) ofX is the space consisting of all linear combinations of the columnsX1,X2,...,Xk. R(X) is ak

dimensional surface contained inRn. If k=2 thenR(X) is a plane. The operatorP =X¡ X0X¢−1

X0 projects vectors ontoR(X). The fitted valuesYb=P Y are the projection ofY ontoR(X).

To visualize examine Figure 3.3. This displays the casen=3 andk=2. Displayed are three vectors Y,X1, andX2, which are each elements ofR3. The plane created byX1andX2is the range spaceR(X).

Regression fitted values are linear combinations ofX1andX2and so lie on this plane. The fitted value Yb is the vector on this plane closest toY. The residualbe=YYb is the difference between the two. The angle between the vectorsYb andbeis 90, and therefore they are orthogonal as shown.

X1 X2

X1 X2

Y

X1 X2

e^ Y

X1 X2

e^ Y

X1 X2

e^ Y

X1 X2

e^ Y

Y^

X1 X2

e^ Y

Y^

X1 X2

e^ Y

Y^

X1 X2

e^ Y

Y^

X1

Figure 3.3: Projection ofY ontoX1andX2

3.16 Regression Components

PartitionX =[X1 X2] andβ=(β1,β2). The regression model can be written as

Y =X1β1+X2β2+e. (3.31)

The OLS estimator ofβ=(β01,β02)0is obtained by regression ofY onX =[X1X2] and can be written as Y =Xβb+be=X1βb1+X2βb2+be. (3.32) We are interested in algebraic expressions forβb1andβb2.

Let’s first focus onβb1. The least squares estimator by definition is found by the joint minimization

¡ βb1,βb2¢

=argmin

β1,β2

SSE¡ β1,β2¢

(3.33) where

SSE¡ β1,β2¢

YX1β1X2β2¢0¡

YX1β1X2β2¢ .

An equivalent expression forβb1can be obtained by concentration (nested minimization). The solution (3.33) can be written as

βb1=argmin

β1

µ minβ2

SSE¡ β1,β2¢

. (3.34)

The inner expression minβ2SSE¡ β1,β2¢

minimizes overβ2while holdingβ1 fixed. It is the lowest pos- sible sum of squared errors givenβ1. The outer minimization argminβ1 finds the coefficientβ1which minimizes the “lowest possible sum of squared errors givenβ1”. This means thatβb1as defined in (3.33) and (3.34) are algebraically identical.

Examine the inner minimization problem in (3.34). This is simply the least squares regression of YX1β1onX2. This has solution

argmin

β2

SSE¡ β1,β2¢

X02X2¢−1¡ X02¡

YX1β1¢¢

with residuals

YX1β1X2¡

X02X2¢−1¡ X02¡

YX1β1¢¢

M2YM2X1β1¢

=M2¡

YX1β1

¢ where

M2=InX2¡

X02X2¢−1

X02 (3.35)

is the annihilator matrix forX2. This means that the inner minimization problem (3.34) has minimized value

minβ2

SSE¡ β1,β2¢

YX1β1¢0

M2M2¡

YX1β1¢

YX1β1¢0 M2¡

YX1β1¢

where the second equality holds sinceM2is idempotent. Substituting this into (3.34) we find βb1=argmin

β1

¡YX1β1

¢0 M2¡

YX1β1

¢

X01M2X1¢−1¡

X01M2Y¢ . By a similar argument we find

βb2

X02M1X2¢−1¡

X02M1Y¢ where

M1=InX1¡

X01X1¢1

X01 (3.36)

is the annihilator matrix forX1.

Theorem 3.4 The least squares estimator¡ βb1,βb2¢

for (3.32) has the algebraic solution

βb1

X01M2X1¢−1¡

X01M2Y¢

(3.37) βb2

X02M1X2¢−1¡

X02M1Y¢

(3.38) whereM1andM2are defined in (3.36) and (3.35), respectively.

3.17 Regression Components (Alternative Derivation)*

An alternative proof of Theorem 3.4 uses an algebraic argument based on the population calculations from Section 2.22. Since this is a classic derivation we present it here for completeness.

PartitionQbX X as

QbX X=

Qb11 Qb12 Qb21 Qb22

=

 1 nX01X1

1 nX01X2

1 nX02X1

1 nX02X2

 and similarlyQbX Y as

QbX Y =

Qb1Y Qb2Y

=

 1 nX01Y 1 nX02Y

 .

By the partitioned matrix inversion formula (A.3)

Qb−1X X=

Qb11 Qb12 Qb21 Qb22

−1 def=

Qb11 Qb12 Qb21 Qb22

=

Qb−111·2Qb−111·2Qb12Qb−122

Qb−122·1Qb21Qb−111 Qb−122·1

 (3.39)

whereQb11·2=Qb11Qb12Qb−122Qb21andQb22·1=Qb22Qb21Qb−111Qb12. Thus βb=

µ βb1

βb2

=

"

Qb−111·2Qb−111·2Qb12Qb−122

Qb−122·1Qb21Qb−111 Qb−122·1

Qb1Y Qb2Y

¸

= Ã

Qb11·21 Qb1Y·2 Qb22·11 Qb2Y·1

! . Now

Qb11·2=Qb11Qb12Qb−122Qb21

= 1

nX01X1−1 nX01X2

µ1 nX02X2

−1 1 nX02X1

= 1

nX01M2X1 and

Qb1y·2=Qb1YQb12Qb−122Qb2Y

= 1

nX01Y −1 nX01X2

µ1 nX02X2

1 1 nX02Y

= 1

nX01M2Y. Equation (3.38) follows.

Similarly to the calculation forQb11·2 andQb1Y·2 you can show thatQb2Y·1= 1

nX02M1Y andQb22·1= 1

nX02M1X2. This establishes (3.37). Together, this is Theorem 3.4.

3.18 Residual Regression

As first recognized by Frisch and Waugh (1933) and extended by Lovell (1963), expressions (3.37) and (3.38) can be used to show that the least squares estimatorsβb1andβb2can be found by a two-step regression procedure.

Take (3.38). SinceM1is idempotent,M1=M1M1and thus βb2

X02M1X2¢−1¡

X02M1Y¢

X02M1M1X2¢1¡

X02M1M1Y¢

Xe02Xe2

´1³ Xe02ee1

´ whereXe2=M1X2andee1=M1Y.

Thus the coefficient estimatorβb2is algebraically equal to the least squares regression ofee1onXe2. No- tice that these two areY andX2, respectively, premultiplied byM1. But we know that pre-multiplication byM1creates least squares residuals. Thereforeee1is simply the least squares residual from a regression ofY onX1, and the columns ofXe2are the least squares residuals from the regressions of the columns of X2onX1.

We have proven the following theorem.

ドキュメント内 PDF ECONOMETRICS - Keio (ページ 93-100)