Normal Regression - PDF ECONOMETRICS

−4 −2 0 2 4

0.00.10.20.30.4

−4 −3 −2 −1 0 1 2 3 4

(a) Normal Density

0.00.20.40.60.81.0

−4 −3 −2 −1 0 1 2 3 4

(b) Normal Distribution Figure 5.1: Standard Normal Density and Distribution

Theorem 5.1 IfZ ∼N (0, 1) then

1. All integer moments ofZ are finite.

2. All odd moments ofZequal 0.

3. For any positive integerm E£

Z^2m¤

=(2m−1)!!=(2m−1)×(2m−3)× · · · ×1.

4. For anyr>0

E|Z|^r=2^r/2 pπΓ

µr+1 2

whereΓ(t)=R_∞

0 u^t−1e^−ud uis the gamma function.

If Z ∼N (0, 1) andX =µ+σZ forµ∈Randσ≥0 thenX has theunivariate normal distribution, writtenX∼N¡

µ,σ²¢

. By change-of-variablesXhas the density f(x)= 1

p2πσ²exp Ã

−

¡x−µ¢2

2σ²

, −∞ <x< ∞. The mean and variance ofX areµandσ², respectively.

The normal distribution and its relatives (the chi-square, student t, F, non-central chi-square and F) are frequently used for inference to calculate critical values and p-values. This involves evaluating the normal cdfΦ(x) and its inverse. Since the cdfΦ(x) is not available in closed form statistical textbooks have traditionally provided tables for this purpose. Such tables are not used currently as these calcula- tions are embedded in modern statistical software. For convenience, we list the appropriate commands

in MATLAB, R, and Stata to compute the cumulative distribution function of commonly used statistical distributions.

Numerical Cumulative Distribution Function To calculateP(Z ≤x) for givenx

MATLAB R Stata

N (0, 1) normcdf(x) pnorm(x) normal(x)

χ²r chi2cdf(x,r) pchisq(x,r) chi2(r,x) tr tcdf(x,r) pt(x,r) 1-ttail(r,x) Fr,k fcdf(x,r,k) pf(x,r,k) F(r,k,x) χ²r(d) ncx2cdf(x,r,d) pchisq(x,r,d) nchi2(r,d,x) F_r,k(d) ncfcdf(x,r,k,d) pf(x,r,k,d) 1-nFtail(r,k,d,x)

Here we list the appropriate commands to compute the inverse probabilities (quantiles) of the same distributions.

Numerical Quantile Function

To calculatexwhich solvesp=P(Z≤x) for givenp

MATLAB R Stata

N (0, 1) norminv(p) qnorm(p) invnormal(p)

χ²r chi2inv(p,r) qchisq(p,r) invchi2(r,p) tr tinv(p,r) qt(p,r) invttail(r,1-p) F_r,k finv(p,r,k) qf(p,r,k) invF(r,k,p) χ²r(d) ncx2inv(p,r,d) qchisq(p,r,d) invnchi2(r,d,p) F_r,k(d) ncfinv(p,r,k,d) qf(p,r,k,d) invnFtail(r,k,d,1-p)

5.3 Multivariate Normal Distribution

We say that thek-vectorZhas amultivariate standard normal distribution, writtenZ∼N (0,Ik) , if it has the joint density

f(x)= 1 (2π)^k/2exp

−x⁰x 2

, x∈R^k.

The mean and covariance matrix ofZ are 0 andIk, respectively. The multivariate joint density factors into the product of univariate normal densities, so the elements ofZare mutually independent standard normals.

IfZ∼N (0,Ik) andX=µ+BZ then thek-vectorX has amultivariate normal distribution, written X∼N¡

µ,Σ¢

whereΣ=B B⁰≥0. IfΣ>0 then by change-of-variablesXhas the joint density function f(x)= 1

(2π)^k/2det (Σ)^1/2exp Ã

−

¡x−µ¢0Σ⁻¹¡ x−µ¢ 2

, x∈R^k.

The mean and covariance matrix ofX areµandΣ, respectively. By settingk=1 you can check that the multivariate normal simplifies to the univariate normal.

An important property of normal random vectors is that affine functions are multivariate normal.

Theorem 5.2 IfX∼N¡ µ,Σ¢

andY =a+BX, thenY ∼N¡

a+Bµ,BΣB⁰¢ .

One simple implication of Theorem 5.2 is that ifXis multivariate normal then each component ofX is univariate normal.

Another useful property of the multivariate normal distribution is that uncorrelatedness is the same as independence. That is, if a vector is multivariate normal, subsets of variables are independent if and only if they are uncorrelated.

Theorem 5.3 Properties of the Multivariate Normal Distribution 1. The mean and covariance matrix of X ∼ N¡

µ,Σ¢

are E[X] = µ and var [X]=Σ.

2. If (X,Y) are multivariate normal,X andY are uncorrelated if and only if they are independent.

3. IfX∼N¡ µ,Σ¢

andY =a+BX, thenY ∼N¡

a+Bµ,BΣB⁰¢ .

4. IfX∼N (0,I_k) thenX⁰X∼χ²_k, chi-square withkdegrees of freedom.

5. IfX∼N (0,Σ) withΣ>0 thenX⁰Σ⁻¹X∼χ²_kwherek=dim (X) .

6. IfX∼N(µ,Σ) withΣ>0,r×r, thenX⁰Σ⁻¹X∼χ²r(λ) whereλ=µ⁰Σ⁻¹µ. 7. IfZ∼N (0, 1) andQ∼χ²_k are independent thenZ/p

Q/k∼t_k, student t withkdegrees of freedom.

8. If (Y,X) are multivariate normal µ Y

∼N µµ µY

µX

¶ ,

µ ΣY Y ΣY X

ΣX Y ΣX X

¶¶

withΣY Y >0 andΣX X>0 then the conditional distributions are Y |X∼N¡

µY+ΣY XΣ⁻_{X X}¹ ¡

X−µX¢

,ΣY Y−ΣY XΣ⁻_{X X}¹ ΣX Y¢ X|Y ∼N¡

µX+ΣX YΣ⁻¹Y Y

¡Y−µY¢

,ΣX X−ΣX YΣ⁻¹Y YΣY X¢ .

5.4 Joint Normality and Linear Regression

Suppose the variables (Y,X) are jointly normally distributed. Consider the best linear predictor ofY givenX

Y =X⁰β+α+e.

By the properties of the best linear predictor,E[X e]=0 andE[e]=0, soXandeare uncorrelated. Since (e,X) is an affine transformation of the normal vector (Y,X) it follows that (e,X) is jointly normal (Theo- rem 5.2). Since (e,X) is jointly normal and uncorrelated they are independent (Theorem 5.3). Indepen- dence implies that

E[e|X]=E[e]=0 and

E£ e²|X¤

=E£ e²¤

=σ² which are properties of a homoskedastic linear CEF.

We have shown that when (Y,X) are jointly normally distributed they satisfy a normal linear CEF Y =X⁰β+α+e

where

e∼N(0,σ²)

is independent ofX. This result can also be deduced from Theorem 5.3.7.

This is a classical motivation for the linear regression model.

5.5 Normal Regression Model

The normal regression model is the linear regression model with an independent normal error

Y =X⁰β+e (5.1)

e∼N(0,σ²).

As we learned in Section 5.4 the normal regression model holds when (Y,X) are jointly normally dis- tributed. Normal regression, however, does not require joint normality. All that is required is that the conditional distribution ofY givenX is normal (the marginal distribution of X is unrestricted). In this sense the normal regression model is broader than joint normality. Notice that for notational conve- nience we have written (5.1) so thatXcontains the intercept.

Normal regression is a parametric model where likelihood methods can be used for estimation, test- ing, and distribution theory. Thelikelihoodis the name for the joint probability density of the data, evaluated at the observed sample, and viewed as a function of the parameters. The maximum likelihood estimator is the value which maximizes this likelihood function. Let us now derive the likelihood of the normal regression model.

First, observe that model (5.1) is equivalent to the statement that the conditional density ofY given Xtakes the form

f¡ y|x¢

= 1

¡2πσ²¢1/2exp µ

− 1 2σ²

¡y−x⁰β¢2¶ .

Under the assumption that the observations are mutually independent this implies that the conditional density of (Y1, ...,Yn) given (X1, ...,Xn) is

f¡

y1, ...,yn|x1, ...,xn¢

i=1

f¡ yi|xi¢

i=1

¡2πσ²¢1/2exp µ

− 1 2σ²

¡yi−x_i⁰β¢2¶

= 1

¡2πσ²¢n/2exp Ã

− 1 2σ²

i=1

¡yi−x⁰_iβ¢2

def= L_n(β,σ²).

This is called thelikelihood functionwhen evaluated at the sample data.

For convenience it is typical to work with the natural logarithm logLn(β,σ²)= −n

2log(2πσ²)− 1 2σ²

i=1

¡Yi−X_i⁰β¢2 def

=`n(β,σ²) (5.2) which is called thelog-likelihood function.

Themaximum likelihood estimator(MLE) (βbmle,σb²_mle) is the value which maximizes the log-likelihood.

We can write the maximization problem as

(βbmle,σb²_mle)= argmax

β∈R^k,σ²>0

`n(β,σ²). (5.3)

In most applications of maximum likelihood the MLE must be found by numerical methods. However in the case of the normal regression model we can find an explicit expression forβbmleandσb²_mle.

The maximizers (βbmle,σb²_mle) of (5.3) jointly solve the first-order conditions (FOC) 0= ∂

∂β`n(β,σ²)

¯β=βbmle,σ²=bσ²_mle= 1 σb²_mle

i=1

Xi¡

Yi−X_i⁰βbmle

¢ (5.4)

0= ∂

∂σ²`n(β,σ²)

¯_β=

βbmle,σ²=σb²_mle= − n

2σb²_mle+ 1 2σb⁴_mle

i=1

¡Yi−X_i⁰βbmle¢2

. (5.5)

The first FOC (5.4) is proportional to the first-order conditions for the least squares minimization prob- lem of Section 3.6. It follows that the MLE satisfies

βbmle= Ã _n

i=1

XiX_i⁰

!−1Ã _n X

i=1

XiYi

=βbols. That is, the MLE forβis algebraically identical to the OLS estimator.

Solving the second FOC (5.5) forσb²_mlewe find

σb²_mle=1 n

i=1

¡Yi−X_i⁰βbmle¢2

=1 n

i=1

¡Yi−X_i⁰βbols¢2

=1 n

i=1

eb_i²=σb²_ols. Thus the MLE forσ²is identical to the OLS/moment estimator from (3.26).

Since the OLS estimator and MLE under normality are equivalent,βbis described by some authors as the maximum likelihood estimator, and by other authors as the least squares estimator. It is important

to remember, however, thatβbis only the MLE when the errorehas a known normal distribution and not otherwise.

Plugging the estimators into (5.2) we obtain the maximized log-likelihood

βbmle,σb²_mle¢

= −n 2log¡

2πσb²_mle¢

−n

2. (5.6)

The log-likelihood is typically reported as a measure of fit.

It may seem surprising that the MLEβbmleis numerically equal to the OLS estimator despite emerging from quite different motivations. It is not completely accidental. The least squares estimator minimizes a particular sample loss function – the sum of squared error criterion – and most loss functions are equiv- alent to the likelihood of a specific parametric distribution, in this case the normal regression model. In this sense it is not surprising that the least squares estimator can be motivated as either the minimizer of a sample loss function or as the maximizer of a likelihood function.

Carl Friedrich Gauss

The mathematician Carl Friedrich Gauss (1777-1855) proposed the normal re- gression model, and derived the least squares estimator as the maximum like- lihood estimator for this model. He claimed to have discovered the method in 1795 at the age of eighteen but did not publish the result until 1809. Interest in Gauss’s approach was reinforced by Laplace’s simultaneous discovery of the central limit theorem, which provided a justification for viewing random distur- bances as approximately normal.

5.6 Distribution of OLS Coefficient Vector

In the normal linear regression model we can derive exact sampling distributions for the OLS/MLE estimator, residuals, and variance estimator. In this section we derive the distribution of the OLS coeffi- cient estimator.

The normality assumptione|X∼N¡ 0,σ²¢

combined with independence of the observations has the multivariate implication

e|X ∼N¡

0,Inσ²¢ .

That is, the error vectoreis independent ofX and is normally distributed.

Recall that the OLS estimator satisfies

βb−β=¡ X⁰X¢₋₁

X⁰e

which is a linear function ofe. Since linear functions of normals are also normal (Theorem 5.2) this implies that conditional onX,

βb−β|X ∼¡ X⁰X¢₋1

X⁰N¡

0,Inσ²¢

∼N³ 0,σ²¡

X⁰X¢₋1

X⁰X¡

X⁰X¢₋1´

=N³ 0,σ²¡

X⁰X¢₋1´ .

This shows that under the assumption of normal errors the OLS estimator has an exact normal dis- tribution.

Theorem 5.4 In the normal regression model, βb|X ∼N³

β,σ²¡

X⁰X¢₋1´ .

Theorems 5.2 and 5.4 imply that any affine function of the OLS estimator is also normally distributed including individual components. Lettingβjandβbj denote thej^{t h}elements ofβandβb, we have

βbj|X ∼N µ

βj,σ²h

¡X⁰X¢−1i

j j

. (5.7)

Theorem 5.4 is a statement about the conditional distribution. What about the unconditional distri- bution? In Section 4.7 we presented Kinal’s theorem about the existence of moments for the joint normal regression model. We re-state the result here.

Theorem 5.5 Kinal (1980) If (Y,X) are jointly normal, then for anyr,E°

° bβ°

∞if and only ifr<n−k+1.

5.7 Distribution of OLS Residual Vector

Consider the OLS residual vector. Recall from (3.24) thatbe=M ewhereM=In−X¡ X⁰X¢₋1

X⁰. This shows thatbeis linear ine. So conditional onX

be=Me|X∼N¡

0,σ²M M¢

=N¡

0,σ²M¢

the final equality sinceM is idempotent (see Section 3.12). This shows that the residual vector has an exact normal distribution.

Furthermore, it is useful to find the joint distribution ofβbandbe. This is easiest done by writing the two as a stacked linear function of the errore. Indeed,

µ βb−β

= Ã ¡

X⁰X¢₋1

X⁰e Me

= Ã ¡

X⁰X¢₋1

X⁰ M

! e

which is a linear function ofe. The vector thus has a joint normal distribution with covariance matrix Ã σ²¡

X⁰X¢₋₁ 0

0 σ²M

! .

The off-diagonal block is zero becauseX⁰M=0 from (3.21). Since this is zero it follows thatβbandbeare statistically independent (Theorem 5.3.2).

Theorem 5.6 In the normal regression model,be|X ∼N¡

0,σ²M¢

and is inde- pendent ofβ.b

The fact thatβbandbe are independent implies thatβbis independent of any function of the residual vector including individual residualsebiand the variance estimatorss²andσb².

5.8 Distribution of Variance Estimator

Next, consider the variance estimators²from (4.26). Using (3.28) it satisfies (n−k)s²=be⁰be=e⁰Me.

The spectral decomposition ofM(equation (A.4)) isM=HΛH⁰whereH⁰H=I_nandΛis diagonal with the eigenvalues ofM on the diagonal. SinceM is idempotent with rankn−k (see Section 3.12) it has n−keigenvalues equalling 1 andkeigenvalues equalling 0, so

Λ=

· In−k 0 0 0_k

¸ . Letu=H⁰e∼N¡

0,Inσ²¢

(see Exercise 5.2) and partitionu=¡ u⁰₁,u⁰₂¢₀

whereu1∼N¡

0,I_n−kσ²¢ . Then (n−k)s²=e⁰Me

=e⁰H

· I_n−k 0

0 0

¸ H⁰e

=u⁰

· In−k 0

0 0

¸ u

=u⁰₁u₁

∼σ²χ²_n−k.

We see that in the normal regression model the exact distribution ofs²is a scaled chi-square.

Sincebeis independent ofβbit follows thats²is independent ofβbas well.

Theorem 5.7 In the normal regression model, (n−k)s²

σ² ∼χ²_n₋_k and is independent ofβb.

5.9 t-statistic

An alternative way of writing (5.7) is

βbj−βj

r σ²h¡

X⁰X¢₋₁i

j j

∼N (0, 1) .

This is sometimes called astandardizedstatistic as the distribution is the standard normal.

Now take the standardized statistic and replace the unknown varianceσ²with its estimators². We call this at-ratioort-statistic

T= βbj−βj

r s²h¡

X⁰X¢₋₁i

j j

=βbj−βj

s(βbj)

wheres(βbj) is the classical (homoskedastic) standard error forβbj from (4.37). We will sometimes write the t-statistic asT(βj) to explicitly indicate its dependence on the parameter valueβj, and sometimes will simplify notation and write the t-statistic asT when the dependence is clear from the context.

With algebraic re-scaling we can write the t-statistic as the ratio of the standardized statistic and the square root of the scaled variance estimator. Since the distributions of these two components are normal and chi-square, respectively, and independent, we deduce that the t-statistic has the distribution

T = βbj−βj

r σ²h

¡X⁰X¢₋1i

j j

, s(n−k)s² σ²

Á (n−k)

∼ N (0, 1) qχ²_n₋_k±

(n−k)

∼tn−k

a studenttdistribution withn−kdegrees of freedom.

This derivation shows that the t-ratio has a sampling distribution which depends only on the quantity n−k. The distribution does not depend on any other features of the data. In this context, we say that the distribution of the t-ratio ispivotal, meaning that it does not depend on unknowns.

The trick behind this result is scaling the centered coefficient by its standard error, and recognizing that each depends on the unknownσonly through scale. Thus the ratio of the two does not depend on σ. This trick (scaling to eliminate dependence on unknowns) is known asstudentization.

Theorem 5.8 In the normal regression model,T∼t_n−k.

An important caveat about Theorem 5.8 is that it only applies to the t-statistic constructed with the homoskedastic (old-fashioned) standard error. It does not apply to a t-statistic constructed with any of the robust standard errors. In fact, the robust t-statistics can have finite sample distributions which deviate considerably fromt_n−keven when the regression errors are independent N(0,σ²). Thus the dis- tributional result in Theorem 5.8 and the use of the t distribution in finite samples is only exact when applied to classical t-statistics under the normality assumption.

5.10 Confidence Intervals for Regression Coefficients

The OLS estimatorβbis apoint estimatorfor a coefficientβ. A broader concept is asetorinterval estimatorwhich takes the formCb=[bL,Ub]. The goal of an interval estimatorCbis to contain the true value, e.g.β∈Cb, with high probability.

The interval estimatorCbis a function of the data and hence is random.

An interval estimatorCbis called a 1−αconfidence intervalwhenP£ β∈Cb¤

=1−αfor a selected value ofα. The value 1−αis called thecoverage probability. Typical choices for the coverage probability 1−α are 0.95 or 0.90.

The probability calculationP£ β∈Cb¤

is easily mis-interpreted as treatingβas random andCbas fixed.

(The probability thatβis inC.) This is not the appropriate interpretation. Instead, the correct inter-b pretation is that the probabilityP£

β∈Cb¤

treats the pointβas fixed and the setCbas random. It is the probability that the random setCbcovers (or contains) the fixed true coefficientβ.

There is not a unique method to construct confidence intervals. For example, one simple (yet silly) interval is

Cb=

½ R with probability 1−α

with probabilityα. Ifβbhas a continuous distribution, then by constructionP£

β∈Cb¤

=1−α, so this confidence interval has perfect coverage. However,Cbis uninformative aboutβband is therefore not useful.

Instead, a good choice for a confidence interval for the regression coefficientβis obtained by adding and subtracting from the estimatorβba fixed multiple of its standard error:

Cb=£

βb−c×s(βb), βb+c×s(βb)¤

(5.8) wherec>0 is a pre-specified constant. This confidence interval is symmetric about the point estimator βband its length is proportional to the standard errors(β).b

Equivalently,Cbis the set of parameter values forβsuch that the t-statisticT(β) is smaller (in absolute value) thanc, that is

Cb=© β:¯

¯T(β)¯

¯≤cª

= (

β:−c≤βb−β s(βb) ≤c

) . The coverage probability of this confidence interval is

P£ β∈Cb¤

=P£¯

¯T(β)¯

¯≤c¤

=P£

−c≤T(β)≤c¤

. (5.9)

Since the t-statisticT(β) has thet_n−kdistribution (5.9) equalsF(c)−F(−c), whereF(u) is the studentt distribution function withn−kdegrees of freedom. SinceF(−c)=1−F(c) (see Exercise 5.8) we can write (5.9) as

P£ β∈Cb¤

=2F(c)−1.

This is thecoverage probabilityof the intervalCb, and only depends on the constantc.

As we mentioned before, a confidence interval has the coverage probability 1−α. This requires se- lecting the constantcso thatF(c)=1−α/2. This holds ifcequals the 1−α/2 quantile of thet_n−k distri- bution. As there is no closed form expression for these quantiles we compute their values numerically.

For example, bytinv(1-alpha/2,n-k)in MATLAB. With this choice the confidence interval (5.8) has exact coverage probability 1−α. By default, Stata reports 95% confidence intervalsCbfor each estimated regression coefficient using the same formula.

Theorem 5.9 In the normal regression model, (5.8) withc=F⁻¹(1−α/2) has coverage probabilityP£

β∈Cb¤

=1−α.

When the degree of freedom is large the distinction between the studentt and the normal distribu- tion is negligible. In particular, forn−k≥61 we havec≤2.00 for a 95% interval. Using this value we obtain the most commonly used confidence interval in applied econometric practice:

Cb=£

βb−2s(βb), βb+2s(βb)¤

. (5.10)

This is a useful rule-of-thumb. This 95% confidence intervalCbis simple to compute and can be easily calculated from coefficient estimates and standard errors.

Theorem 5.10 In the normal regression model, ifn−k ≥61 then (5.10) has coverage probabilityP£

β∈Cb¤

≥0.95.

Confidence intervals are a simple yet effective tool to assess estimation uncertainty. When reading a set of empirical results look at the estimated coefficient estimates and the standard errors. For a pa- rameter of interest compute the confidence intervalCband consider the meaning of the spread of the suggested values. If the range of values in the confidence interval are too wide to learn aboutβthen do not jump to a conclusion aboutβbased on the point estimate alone.

5.11 Confidence Intervals for Error Variance

We can also construct a confidence interval for the regression error varianceσ²using the sampling distribution ofs²from Theorem 5.7. This states that in the normal regression model

(n−k)s²

σ² ∼χ²_n−k. (5.11)

LetF(u) denote theχ²_n₋_k distribution function and for someαsetc1=F⁻¹(α/2) andc2=F⁻¹(1−α/2) (theα/2 and 1−α/2 quantiles of theχ²_n−kdistribution). Equation (5.11) implies that

c1≤(n−k)s² σ² ≤c2

=F(c2)−F(c1)=1−α. Rewriting the inequalities we find

·(n−k)s²

c₂ ≤σ²≤(n−k)s² c₁

=1−α. This shows that an exact 1−αconfidence interval forσ²is

Cb=

·(n−k)s²

c₂ , (n−k)s² c₁

. (5.12)

Theorem 5.11 In the normal regression model (5.12) has coverage probability P£

σ²∈Cb¤

=1−α.

The confidence interval (5.12) forσ² is asymmetric about the point estimates² due to the latter’s asymmetric sampling distribution.

5.12 t Test

A typical goal in an econometric exercise is to assess whether or not a coefficientβequals a specific valueβ0. Often the specific value to be tested isβ0=0 but this is not essential. This is calledhypothesis testing, a subject which will be explored in detail in Chapter 9. In this section and the following we give a short introduction specific to the normal regression model.

For simplicity write the coefficient to be tested asβ. Thenull hypothesisis

H0:β=β0. (5.13)

This states that the hypothesis is that the true value ofβequals the hypothesized valueβ0. The alternative hypothesis is the complement ofH0, and is written as

H1:β6=β0.

This states that the true value ofβdoes not equal the hypothesized value.

We are interested in testingH0againstH1. The method is to design a statistic which is informative aboutH1. If the observed value of the statistic is consistent with random variation under the assumption thatH0is true, then we deduce that there is no evidence againstH0and consequently do not rejectH0. However, if the statistic takes a value which is unlikely to occur under the assumption thatH0 is true, then we deduce that there is evidence againstH0and consequently we rejectH0in favor ofH1. The main steps are to design a test statistic and to characterize its sampling distribution.

The standard statistic to testH0againstH1is the absolute value of the t-statistic

|T| =

¯ βb−β0

s(βb)

. (5.14)

IfH0is true then we expect|T|to be small, but ifH1is true then we would expect|T|to be large. Hence the standard rule is to rejectH0in favor ofH1for large values of the t-statistic|T|and otherwise fail to rejectH0. Thus the hypothesis test takes the form

RejectH0if|T| >c.

The constantc which appears in the statement of the test is called thecritical value. Its value is selected to control the probability of false rejections. When the null hypothesis is trueT has an exact tn−k distribution in the normal regression model. Thus for a given value ofc the probability of false rejection is

P£

RejectH0|H0¤

=P[|T| >c|H0]

=P[T >c|H0]+P[T< −c|H0]

=1−F(c)+F(−c)

=2(1−F(c))

whereF(u) is thetn−k distribution function. This is the probability of false rejection and is decreasing in the critical valuec. We select the valuecso that this probability equals a pre-selected value called the significance levelwhich is typically written asα. It is conventional to setα=0.05, though this is not a hard rule. We then selectcso thatF(c)=1−α/2, which means thatcis the 1−α/2 quantile (inverse CDF) of thetn−k distribution, the same as used for confidence intervals. With this choice the decision rule “RejectH0if|T| >c” has a significance level (false rejection probability) ofα.

Theorem 5.12 In the normal regression model if the null hypothesis (5.13) is true, then for|T|defined in (5.14)T ∼tn−k. Ifcis set so thatP[|tn−k| ≥c]= αthen the test “RejectH0in favor ofH1if|T| >c” has significance levelα.

To report the result of a hypothesis test we need to pre-determine the significance levelαin order to calculate the critical valuec. This can be inconvenient and arbitrary. A simplification is to report what is known as thep-valueof the test. In general, when a test takes the form “RejectH0ifS>c” andShas null distributionG(u) then the p-value of the test isp =1−G(S). A test with significance levelαcan be restated as “RejectH0ifp<α”. It is sufficient to report the p-valuepand we can interpret the value ofp as indexing the test’s strength of rejection of the null hypothesis. Thus a p-value of 0.07 might be interpreted as “nearly significant”, 0.05 as “borderline significant”, and 0.001 as “highly significant”. In the context of the normal regression model the p-value of a t-statistic|T|isp=2(1−Fn−k(|T|)) where F_n−k is thet_n−k CDF. For example, in MATLAB the calculation is2*(1-tcdf(abs(t),n-k)). In Stata, the default is that for any estimated regression, t-statistics for each estimated coefficient are reported along with their p-values calculated using this same formula. These t-statistics test the hypotheses that each coefficient is zero.

A p-value reports the strength of evidence againstH0but is not itself a probability. A common mis- understanding is that the p-value is the “probability that the null hypothesis is true”. This is an incorrect interpretation. It is a statistic, is random, and is a measure of the evidence againstH0. Nothing more.

5.13 Likelihood Ratio Test

In the previous section we described the t-test as the standard method to test a hypothesis on a single coefficient in a regression. In many contexts, however, we want to simultaneously assess a set of coefficients. In the normal regression model, this can be done by anF test which can be derived from the likelihood ratio test.

Partition the regressors asX=(X₁⁰,X₂⁰) and similarly partition the coefficient vector asβ=(β⁰₁,β⁰₂)⁰. The regression model can be written as

Y =X₁⁰β1+X₂⁰β2+e. (5.15)

Letk =dim(X),k₁=dim(X₁), andq =dim(X₂), so thatk=k₁+q. Partition the variables so that the hypothesis is that the second set of coefficients are zero, or

H0:β2=0. (5.16)

IfH0is true then the regressorsX2can be omitted from the regression. In this case we can write (5.15) as

Y =X₁⁰β1+e. (5.17)

We call (5.17) the null model. The alternative hypothesis is that at least one element ofβ2is non-zero and is written asH1:β26=0.

When models are estimated by maximum likelihood a well-accepted testing procedure is to reject H0in favor ofH1for large values of the Likelihood Ratio – the ratio of the maximized likelihood function underH1andH0, respectively. We now construct this statistic in the normal regression model. Recall from (5.6) that the maximized log-likelihood equals

`n¡ βb,σb²¢

= −n 2log¡

2πbσ²¢

−n 2.

ドキュメント内 PDF ECONOMETRICS - Keio (ページ 157-174)