Statistics 5102 (Geyer, Spring 2003) Regression in R

Fitting a Regression Model
Hypothesis Tests
Confidence and Prediction Intervals

Fitting a Regression Model

Simple Linear Regression

Section 10.1 in DeGroot and Schervish describes the method of least squares, also known as linear regression.

R does this a lot more easily.

Here are the calculations for the first model fit in Section 10.1 in DeGroot and Schervish done using R

The regression coefficients reported by the printout, β₀ = −0.7861 and β₁ = 0.6850 are the same as DeGroot and Schervish give below their equation (10.1.5) and the plot is the same as their Figure 10.4.

The unbiased estimate of the error variance σ² is 1.083^2 where 1.083 is the Residual standard error reported just below the coefficients table.

The R function lm fits the linear model. The expression y ~ x specifies the model

y = β₀ + β₁ x + error

More about model formulas is given in the R documentation for model formulas, but you don't need to look at that. We'll cover all the formulas you need to know about in this web page.

For now it's enough to know that the formula specifies the predictor x and the response y. If these variables were instead named fred and sally, respectively, the model formula would have to be sally ~ fred and the whole model fitting statement would be

out <- lm(sally ~ fred)

Note that the model formula doesn't mention regression coefficients explicitly. There is one regression coefficient for each predictor in the formula (that is, there is a β₁ for the predictor x) and there is also a regression coefficient for the constant predictor (that is, there is a β₀ too) included by default.

The reason why the model fit is saved in a variable out is because we generally want to do many things with it. In this example, we use it twice. In other examples, we will use it more.

Quadratic Regression

Here are the calculations for Example 10.1.1 in DeGroot and Schervish done using R

The regression coefficients reported by the printout

β₀ = −0.7451
β₁ = 0.6186
β₂ = 0.0126

are the same as DeGroot and Schervish give in their equation (10.1.10) and the plot is the same as their Figure 10.5 (except for not having the line from the other fit).

The R function lm fits the linear model. The expression y ~ x + I(x^2) specifies the model

y = β₀ + β₁ x + β₂ x² + error

More about model formulas is given in the R documentation for model formulas, but you don't need to look at that. We'll cover all the formulas you need to know about in this web page.

For now it's enough to know that the formula specifies the predictors x and x^2 and the response y. The plus sign (+) is magical. It separates different predictors.

The I function that wraps the second predictor function. It is necessary to write I(x^2) instead of just x^2 because otherwise the hat (^) would not be interpreted correctly.

Generally, you have to wrap all complicated expressions for predictors. For example, you would write I(sin(x)) to use sin(x) as a predictor. Only simple variable names do not have to be wrapped.

Note that the model formula doesn't mention regression coefficients explicitly. There is one regression coefficient for each predictor in the formula, that is, there is a β₁ for the predictor x and a β₂ for the predictor I(x^2), and there is also a regression coefficient for the constant predictor (that is, there is a β₀ too) included by default.

The curve function that draws the sample regression curve on the plot we will just treat as magic because it is too complicated to explain. It is, of course, documented in the on-line help, but you don't need to look at that.

For comparison, we add (dotted line) the regression line from the other plot (simple linear regression).

Multiple Regression

Here are the calculations for Example 10.1.3 in DeGroot and Schervish done using R

The regression coefficients reported by the printout

β₀ = −11.4528
β₁ = 0.4503
β₂ = 0.1725

are the same as DeGroot and Schervish give in their equation (10.1.15).

The R function lm fits the linear model. The expression y ~ x1 + x2 specifies the model

y = β₀ + β₁ x₁ + β₂ x₂ + error

Note that we can't call both predictor variables the same name. Here we call them x1 and x2.

Hypothesis Tests

Tests about Regression Coefficients

Done by R

DeGroot and Schervish use an example with for hypothesis tests for which they don't give the data. We'll use the example for simple linear regression and the example for quadratic regression which were done above.

The form below redoes those examples, except for the plots, which we don't need.

The columns labeled t value and Pr(>|t|) of the Coefficients: table give, respectively,

t value, the test statistic for a test of the hypothesis that the specified regression coefficient (the one for that line of the table) is actually zero.
Pr(>|t|) the P-value for the two-tailed test about that regression coefficient.

For example, if we want to do the test with regression function

g(x) = β₀ + β₁ x

and hypotheses

H₀ : β₁ = 0
H₁ : β₁ ≠ 0

then the value of the test statistic is T = 3.801. Assuming the null hypothesis and assuming the standard regression assumptions (i. i. d. normal errors), this test statistic has a Student t distribution with n - 2 degrees of freedom. The corresponding P-value is P = 0.00523 (two-tailed). But you don't have to do a look-up in a table of the t distribution, R does it for you.

The interpretation of P = 0.00523 for this test is, of course, that the null hypothesis is rejected and the true unknown population regression coefficient β₁ is nonzero.

If you wanted a one-tailed rather than a two-tailed test, you would have to divide the given P-value by 2.

For another example, if we want to do the test with regression function

g(x) = β₀ + β₁ x + β₂ x²

and hypotheses

H₀ : β₂ = 0
H₁ : β₂ ≠ 0

then the value of the test statistic is T = 0.092 and the P-value is 0.929.

The interpretation of this P-value is, of course, that the null hypothesis is accepted and the true unknown population regression coefficient is zero. As usual, that doesn't prove the population regression coefficient is actually zero. It only says that the data at hand don't give any real evidence it is not zero.

Done by Hand

The by hand here doesn't mean completely by hand. R does most of the work but leaves a bit for you.

Since the calculations are very much like the calculations for confidence intervals, we will put this out of sequence, in the section on confidence intervals in the subsection on general tests about regression coefficients.

Summary

The quadratic regression is unnecessary for these data. It appears that the linear model with the regression function

g(x) = β₀ + β₁ x

fits the data well.

The linear regression does appear to be necessary for these data. It appears that the model with the constant regression function

g(x) = β₀

does not fit the data well. So the the linear model is necessary.

Caution

Remember the dogma: a statistical hypothesis test is valid only when you do only one test on a data set.

We can push that a little bit. It's not too bogus if we do only one test per regression.

But we should never do more than one test on one regression. That means we should never look at more than one of the P-values printed in the Pr(>|t|) column of the Coefficients: table in a regression output.

The two regressions in this section are a good example of this issue.

Suppose we look at the second regression and conclude (as we did above) that β₂ is not significantly different from zero.
Then it would be the Wrong Thing (with a capital W and a capital T) to do another test using the same regression output and conclude that β₁ is also not significantly different from zero (P = 0.437).
In fact, the Right Thing is to fit the linear regression model and conclude exactly the opposite, that β₁ is significantly different from zero (P = 0.00523).
After you've used the quadratic regression output once (to conclude β₂ is not significantly different from zero), then you shouldn't use this regression output for anything else! If you want to do another test, then it should be based on another regression fit.

Tests about Correlation Coefficients

Regression coefficients in simple linear regression are intimately related to correlation coefficients. The simple relation

β₁ = ρ σ_Y / σ_X

relates the two. Thus the slope of the population regression line is zero if and only if the correlation of X and Y is zero. And thus the test with null hypothesis β₁ = 0 done above is also a test with null hypothesis ρ = 0.

You only do one test, but you are entitled to woof about it two different ways.

Omnibus Tests

Also of interest in multiple regression is a test of whether there are any regression coefficients that are significantly nonzero except for the coefficient β₀ that goes with the constant predictor and is usually not of interest. That is we want to do a test of

H₀ : β₁ = β₂ = ... = β_k = 0
H₁ : β_i &ne 0, for some i

This test is often not of particular research interest. It serves as a straw man to knock down. The null hypothesis is generally thought to be false and is easily rejected with a reasonable amount of data.

But it is important to do the test anyway. When the null hypothesis cannot be rejected, this means the data are completely worthless. The model that has the constant regression function fits as well as the regression model.

Thus if this test fails to reject H₀ the right thing to do is throw the data in the trash. They're worthless. No further analysis need be done.

The idea that their precious data, obtained at great cost in time, money, or effort, might actually be worthless upsets some people so much that they don't even want to think about doing this test. But they should. Drawing conclusions from worthless data makes you a fool, whether or not you are aware of your foolishness. If someone else reanalyzes the data, they'll find out.

R does this test in every regression printout. For quadratic regression example were done above part of the printout was

Coefficients: 
            Estimate Std. Error t value Pr(>|t|) 
(Intercept)  -0.7451     0.7319  -1.018    0.343 
x             0.6186     0.7500   0.825    0.437 
I(x^2)        0.0126     0.1373   0.092    0.929 
 
Residual standard error: 1.157 on 7 degrees of freedom 
Multiple R-Squared: 0.644,	Adjusted R-squared: 0.5423  
F-statistic: 6.332 on 2 and 7 DF,  p-value: 0.02692

The last line describes this omnibus test.

The interpretation is that the null hypothesis is rejected (P = 0.02692). So the regression is not completely worthless. Either β₁ or β₂ appears to be nonzero (or perhaps both).

Note that this is quite the opposite conclusion from what one would get if one were foolish enough to do two tests on the same regression using the P-values for the regression coefficients in the Pr(>|t|) column of the Coefficients: table. Yet another example of the reason for the only one test rule.

The theory of this test is given in the following section.

Model Comparison Tests

In this section we consider testing a big model versus a little model. The null hypothesis is some regression model, and the alternative hypothesis is some other regression model, and the little model is a submodel of the big model (the little model is obtained by setting some of the regression coefficients of the big model to zero).

As an example, let us consider the multiple regression example done above. We use the model from that fit as the little model. And we consider a quadratic model as the big model. A quadratic model has three more terms. The regression function for the big model is

g(x) = β₀ + β₁ x₁ + β₂ x₂ + β₃ x₁² + β₄ x₁ x₂ + β₅ x₂²

We obtain the regression function for the little model by setting β₃ = β₄ = β₅ = 0. So the little model is indeed a submodel of the big model (as the test requires).

To compare the fits of the two models we first do the fits, which we save in the R objects out.little and out.big. Then we compare the models using the anova function. The printout of that function is a so-called ANOVA (analysis of variance) table, which dates back to the days of hand calculation and gives old timers a warm fuzzy feeling. For our purposes we only need to get the test statistic (F = 1.5442) and P-value (P = 0.3337) out of this table.

The interpretation of the test is that the null hypothesis is accepted. The data give no evidence of statistically significant departure from the little model. That doesn't prove the little model is actually correct, only that the data give no evidence it isn't.

Note that you can't get the same effect by looking at the three P-values for the three regression coefficients separately. That violates the do only one test rule.

If you want to make this model comparison, you need to do this test.

The omnibus test of the preceding section is the special case of the test of this section where the little model has only the constant predictor.

The Theorem

This test is based on the following theorem.

If RSS₁ and RSS₂ are the residual sum of squares for the little model and big model, respectively.
If the big model is correct, meaning
- the regression function for the big model is correct, and
- the errors are i. i. d. normal.
And if the little model is a submodel of the big model.
Then the model sum of squares RSS₁ − RSS₂ and the error sum of squares RSS₂ are independent chi-square random variables with p₂ − p₁ degrees of freedom and n − p₂ degrees of freedom, respectively, where p₁ and p₂ are the number of regression coefficients in the big and little models, respectively, and n is the number of cases.

Hence we can form an F statistic by dividing each sum of squares by its degrees of freedom and forming the ratio. And that's what the anova function does.

Confidence and Prediction Intervals

Confidence Intervals for Regression Coefficients

DeGroot and Schervish (pp. 626-627) use an example for confidence intervals that has no data, so again we'll use the example for quadratic regression which done above.

R doesn't do regression confidence intervals for you. It does most of the work but leaves a bit left for you. Let's again look at the regression printout for that example.

Coefficients: 
            Estimate Std. Error t value Pr(>|t|) 
(Intercept)  -0.7451     0.7319  -1.018    0.343 
x             0.6186     0.7500   0.825    0.437 
I(x^2)        0.0126     0.1373   0.092    0.929 
 
Residual standard error: 1.157 on 7 degrees of freedom 
Multiple R-Squared: 0.644,	Adjusted R-squared: 0.5423  
F-statistic: 6.332 on 2 and 7 DF,  p-value: 0.02692

The printout gives us three numbers (highlighted in yellow) needed to construct the confidence interval. The same three numbers are also needed for general tests about regression coefficients described in the following section.

The first number is the estimate, also called the sample regression coefficient.
The second number is the standard error of this estimate (the estimated standard deviation of the sampling distribution of the estimate).
The third number is the degrees of freedom for the t pivotal quantity made from the first two numbers (DeGroot and Schervish, pp. 626-627).

Thus one makes up a confidence interval in the usual way

point estimate ± critical value × standard error

where the critical value here comes from the t distribution with the degrees of freedom stated.

If we want a 95% confidence interval, the critical value for 7 degrees for freedom is given by

Rweb:> qt(0.975, 7) 
[1] 2.364624

and then the confidence interval for the parameter β₂ for these data is given by

Rweb:> 0.0126 + c(-1, 1) * qt(0.975, 7) * 0.1373 
[1] -0.3120629  0.3372629

Of course, this whole calculation can be done by hand with a calculator and a table of the t distribution starting with the regression output. No further use of R is necessary after getting the regression output.

General Tests about Regression Coefficients

General test (with the hypothesized value under the null hypothesis some number other than zero) are done by hand in the same way as the confidence intervals.

We use the same three numbers highlighted above.

Suppose we want to test

H₀ : β₂ = 0.25
H₁ : β₂ < 0.25

The test statistic for a t test is

(point estimate - hypothesized value) / standard error

In this example that is

Rweb:> (0.0126 - 0.25) / 0.1373 
[1] -1.729060

And we look up the P-value using the pt function.

Rweb:> tstat <- (0.0126 - 0.25) / 0.1373 
Rweb:> pt(tstat, 7) 
[1] 0.06371217

nearly, but not quite, statistically significant at the 0.05 level (one-tailed).

Confidence Intervals for the Regression Function

We can also get a confidence interval for the value of the regression function g(x) at some predictor point x = (x₁, ..., x_k).

The R function predict does this.

For a first example, take the quadratic regression example we used frequently in this web page. Suppose we want to estimate g(x) for x = 4.5.

The interval: (0.97802, 3.60901)

If the predictor is a vector, then each component must be supplied in the newdata argument.

This time we use the multiple regression example. we used frequently in this web page. Suppose we want to estimate g(x₁, x₂) for x₁ = 2.0 and x₂ = 64.

Prediction Intervals

We can also get a prediction interval for the value of the a new response value independent of the data at hand associated with the predictor point x = (x₁, ..., x_k).

The R function predict also does this. Just use interval = "prediction" instead of interval = "confidence".

The interval: (-0.741536, 5.328566)

Statistics 5102 (Geyer, Spring 2003) Regression in R

Contents

The Theorem