Statistics 3011 (Geyer and Jones, Spring 2006) Examples:
Inference for Regression

Doing Regression

Long ago we learned how to do regression.

In the regression output is a good deal of what we need to know to do confidence intervals and tests of significance for linear regression.

Confidence Intervals for the Regression Slope

We redo Example 21.4 in the textbook.

We use the R functions lm (on-line help) and summary.lm (on-line help)

External Data Entry

Enter a dataset URL :

Although this example is not about making the scatterplot and adding the regression line, we do that too and get a plot just like Figure 21.1 in the textbook.

The printout

Coefficients: 
            Estimate Std. Error t value Pr(>|t|)     
(Intercept)   91.268      8.934  10.216  3.5e-12 *** 
Crying         1.493      0.487   3.065  0.00411 **  
--- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
 
Residual standard error: 17.5 on 36 degrees of freedom 

contains everything we need to make a confidence interval for the slope.

In the calculation in the textbook, they use the t critical value for 30 degrees of freedom because the book's table is incomplete.

We use R to do the right thing.

Rweb:> 1.493 + c(-1,1) * qt(0.975, df = 36) * 0.487 
[1] 0.5053182 2.4806818

The critical value was

Rweb:> qt(0.975, df = 36) 
[1] 2.028094 

and the margin of error was

Rweb:> qt(0.975, df = 36) * 0.487 
[1] 0.9876818

In practice we would round, reporting either

1.493 ± 0.988

or

(0.505, 2.481)

Not much different, but in real life (as opposed to classwork) there is no excuse for using 30 degrees of freedom when 36 degrees of freedom is correct.

Testing the Hypothesis of No Linear Relationship

The section title agrees with the textbook. This is a test of whether the true population regression function (assuming all the assumptions in the box on p. 564 in the textbook hold) has zero slope.

The R regression printout contains the test statistic and P-value for this test.

We redo Example 21.5 in the textbook.

The necessary regression printout is given in the output above. We merely highlight a different bit.

Coefficients: 
            Estimate Std. Error t value Pr(>|t|)     
(Intercept)   91.268      8.934  10.216  3.5e-12 *** 
Crying         1.493      0.487   3.065  0.00411 **  
--- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  

The P-value (P = 0.004) agrees with the one given in the textbook (because the book used technology on this one).

Interpretation: These data do appear to have a linear relationship. More precisely, a linear relationship is better supported than a constant relationship (a regression function with slope zero).

The test does not say anything whatsoever about the presence or absence of a nonlinear relationship.

Testing Lack of Correlation

This section is really short. Under the assumptions in the box on p. 564 in the textbook, the correlation between the two variables is zero if and only if their population regression function has zero slope.

Therefore the heading of this section and the heading of the preceding section describe exactly the same test.

Inference about Prediction

We redo Examples 21.6 and 21.9 in the textbook.

We use the R functions lm (on-line help) and predict.lm (on-line help)

External Data Entry

Enter a dataset URL :

The intervals are

Rweb:> ##### confidence interval for mean response ##### 
Rweb:> predict(out, newdata = data.frame(Beers = 5), 
+     interval = "confidence") 
           fit        lwr        upr 
[1,] 0.0771182 0.06611536 0.08812105
Rweb:> ##### prediction interval for single observation ##### 
Rweb:> predict(out, newdata = data.frame(Beers = 5), 
+     interval = "prediction") 
           fit        lwr       upr 
[1,] 0.0771182 0.03191712 0.1223193

Again, they agree with the textbook because the textbook used technology.

Multiple Regression

We use the data Example 21.4 in the textbook that was also used above.

This time we fit a quadratic regression function. We assume

Yi = α + β Xi + γ Xi2 + σ Zi

where the Zi are IID standard normal.

External Data Entry

Enter a dataset URL :

We are most interested here in a test of significance rather than a confidence interval. We wish to test

H0: γ = 0
Ha: γ ≠ 0

The null hypothesis here is that the coefficient of the quadratic term is zero, which means the linear regression done above is correct (more precisely, that a quadratic function seems no better).

The entire test is done in the printout.

Coefficients: 
             Estimate Std. Error t value Pr(>|t|)     
(Intercept) 115.86938   25.12217   4.612 5.15e-05 *** 
Crying       -1.24247    2.65611  -0.468    0.643     
I(Crying^2)   0.06828    0.06518   1.048    0.302

The quadratic term is not statistically significant (P = 0.302, two-tailed t test).

Note that adding another term (here Crying2) is easy when you use the computer. The hand calculations that would be involved in this example are so horrendous that no book written since the dawn of the computer age presents them.