next up previous
Up: Stat 5102

The data in the file

  http://www.stat.umn.edu/geyer/5102/highway.dat
are taken from the book Applied Linear Regression, by Weisberg. They are originally from a masters paper in civil engineering. The data are on the automobile accident rate and 13 other variables on 39 sections of large highways in Minnesota in 1973.

RATE automobile accidents per million vehicle miles
LEN length of the highway segment in miles
ADT average daily traffic in thousands of vehicles (estimated)
TRKS truck volume as a percent of total volume
SLIM speed limit
LWID lane width in feet
SHLD outer shoulder width in feet
ITG number of freeway-type interchanges per mile in the segment
SIGS number of signalized interchanges per mile in the segment
ACPT number of access points per mile in the segment
LANE number of lanes (both directions)
FAI one indicates federal aid highway (otherwise zero)
PA one indicates principal arterial highway (otherwise zero)
MA one indicates major arterial highway (otherwise zero)

The question of interest is whether any of the other variables are associated with accident rate.

We want to use this data set as an example for all subsets regression. The commands


y <- RATE
x <- as.matrix(X[ , -1])
will give you a response vector y and a design matrix x for these data (if you have read the data set into Rweb from the URL). Then you can follow Section 12.7 of the notes.
  1. Make a plot like Figure 12.9 in the notes (there is a problem with this plot: the $C_p$ value for the best model is actually negative and so doesn't appear on the plot if we use a log scale. Try omitting the log="y" argument. To get a better view in the area where the action takes place you can cut off the top of the plot by adding the argument ylim=c(-1,20).
  2. Get the printout for the best model (according to the $C_p$ criterion).
  3. What does the $C_p$ plot tell you about how many good models there are?
  4. Can we conclude that the best model according to the $C_p$ criterion tells us what variables influence the accident rate? If so, what variables do influence the accident rate?


next up previous
Up: Stat 5102
Charles Geyer
2001-04-23