Rules
No rules. This is practice.
Grades
No grades. This is practice.
Disclaimer
These practice problems are supplied without any guarantee that they will help you do the quiz problems. However, they were written after the quiz problems were written and with the intention that they would help.
These practice problems are also supplied without any guarantee that they are exactly or even nearly like the quiz problems. However, they are like at least some quiz problems in at least some respects.
Problem 1
The web page
contains two tables.
- Get the data in these tables using
the R function
readHTMLTable
as described in the example in Section 4 of the course notes about data. (When I was reading this web page multiple times, getting my code right, there were a few failures when I got an empty list. Just try again if this happens to you.) - Bizarrely, both tables seem to contain the same data and only one of them appears on the web page. Use R to prove that the two tables are identical except for their header rows.
- For the rest of this problem we just use the first table. Divide this
table into two tables at the the division in the headers where the black
background becomes blue background (for anyone who cannot see these colors,
the block with header
Conference Games
and the block with headerOverall
. - Make each of the resulting two data frames have proper names taken
from the row of the table read from the web that has items
Pts.
,GP
, and so forth. The first two columns that does not have names in the header row should be calledPlace
andTeam
or some abbreviation thereof. -
Add a team names column to the second table (which started with the second
column labeled
GP
). - Print your tables so we can see how they look.
No credit for just typing in stuff you can read yourself from the table. Imagine that your code is run automatically every week as the standings change (of course, they don't change now because the regular season is over, but they have been changing weekly since October).
You are allowed to type in one or two things. There seems to be no way
to derive where to split the table in two except by looking at the web.
If you view the HTML source, you can see that which headers in row 1
are supposed to cover which columns below is governed by colspan
attributes of td
elements, but the readHTMTable
function does not preserve that information, so it isn't in the object
you get from it. I also lost one of the headers for right part of the
table because it went with the left part when I split it. So you can
type in two things, but no more than that.
Problem 2
This problem uses the data read in by
foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/p4p2.csv")which makes
foo
a data frame having variables
x1
, x2
, and y
, all of
which are quantitative.
Treat y
as the response to be predicted by the other two
variables.
Following the example Section 3.4.1 of the course notes about statistical models
-
Fit a robust regression that has each of the predictor variables
as
main effects
(nointeractions
). - Also fit the analogous linear regression.
- We don't yet know a good way to compare the models (bootstrap, eventually), but if we really need robust regression, least squares is no good. Make a plot of residuals versus fitted values for each fit. Do you see anything fishy about either one?
Problem 3
This problem uses the data read in by
foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/p4p3.csv")which makes
foo
a data frame having variables
x1
, x2
, and y
, all of
which are quantitative.
Treat y
as the response to be predicted by the other two
variables.
Following the example Section 3.4.2.3 of the course notes about statistical models
-
Fit a smooth additive regression that has each of the predictor variables
as
main effects
(nointeractions
). You may need to look at the help for R functiongam
to fit smooth terms of two variables. -
Fit a smooth additive regression that does have
interaction
, that is the regression function is a smooth function of both predictor variables. The examples in the help forgam
also show that. Since the options for thes
function are rather bewildering, just use the defaults. - Which model is simpler in terms of the number of degrees of freedom reported for the terms?