Statistics 3701 (Geyer, Spring 2017) Practice Problems 4

Rules

No rules. This is practice.

Grades

No grades. This is practice.

Disclaimer

These practice problems are supplied without any guarantee that they will help you do the quiz problems. However, they were written after the quiz problems were written and with the intention that they would help.

These practice problems are also supplied without any guarantee that they are exactly or even nearly like the quiz problems. However, they are like at least some quiz problems in at least some respects.

Problem 1

The web page

http://www.bigten.org/sports/m-hockey/spec-rel/m-hockey-standings.html

contains two tables.

Get the data in these tables using the R function readHTMLTable as described in the example in Section 4 of the course notes about data. (When I was reading this web page multiple times, getting my code right, there were a few failures when I got an empty list. Just try again if this happens to you.)
Bizarrely, both tables seem to contain the same data and only one of them appears on the web page. Use R to prove that the two tables are identical except for their header rows.
For the rest of this problem we just use the first table. Divide this table into two tables at the the division in the headers where the black background becomes blue background (for anyone who cannot see these colors, the block with header Conference Games and the block with header Overall.
Make each of the resulting two data frames have proper names taken from the row of the table read from the web that has items Pts., GP, and so forth. The first two columns that does not have names in the header row should be called Place and Team or some abbreviation thereof.
Add a team names column to the second table (which started with the second column labeled GP).
Print your tables so we can see how they look.

No credit for just typing in stuff you can read yourself from the table. Imagine that your code is run automatically every week as the standings change (of course, they don't change now because the regular season is over, but they have been changing weekly since October).

You are allowed to type in one or two things. There seems to be no way to derive where to split the table in two except by looking at the web. If you view the HTML source, you can see that which headers in row 1 are supposed to cover which columns below is governed by colspan attributes of td elements, but the readHTMTable function does not preserve that information, so it isn't in the object you get from it. I also lost one of the headers for right part of the table because it went with the left part when I split it. So you can type in two things, but no more than that.

Problem 2

This problem uses the data read in by


foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/p4p2.csv")

which makes foo a data frame having variables x1, x2, and y, all of which are quantitative.

Treat y as the response to be predicted by the other two variables.

Following the example Section 3.4.1 of the course notes about statistical models

Fit a robust regression that has each of the predictor variables as main effects (no interactions).
Also fit the analogous linear regression.
We don't yet know a good way to compare the models (bootstrap, eventually), but if we really need robust regression, least squares is no good. Make a plot of residuals versus fitted values for each fit. Do you see anything fishy about either one?

Problem 3