Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.

Note: Plain text specifically excludes Microsoft Word native format (extension .docx). If you have to Word as your text editor, then save as and choose the format to be Text (.txt) or something like that. Then upload the saved plain text file.

Note: Plain text specifically excludes PDF (Adobe Portable Document Format) (extension .pdf). If you use Sweave, knitr, or Rmarkdown, upload the source (extension .Rnw or .Rmd) not PDF or any other kind of output.

If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1310928.

You must be in the classroom, Armory 202, while taking the quiz.

Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1310947.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1310954.

Quiz 4

Problem 1

Scrape the data from the table(s) in the following web page

following the example Section 4 of the course notes about data

And answer the following questions. Your answers must be computed entirely using R operating on that web page. Simply reading the answers yourself gets no credit. You have to tell R how to get the answers. Print your answers so the grader can see them.

Note that the R function readHTMLTable in the CRAN package XML that was used for that example reads in all items in the table as character strings. You will have to convert them to numbers if you want to use them as numbers. The R function as.numeric will convert character strings to numbers if they are numbers.

  1. Read the data in this web page, convert numeric columns in the tables to type "numeric".
  2. In the conference a win counts 3 points, a loss zero points, and a tie counts either 1 or 2 points depending on a shootout. Verify that the points are calculated correctly (SOW stands for shootout wins).
  3. Outside the conference, shootout wins don't count. A win counts 2 points and a tie one point. What would the points be if they were counted for overall records? Associate team names with the numbers you calculate so we can see which is which.
  4. Since the teams did not play the same number of non-conference games, adjust the numbers calculated in part (c) by dividing by games played.

Problem 2

This problem uses the data read in by


foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/q4p2.csv", stringsAsFactors = FALSE)

which makes foo a data frame having variables speed (quantitative), state (categorical), color (categorical), and y (zero-or-one).

Treat y as the response to be predicted by the other three variables.

Following the example Section 3.3 of the course notes about statistical models fit a GLM that has each of the predictor variables as main effects (no interactions).

Perform tests of statistical hypotheses about whether each of these variables can be dropped from the model without making the model fit the data worse.

Interpret the P-values for these tests. What model do they say is the most parsimonious model that fits the data?

Problem 3

This problem uses the data read in by


foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/q4p3.csv")

which makes foo a data frame having variables x and y both quantitative.

Treat y as the response to be predicted by x.

Following the example Section 3.4.2.3 of the course notes about statistical models fit a GAM that assumes the conditional mean of y given x is a smooth function (but no parametric assumptions about this smooth function).

On a scatter plot of the data, add lines that are the lower and upper end points of 95% confidence intervals for the mean of y given x for each value of x. As in the example in the course notes, do not adjust these intervals to obtain simultaneous coverage.

This is the first question that asks for a plot. For this question, not only upload your R code, but also the plot as a PDF file called q4p3.pdf.

Also give numeric 95% confidence intervals for the conditional mean of y given x for the x values 0, 20, 40, 60, 80, 100.