Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.

Note: Plain text specifically excludes Microsoft Word native format (extension .docx). If you have to Word as your text editor, then save as and choose the format to be Text (.txt) or something like that. Then upload the saved plain text file.

Note: Plain text specifically excludes PDF (Adobe Portable Document Format) (extension .pdf). If you use Sweave, knitr, or Rmarkdown, upload the source (extension .Rnw or .Rmd) not PDF or any other kind of output.

If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1279315.

You must be in the classroom, Armory 202, while taking the quiz.

Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1279330.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1279355.

Quiz 3

Problem 1

Write an R function that, like the example in Section 8 of the course notes about computer arithmetic except that we want it to be for the geometric distribution rather than the binomial distribution.

The geometric distribution has parameter θ and data x and log likelihood

l(θ) = x θ + log(1 − exp(θ))

where, as always, exp denotes the exponential function and log the natural logarithmic function (as in R). The data x is nonnegative-integer-valued (0, 1, 2, …). The parameter θ is negative-real-valued (− ∞ < θ < 0).

Your function should have signature

    logl(theta, x)
and return a list having components

Like the example in the course notes, the point is to be careful about computer arithmetic, avoiding overflow and catastrophic cancellation as much as possible.

For this problem, you do not have to check for invalid argument values.

Show your function working for parameter values


thetas <- (- 10^seq(3, - 3))

and for data values both

x <- 0

and

x <- 5

Problem 2

Test your function for the preceding problem like the example in Section 8 of the course notes about computer arithmetic

To test with the function value compare with the values of the function

logl.too <- function(theta) dgeom(x, 1 - exp(theta), log = TRUE)
and to test the derivatives use numerical derivatives (you can also apply the other method using derivatives calculated by the R function D if you like, but that does not count; we are only going to grade the comparison to numerical derivatives).

Note that numerical differentiation does not give perfectly accurate derivatives. So there may be small discrepancies between the two methods of calculation.

Problem 3

(This problem is about finding errors in data, but that is too hard for a quiz, so we are just going to test some skills that are usefull for that.)

We will use the data


foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/q3p3.csv", stringsAsFactors = FALSE)

(This reads in a data frame having variables w, x, y, and z). The categorical variables are w and z.

In this problem we consider the answer better if it is done without using a loop.

  1. How many distinct values does w have?
  2. If you wanted to fit a linear model that regresses y on w, x, and z with z being treated as categorical (like w), what would you have have to do to make that work right?
  3. Find the largest value of y for each value of w.
  4. Find the second largest value of y for each value of w.