Statistics 3701 (Geyer, Spring 2017) Quiz 5

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.

Note: Plain text specifically excludes Microsoft Word native format (extension .docx). If you have to Word as your text editor, then save as and choose the format to be Text (.txt) or something like that. Then upload the saved plain text file.

Note: Plain text specifically excludes PDF (Adobe Portable Document Format) (extension .pdf). If you use Sweave, knitr, or Rmarkdown, upload the source (extension .Rnw or .Rmd) not PDF or any other kind of output.

If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1338881.

You must be in the classroom, Armory 202, while taking the quiz.

Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1338883.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1338886.

Quiz 5

Problem 1

The following R command


load(url("http://www.stat.umn.edu/geyer/s17/3701/data/q5p1.rda"))

loads two R objects

logl: an R function that calculates the log likelihood and up to two derivatives for data from the zero-truncated Poisson distribution. This is copied exactly from the solution to Problem 4 on Homework 3.
x: an R numeric vector that is independent and identically distributed (IID) zero-truncated Poisson data.

Note that this logl function has multiple arguments

> args(logl)
function (theta, x)

neither of which vectorizes. Here we have a vector of IID data, but our function only works for sample size 1. Moreover it returns a list (components function value, gradient, and hessian) rather than a number so it cannot be vectorized using the R function Vectorize. Nevertheless, a function that calculates log likelihood for vector data can be easily made using the theory in section about log likelihood for IID data in the course notes (the log likelihood for the whole data is the sum of the values for logl applied to each term in the data, and similarly for derivatives if you want to use them, which you probably don't because there is no extra credit for doing so).

You do not have to use the R function logl loaded here to do this problem. For example, you could cut-and-paste the code into an editor and modify it somewhat to do what this problem requires. But my solution will use it. Also this function is well tested, so it is perhaps better not to modify it.

Find the MLE for these data. A fairly good starting point for these data would be the MLE for the untruncated data model, which is log(mean(x)).

This starting point is inconsistent, but it can be shown that the log likelihood for this model has a unique local maximizer which is also the global optimizer, so any local maximum is the "right" local maximum. It doesn't actually matter where you start (in this particular problem).

Possibly helpful references

example of finding an MLE
google "sapply" in the notes (I used sapply in my solution; you don't have to).
help("sapply") in R (I used sapply in my solution; you don't have to).

Problem 2

This problem continues where the preceding problem left off. Make a 90% (note not 95%) confidence interval for the true unknown θ using a Wald interval (point estimate plus or minus critical value times standard error) using observed Fisher information to calculate standard error, which is described in Section 3.2.4.4.8.1 of the course notes on models, part II and exemplified in Section 5.4.2 of the course notes on models, part II.

Problem 3

This problem also continues where problem 1 left off. Make a 90% (note not 95%) confidence interval for the true unknown θ using a Wilks interval (a level set of the log likelihood), which is described in Section 3.2.4.4.8.2 of the course notes on models, part II and exemplified in Section 5.4.4 of the course notes on models, part II.