Statistics 3701 (Geyer, Spring 2017) Quiz 7

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.

Note: Plain text specifically excludes Microsoft Word native format (extension .docx). If you have to Word as your text editor, then save as and choose the format to be Text (.txt) or something like that. Then upload the saved plain text file.

Note: Plain text specifically excludes PDF (Adobe Portable Document Format) (extension .pdf). If you use Sweave, knitr, or Rmarkdown, upload the source (extension .Rnw or .Rmd) not PDF or any other kind of output.

If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1400091.

You must be in the classroom, Armory 202, while taking the quiz.

Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1400096.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1400110.

Quiz 7

Problem 1

This problem is a redo of Homework 6, Problem 4 (the solutions for which have been posted). The only difference is that we are going to parallelize the computations using the R function mclapply in the base package parallel following Section 7.1 of the course notes about parallel computing).

You may have everything in your solution as in the solution for Homework 6, Problem 4 except for parallelization. You have to break up the work into multiple pieces, and do each piece using mclapply operating on one component of the list it is given.

On unix (Linux or Mac OS X) presumably you want the same number of pieces at the R function detectCores in the R package parallel says your computer has.

On Microsoft Windows this method of parallelization does not work. But the documentation for mclapply says the function will work (but just not do any parallelization) if the optional argument mc.cores = 1 is supplied. This allows you to do this problem satisfactorily even if you have Windows. Also you know that you could actually do parallelization this way if you ever get a real computer (TM).

Don't forget to

set the random number seed for reproducibility, and
use the special RNG that does reproducible parallel streams of random numbers.

You may ignore the fuss about warnings in the my solution to Homework 6, Problem 4.

Problem 2

This problem is a redo of the the preceding problem. (the solutions for which have been posted). The only difference is that we are going to parallelize the computations using the R function parLapply in the base package parallel (rather than the R function mclapply from the same package, which was used in the preceding problem following Section 7.2 of the course notes about parallel computing).

Unlike in the proceeding problem, this method should work equally well on Windows and unix.

In addition to the hints to the preceding problem, don't forget that you may need to use the R function clusterExport in R package parallel.

Problem 3

This problem is about Bayesian inference via Markov chain Monte Carlo (MCMC), which was covered in the course notes on that subject.

The statistical model is Cauchy location-scale, the same model that was analyzed by likelihood methods in Section 9 of the course notes on statistical models, Part II.

For the prior we are going to use what the course notes on Bayesian inference call the "method of made-up data" (which in this problem is a special case of the well known and widely used method of conjugate priors, which we did not explain in the course notes, and will not explain here).

We take the prior to be the likelihood for made-up data { −1, 0, 1 }, so the unnormalized posterior is the same as the likelihood for the real data times the the likelihood for these made-up data, or, what is the same thing, the likelihood for the concatentation of real and made-up data.

It is not obvious that this prior is proper, but I checked it by integrating it in Mathematica, and it is proper.

Simulate the posterior distribution of the parameters μ and σ using the R function metrop in the CRAN package mcmc.

Do a simulation in which there is no batching (argument blen = 1 to the R function metrop) so the result returned by metrop has component batch whose components are simulations of μ and σ rather than batch means.

From this make 95% equal-tailed Bayesian credible intervals for μ and σ, which is something not covered in the course notes, but is very simple. The credible interval for μ has endpoints that are the 0.025 and 0.975 quantiles of the simulations of μ and similarly for σ.

Do not forget the restriction σ > 0. Also do not forget that the starting state for the Metropolis algorithm must also satisfy this constraint.

For data use the same data as was used in the course notes


x <- read.csv(url("http://www.stat.umn.edu/geyer/5102/data/prob7-1.txt"))$x

It is OK if the code you submit makes plots. You do not also have to submit the plots (the code to make the plots is enough). Just explain what the point of the plots is.