The homework assignments have been reorganized into reading, doing, and thinking. Homework assignments that do not have 2024 in their dates are wrong, have not been revised from last year. When they are revised, they will get a 2024 in their dates.

Thinking

Thinking Assignment 1

Due Wednesday, Feb 14, 2024. Updated for 2024.

Upload to Canvas a one-page (or less) summary of what your class project will be. It should result in an R package that does something that some conceivable users might want to do and is not just a copy of some existing package (it could improve some existing package in some important way; it can import or require other packages so long as they do not do all the work). You should be clear about what the package will do. If you find out during development that there are better or different things for the package to do, that is OK. But you need to be concrete, not vague, at the beginning.

If you don't know where you are going, any road will get you there.

We will assign dates for class presentations at that time.

(See also Doing Assignment 1 below.)

Reading

For each reading, read and be prepared to discuss the reading in class. Every student should have two or more questions or comments about the reading to contribute to the discussion.

Reading Assignment 0

Due Friday, Jan 19, 2024. Updated for 2024.

The notes about git. Be prepared with questions about git (not necessarily related to the notes) for that day.

Reading Assignment 1

Due Monday, Jan 22, 2024. Updated for 2024.

The 3701 handout Basics of R. For anyone who wants the R markdown source, it is one level up from there http://www.stat.umn.edu/geyer/3701/notes/.

Reading Assignment 2

Due Friday, Jan 26, 2024. Updated for 2024.

The 8054 handout R as a Functional Programming Language. Also read the links in that document except where it says you do not have to.

Reading Assignment 3

Due Friday, Feb 2, 2024. Updated for 2024.

(Also involves some doing.) The subject is R packages.

Clone the github repo https://github.com/cjgeyer/foo. Read the files in R package foo. This is the file package/foo in the repo. Ignore R package fooRegister for now.

Read the book Writing R Extensions found either at CRAN under manuals or on your own computer linked on the web page shown by R function help.start.

The code in the package may be hard to read (simple though it is). The book explains. I find it helpful to go back and forth.

The book is very large (204 pages in the PDF for the current version of R) very dense, and highly technical. So you are not going to understand much of it in a week. At least get an overview of what is in it. Skim Chapter 1, which says what an R package is, Chapter 2, which says how they are documented, and Chapter 5, which describes the R, C, and Fortran APIs (what your extensions of R written in C or Fortran or C++ that uses external linkage) can use of base R.

A student complained that the above was almost uselessly nonspecific and also misleading since some of Chapter 6 is also necessary. So here is more specificity about sections of Chapters 1, 5, and 6 that are necessary for understanding R packages foo and fooRegister even though the latter is not part of this assignment.

Get at least some idea of what an R package looks like and what it is required to have in it (and some optional things too).

This does not cover CRAN package Rcpp, which is a whole nother kettle of fish. If the class wants to look at that https://cran.r-project.org/package=Rcpp, we can, but not this assignment.

(The doing part.) One cannot understand R without trying it out, as you often see me do in class. You have to look at things.

At the very least you want to do

     R CMD build foo
     R CMD check foo_*.tar.gz
and perhaps
     R CMD check foo_*.tar.gz --as-cran
although this last check is supposed to be done with the development version of R if you were really going to submit this to CRAN (which we are not because it is just a toy).

You could also substitute any other package for foo. But this one is about as simple as packages come.

Reading Assignment 4

Due Friday, Feb 9, 2024. Updated for 2024.

More reading in R packages. None of this material is essential if you are not doing what these packages demo. Hence some of you may not ever need this material. But some may. So we will go over this quickly, but still think of questions to ask.

Reading Assignment 5

Due Friday, Feb 16, 2024. Updated for 2024.

The 3701 handouts Computer Arithmetic and Zero-Truncated Poisson Distribution.

Reading Assignment 6

Due Friday, Feb 23, 2023. Updated for 2024.

Reading Assignment 7

Due Monday, Feb 26, 2024. Updated for 2024.

8054 notes on Web Scraping

Reading Assignment 8

Due Monday, Mar 13, 2024. Updated for 2024.

8054 notes on Isotonic Regression

These notes assume you have already gone over the slide deck on optimization. Hence the due date may slip depending on how fast we can cover these slides.

Reading Assignment 9

Due Friday, Mar 22, 2024. Updated for 2024.

Chapter 1, Introduction to Markov Chain Monte Carlo from the Handbook of Markov Chain Monte Carlo by your humble instructor. Sections 1.1 through 1.12.

And a handout written in response to a question from the class last year on Autocovariances in MCMC. This expands on Sections 1.10.2 and 1.10.3 in the preceding reading.

Reading Assignment 10

Due Monday, Mar 25, 2024. Updated for 2024.

Chapter 1, Introduction to Markov Chain Monte Carlo from the Handbook of Markov Chain Monte Carlo by your humble instructor. The rest of the chapter not in the preceding assignment: Sections 1.13 to the end of the chapter.

Also some announcements of permanent interest on the course home page.

Reading Assignment 11

Due Monday, Apr 1, 2024. Updated for 2024.

Stat 3701 Lecture Notes: Bayesian Inference via Markov Chain Monte Carlo (MCMC), Section 9. The rest of this handout is hopefully interesting reading but probably already known from other courses (Bayesian inference). Read it if you want to, and even ask questions about if you want to, but the only required reading is Section 9, which contains all the computing.

Reading Assignment 12

Due Friday, Apr 5 Updated for 2024.

Parallel Computing.

Doing

Doing Assignment 1

Due last day of class (Monday, April 29, 2024) at one minute before midnight. Updated for 2024.

Consists of a git repo on github.umn.edu that contains your R package for your class project. The repo can be public or private, your choice. Your package must contain a vignette. You will also do a 20 minute talk with 5 more minutes for questions on the date assigned. We will do 2 talks a day, for as long as that takes to do the class. The talk should be aimed at the class (not at your research advisor).

Doing Assignment 2

Due Wednesday, Mar 13, 2024. Updated for 2024.

The class, working as a team, will write an R package that provides four functions — call them pxkcd, qxkcd, dxkcd, and rxkcd — that are the usual p, q, d, and r functions for a family of distributions known to R (like pnorm etc.).

This is the distribution introduced by xkcd comic 2118. Suppose f is the probability density function (PDF) of a normal distribution, and we have a random vector (X, Y) uniformly distributed on the region bounded above by the graph of f and bounded below by the horizontal axis. Then the marginal distribution of X is this normal distribution, the conditional distribution of Y given X is uniform on the interval from zero to f(X). The marginal distribution of Y is the xkcd distribution that we want R functions for.

Let h(y) be the distance from the mean of X to either of the points where f(x) = y. Then the distribution function (DF) of Y is

G(y) = 1 − F(μ + h(y)) + F(μ − h(y)) + 2 y h(y),    0 < y ≤ f(μ)

and Mathematica says the probability density function (PDF) of Y simplifies to

g(y) = 2 h(y),    0 ≤ y ≤ f(μ)

(note that this does not depend on μ so the only parameter of this distribution is σ).

Generating random variates from the distribution of Y is easy if done as described above (generate X normal, and then generate Y using the uniform conditional distribution of Y given X).

Computing quantiles of the distribution of Y is hard. I can see no way other than numerically inverting the DF.

The class should use the pattern where everyone works in their own repo on github.umn.edu we will divide up the work so that four students write each of the four functions and the parts of the documentation and tests related to their function and the fifth student is the BDFL of the project responsible for making the whole thing work and that everything is done.

For this assignment you can do all code in R. No C unless you want to.

You should have tests that demonstrate your code works.

You should model the help page for your functions after the help page for the normal distribution mutatis mutandis (changing what needs to be changed). In particular,

Doing Assignment 3

Due wednesday, Mar 27, 2023. Cancelled.

This homework is the same as the previous homework with the following changes.

Doing Assignment 4

Due Monday, Apr 29, 2024. Updated for 2024.

This uses MCMC. It is quite complicated, so I wrote it up in Rmarkdown.

There is no R package to be done for this assignment. Turn in a single Rmarkdown, knitr, or Sweave (whichever you prefer) file. You should still use github.umn.edu to manage your code and hand in your code.