The homework assignments
have been reorganized into reading, doing,
and thinking.
Homework assignments that do not have 2024 in their dates are wrong,
have not been revised from last year. When they are revised, they will
get a 2024 in their dates.
Thinking
Thinking Assignment 1
Due Wednesday, Feb 14, 2024. Updated for 2024.
Upload to Canvas a one-page (or less) summary of what your class project will be. It should result in an R package that does something that some conceivable users might want to do and is not just a copy of some existing package (it could improve some existing package in some important way; it can import or require other packages so long as they do not do all the work). You should be clear about what the package will do. If you find out during development that there are better or different things for the package to do, that is OK. But you need to be concrete, not vague, at the beginning.
If you don't know where you are going, any road will get you there.
We will assign dates for class presentations at that time.
(See also Doing Assignment 1 below.)
Reading
For each reading, read and be prepared to discuss the reading in class. Every student should have two or more questions or comments about the reading to contribute to the discussion.
Reading Assignment 0
Due Friday, Jan 19, 2024. Updated for 2024.
The notes about git. Be prepared with questions about git (not necessarily related to the notes) for that day.
Reading Assignment 1
Due Monday, Jan 22, 2024. Updated for 2024.
The 3701 handout Basics of R. For anyone who wants the R markdown source, it is one level up from there http://www.stat.umn.edu/geyer/3701/notes/.
Reading Assignment 2
Due Friday, Jan 26, 2024. Updated for 2024.
The 8054 handout R as a Functional Programming Language. Also read the links in that document except where it says you do not have to.
Reading Assignment 3
Due Friday, Feb 2, 2024. Updated for 2024.
(Also involves some doing.) The subject is R packages.
Clone the github repo https://github.com/cjgeyer/foo.
Read the files in R package foo
.
This is the file package/foo
in the repo.
Ignore R package fooRegister
for now.
Read the book Writing R Extensions found either at CRAN
under manuals or on your own computer linked on the web page
shown by R function help.start
.
The code in the package may be hard to read (simple though it is). The book explains. I find it helpful to go back and forth.
The book is very large (204 pages in the PDF for the current version of R) very dense, and highly technical. So you are not going to understand much of it in a week. At least get an overview of what is in it. Skim Chapter 1, which says what an R package is, Chapter 2, which says how they are documented, and Chapter 5, which describes the R, C, and Fortran APIs (what your extensions of R written in C or Fortran or C++ that uses external linkage) can use of base R.
A student complained that
the above
was almost uselessly nonspecific and also misleading since some of
Chapter 6 is also necessary.
So here is more specificity about sections of Chapters 1, 5, and 6 that
are necessary for understanding R packages foo
and
fooRegister
even though the latter is not part of this
assignment.
- Section 1.5 Package namespaces, especially
- Section 5.2 Interface functions .C and .Fortran
- Section 5.4 Registering native routines (only needed to understand package
fooRegister
so not for this assignment) - Section 5.9 Handling R objects in C (only needed to understand C functions called from R via the
.Call
function) - Section 5.10 Interface functions
.Call
and.External
(only needed to understand C functions called from R via the.Call
function) especially - Section 6.3 Random number generation (only needed to understand C functions called from R that use R's random number generators)
- Section 6.6 Calling C from FORTRAN and vice versa (only needed to understand Fortran functions called from R that use R's random number generators)
- Section 6.16 Controlling visibility (only needed to understand package
fooRegister
so not for this assignment)
Get at least some idea of what an R package looks like and what it is required to have in it (and some optional things too).
This does not cover CRAN package Rcpp, which is a whole nother kettle of fish. If the class wants to look at that https://cran.r-project.org/package=Rcpp, we can, but not this assignment.
(The doing part.) One cannot understand R without trying it out, as you often see me do in class. You have to look at things.
At the very least you want to do
R CMD build foo R CMD check foo_*.tar.gzand perhaps
R CMD check foo_*.tar.gz --as-cranalthough this last check is supposed to be done with the development version of R if you were really going to submit this to CRAN (which we are not because it is just a toy).
You could also substitute any other package for foo
. But this
one is about as simple as packages come.
Reading Assignment 4
Due Friday, Feb 9, 2024. Updated for 2024.
More reading in R packages. None of this material is essential if you are not doing what these packages demo. Hence some of you may not ever need this material. But some may. So we will go over this quickly, but still think of questions to ask.
- R package
fooRegister
in github repofoo
(which you should have already cloned). There is little to see here. Other than obvious changes (the name of the package changes wherever it is used), there are the following non-obvious ones.-
Package
fooRegister
in its directorysrc
has filesinit.c
andfoo.h
andMakevars
that register native routines. -
In the
NAMESPACE
file, theuseDynLib
statement is different to allow R symbols through which native routines can be called to be created. - In all
*.R
files in theR
directory all.C
,.Fortran
, and.Call
invocations are different because of the preceding item.
-
Package
- R package
bar
in repository bar, which is also accompanied by a gist about regression packages. These explain how to write R packages that do some sort of regression. - R package
baz
in repository mat. This explains how to callblas
andlapack
functions, which are the best available codes for linear algebra, in C called from R. - We will also go over slides about numerical linear algebra
- R package
qux
in repository qux. This explains how to call R functions from inside C called from R, like R functions that do optimization, integration, simulation, and bootstrap do.
Reading Assignment 5
Due Friday, Feb 16, 2024. Updated for 2024.
The 3701 handouts Computer Arithmetic and Zero-Truncated Poisson Distribution.
Reading Assignment 6
Due Friday, Feb 23, 2023. Updated for 2024.
- 8054 notes on SQL databases (you can ignore the stuff about the SQL programming language).
- 3701 notes on using R package
dplyr
to talk to an SQL database (Section 7.5 of this page only). - Website for R package
dplyr
. Look this over. - 3701 notes on JSON API's (Section 5 of this page only).
Reading Assignment 7
Due Monday, Feb 26, 2024. Updated for 2024.
Reading Assignment 8
Due Monday, Mar 13, 2024. Updated for 2024.
8054 notes on Isotonic Regression
These notes assume you have already gone over the slide deck on optimization. Hence the due date may slip depending on how fast we can cover these slides.
Reading Assignment 9
Due Friday, Mar 22, 2024. Updated for 2024.
Chapter 1, Introduction to Markov Chain Monte Carlo from the Handbook of Markov Chain Monte Carlo by your humble instructor. Sections 1.1 through 1.12.
And a handout written in response to a question from the class last year on Autocovariances in MCMC. This expands on Sections 1.10.2 and 1.10.3 in the preceding reading.
Reading Assignment 10
Due Monday, Mar 25, 2024. Updated for 2024.
Chapter 1, Introduction to Markov Chain Monte Carlo from the Handbook of Markov Chain Monte Carlo by your humble instructor. The rest of the chapter not in the preceding assignment: Sections 1.13 to the end of the chapter.
Also some announcements of permanent interest
on the course home page.
Reading Assignment 11
Due Monday, Apr 1, 2024. Updated for 2024.
Stat 3701 Lecture Notes: Bayesian Inference via Markov Chain Monte Carlo (MCMC), Section 9. The rest of this handout is hopefully interesting reading but probably already known from other courses (Bayesian inference). Read it if you want to, and even ask questions about if you want to, but the only required reading is Section 9, which contains all the computing.
Reading Assignment 12
Due Friday, Apr 5 Updated for 2024.
Doing
Doing Assignment 1
Due last day of class (Monday, April 29, 2024) at one minute before midnight. Updated for 2024.
Consists of a git repo on github.umn.edu
that
contains your R package for your class project.
The repo can be public or private, your
choice. Your package must contain a vignette.
You will also do a 20 minute talk with 5 more minutes for questions
on the date assigned. We will do 2 talks a day, for as long as that
takes to do the class. The talk should be aimed at the class (not at
your research advisor).
Doing Assignment 2
Due Wednesday, Mar 13, 2024. Updated for 2024.
The class, working as a team, will
write an R package that provides four functions — call them
pxkcd
,
qxkcd
,
dxkcd
, and
rxkcd
— that are the usual p, q, d, and r functions for
a family of distributions known to R (like pnorm
etc.).
This is the distribution introduced by xkcd comic 2118. Suppose f is the probability density function (PDF) of a normal distribution, and we have a random vector (X, Y) uniformly distributed on the region bounded above by the graph of f and bounded below by the horizontal axis. Then the marginal distribution of X is this normal distribution, the conditional distribution of Y given X is uniform on the interval from zero to f(X). The marginal distribution of Y is the xkcd distribution that we want R functions for.
Let h(y) be the distance from the mean of X to either of the points where f(x) = y. Then the distribution function (DF) of Y is
and Mathematica says the probability density function (PDF) of Y simplifies to
(note that this does not depend on μ so the only parameter of this distribution is σ).
Generating random variates from the distribution of Y is easy if done as described above (generate X normal, and then generate Y using the uniform conditional distribution of Y given X).
Computing quantiles of the distribution of Y is hard. I can see no way other than numerically inverting the DF.
The class should use the pattern where everyone works in their own
repo on github.umn.edu
we will divide up the work so that
four students write each of the four functions and the parts of the
documentation and tests related to their function and the fifth student
is the
BDFL
of the project responsible for making the whole thing work and that
everything is done.
For this assignment you can do all code in R. No C unless you want to.
You should have tests that demonstrate your code works.
You should model the help page for your functions after the help page for the normal distribution mutatis mutandis (changing what needs to be changed). In particular,
- you do not need a parameter
mean
— changing the mean of X does not change the distribution of Y — and may want to change the name of the parametersd
to reflect that it is the standard deviation of X not Y, - you do not need the
concepts
on the normal distribution page — they refer only to the normal distribution — - you do need the arguments
log
andlog.p
, - the argument
lower.tail
is almost useless for this distribution because the upper bound of the range is finite and nonzero and the distribution is not symmetric — it cannot provide high accuracy for computations near the upper bound — so instead of argumentlower.tail
, provide the argumentswap.end.points = FALSE
which whenTRUE
indicates that all computations are about the random variable W = 1 / (σ √ 2 π) − Y rather than Y, that is, all computations are about deviation from the upper end of the range of Y, in particular whenswap.end.points = TRUE
- function
dxkcd
should calculate the PDF of W (or the log thereof whenlog = TRUE
), - function
pxkcd
should calculate the DF of W (or the log thereof whenlog.p = TRUE
), - function
qxkcd
should calculate the quantile function of W, and - function
rxkcd
should simulate random realizations of W (and this function has nolog
orlog.p
argument). - Functions
pxkcd
andqxkcd
should be inverse functions of their first arguments when the rest of the arguments are the same and the probabilities involved are strictly between zero and one.
- function
- you should have some equations in your details section, and
- you should reference the xkcd cartoon in your references section.
Doing Assignment 3
Due wednesday, Mar 27, 2023. Cancelled.
This homework is the same as the previous homework with the following changes.
- Put the core of the computation in C or C++ (you humble instructor
recommends C). You may use either
.C
or.Call
to call your C code from R. You need only put the inner loop in C. In your C- Generate Random numbers (for function
rxkcd
). - Do case splitting for calculations (react to arguments
log
,log.p
, andswap.end.points
). - Have a loop running over components of the result (generate the whole result vector in one call to C).
- Generate Random numbers (for function
- You probably want to work in the same git repo as for the previous doing assignment. That's fine. Just make it clear from the commit messages which commit is assignment 2 and which commit is assignment 3.
Doing Assignment 4
Due Monday, Apr 29, 2024. Updated for 2024.
This uses MCMC. It is quite complicated, so I wrote it up in Rmarkdown.
There is no R package to be done for this assignment. Turn in a single Rmarkdown, knitr, or Sweave (whichever you prefer) file. You should still use github.umn.edu to manage your code and hand in your code.