Statistics 8701 (Geyer, Spring 2003) Homework No. 3

Policy

This assignment will be handed in as a regular paper assignment, although you can hand in code by giving a URL where it can be found on the web or a location in your home directory on the School of Statistics Linux boxes.

For the programming problems, you should hand in enough so that it is clear exactly what you did.

The rules for grading this stuff is that only perfect is passing. If I find anything wrong, you have to debug the problem and hand in a fix (for that problem or part of a problem). And the process will be repeated until everything is correct. Thus everyone will eventually do every assignment perfectly (if they keep working at it).

You are not in competition with each other for grades. There's no reason why everyone shouldn't get an A. If you don't get A, that's because you either failed to complete some assignments or took some huge number of iterations to complete some.

This assignment is due (initially) Wednesday, February 19, 2002.

Problem 1

(Continues Problems 2 and 3 of Homework 2)

Reimplement your

rar(rho, sigma, n, nburn, x)

function so that the looping is in C. There should be no loops in your R function.

Another requirement is that your C function should not use nburn space when nburn is big. If length(rho) is k you should need space n + 2 * k or something like that. This will require some clever reuse of space if nburn is big.

Problem 2

Modify your rar package to have use the C function written for Problem 1. (You can keep both solutions in your package if you like).

Add a tests directory to your package and some test code that makes the function work. Exercise all the options (nburn and x) in different tests.

One student already asked me about how to create new memory in C. I don't know whether you actually want to do this in this case (I don't plan to in my solution). An alternative is just to pass in extra arguments (workspace vectors) in the .C call. But if you want to try it. The way to do is is with R_alloc described in the writing R extensions book.

For example, suppose you want a vector of type double and length n named fred. The C code

double *fred = (double *) R_alloc(n, sizeof(double));

creates it. In order to do this, you need the include

#include <R.h>

at the beginning of your program. After the R_alloc statement has been executed, foo[i] can be assigned to and read from for any integer i between zero and n - 1 (the usual C zero-origin indexing). References to foo[i] for other values of i cause a crash or data corruption (the usual behavior of C).

For those of you who know something about C (have used malloc) there is no need to free the memory returned by R_alloc. It is automagically freed when your function returns. If you want to do complicated memory management you need to use other stuff in the writing R extensions book.

Problem 3

Suppose we have binomial data, sample size 10, and there are 3 successes. Suppose I am a Bayesian using the flat (improper) prior on canonical parameter θ = logit(p) = log(p / (1 − p)). That is, the posterior density is the likelihood renormalized to be a probability density.

If I were to use a flat prior on p, the posterior would be beta (found in master's level theory books). When I use a flat (improper) prior on θ the posterior is not a brand name distribution. So we use Monte Carlo.

Write a simulator for this distribution using rejection sampling (what Gentle calls an acceptance/rejection method). Be sure to do the appropriate theory as well as the computing, that is, also prove that in Gentle's notation in his Algorithm 2.4

p_X(y) ≤ c g_Y(y), for all y

for the c and g_Y that you choose.

Do not hard code the numbers 10 and 3 into your function. Always solve the general problem! Use variables n and x instead and invoke your function with argument values 10 and 3. But this leads to the problem that for data x = 0 or x = n this improper prior leads to an improper posterior (!). That is, to utter rubbish. So you will need a check and an error message for this. (Our motto: garbage in, error messages out).

Here are some helpful hints about this distribution.

log unnormalized posterior is just another name for log likelihood for this (canonical) parameterization.
the log likelihood is approximately quadratic in the middle near the MLE, with a maximum at the MLE (duh!) and second derivative minus Fisher information.
The second derivative − n p(θ) [1 − p(θ)] of the log unnormalized posterior is negative everywhere, hence this function is strictly concave, hence the posterior is unimodal.
The second derivative goes to zero as θ goes to plus or minus infinity. Hence the tails of the distribution look like those of an exponential distribution rather than a normal. (The posterior looks normal in the middle but exponential in the tails).
The first derivative of the log likelihood is
x − n p(θ)
goes to x as θ goes to minus infinity and to x − n as θ goes to plus infinity. That tells you how the tails of the posterior behave.

Problem 4

Suppose I observe a vector x that is assumed to be i. i. d. normal with unknown mean μ and unknown variance σ² and I wish to be Bayesian. In this problem I use a proper posterior.

It is easier (meaning we get brand-name distributions) if I use the parameter λ = 1 / σ² which is called precision (inverse variance). That is, the data are assumed i. i. d. Normal(μ 1 / λ).

The conjugate prior (the normal-gamma family or t-gamma family depending on how you factor it) is described in Bayes books. It has a complicated prior dependence between μ and λ

I don't want to use the conjugate prior, I think μ and λ are independent a priori with marginal distributions Normal(&mu₀, 1 / &lambda₀) and Gamma(a, b), respectively.

Write a Gibbs sampler to do this problem. You will find that the posterior conditional distribution of μ given λ is normal (with some parameters that depend on λ that you have to figure out) and the posterior conditional distribution of λ given μ is gamma (with some parameters that depend on μ that you have to figure out).

Thus simulating μ from it posterior conditional, and then λ from its posterior conditional (with parameters depending on the just changed value of μ) preserve the posterior distribution.

Use the following code to make up data and hyperparameters

set.seed(42)
n <- 11
x <- rnorm(n, 15.5, 5)
mu0 <- 0.0
lambda0 <- 0.01
a <- 2.5
b <- 200

But, as in Problem 3, don't hard code these numbers. Write a function that takes general values of x, mu0, lambda0, a, b.

Your function should also take general values of

nsamp, the number of (μ, λ) pairs simulated.
nburn, the number of iterations of the Markov chain (number of μ-λ updates) thrown away before sampling begins.
nspac, the spacing of the samples (explained below).

If X_i, i = 1, 2, ... is a Markov chain, then so is the subsampled chain X_{m i}, i = 1, 2, ... for any positive integer m. We call m the spacing of the Markov chain samples. It is the nspac argument above. The case nspac = 1 gives no subsampling.