Statistics 8701 (Geyer) Spring 2002 Homework No. 2

Policy

This assignment will be handed in as a regular paper assignment, although you can hand in code by giving a URL in the assignment or a location in the filesystem of our linux boxes (either the department linux workstation network for students in our department or grunard.stat.umn.edu for the three students with accounts on that.

For the programming problems, you should hand in enough so that it is clear exactly what you did.

The rules for grading this stuff is that only perfect is passing. If I find anything wrong, you have to debug the problem and hand in a fix (for that problem or part of a problem). And the process will be repeated until everything is correct. Thus everyone will eventually do every assignment perfectly (if they keep working at it).

You are not in competition with each other for grades. There's no reason why everyone shouldn't get an A. If you don't get A, that's because you either failed to complete some assignments or took some huge number of iterations to complete some. (I've never tried this form of grading before, so I can't be more specific. After a few assignments I can say more.)

This assignment is due (initially) Friday, February 8, 2002.

Problem 1

Problem 2.10 in Robert and Casella, except ignore the part about the Kiss algorithm. Just implement completely in R using the default random number generator and an implementation of the generalized inverse CDF of the distribution of interest. For example, if gicdf is an implementation of the generalized inverse CDF that works properly on vector arguments, then

    x <- gicdf(runif(n))

produces a vector of length n of random variates having the distribution of interest. Try to implement each of the examples this way.

The extreme value distribution (also called Fisher-Tippett distribution) has the CDF

F(x) = exp[- exp((a - x) / b)]

for arbitrary parameters a and b with b > 0.

The Pareto distribution has the CDF

F(x) = 1 - (b / x)^a

for x > b > 0 and a > 0.

The arcsine distribution has the CDF

F(x) = (2 / π) sin^-1(x^1/2)

for 0 < x < 1.

Problem 2

Write a C function myrunif and a corresponding R function does the same as the R function runif. That is, your function should have the prototype

void myrunif(double *x, int *n)

and each call should set x to be a vector of n[0] uniform random numbers, so that, after you have dynamically loaded myrunif.so, the R commands

n <- 7
invisible(runif(1))
save.seed <- .Random.seed
runif(n)
print(.Random.seed)
.Random.seed <- save.seed
.C("myrunif", x = double(n), n = as.integer(n))$x
print(.Random.seed)

should produce exactly the same results.

Section 5.3 in the Writing R Extensions book tells how to do this.

Problem 3

Problems 2.12 and 2.13 in Robert and Casella. For the small experiment mentioned in 2.13(b), there is really no point unless both algorithms are coded in C. So do that. Write both algorithms as C just like you did for Problem 2.

Use the R system.time function to time your routines. Use a larger enough sample size that the running time is dominated by the time of the random variate generation (the start-up and clean-up overhead is negligible).

Problem 4

Problem 2.32(b) in Robert and Casella. This one can be done efficiently in R.

The R function chol computes the Choleski factorization of a symmetric, positive definite, square matrix.

The R operator %*% does matrix multiplication.

In case the problem statement is still not clear, the problem is how to define a matrix A so that

z <- rnorm(n)
x <- A %*% z

produces a multivariate normal random vector x with a specified covariance matrix M. That is how do you produce A given M?

Problem 5

Problem 2.38 in Robert and Casella. No computing here, just theory.

In case the problem statement is not clear

f(x) = (2 / &pi)^1/2 exp(- x² / 2)

and

g(x) = λ exp(- λ x)

both for x > 0.