Statistics 5931 (Geyer, Fall 2003) Homework Assignments

Homework Assignment 1, Due Thur, Oct 9, 2003

Problem 1.1

Make a QQ plot of a supposedly chi-square random variable against the theoretical chi-square distribution. Try to make it look as much like that produced by qqnorm as possible.

R doesn't have a function to do this. Look at the code for qqnorm to see what it does. The R statements

qqnorm
methods(qqnorm)
qqnorm.default

first see the definition of qqnorm, when this turns out to be uninformative, just UseMethod("qqnorm"), we check to see what methods qqnorm has, and then look at the definition of the only one it has (why use this methods stuff when there is only one method is a mystery that only the author of the function could elucidate).

As someone brought up in class, there doesn't seem to be much point to following your QQ plot with qqline(foo). What you want to plot is the line of identity (the graph of the identity function f(x) = x). So add that line to your plot.

For test data you can use truely chi-square data produced by rchisq.

Problem 1.2

Write an R function that takes an arbitrary continuous distribution having a q function (like qnorm or qchisq) as an argument as well as some data y and produces a plot like that you made in first problem.

Be sure to include tests for bogus arguments. Again the R source for qqnorm is a good example.

Problem 1.3

Do a simulation to see whether the deviance statistic for a particular logistic regression is well approximated by a chi-square distribution. For predictor variables let us use one quantitative and one categorical variable created as follows

n <- 30
set.seed(111)
x <- rnorm(n)
w <- sample(c("red", "green", "blue"), n, replace = TRUE)
w <- as.factor(w)

and one Bernoulli response created as follows (with the random number generator in the state it was in just after creating x and w)

betax <- 1
betaw <- c(0.75, -1.5, 0.33333)
names(betaw) <- c("red", "green", "blue")
invlogit <- function(x) 1 / (1 + exp(x))
eta <- betax * x + betaw[as.character(w)]
p <- as.numeric(invlogit(eta))
y <- rbinom(n, 1, p)

  1. Fit two standard logistic regression models (standard here meaning to use the logit link function) one, the big model, having both x and w as predictors, and the other, the little model, having just x. Print out the model summaries.
  2. Do a likelihood ratio test for model comparison of these two models. Is there a statistically significant difference? Or in other words, which model does statistics tell us should be used? (Of course, here we know the true parameter values betax and betaw, but pretend we don't.)
  3. Do a parametric bootstrap simulation of the null distribution of the test statistic. This involves
  4. Compare the parametric bootstrap distribution of the deviance (which you stored in the vector dev) with its large-sample chi-square approximation using the QQ plot ideas you developed in preceding problems.
  5. Produce a bootstrap P-value. This is just mean(dev >= dev.obs) if dev.obs is the value of the test statistic corresponding to the observed data y. Compare this with the P-value derived from the large-sample approximation.
  6. Compute the Monte Carlo error for your bootstrap P-value. Go back and redo with a large enough sample size so your bootstrap P-value has two significant figures.