\documentclass[11pt]{article} \usepackage{indentfirst} \usepackage{natbib} \usepackage[utf8]{inputenc} \RequirePackage{amsmath} \DeclareMathOperator{\var}{var} \begin{document} \title{Stat 8931 Flu Homework Solution, Part 1} \author{Charles J. Geyer} \maketitle <>= options(keep.source = TRUE, width = 65) @ \section{Setup} First we load the library. <>= library(mcmc) @ Then we load the data, copied from Table~1 in \citet{flu}. <>= data <- read.table("flu.txt", header = TRUE) nobs <- data$nobs data$nobs <- NULL ypat <- as.matrix(data) ypat <- ypat[nobs > 0, ] nobs <- nobs[nobs > 0] @ Then we specify the design matrices <>= x <- diag(4) z <- rbind( c(1, 1, 1, 0, 0, 0), c(1, 1, 0, 1, 0, 0), c(1, 1, 0, 0, 1, 0), c(1, -1, 0, 0, 0, 1) ) idx <- c(1, 2, 3, 3, 3, 3) @ \subsection{Putative MLE} \citet{flu} give the following MLE for this model. <>= beta.putative <- c(-4.0, -4.4, -4.7, -4.5) sigmasq.putative <- 4.05^2 rho1.putative <- 0.43 rho2.putative <- (-0.25) delta3.putative <- sqrt(sigmasq.putative * (1 - rho1.putative)) delta2.putative <- sqrt(sigmasq.putative * (rho1.putative - rho2.putative) / 2) delta1.putative <- sqrt(sigmasq.putative * (rho1.putative + rho2.putative) / 2) delta.putative <- c(delta1.putative, delta2.putative, delta3.putative) beta.putative delta.putative @ \section{The Assigned Homework} Check their MLE using the \verb@metrop@ function in the MCMC package to calculate the score. More precisely if $f_\theta(b, y)$ is the complete data joint density (you have to work this out for yourself, although it is described implicitly in the first section), do the following. \begin{enumerate} \item Simulate using the \verb@metrop@ function the conditional distribution of $b$ given $y$ under the parameter value $\theta$ (which denotes a vector of seven parameters, four betas and three deltas). \item If $b_1$, $b_2$, $\ldots$ are the resulting Markov chain, calculate \begin{equation} \label{eq:fred} \frac{1}{n} \sum_{i = 1}^n \nabla \log f_\theta(b_i, y) \end{equation} where $\nabla$ denotes differentiation w.~r.~t.\ $\theta$ (and hence is a seven-dimensional vector of partial derivatives). \item Calculate MCSE (using the method of batch means or any other valid method) for each component of \eqref{eq:fred}. \item You should get zero (to within MCSE) for \eqref{eq:fred}, assuming \citet{flu} are correct, which all the calculations I have done seem to support. As statisticians we know that you can never ``accept'' a null hypothesis, you can only ``fail to reject'' it (in the words of one intro textbook I have used). So we can't be sure from the MCMC that this is the true MLE, but we can check it to arbitrary accuracy. Get your MCSE for the components of \eqref{eq:fred} to less than 0.001. \end{enumerate} \section{MCMC} \subsection{Log Unnormalized Density of Equilibrium Distribution} Unnormalized log density of $b$ given $y$, same as complete data log likelihood. <>= h <- function(b) { stopifnot(length(b) == ncol(z)) stopifnot(length(y) == ncol(ypat)) eta <- as.numeric(x %*% beta + z %*% (delta[idx] * b)) p <- 1 / (1 + exp(- eta)) q <- 1 / (1 + exp(eta)) sum(y * log(p) + (1 - y) * log(q)) + sum(dnorm(b, log = TRUE)) } @ \subsection{Initial Runs} <>= bstart <- rep(0, ncol(z)) beta <- beta.putative delta <- delta.putative y <- ypat[1, ] set.seed(42) @ <>= mout <- metrop(h, bstart, nbatch = 1e3) mout$accept @ Seems we lucked out with the acceptance rate. Figure~\ref{fig:one} is produced by the following code <>= par(mfrow = c(3,2)) for (i in 1:6) acf(mout$batch[ , i], main = paste("Series b", i, sep = ""), lag.max = 250) @ and appears on p.~\pageref{fig:one}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Autocorrelation plots of the coordinates of the conditional distribution of random effects given observed data. Data pattern all zeros.} \label{fig:one} \end{figure} It looks like batches of length 50 will do the job (for this $y$ pattern). \subsection{Output Function} We need to consider the functional we wish to integrate, which is the derivative of the complete data log likelihood. If $\mu$ is the mean value parameter, then the derivative with respect to $\beta$ is $(y - \mu)^T X$ and the derivative with respect to $\delta$ is $$ (y - \mu)^T Z \frac{\partial \Delta}{\partial \delta} b $$ where $\Delta$ is a diagonal matrix whose diagonal elements are deltas (the first component is $\delta_1$, the second $\delta_3$, and the other four $\delta_3$). <>= ofun <- function(b) { stopifnot(length(b) == ncol(z)) stopifnot(length(y) == ncol(ypat)) eta <- as.numeric(x %*% beta + z %*% (delta[idx] * b)) p <- 1 / (1 + exp(- eta)) foo <- y - p bar <- as.numeric(t(foo) %*% x) baz <- double(length(delta)) for (j in 1:length(baz)) { part <- as.numeric(idx == j) baz[j] <- sum(foo * as.numeric(z %*% (part * b))) } c(bar, baz) } @ The following checks that it does indeed calculate the derivative. <>= b <- rnorm(ncol(z)) epsilon <- 1e-8 logl <- h(b) grad <- ofun(b) frad <- 0 * grad for (i in 1:length(frad)) { if (i <= length(beta)) { beta.save <- beta beta[i] <- beta[i] + epsilon frad[i] <- (h(b) - logl) / epsilon beta <- beta.save } else { j <- i - length(beta) delta.save <- delta delta[j] <- delta[j] + epsilon frad[i] <- (h(b) - logl) / epsilon delta <- delta.save } } grad frad abs(grad - frad) @ We look at autocorrelations for this \verb@outfun@. <>= mout <- metrop(mout, outfun = ofun) mout$accept @ Figure~\ref{fig:two} is produced by the following code <>= par(mfrow = c(4,2), mar = c(3, 3, 0, 0) + 0.1) for (i in 1:7) acf(mout$batch[ , i], main = "", xlab = "", ylab = "", lag.max = 250) @ and appears on p.~\pageref{fig:two}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Autocorrelation plots of the coordinates of the complete data score when missing data has the conditional distribution of random effects given observed data. Data pattern all zeros.} \label{fig:two} \end{figure} \subsection{Runs For All Data Patterns} \subsubsection{Try I} Still looks like batches of length 100 should be safe. Let's do all data patterns with 100 batches of length 100, and $10^3$ burn in. <>= nbatch <- 100 blen <- 100 scale <- 1 @ <>= batches <- list() accepts <- list() for (j in 1:nrow(ypat)) { y <- ypat[j, ] mout <- metrop(h, bstart, nbatch = 1e3, scale = scale) mout <- metrop(mout, outfun = ofun, nbatch = nbatch, blen = blen) batches[[j]] <- mout$batch accepts[[j]] <- mout$accept } unlist(accepts) @ We should probably adjust the scale to get somewhat higher acceptance rates. But for now, lets just go ahead with the calculation of the Monte Carlo estimate of the score and its MCSE. <>= score <- double(length(beta) + length(delta)) for (j in 1:length(batches)) score <- score + nobs[j] * apply(batches[[j]], 2, mean) score.mcse <- double(length(beta) + length(delta)) for (j in 1:length(batches)) score.mcse <- score.mcse + nobs[j]^2 * apply(batches[[j]], 2, var) score.mcse <- sqrt(score.mcse / nbatch) @ <>= score score.mcse @ Some of the components of the score are several times the size of their (estimated) MCSE. Perhaps the batch length is too short? Now we have seven components to look at for each of 14 chains. Perhaps we should use time series methods to automate the process. We use the initial positive sequence method of \citet{practical}. Here is a function to calculate them <>= initpos <- function(x) { foo <- acf(x, type = "covariance", plot = FALSE, lag.max = 49) even <- foo$lag %% 2 == 0 bigGamma <- foo$acf[even] + foo$acf[! even] bigGammaPos <- cumprod(as.numeric(bigGamma > 0)) 2 * sum(bigGamma * bigGammaPos) - foo$acf[1] } @ What if we use that to calculate MCSE? <>= score.mcse.too <- double(length(beta) + length(delta)) for (j in 1:length(batches)) score.mcse.too <- score.mcse.too + nobs[j]^2 * apply(batches[[j]], 2, initpos) score.mcse.too <- sqrt(score.mcse.too / nbatch) @ <>= score.mcse score.mcse.too @ \subsubsection{Try II} So let's redo with smaller scale. <>= nbatch <- 100 blen <- 100 scale <- 0.75 <> <> <> score score.mcse score.mcse.too @ \subsubsection{Try III} So let's redo with much longer run, 100 times as long. <>= nbatch <- 1000 blen <- 1000 scale <- 0.75 <> <> <> score score.mcse score.mcse.too @ \section{Elapsed Time} <>= foo <- proc.time()[1] fooh <- floor(foo / 60^2) foo <- foo - fooh * 60^2 foom <- floor(foo / 60) foo <- foo - foom * 60 cat("total elapsed time:", fooh, "hours,", foom, "minutes, and", foo, "seconds\n") foo <- try(system("hostname -f", intern = TRUE)) if (! inherits(foo, "try-error")) cat("machine:", foo, "\n") @ \begin{thebibliography}{} \bibitem[Coull and Agresti(2000)]{flu} Coull, B.~A. and Agresti, A. (2000). \newblock Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. \newblock \emph{Biometrics}, \textbf{56}, 73--80. \bibitem[{Geyer(1992)}]{practical} Geyer, C.~J. (1992). \newblock Practical {M}arkov chain {M}onte {C}arlo (with discussion). \newblock \emph{Statistical Science}, \textbf{7}, 473--511. \end{thebibliography} \end{document}