\documentclass[11pt]{article} \usepackage{indentfirst} \usepackage{natbib} \usepackage{url} \usepackage[utf8]{inputenc} \RequirePackage{amsmath} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\logit}{logit} \begin{document} \title{Stat 8931 Flu Homework} \author{Charles J. Geyer} \maketitle \section{Introduction} \subsection{Logit-Normal GLMM} In a Logit-Normal generalized linear mixed model (GLMM), the observed data is a vector $y$ whose components are conditionally independent Bernoulli random variables given the missing data vector $b$, which is unconditionally jointly mean-zero multivariate normal. The model specification is completed by the specification of the \emph{linear predictor} \begin{equation} \label{eq:linear-predictor} \eta = X \beta + Z b \end{equation} and the link function. In \eqref{eq:linear-predictor} $X$ and $Z$ are known matrices (the \emph{model matrices} for fixed and random effects, respectively), $\beta$ is a vector of unknown parameters (\emph{fixed effects}), $b$ is the vector of missing data (\emph{random effects}), and the conditional expectation of $y$ given $\eta$ is $\logit^{-1}(\eta)$. The unknown parameters to be estimated are $\beta$ and any unknown parameters determining the variance matrix of $b$. Usually this variance matrix has simple structure and involves only a few unknown parameters. Following \citet{miss}, we simplify the structure of the model still further. The class of models they use is more easily described in R than in mathematical notation. The linear predictor has the form \begin{equation} \label{eq:linear-predictor-in-r} \verb@eta = X %*% beta + Z %*% (sigma[i] * b)@ \end{equation} where \verb@X@ and \verb@Z@ are the matrices $X$ and $Z$ in \eqref{eq:linear-predictor} and \verb@X %*% beta@ is the matrix multiplication $X \beta$ so the only way in which \eqref{eq:linear-predictor} differs from \eqref{eq:linear-predictor-in-r} other than notationally is that $b$ in \eqref{eq:linear-predictor} is replaced by \verb@(sigma[i] * b)@ in \eqref{eq:linear-predictor-in-r}, which, for readers not familiar with R, has the following interpretation: \verb@sigma@ is a vector of unknown parameters, \verb@i@ is a vector of the same length as \verb@b@ and having values that are possible indices for \verb@sigma@, so \verb@sigma[i]@ is the vector $(\sigma_{i_1}, \ldots, \sigma_{i_m})$ in ordinary mathematical notation and \verb@*@ in \eqref{eq:linear-predictor-in-r} denotes coordinatewise multiplication, so if $z_{j k}$ are the components of the matrix $Z$ the second term on the right hand side of \eqref{eq:linear-predictor-in-r} has $j$-th component $$ \sum_{k = 1}^m z_{j k} \sigma_{i_k} b_k $$ written in conventional mathematical notation. We also change assumptions; in \eqref{eq:linear-predictor} $b$ is general multivariate normal, but in \eqref{eq:linear-predictor-in-r} \verb@b@ is standard multivariate normal (mean vector zero, variance matrix the identity). Thus the only unknown parameters in our model are the vectors \verb@beta@ and \verb@sigma@. We also allow for independent and identically distributed (IID) data, in which case the data $y$ is a matrix with IID columns, each column of $y$ modeled as described above. \paragraph{Summary} First consider a single individual. The data on this individual consist of observed data $y$ and missing data (random effects) $b$, both random vectors. Marginally, the random effects vector $b$ has independent standard normal components. The conditional distribution of $y$ given $b$ has independent Bernoulli components. The success probabilities are given by $$ E(y_i | b) = \logit^{-1}(\eta_i) = \frac{1}{1 + \exp(- \eta_i)} $$ where $\eta_i$ are the components of the vector given by \eqref{eq:linear-predictor-in-r}. When we have multiple individuals, the are considered independent and identically distributed. \subsection{The Influenza Data} \citet{flu} propose fitting Logit-Normal GLMM to data on influenza (flu). \subsubsection{Data} First we load the data, copied from Table~1 in \citet{flu}. <>= data <- read.table("flu.txt", header = TRUE) names(data) nobs <- data$nobs data$nobs <- NULL ypat <- as.matrix(data) ypat <- ypat[nobs > 0, ] nobs <- nobs[nobs > 0] foo <- cbind(ypat, nobs) dimnames(foo)[[1]] <- rep("", nrow(foo)) @ \pagebreak[3] <>= foo @ there are <>= sum(nobs) @ independent and identically distributed (IID) individuals, each with four Bernoulli observations ($Y_1$, $\ldots$, $Y_4$). The meaning is that the observations are IID random vectors $y = (y_1, y_2, y_3, y_4)$ and the $i$-th pattern of $y$ values (the $i$-th row of \verb@ypat@) is observed \verb@nobs[i]@ times. \subsubsection{Model} We fit the model the random effects having variance matrix \begin{equation} \label{foo} \Sigma = \sigma^2 \begin{bmatrix} 1 & \rho_1 & \rho_1 & \rho_2 \\ \rho_1 & 1 & \rho_1 & \rho_2 \\ \rho_1 & \rho_1 & 1 & \rho_2 \\ \rho_2 & \rho_2 & \rho_2 & 1 \end{bmatrix} \end{equation} which is given by equation (7) in \citet{flu}. In order to use our restricted model random vector having variance matrix \eqref{foo} in the form $Z \Delta b$ where $Z$ is a fixed known matrix that does not contain parameters, $\Delta$ is a diagonal positive semi-definite matrix, the diagonal being \verb@sigma[i]@ in the notation of the preceding section, and $b$ is a standard normal random vector (having IID standard normal components). We try \begin{equation} \label{fooz} Z = \begin{pmatrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & -1 & 0 & 0 & 0 & 1 \end{pmatrix} \end{equation} Then \begin{equation} \label{food} \begin{split} \var(Z \Delta b) & = Z \Delta^2 Z^T \\ & = \begin{pmatrix} \delta_1^2 + \delta_2^2 + \delta_3^2 & \delta_1^2 + \delta_2^2 & \delta_1^2 + \delta_2^2 & \delta_1^2 - \delta_2^2 \\ \delta_1^2 + \delta_2^2 & \delta_1^2 + \delta_2^2 + \delta_4^2 & \delta_1^2 + \delta_2^2 & \delta_1^2 - \delta_2^2 \\ \delta_1^2 + \delta_2^2 & \delta_1^2 + \delta_2^2 & \delta_1^2 + \delta_2^2 + \delta_5^2 & \delta_1^2 - \delta_2^2 \\ \delta_1^2 - \delta_2^2 & \delta_1^2 - \delta_2^2 & \delta_1^2 - \delta_2^2 & \delta_1^2 + \delta_2^2 + \delta_6^2 \end{pmatrix} \end{split} \end{equation} and we see that \eqref{foo} and \eqref{food} do indeed have the same form if we impose the constraints \begin{equation} \label{deltas} \delta_3 = \delta_4 = \delta_5 = \delta_6 \end{equation} with one caveat. The representation \eqref{foo} is positive semi-definite if \begin{subequations} \begin{gather} - \frac{1}{2} \le \rho_1 \le 1 \label{fulla} \\ \rho_2^2 \le \frac{1 + 2 \rho_1}{3} \label{fullb} \end{gather} \end{subequations} whereas the representation \eqref{food} is necessarily positive semi-definite, but as the $\delta_i$ range over all non-negative numbers the correlations \begin{align*} \rho_1 & = \frac{\delta_1^2 + \delta_2^2}{\delta_1^2 + \delta_2^2 + \delta_3^2} \\ \rho_2 & = \frac{\delta_1^2 - \delta_2^2}{\delta_1^2 + \delta_2^2 + \delta_3^2} \end{align*} only fill out the set \begin{subequations} \begin{gather} 0 \le \rho_1 \le 1 \label{onea} \\ - \rho_1 \le \rho_2 \le \rho_1 \label{oneb} \end{gather} \end{subequations} Hence our $Z \Delta b$ model is a restriction of the full model specified by \eqref{foo}. Nevertheless, if the MLE is in the interior of the parameter space for our $Z \Delta b$ model, then it is the same as the MLE for the full model. To finish the model specification we must specify the design matrices <>= x <- diag(4) z <- rbind( c(1, 1, 1, 0, 0, 0), c(1, 1, 0, 1, 0, 0), c(1, 1, 0, 0, 1, 0), c(1, -1, 0, 0, 0, 1) ) idx <- c(1, 2, 3, 3, 3, 3) @ the last vector being used to impose the constraint \eqref{deltas}. Now \verb@sigma[idx]@ is the diagonal of $\Delta$. (We switched from \verb@i@ to \verb@idx@ because it is too hard to remember not to use the variable name \verb@i@ for anything else). \subsubsection{Putative MLE} \label{sec:putative} \citet{flu} give the following MLE for this model. <>= beta.putative <- c(-4.0, -4.4, -4.7, -4.5) sigmasq.putative <- 4.05^2 rho1.putative <- 0.43 rho2.putative <- (-0.25) delta3.putative <- sqrt(sigmasq.putative * (1 - rho1.putative)) delta2.putative <- sqrt(sigmasq.putative * (rho1.putative - rho2.putative) / 2) delta1.putative <- sqrt(sigmasq.putative * (rho1.putative + rho2.putative) / 2) delta.putative <- c(delta1.putative, delta2.putative, delta3.putative) beta.putative delta.putative @ \section{The Assigned Homework} Check the MLE found by \citet{flu} (given in Section~\ref{sec:putative} above) using the \verb@metrop@ function in the MCMC package to calculate the score. More precisely if $f_\theta(b, y)$ is the complete data joint density (you have to work this out for yourself, although it is described implicitly in the first section), do the following. \begin{enumerate} \item Simulate using the \verb@metrop@ function the conditional distribution of $b$ given $y$ under the parameter value $\theta$ (which denotes a vector of seven parameters, four betas and three deltas). \item If $b_1$, $b_2$, $\ldots$ are the resulting Markov chain, calculate \begin{equation} \label{eq:fred} \frac{1}{N} \sum_{i = 1}^N \nabla \log f_\theta(b_i, y) \end{equation} where $\nabla$ denotes differentiation w.~r.~t.\ $\theta$ (and hence is a seven-dimensional vector of partial derivatives). \item Calculate MCSE (using the method of batch means or any other valid method) for each component of \eqref{eq:fred}. \item You should get zero (to within MCSE) for \eqref{eq:fred}, assuming \citet{flu} are correct, which all the calculations I have done seem to support. As statisticians we know that you can never ``accept'' a null hypothesis, you can only ``fail to reject'' it (in the words of one intro textbook I have used). So we can't be sure from the MCMC that this is the true MLE, but we can check it to arbitrary accuracy. Get your MCSE for the components of \eqref{eq:fred} to less than 0.25 (\textbf{note:} this is increased from the paper handed out in class). \end{enumerate} \subsection*{Hints} For each individual, the observed data is a Bernoulli vector of length four (a row of \verb@ypat@) and the missing data is continuous random vector of length four (marginally normal, but not conditionally given the observed data). Although there are \verb@sum(nobs)@ = \Sexpr{sum(nobs)} individuals, all individuals with the same observed data pattern (belonging to the same row of \verb@ypat@) make the same contribution to the log likelihood. Therefore we need only simulate as many missing data patterns as there are observed data patterns, which is \verb@nrow(ypat)@ = \Sexpr{nrow(ypat)}. In order to make this clear, write $n_j$ for \verb@nobs[j]@, and write $y_j$ for the data pattern \verb@ypat[j, ]@ (note that $y_j$ is a four-vector), and write $b_{i j}$ for the missing data for an individual with data pattern $y_j$ simulated in the Markov chain. Here $i$ indicates an iteration of the Markov chain, $j$ a data pattern, and $b_{i j}$ is a six-vector (so that $Z b_{i j}$ makes sense). Then we can rewrite \eqref{eq:fred} as \begin{equation} \label{eq:sally} \frac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^J n_j \nabla \log f_\theta(b_{i j}, y_j) \end{equation} where now $f_\theta$ denotes the complete data density for one individual with observed data $y_j$ and missing data $b_{i j}$. Thus we see that we can have the dimension of the state of the Markov chain be \verb@nrow(ypat)@ $\times$ \verb@ncol(z)@ = \Sexpr{nrow(ypat) * ncol(z)}. An alternative method would be to run \verb@nrow(ypat)@ = \Sexpr{nrow(ypat)} independent Markov chains, each with state of dimension \verb@ncol(z)@ = \Sexpr{ncol(z)}. We still denote the realizations $b_{i j}$ but now for each fixed $j$ $b_{1 j}$, $b_{2 j}$, $\ldots$ is a Markov chain (all by itself). Then we can calculate the contribution to the log likelihood \begin{equation} \label{eq:sally-contrib} \frac{1}{N} \sum_{i = 1}^N \nabla \log f_\theta(b_{i j}, y_j) \end{equation} from this Markov chain and calculate standard errors for that too. Of course, \eqref{eq:sally-contrib} is not supposed to be zero. Only when the terms of \eqref{eq:sally-contrib} are multiplied by $n_j$ and added to give \eqref{eq:sally} do we get zero (theoretically). \begin{thebibliography}{} \bibitem[Sung and Geyer(submitted)]{miss} Sung, Y.~J. and Geyer, C.~J. (submitted). \newblock Monte Carlo likelihood inference for missing data models. \newblock \url{http://www.stat.umn.edu/geyer/bernor/ms.pdf}. \bibitem[Coull and Agresti(2000)]{flu} Coull, B.~A. and Agresti, A. (2000). \newblock Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. \newblock \emph{Biometrics}, \textbf{56}, 73--80. \end{thebibliography} \end{document}