\documentclass{article}
\usepackage{amsmath}
\usepackage{natbib}
\usepackage{indentfirst}
\DeclareMathOperator{\logit}{logit}
% \VignetteIndexEntry{Logit-Normal GLMM Examples}
\begin{document}
\title{Logit-Normal GLMM Examples}
\author{Yun Ju Sung \and Charles J. Geyer}
\maketitle
\section{Examples}
\subsection{Logit-Normal GLMM}
In a Logit-Normal generalized linear mixed model (GLMM), the observed data
is a vector $y$ whose components are conditionally independent Bernoulli
random variables given the missing data vector $b$, which is unconditionally
jointly mean-zero multivariate normal. The model specification is completed
by the specification of the \emph{linear predictor}
\begin{equation} \label{eq:linear-predictor}
\eta = X \beta + Z b
\end{equation}
and the link function. In \eqref{eq:linear-predictor} $X$ and $Z$ are
known matrices (the ``design'' or ``model'' matrices for fixed and random
effects, respectively), $\beta$ is a vector of unknown parameters
(``fixed effects''), $b$ is the vector of missing data (``random effects''),
and the conditional expectation of $y$ given $\eta$ is $\logit^{-1}(\eta)$.
The unknown parameters to be estimated are $\beta$ and any unknown parameters
determining the variance matrix of $b$. Usually this variance
matrix has simple structure and involves only a few unknown parameters.
For this paper we have written an R package \texttt{bernor} that implements
the methods of this paper for a class of Logit-Normal GLMM. The
class of models our package handles is more easily
described in R than in mathematical
notation. The linear predictor has the form
\begin{equation} \label{eq:linear-predictor-in-r}
\verb@eta = X %*% beta + Z %*% (sigma[i] * b)@
\end{equation}
where \verb@X@ and \verb@Z@ are the matrices $X$ and $Z$
in \eqref{eq:linear-predictor} and \verb@X %*% beta@ is the matrix
multiplication $X \beta$ so the only way
in which \eqref{eq:linear-predictor} differs
from \eqref{eq:linear-predictor-in-r} other than notationally is that
$b$ in \eqref{eq:linear-predictor}
is replaced by \verb@(sigma[i] * b)@ in \eqref{eq:linear-predictor-in-r},
which, for readers not familiar with R, has the following interpretation:
\verb@sigma@ is a vector of unknown parameters, \verb@i@ is a vector of
the same length as \verb@b@ and having values that are possible indices
for \verb@sigma@, so \verb@sigma[i]@ is the vector
$(\sigma_{i_1}, \ldots, \sigma_{i_m})$ in ordinary mathematical notation
and \verb@*@ in \eqref{eq:linear-predictor-in-r} denotes coordinatewise
multiplication, so if $z_{j k}$ are the components of the matrix $Z$
the second term on the right hand side of \eqref{eq:linear-predictor-in-r}
has $j$-th component
$$
\sum_{k = 1}^m z_{j k} \sigma_{i_k} b_k
$$
written in conventional mathematical notation.
We also change assumptions; in \eqref{eq:linear-predictor}
$b$ is general multivariate normal, but in \eqref{eq:linear-predictor-in-r}
\verb@b@ is standard multivariate normal
(mean vector zero, variance matrix the identity).
Thus the only unknown parameters in our model are the vectors
\verb@beta@ and \verb@sigma@. Thus our package only deals with the
simple situation in which the random effects are (unconditionally)
independent.
We also allow for independent and identically distributed (IID) data,
in which case the data $y$ is a matrix with IID columns, each column
of $y$ modeled as described above.
\subsection{McCulloch's Toy Data}
We start with a simple toy model taken from \citet{mcculloch} and
also used by \citet{booth} in which the log likelihood can be calculated
exactly by numerical integration.
These data have the form
$$
y_{i j} = \beta x_i + \sigma b_j
$$
where $x_i = i / d$, where $d$ is the number of rows of $y$.
A simulated data set of this form was given by \citet[Table~2]{booth}.
This is the data set \verb@booth@ in our \verb@bernor@ package
(note that our $y$ is the transpose of their $y$ to agree with our
convention that columns of $y$ are independent).
\subsubsection{Monte Carlo Maximum Likelihood}
Our package provides no optimization capabilities, only evaluation
of the log likelihood, its derivatives, and related quantities.
Use either \verb@nlm@ or \verb@optim@ for optimization.
Here we demonstrate \verb@optim@.
First we attach the data.
<>=
library(bernor)
data(booth)
attach(booth)
@
Then we create functions that calculate the objective function
(the Monte Carlo log likelihood approximation) and its gradient
(the gradient is optional, \verb@optim@ can use numerical differentiation
instead, but supplying the gradient makes for more efficient optimization).
<>=
moo <- model("gaussian", length(i), 1.0)
nparm <- length(theta0)
nfix <- length(mu0)
objfun <- function (theta) {
if (!is.numeric(theta)) stop("objfun: theta not numeric")
if (length(theta) != nparm) stop("objfun: theta wrong length")
mu <- theta[seq(1, nfix)]
sigma <- theta[- seq(1, nfix)]
.Random.seed <<- .save.Random.seed
bnlogl(y, mu, sigma, nmiss, x, z, i, moo)$value
}
objgrd <- function (theta) {
if (!is.numeric(theta)) stop("objfun: theta not numeric")
if (length(theta) != nparm) stop("objfun: theta wrong length")
mu <- theta[seq(1, nfix)]
sigma <- theta[- seq(1, nfix)]
.Random.seed <<- .save.Random.seed
bnlogl(y, mu, sigma, nmiss, x, z, i, moo, deriv = 1)$gradient
}
@
Our functions always use the same random seed (the seed is always restored
to \verb@.save.Random.seed@ just before any function evaluation). This
follows the principle of ``common random numbers'' and assures that the
function we are evaluating remains the same throughout the optimization.
(We can later try different random seeds if we chose.)
Then we are ready to try it out.
<>=
set.seed(42)
.save.Random.seed <- .Random.seed
nmiss <- 1e2
totalelapsed <- 0
theta.start <- theta0
lower <- c(-Inf, 0)
control <- list(fnscale = -10)
tout <- system.time(
out <- optim(theta.start, objfun, objgrd, method = "L-BFGS-B",
lower = lower, control = control)
)
cat("elapsed time", tout[1], "seconds\n")
totalelapsed <- totalelapsed + tout[1]
print(out)
@
\pagebreak[1]
The result does not agree closely with the exact maximum likelihood
estimate (MLE), which is
<>=
print(theta.hat.exact)
@
from the \verb@booth@ dataset (which we attached above) and
agrees with the exact MLE
$(6.132, 1.766)$ reported by \citet[p.~278]{booth} when one takes
into consideration that that their second parameter is $\sigma^2$ and
ours is $\sigma$.
It is hard to know what lessons one is supposed to draw from a toy problem.
In real life we would not, in general, have an exact MLE for comparison.
We would have for guidance Monte Carlo standard errors
(Section~\ref{sec:mcmle} below), but rather than calculate them for such
a small Monte Carlo sample size \verb@nmiss@, let us increase \verb@nmiss@
and redo
<>=
nmiss <- 1e4 ##### for real
##### nmiss <- 1e3 ##### DEBUG
theta.start <- out$par
lower <- c(-Inf, 0)
control <- list(fnscale = out$value)
tout <- system.time(
out <- optim(theta.start, objfun, objgrd, method = "L-BFGS-B",
lower = lower, control = control)
)
cat("elapsed time", tout[1], "seconds\n")
totalelapsed <- totalelapsed + tout[1]
print(out)
@
And we are now much closer
<>=
theta.hat <- out$par
theta.hat - theta.hat.exact
@
\subsubsection{Monte Carlo Standard Errors} \label{sec:mcmle}
Standard errors for our method involve the matrices $J$, $V$, and $W$
that are estimated as follows.
<>=
.Random.seed <<- .save.Random.seed
tout <- system.time(
out <- bnlogl(y, theta.hat[1], theta.hat[2], nmiss, x, z, i, moo, deriv = 3)
)
cat("elapsed time", tout[1], "seconds\n")
totalelapsed <- totalelapsed + tout[1]
print(out)
tout <- system.time(
wout <- bnbigw(y, theta.hat[1], theta.hat[2], nmiss, x, z, i, moo)
)
cat("elapsed time", tout[1], "seconds\n")
totalelapsed <- totalelapsed + tout[1]
print(wout)
nobs <- ncol(y)
bigJ <- (- out$hessian / nobs)
eigen(bigJ, symmetric = TRUE, only.values = TRUE)$values
bigV <- out$bigv
bigW <- wout
bigS <- solve(bigJ) %*% (bigV / nobs + bigW / nmiss) %*% solve(bigJ)
print(bigS)
@
% print(bigV / nobs)
% print(bigW / nmiss)
If we write a function to draw ellipses,
<>=
doellipse <- function(m, v, Rsq = qchisq(0.95, 2), npoint = 250,
plot = TRUE, add = FALSE, ...) {
if (! is.numeric(m)) stop("m not numeric")
if (! is.numeric(v)) stop("v not numeric")
if (! is.matrix(v)) stop("v not matrix")
if (length(m) != 2) stop("m not 2-vector")
if (any(dim(v) != 2)) stop("v not 2x2-matrix")
phi <- seq(0, 2 * pi, length = npoint)
foo <- rbind(cos(phi), sin(phi))
rsq <- Rsq / diag(t(foo) %*% solve(v) %*% foo)
bar1 <- sqrt(rsq) * foo[1, ] + m[1]
bar2 <- sqrt(rsq) * foo[2, ] + m[2]
if (plot) {
if (! add)
plot(bar1, bar2, type = "l", ...)
else
lines(bar1, bar2, ...)
}
return(invisible(list(x = bar1, y = bar2)))
}
@
we can use it to produce confidence regions.
Figure~\ref{fig:fig1} shows a nominal 95\% confidence ellipse.
<