\documentclass{article}
\usepackage{natbib}
\usepackage{graphics}
\usepackage{amsmath}
\usepackage{indentfirst}
\usepackage[utf8]{inputenc}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\cov}{cov}
% \VignetteIndexEntry{MCMC Example}
\begin{document}
<>=
options(keep.source = TRUE, width = 60)
foo <- packageDescription("mcmc")
@
\title{MCMC Package Example (Version \Sexpr{foo$Version})}
\author{Charles J. Geyer}
\maketitle
\section{The Problem}
This is an example of using the \verb@mcmc@ package in R. The problem comes
from a take-home question on a (take-home) PhD qualifying exam
(School of Statistics, University of Minnesota).
Simulated data for the problem are in the dataset \verb@logit@.
There are five variables in the data set, the response \verb@y@
and four predictors, \verb@x1@, \verb@x2@, \verb@x3@, and \verb@x4@.
A frequentist analysis for the problem is done by the following R statements
<>=
library(mcmc)
data(logit)
out <- glm(y ~ x1 + x2 + x3 + x4, data = logit,
family = binomial())
summary(out)
@
But this problem isn't about that frequentist analysis, we want a Bayesian
analysis. For our Bayesian analysis we assume the same data model as the
frequentist, and we assume the prior distribution of the five parameters
(the regression coefficients) makes them independent and identically
normally distributed with mean 0 and standard deviation 2.
The log unnormalized posterior (log likelihood plus log prior) density
for this model is calculated by
the following R function (given the preceding data definitions)
<>=
x <- logit
x$y <- NULL
x <- as.matrix(x)
x <- cbind(1, x)
dimnames(x) <- NULL
y <- logit$y
lupost <- function(beta, x, y) {
eta <- x %*% beta
p <- 1 / (1 + exp(- eta)) # note: works even when eta is Inf or -Inf
logl <- sum(log(p[y == 1])) + sum(log(1 - p[y == 0]))
return(logl + sum(dnorm(beta, 0, 2, log = TRUE)))
}
@
\section{Beginning MCMC}
With those definitions in place, the following code runs the Metropolis
algorithm to simulate the posterior.
<>=
set.seed(42) # to get reproducible results
beta.init <- as.numeric(coefficients(out))
out <- metrop(lupost, beta.init, 1e3, x = x, y = y)
names(out)
out$accept
@
The arguments to the \verb@metrop@ function here (there are more we don't
use here) are
\begin{itemize}
\item an R function (here \verb@lupost@ that evaluates the log unnormalized
density of the desired stationary distribution (here a posterior
distribution) of the Markov chain. Note that (although this example
does not exhibit the phenomenon) that the unnormalized density may
be zero, in which case the log unnormalized density is \verb@-Inf@.
\item an initial state (here \verb@beta.init@) of the Markov chain.
\item a number of batches (here \verb@1e3@) for the Markov chain.
This combines with batch length and spacing (both 1 by default)
to determine the number of iterations done.
\item additional arguments (here \verb@x@ and \verb@y@) supplied to
provided functions (here \verb@lupost@).
\item there is no ``burn-in'' argument, although burn-in is easily
accomplished, if desired (more on this below).
\end{itemize}
The output is in the component \verb@out$batch@ returned by the \verb@metrop@
function. We'll look at it presently, but first we need to adjust the
proposal to get a higher acceptance rate (\verb@out$accept@). It is generally
accepted \citep*{grg} that an acceptance rate of about 20\% is right, although
this recommendation is based on the asymptotic analysis of a toy problem
(simulating a multivariate normal distribution) for which one would never
use MCMC and is very unrepresentative of difficult MCMC applications.
\citet{geyer-temp} came to a similar conclusion,
that a 20\% acceptance rate is about right, in a very different situation.
But they also warned that a 20\% acceptance rate could be very wrong
and produced
an example where a 20\% acceptance rate was impossible and attempting to
reduce the acceptance rate below 70\% would keep the sampler from ever
visiting part of the state space. So the 20\% magic number must be
considered like other rules of thumb we teach in intro courses
(like $n > 30$ means means normal approximation is valid).
We know these rules of thumb can fail.
There are examples in the literature where
they do fail. We keep repeating them because we want something simple to
tell beginners, and they are all right for some problems.
Be that as it may, we try for 20\%.
<>=
out <- metrop(out, scale = 0.1, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.3, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.5, x = x, y = y)
out$accept
out <- metrop(out, scale = 0.4, x = x, y = y)
out$accept
@
Here the first argument to each instance of the \verb@metrop@ function is
the output of a previous invocation. The Markov chain continues where
the previous run stopped doing just what it would have done if it had
kept going the initial state and random seed being the final state and
random seed of the previous invocation. Everything stays the same
except for the arguments (here \verb@scale@ that are supplied).
\begin{itemize}
\item The argument \verb@scale@ controls the size of the Metropolis
``normal random walk'' proposal. The default is \verb@scale = 1@.
Big steps give lower acceptance rates. Small steps give higher.
We want something about 20\%. It is also possible to make \verb@scale@
a vector or a matrix. See \verb@help(metrop)@.
\end{itemize}
Because each run starts where the last one stopped (when the first argument
to \verb@metrop@ is the output of the previous invocation), each run serves
as ``burn-in'' for its successor (assuming that any part of that run was
worth anything at all).
\section{Diagnostics}
O.~K. That does it for the acceptance rate. So let's do a longer run
and look at the results.
<