--- title: "Stat 8054 Lecture Notes: Geometric Ergodicity of Metropolis-Hastings" author: "Charles J. Geyer" date: "`r format(Sys.time(), '%B %d, %Y')`" output: bookdown::html_document2: number_sections: true md_extensions: -tex_math_single_backslash mathjax: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML css: "../bar.css" bookdown::pdf_document2: latex_engine: xelatex number_sections: true md_extensions: -tex_math_single_backslash linkcolor: blue urlcolor: blue bibliography: mcmc.bib csl: journal-of-the-royal-statistical-society.csl link-citations: true --- # License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/). # R * The version of R used to make this document is `r getRversion()`. * The version of the `rmarkdown` package used to make this document is `r packageVersion("rmarkdown")`. # Geometric Ergodicity and the Markov Chain CLT This is not a theory course but some theory came up in discussion of MCMC. A Markov chain is *geometrically ergodic* [@meyn-tweedie, Chapter 15] if \begin{equation} \lVert P^n(x, \,\cdot\,) - \pi(\,\cdot\,) \rVert \le M(x) \rho^n (\#eq:geom) \end{equation} for some function $M$ and some constant $\rho < 1$, where the left-hand is total variation distance between the distribution of the Markov chain at time $n$ when started at $x$ and the equilibrium distribution. Thus this distance is decreasing exponentially fast (also called geometrically fast because of the geometric series). Geometric ergodicity is important because it is part of the sufficient conditions for the Markov chain central limit theorem (CLT) to hold. @chan-geyer, Theorem 2, showed that if the functional of the Markov chain under discussion has $2 + \varepsilon$ moments for some $\varepsilon > 0$ and the Markov chain is geometrically ergodic, then the CLT holds. @roberts-rosenthal, Theorem 2.1, showed that if the functional of the Markov chain under discussion has second moments and the Markov chain is *reversible* and geometrically ergodic, then the CLT holds. Proving geometric ergodicity is hard. That is why @mengersen-tweedie, @roberts-tweedie, and @jarner-hansen exist. But even the theorems in those papers are hard to apply to non-toy problems. # Lessons from Mengersen and Tweedie ## Independence Metropolis-Hastings is Just Like Rejection Sampling Except for Minor Details @tierney coined the name *independence* Metropolis-Hastings for a Metropolis-Hastings sampler for which the proposal distribution does not depend on the current state. Thus it is just a probability density function (PDF) $q$. @mengersen-tweedie, Theorem 2.1, says such a Markov chain is geometrically ergodic only if the importance weights $\pi(x) / q(x)$ are bounded, where $\pi$ is the target distribution, in which case the sampler is actually uniformly ergodic (which means that $M(x)$ in \@ref(eq:geom) does not actually depend on $x$). This is exactly the same condition needed for ordinary Monte Carlo (OMC) rejection sampling to work, where $q$ is the proposal distribution for that. The only differences are that * in OMC rejection sampling one needs to know the bound (it is used in in the OMC rejection step), but * in Metropolis-Hastings rejection one does not need to know the bound (it is not used in the Metropolis-Hastings rejection step). and, of course, * OMC rejection sampling produces independent and identically distributed samples from the target distribution, but * the Metropolis-Hastings algorithm produces a Markov chain having the target distribution as its equilibrium distribution. ## OMC Importance Sampling is Maybe Better For the CLT to hold for OMC Importance sampling, one only needs that the quantities being averaged $$ \frac{g(X) \pi(X)}{q(X)} $$ have finite variance and this is different from the importance weights being bounded. Although, if one wants this to hold for every $g$ that has second moments with respect to $\pi$, then this is equivalent to the importance weights being bounded. ## Random Walk Metropolis-Hastings @tierney coined the name *random walk* Metropolis (RWM) for a Metropolis sampler for which the proposal distribution has the form $$ q(x, y) = q(y - x) $$ where $y$ is the proposal and $x$ is the current state. @mengersen-tweedie provide some counterintuitive results for RWM. Their theorem 3.3 says that provided the proposal distribution has absolute first moment the sample cannot be geometrically ergodic unless the target distribution has a moment generating function. @jarner-tweedie considerably improve this result. They drop the condition on the proposal distribution and extend to the multivariate case. Their Theorem 2.2 says an RMW sampler is geometrically ergodic only if the target distribution has a moment generating function. If it does not, then no proposal distribution will work. Then @mengersen-tweedie provide a sufficient condition for geometric ergodicity. For the multivariate case @roberts-tweedie and @jarner-hansen prove some sufficient conditions. All of these sufficient conditions are far from necessary. A student asked in class about distributions falling in between these necessary conditions and sufficient conditions. # A Lesson from Roberts and Tweedie Theorem 5.1 of @roberts-tweedie says that no Metropolis-Hastings algorithm that has \begin{equation} \mathop{\rm ess\,sup} P(x, \{x\}) = 1 (\#eq:reject) \end{equation} can be geometrically ergodic. Here $P(x, \{x\})$ is the probability that the (continuous) proposal is (Metropolis-Hastings) rejected when the current position is $x$. If that is not bounded away from one, then we cannot have geometric ergodicity. # A Toy Problem What that student asked about is a target distribution for with the PDF does not go to zero at infinity. The PDF is spiky. Such a distribution satisfies none of the sufficient conditions of @mengersen-tweedie or @roberts-tweedie or @jarner-hansen. So we do look at one such toy problem. The toy problem we will analyze has a continuous uniform distribution on a certain set but is really a smeared out geometric distribution. the PDF is $$ f(x) = \begin{cases} 1, & \text{$n \in \mathbb{N}$ and $n < x < n + p (1 - p)^n$} \\ 0, & \text{otherwise} \end{cases} $$ (here $\mathbb{N}$ is the set of natural numbers (nonnegative integers)). This PDF clearly integrates to one and the lim sup as $x \to \infty$ is one. We propose to sample with via an RWM with any continuous unimodal proposal distribution. Examples are mean-zero normal distribution or uniform distribution on an interval symmetric about zero. Clearly, for $x$ very far out in the tails of the distribution, the per $x$ rejection probability, the probability in \@ref(eq:reject), goes to zero as $x$ goes to infinity. Thus the sampler is not geometrically ergodic. It is hard to know what lessons to draw from a toy problem. It is clear that there is no way to modify the example so that it is geometrically ergodic unless * we modify the proposal distribution so it "knows" where the spikes are or * we modify the target PDF so it isn't so spiky. But this example also shows there is something weird about asymptotics. The only problem is way out in the tails. The farther out in the tails the sampler gets, the longer it takes to get back to the center of the distribution. But we only have a problem far out in the tails. So as long as we are interested in getting a good idea about the part of the distribution that is not very far out in the tails, we are OK. For example if we conditioned on the event that $X < 10^{100000}$, then we would have a geometrically ergodic sampler and the Markov CLT would hold. But we would never see samples outside this event in any run we would actually do anyway! So Markov chain asymptotics can be somewhat misleading. # Bibliography {-}