\documentclass[11pt]{article} \usepackage{amsmath} \usepackage{url} \RequirePackage{ifthen} \newboolean{solutions} \setboolean{solutions}{true} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\logit}{logit} \begin{document} \begin{raggedright} Stat 8931, Fall 2005 \\ Homework 2 \\ Due Oct 5, 2005 \end{raggedright} \bigskip \paragraph{Q1} \input{hw2b} \bigskip \paragraph{S1} The first task is to derive the unnormalized posterior (almost given in the notes) and the ``full conditionals'' of each variable given the other. The unnormalized joint posterior density is \begin{multline*} - \frac{\lambda}{2} \sum_{i = 1}^n (x_i - \hat{\mu}_n)^2 - \frac{n \lambda}{2} (\hat{\mu}_n - \mu)^2 + \frac{n}{2} \log(\lambda) - \frac{l}{2} (\mu - m)^2 + (a - 1) \log(\lambda) - b \lambda \\ = - \frac{n\lambda}{2} \hat{\sigma}^2_n - \frac{n \lambda}{2} (\hat{\mu}_n - \mu)^2 + \frac{n}{2} \log(\lambda) - \frac{l}{2} (\mu - m)^2 + (a - 1) \log(\lambda) - b \lambda \end{multline*} The posterior precision of $\mu$ given $\lambda$ is the coefficient of $- \mu^2 / 2$ in the above, which is $l + n \lambda$. The posterior mean of $\mu$ given $\lambda$ is the coefficient of $\mu$ divided by the posterior precision, which is $(l m + n \lambda \hat{\mu_n}) / (l + n \lambda)$. Thus we have discovered one ``full conditional'' $$ \mu \mid \lambda \sim \text{Normal}\left(\frac{l m + n \lambda \hat{\mu_n}}{l + n \lambda}, \frac{1}{l + n \lambda} \right) $$ The shape parameter of $\lambda$ given $\mu$ is one plus the coefficient of $\log(\lambda)$ in the above, which is $n / 2 + a$. The inverse scale parameter of $\lambda$ given $\mu$ is the coefficient of $- \lambda$ in the above, which is $b + n \hat{\sigma}^2_n / 2 + n (\hat{\mu}_n - \mu)^2 / 2$. Thus we have discovered the other ``full conditional'' $$ \lambda \mid \mu \sim \text{Gamma}\left(a + \frac{n}{2}, b + \frac{n \hat{\sigma}^2_n + n (\hat{\mu}_n - \mu)^2}{2} \right) $$ First we assign the data. <>= n <- 20 mu.hat <- 14.731 sig2.hat <- 4.814 m <- 10 l <- 1 / (2^2) a <- 3 b <- 1 @ Then devise the Gibbs updates. <>= mu.up <- function(lambda) { muprec <- l + n * lambda mumu <- (l * m + n * lambda * mu.hat) / muprec rnorm(1, mean = mumu, sd = sqrt(1 / muprec)) } lambda.up <- function(mu) { lambdarate <- b + n * (sig2.hat + (mu.hat - mu)^2) / 2 rgamma(1, shape = a + n / 2, rate = lambdarate) } @ Now we are ready to run a simple Gibbs sampler. We start at the MLE. <>= niter <- 1000 @ <>= mu <- mu.hat lambda <- 1 / sig2.hat mupath <- double(niter) lambdapath <- double(niter) for (i in 1:niter) { mu <- mu.up(lambda) lambda <- lambda.up(mu) mupath[i] <- mu lambdapath[i] <- lambda } @ Figure~\ref{fig:one} is produced by the following code <>= acf(mupath) @ and appears on p.~\pageref{fig:one}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Autocorrelation Plot for $\mu$ (Monte Carlo sample size \Sexpr{niter}).} \label{fig:one} \end{figure} Figure~\ref{fig:two} is produced by the following code <>= acf(lambdapath) @ and appears on p.~\pageref{fig:two}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Autocorrelation Plot for $\lambda$ (Monte Carlo sample size \Sexpr{niter}).} \label{fig:two} \end{figure} Almost unbelievable, there is no detectable autocorrelation. This MCMC is almost IIDMC. Thus there is no need for batch means in calculating MCSE. <>= mu.pe <- mean(mupath) mu.mcse <- sd(mupath) / sqrt(length(mupath)) lambda.pe <- mean(lambdapath) lambda.mcse <- sd(lambdapath) / sqrt(length(lambdapath)) sigma.pe <- 1 / sqrt(lambda.pe) sigma.mcse <- (1 / 2) * lambda.pe^(- 3 / 2) * lambda.mcse @ <>= mu.pe mu.mcse sigma.pe sigma.mcse @ Looks like to satisfy the required accuracy for $\mu$ we need at least $14^2 = \Sexpr{14^2}$ times the Monte Carlo sample size. So we redo. <>= niter <- 200 * niter <> <> @ <>= <> @ Time for a sanity check. The posterior mean for $\mu$, which is \Sexpr{round(mean(mupath), 4)}, is between the prior mean \Sexpr{round(m, 4)} and the observed data mean \Sexpr{round(mu.hat, 4)}. How about $\lambda$ and $\sigma$? The posterior mean for $\lambda$, which is \Sexpr{round(lambda.pe, 4)}, is between the prior mean \Sexpr{round(a / b, 4)} and the observed data point estimate (not a mean) $1 / \hat{\sigma}^2_n = \Sexpr{round(1 / sig2.hat, 4)}$. The prior mean for $\sigma$ is an exercise in master's level theory. \begin{align*} E(\sigma) & = E(\lambda^{- 1 / 2} \\ & = \int_0^\infty \lambda^{- 1 / 2} f(\lambda) \, d \lambda \\ & = \int_0^\infty \lambda^{- 1 / 2} \cdot \frac{b^a}{\Gamma(a)} \lambda^{a - 1} e^{- b \lambda} \, d \lambda \\ & = \frac{b^a}{\Gamma(a)} \cdot \frac{\Gamma(a - 1 / 2)}{b^{a - 1 / 2}} \end{align*} <>= sigma.prior.mean <- sqrt(b) * gamma(a - 1 / 2) / gamma(a) sigma.prior.mean @ So now we see that the posterior mean for $\sigma$, which is \Sexpr{round(sigma.pe, 4)}, is between the prior mean \Sexpr{round(sigma.prior.mean, 4)} and the observed data point estimate (not a mean) $\hat{\sigma}_n = \Sexpr{round(sqrt(sig2.hat), 4)}$. In all cases the posterior mean is a lot closer to the data point estimate than to the prior mean, as is to be expected with the fairly diffuse priors we used. On a different issue, note that the lack of autocorrelation in this Gibbs sampler does not arise from lack of correlation between $\mu$ and $\lambda$ <>= cor(mupath, lambdapath) @ Apparently, the correlation just isn't enough to give rise to much autocorrelation in the Gibbs sampler. Figure~\ref{fig:three} is produced by the same code as Figure~\ref{fig:one} but with our new larger sample as the ``path'' and appears on p.~\pageref{fig:three}. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Autocorrelation Plot for $\mu$ (Monte Carlo sample size \Sexpr{niter}).} \label{fig:three} \end{figure} In it we now see statistically significant lag-one autocorrelation, though is is of no practical significance. And this means we should have used the method of batch means with batch length at least two, and that our MCSE are a little bit low. We know from the formula for asymptotic variance being a sum of autocovariance terms that something like $1 + 2 \rho_1 = \Sexpr{round(1 + 2 * acf(mupath)$acf[2], 4)}$ (where $\rho_1$ is the lag-one autocovariance) is the correct variance inflation factor, hence the square root of this, which is \Sexpr{round(sqrt(1 + 2 * acf(mupath)$acf[2]), 4)}, is the correct inflation factor for MCSE. But we won't worry about that \Sexpr{round(100 * (sqrt(1 + 2 * acf(mupath)$acf[2]) - 1))}\% difference. \end{document}