Chapter 11 Example 2.6
Let’s examine a variety of possible Bayesian analyses of the differences
in the RunStitch
data. The variety of possible prior distributions produces
the variety of possible analyses. This is both the strength, and to many the weakness,
of the Bayesian approach.
11.1 Prior distributions
We begin with a prior for \(\sigma_0\). We use sigma0.mean=.75
and sigma0.shape=3
; this
puts 99% of the prior probability between .08 and 2.32. Given that the
sample standard deviation of the differences is .64, this is an
adequately broad prior. We will also show an example using sigma0.shape=20
;
this put 99% of the prior probability between .39 and 1.25.
A model of the form differences ~ 0
has a mean of precisely 0; there
are no parameters beyond \(\sigma_0\), and no further priors are needed.
Similarly, a model of the form differences - mu ~ 0
assumes that the
shifted data have a mean of precisely 0.
Generally, we will use a model of the form differences ~ 1
; this fits
a mean value to the differences. The prior distribution for the mean
is normal with mean 0 and standard deviation \(\sigma\), and the
second level prior for \(\sigma\) is a gamma distribution. We specify
the expected value for this gamma via sigma.mean
and its shape
via sigma.shape
.
We can specify \(\sigma\) (almost) exactly by using
a very large value of sigma.shape.int
. (Note, bglmm()
lets
you set prior parameters for the intercept separately from
those for other effects.) For example, with a mean of 1
and a shape of 20,000, 99% of the probability will be between .98 and 1.02.
More typically, we use a smaller value of sigma.shape
and let the
data help us decide the appropriate value for \(\sigma\).
Mr. Skeptical thought that the mean difference should be zero, but he might have various degrees of certainty about that claim. That is, he might express his claim as his prior for the mean difference is normal with mean 0, but it has a smaller standard deviation if he is more certain and a larger standard deviation if he is less certain. Similarly, Mr. Enthusiastic has a prior distribution for the mean of the differences that has mean .5, and different standard deviations depending on his degree of certainty.
11.2 Fitting the models
Load the RunStitch
data from the package and create
differences between Standard and Ergonomic.
> library(cfcdae)
> data(RunStitch)
> differences <- RunStitch[,"Standard"] - RunStitch[,"Ergonomic"]
We first fit ten models, five each for prior mean 0 and prior mean .5,
with the difference being various levels of certainty. We
use quiet=TRUE
to suppress the voluminous progress information that
would otherwise be printed. The order below begins at prior mean
precisely 0, then gradually increases the prior uncertainty, then
switches to prior mean .5 (actually, prior mean 0 for data shifted by .5)
but still with considerable uncertainty,
then gradually decreases the prior uncertainty until the prior is
precisely .5.
> library(bcfcdae)
> library(rstan)
> set.seed(20220425)
> # mean precisely 0
> fit.0.0 <- bglmm(differences ~ 0, sigma0.mean=.75, sigma0.shape=3,
+ quiet=TRUE,seed=10004)
Need to compile the model.
> # prior mean 0 with sd .1 (ish)
> fit.0.1 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.1, sigma.shape.int=20000,
+ quiet=TRUE,seed=10005)
> # prior mean 0 with sd .3 (ish)
> fit.0.3 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.3, sigma.shape.int=20000,
+ quiet=TRUE,seed=10006)
> # prior mean 0 with sd .5 (ish)
> fit.0.5 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.5, sigma.shape.int=20000,
+ quiet=TRUE,seed=10007)
> # prior mean 0 with sd .8 (ish)
> fit.0.8 <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.8, sigma.shape.int=20000,
+ quiet=TRUE,seed=10008)
> # prior mean .5 with sd .8 (ish)
> fit.5.8 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.8, sigma.shape.int=20000,
+ quiet=TRUE,seed=10009)
> # prior mean .5 with sd .5 (ish)
> fit.5.5 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.5, sigma.shape.int=20000,
+ quiet=TRUE,seed=10010)
> # prior mean .5 with sd .3 (ish)
> fit.5.3 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.3, sigma.shape.int=20000,
+ quiet=TRUE,seed=10011)
> # prior mean .5 with sd .1 (ish)
> fit.5.1 <- bglmm(differences-.5 ~ 1, sigma0.mean=.75, sigma0.shape=3,
+ sigma.mean.int=.1, sigma.shape.int=20000,
+ quiet=TRUE,seed=10012)
> # prior mean precisely .5
> fit.5.0 <- bglmm(differences-.5 ~ 0, sigma0.mean=.75, sigma0.shape=3,
+ quiet=TRUE,seed=10013)
Note that we had to compile twice. That’s because fitting a mean-zero model and fitting a model with mean predictors use different codes that each need compilation. However, once it’s been compiled we do not need to recompile in this run.
11.2.1 MCMC Assessment
One risk with Bayesian analysis via MCMC is that the Markov chains might not be behaving properly. Begin by looking atn_eff
and Rhat
:
> summary(fit.0.5)[,c("n_eff","Rhat")]
n_eff Rhat
(Intercept) 3110 1
sigma0 2530 1
sigma.Intercept 4030 1
Rhat
of 1 indicates no problems
detected. Most of the others are similar, but fit.5.0
(certainty
on a mean of .5) shows a low effective sample size.
> summary(fit.5.0)[,c("n_eff","Rhat")]
n_eff Rhat
1130 1
> plot(fit.0.5,"trace")

Figure 11.1: Trace plots for the N(0,.5) prior
bglmm()
model we
fit, but I’ll spare you all of the others except for fit.5.0
.
> plot(fit.5.0,"trace")

Figure 11.2: Trace plots for the N(.5,0) prior
The traces all overlap, which is good, but the traces do not mix around the sample space as quickly as what we see for the other models. This is the visual analogue of a lower effective sample size.
Finally, we should look at the autocorrelation of the Markov chain iterates. Ideally, they should decay to 0 quickly and stay at 0.> plot(fit.0.5,"autocor")

Figure 11.3: Autocorrelation plots for N(0,.5) prior
> plot(fit.5.0,"autocor")

Figure 11.4: Autocorrelation plots for N(.5,0) prior
11.2.2 Model Results
We’re eager to see the results, and now that we have
assessed the chains as working well, we can look at the
results.
The simplest way to see the results of the
fit is to use the summary()
function.
For each parameter in the model, it produces
- The posterior mean for the parameter.
- The posterior standard error for the posterior mean.
- The posterior standard deviation of the parameter.
- The 2.5, 25, 50, 75, and 97.5 percentiles of the posterior distribution (these can be reset via optional arguments).
- The effective sample size of the chain (more is better, and closer to 4,000 is good for default chain sizes).
- The Rhat value.
Note: the posterior standard error is essentially a numerical issue. If you run the chains longer and longer, this standard error will decrease to zero. However, the posterior standard deviation, which measures the true statistical variability in the posterior, does not decrease as you increase the size of the chains.
Here we look at a few of the columns for the different models, and we combine it all into a table with nice row labels. Note that we need to add .5 back into the results for the .5 prior mean, because we were actually fitting shifted data and we need to shift the results back.> all.out <- rbind(
+ "mean 0, sd 0"=c(0,0,0),
+ "mean 0, sd .1"=summary(fit.0.1,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+ "mean 0, sd .3"=summary(fit.0.3,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+ "mean 0, sd .5"=summary(fit.0.5,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+ "mean 0, sd .8"=summary(fit.0.8,pars="(Intercept)")[,c("mean","2.5%","97.5%")],
+ "mean .5, sd .8"=summary(fit.5.8,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+ "mean .5, sd .5"=summary(fit.5.5,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+ "mean .5, sd .3"=summary(fit.5.3,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+ "mean .5, sd .1"=summary(fit.5.1,pars="(Intercept)")[,c("mean","2.5%","97.5%")]+.5,
+ "mean .5, sd 0"=c(.5,.5,.5)
+ )
> knitr::kable(all.out)
mean | 2.5% | 97.5% | |
---|---|---|---|
mean 0, sd 0 | 0.0000 | 0.0000 | 0.0000 |
mean 0, sd .1 | 0.0712 | -0.0829 | 0.2220 |
mean 0, sd .3 | 0.1520 | -0.0763 | 0.3680 |
mean 0, sd .5 | 0.1670 | -0.0696 | 0.4080 |
mean 0, sd .8 | 0.1720 | -0.0674 | 0.4050 |
mean .5, sd .8 | 0.1810 | -0.0540 | 0.4121 |
mean .5, sd .5 | 0.1940 | -0.0440 | 0.4331 |
mean .5, sd .3 | 0.2210 | 0.0030 | 0.4510 |
mean .5, sd .1 | 0.3690 | 0.2150 | 0.5261 |
mean .5, sd 0 | 0.5000 | 0.5000 | 0.5000 |
See how the posterior intervals for the intercept gradually creep higher and higher as we use priors that put less and less probability near 0. Looking at the interval estimates above, we see that the precise priors (at 0 or .5) give us precise posterior estimates of the at the same place; this was expected. The models with .1 prior standard deviations for the intercept pull the values toward the prior mean value (upper end pulled down for the 0 prior mean, lower end pulled up for the .5 prior mean). Once you get to prior standard deviations for the mean of .5 or .8, it really doesn’t matter whether we use a prior mean of 0 or .5, the posterior intervals are (roughly) the same. Notice, however, that the sample mean of .175 is not even included in the posterior intervals for mean .5 and standard deviation .1 or 0. This is effect of a narrow prior pulling the posterior toward it.
11.2.3 Model Selection
We have looked at a lot of models. Ordinarily we wouldn’t be comparing models with lots of different prior parameters, because we would know our prior distributions and use them. We are doing it here to illustrate the sensitivity of the results to the prior parameters.
That said, we can still assess which models fit the data
better. One way to do that is via the LOOIC, which we
can get via the function loo::loo()
.
Smaller values of LOOIC are better.
Bayes factors are a second way to assess models, but
be aware that Bayes factors can be sensitive to priors.
Let’s compare all these models to the one with a N(0,.5)
prior for the mean.
> loo::loo(fit.0.3)
[1] 61.6155
> loo::loo(fit.0.5)
Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
[1] 61.99377
> bayes_factor(fit.0.5,fit.0.3,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 0.68824
kable()
.
> all.compare <- rbind(
+ "mean 0, sd 0"=c("LOO"=loo::loo(fit.0.0),"BF"=bayes_factor(fit.0.5,fit.0.0,silent=TRUE)$bf),
+ "mean 0, sd .1"=c(loo::loo(fit.0.1),bayes_factor(fit.0.5,fit.0.1,silent=TRUE)$bf),
+ "mean 0, sd .3"=c(loo::loo(fit.0.3),bayes_factor(fit.0.5,fit.0.3,silent=TRUE)$bf),
+ "mean 0, sd .5"=c(loo::loo(fit.0.5),bayes_factor(fit.0.5,fit.0.5,silent=TRUE)$bf),
+ "mean 0, sd .8"=c(loo::loo(fit.0.8),bayes_factor(fit.0.5,fit.0.8,silent=TRUE)$bf),
+ "mean .5, sd .8"=c(loo::loo(fit.5.8),bayes_factor(fit.0.5,fit.5.8,silent=TRUE)$bf),
+ "mean .5, sd .5"=c(loo::loo(fit.5.5),bayes_factor(fit.0.5,fit.5.5,silent=TRUE)$bf),
+ "mean .5, sd .3"=c(loo::loo(fit.5.3),bayes_factor(fit.0.5,fit.5.3,silent=TRUE)$bf),
+ "mean .5, sd .1"=c(loo::loo(fit.5.1),bayes_factor(fit.0.5,fit.5.1,silent=TRUE)$bf),
+ "mean .5, sd 0"=c(loo::loo(fit.5.0),bayes_factor(fit.0.5,fit.5.0,silent=TRUE)$bf)
+ )
Warning: effective sample size cannot be calculated, has been replaced by number
of samples.
Warning: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
Warning: effective sample size cannot be calculated, has been replaced by number
of samples.
> knitr::kable(all.compare)
LOO | BF | |
---|---|---|
mean 0, sd 0 | 61.95804 | 0.6559599 |
mean 0, sd .1 | 61.53054 | 0.5441041 |
mean 0, sd .3 | 61.61550 | 0.6878520 |
mean 0, sd .5 | 61.99377 | 0.9985064 |
mean 0, sd .8 | 61.75867 | 1.5200559 |
mean .5, sd .8 | 61.87140 | 1.6084977 |
mean .5, sd .5 | 61.89258 | 1.1485162 |
mean .5, sd .3 | 61.90773 | 0.9836125 |
mean .5, sd .1 | 63.20871 | 2.4199692 |
mean .5, sd 0 | 66.59450 | 7.0062285 |
The models with mean of .5 and sd of .1 or 0 are noticeably worse (less preferred) than the other models, which are basically indistinguishable. This is not good news for Mr. Enthusiastic’s assertion.
To give some warning about Bayes factors, consider the prior mean 0 and prior standard deviation .5 for the mean, but now use a tighter prior for \(\sigma_0\) (same prior expected value but a lower shape). Specifically, usesigma0.shape
equal to 20.
> fit.0.5.s <- bglmm(differences ~ 1, sigma0.mean=.75, sigma0.shape=20,
+ sigma.mean.int=.5, sigma.shape.int=20000,
+ quiet=TRUE,seed=10014)
> summary(fit.0.5.s,pars=c("(Intercept)"))
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
(Intercept) 0.159 0.0021 0.123 -0.0818 0.0748 0.159 0.241 0.396 3390 1
> loo::loo(fit.0.5.s)
[1] 61.38973
> bayes_factor(fit.0.5,fit.0.5.s,silent=TRUE)
Estimated Bayes factor in favor of bridge1 over bridge2: 0.47990
This model has essentially the same inference for the mean of the differences and essentially the same LOO value, but its Bayes factor is off by a factor of 2. Bayes factors are much more influenced by priors than are LOO values or the actual posterior distribution.