Chapter 8 Example 2.4

Predictive model comparison via AIC attempts to find the model that would produce the highest likelihood when applied to a new set of data. The log likelihood of the model for the current data is overly optimistic about how well it would work on an independent replication of the data and is biased too high by \(p\), the number of parameters fit in the current model. Thus we might estimate the log likelihood for future data by the log likelihood for this model and this data minus \(p\).

AIC works on the scale of 2 times the log likelihood, which aligns it with likelihood ratio tests. It also works with the negative log likelihood, so we try to minimize AIC instead of maximize likelihood. Letting \(L\) be the log likelihood for a model, we have \[ AIC = -2 L + 2 p \] Models with small AIC will tend to be better predictors of future data. Adding more parameters always makes the likelihood higher, but the additional parameters need to raise \(L\) by enough to overcome the increase in \(p\).

There are a number of other criteria around that could be used instead of AIC. For example, AICc uses a correction the the “add 2p” rule that works better for small sample sizes. BIC adds the penalty \(\log(N)p\) (where \(N\) is the sample size) instead of \(2p\); this makes it less likely to include a parameter unless it is really needed.

If the true model is among the models you are comparing, BIC will eventually find it in large sample sizes. However, BIC may not work well when all the models are approximate. Conversely, AIC can pick out a reasonable model even when all of the models are approximate, but it will not necessarily home in on the true model if the true model is among the selection group, even in large sample sizes.

Will will demonstrate using the data from the RunStitch example in cfcdae, which we have used before. Each worker run stitches collars using two different setups: the conventional setup and an ergonomic setup. The two runs are made in random order for each worker, and the interest is in any difference in average speed between the two setups.

Load the runstitch data from the package and create differences between Standard and Ergonomic.

> data(RunStitch)
> differences <- RunStitch[,"Standard"] - RunStitch[,"Ergonomic"]
We want to compare three models. Mr. Skeptical thinks that the mean should be 0, and Mr. Enthusiastic thinks that the mean should be .5; we also want the model that fits the mean.
> null0model <- lm(differences~0)
> nullp5model <- lm(differences-.5~0)
> fullmodel <- lm(differences~1)
AIC just slightly prefers the full model over the mean 0 model:
> AIC(null0model)
[1] 61.97586
> AIC(fullmodel)
[1] 61.76274
> AIC(nullp5model)
[1] 66.75583
BIC, on the other hand, prefers the mean 0 model:
> BIC(null0model)
[1] 63.37706
> BIC(nullp5model)
[1] 68.15703
> BIC(fullmodel)
[1] 64.56513

The model assuming a mean of .5 is never really in the running. Sorry Mr. Enthusiastic.