## When BIC is Best

The data set

is data for a linear model with one response variable `y`

and
25 predictor variables `x1`

through `x25`

. In real
life, we don't know which of the predictor variables have anything to do
with the response. If we think of all possible regression models in which
the means are linear functions of some subset of the predictors, there
are 2^{25} = 33,554,432 different subsets so about 33 million
possible models.

The following R statements fit the largest model, which includes all the predictors.

This is simulated data and the simulation truth

regression coefficients
are equal to 1 for `x1`

through `x5`

and equal to zero
for `x6`

through `x25`

.

The following R statements fit the simulation truth model.

We want a procedure that does not use the true (pretend unknown) parameter values and still gives reasonable results. We demonstrate model selection with BIC and AIC

First BIC

The R function `regsubsets`

in the `leaps`

package
(on-line
help)
rapidly finds the best fitting model with `p` parameters for each
`p` in the range specified (here 2 to 16). The best subset
according to
BIC
has `p` = 7.

Second AIC

The best subset according to
AIC
has `p` = 9.

## When AIC is Best

The data set

is data for a linear model with one response variable `y`

and
25 predictor variables `x1`

thought `x25`

. In real
life, we don't know which of the predictor variables have anything to do
with the response. If we think of all possible regression models in which
the means are linear functions of some subset of the predictors, there
are 2^{25} = 33,554,432 different subsets so about 33 million
possible models.

The following R statements fit the largest model, which includes all the predictors.

This is simulated data and the simulation truth

regression coefficient
for `x`_{i} is 1 ⁄ (1 + `i`).
All simulation truth regression coefficients are nonzero.
Hence the fit above is the simulation truth model.

We want a procedure that does not use the true (pretend unknown) parameter values and still gives reasonable results. We demonstrate model selection with BIC and AIC

First BIC

The best subset according to
BIC
has `p` = 6.

Second AIC

The best subset according to
AIC
has `p` = 10.

## Summary

When the actual true model is one of the models under consideration and has a small number of nonzero parameters, then BIC is best. It provides consistent model selection as the sample size goes to infinity and AIC does not.

When the actual true model is not one of the models under consideration or has a large number of nonzero parameters, then AIC is best. The consistent estimation property of BIC is meaningless in this context. Moreover BIC tends to pick models that are too small.