When BIC is Best

The data set

is data for a linear model with one response variable y and 25 predictor variables x1 through x25. In real life, we don't know which of the predictor variables have anything to do with the response. If we think of all possible regression models in which the means are linear functions of some subset of the predictors, there are 225 = 33,554,432 different subsets so about 33 million possible models.

The following R statements fit the largest model, which includes all the predictors.

This is simulated data and the simulation truth regression coefficients are equal to 1 for x1 through x5 and equal to zero for x6 through x25.

The following R statements fit the simulation truth model.

We want a procedure that does not use the true (pretend unknown) parameter values and still gives reasonable results. We demonstrate model selection with BIC and AIC

First BIC

The R function regsubsets in the leaps package (on-line help) rapidly finds the best fitting model with p parameters for each p in the range specified (here 2 to 16). The best subset according to BIC has p = 7.

Second AIC

The best subset according to AIC has p = 9.

When AIC is Best

The data set

is data for a linear model with one response variable y and 25 predictor variables x1 thought x25. In real life, we don't know which of the predictor variables have anything to do with the response. If we think of all possible regression models in which the means are linear functions of some subset of the predictors, there are 225 = 33,554,432 different subsets so about 33 million possible models.

The following R statements fit the largest model, which includes all the predictors.

This is simulated data and the simulation truth regression coefficient for xi is 1 ⁄ (1 + i). All simulation truth regression coefficients are nonzero. Hence the fit above is the simulation truth model.

We want a procedure that does not use the true (pretend unknown) parameter values and still gives reasonable results. We demonstrate model selection with BIC and AIC

First BIC

The best subset according to BIC has p = 6.

Second AIC

The best subset according to AIC has p = 10.

Summary

When the actual true model is one of the models under consideration and has a small number of nonzero parameters, then BIC is best. It provides consistent model selection as the sample size goes to infinity and AIC does not.

When the actual true model is not one of the models under consideration or has a large number of nonzero parameters, then AIC is best. The consistent estimation property of BIC is meaningless in this context. Moreover BIC tends to pick models that are too small.