R version 3.2.2 (2015-08-14) -- "Fire Safety" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > > options(width = 132) > > # So new R package (shipped to CRAN and accepted in 1 hour and 45 minutes!) > > library(glmbb) > data(crabs) > > gout.AIC <- glmbb(satell ~ color * spine * width * weight, data = crabs) Warning messages: 1: glm.fit: fitted rates numerically 0 occurred 2: glm.fit: fitted rates numerically 0 occurred 3: glm.fit: fitted rates numerically 0 occurred 4: glm.fit: fitted rates numerically 0 occurred 5: glm.fit: fitted rates numerically 0 occurred > sout.AIC <- summary(gout.AIC) > sout.AIC Results of search for hierarchical models with lowest AIC. Search was for all models with AIC no larger than min(AIC) + 10 These are shown below. criterion weight formula 894.4 0.145338 satell ~ color*spine*width + spine*width*weight 895.5 0.082022 satell ~ color*spine*width + color*width*weight + spine*width*weight 895.8 0.071091 satell ~ color*spine + color*width + width*weight 896.4 0.053926 satell ~ color*spine + spine*weight 896.8 0.043010 satell ~ color*spine + color*width*weight 896.9 0.041625 satell ~ color*spine*width + color*spine*weight + color*width*weight 897.0 0.039209 satell ~ color*spine + width*weight 897.0 0.038240 satell ~ color*spine + spine*width + width*weight 897.2 0.034691 satell ~ color*spine + spine*weight + width*weight 897.3 0.033555 satell ~ color*spine + color*width + spine*weight + width*weight 897.4 0.031368 satell ~ width + color*spine + spine*weight 897.5 0.031181 satell ~ color*spine + color*width + spine*weight 897.8 0.026234 satell ~ color*spine + color*weight + width*weight 898.2 0.021556 satell ~ color*spine + spine*weight + color*width*weight 898.4 0.018996 satell ~ color*spine + color*width + spine*width + width*weight 898.7 0.016963 satell ~ weight + color*spine + spine*width 899.0 0.014257 satell ~ color*spine + color*weight + spine*weight 899.1 0.013977 satell ~ color*weight + color*spine*width + spine*width*weight 899.3 0.012488 satell ~ color*spine + spine*width + spine*weight + width*weight 899.3 0.012072 satell ~ color*spine*width + color*spine*weight + color*width*weight + spine*width*weight 899.7 0.010147 satell ~ width*weight + color*spine*width 899.7 0.010021 satell ~ color*spine + spine*width + color*width*weight 899.7 0.009917 satell ~ color*spine + color*weight + spine*weight + width*weight 899.8 0.009825 satell ~ color*spine*weight + color*width*weight 899.8 0.009483 satell ~ color*spine + spine*width + color*weight + width*weight 899.8 0.009415 satell ~ color*spine*weight 900.0 0.008761 satell ~ color*spine + spine*width + spine*weight 900.1 0.008462 satell ~ weight + color*spine + color*width + spine*width 900.1 0.008103 satell ~ color*spine*width + color*spine*weight 900.2 0.007995 satell ~ color*spine + color*width + spine*width + spine*weight 900.2 0.007833 satell ~ width + color*spine + color*weight + spine*weight 900.2 0.007808 satell ~ color*spine + color*width + spine*width + spine*weight + width*weight 900.4 0.007201 satell ~ weight + color*spine + color*width 900.5 0.006944 satell ~ color*spine + color*weight 900.5 0.006755 satell ~ width*weight + color*spine*width + color*spine*weight 900.6 0.006359 satell ~ color*spine + spine*width + spine*weight + color*width*weight 900.6 0.006344 satell ~ spine*weight + width*weight + color*spine*width 900.7 0.006199 satell ~ color*spine*width + color*width*weight 900.8 0.005963 satell ~ color*spine + color*width + color*weight + width*weight 901.1 0.004955 satell ~ color*spine + spine*width + color*weight 901.2 0.004832 satell ~ width + color*spine*weight 901.2 0.004710 satell ~ spine*weight + color*spine*width 901.3 0.004445 satell ~ width*weight + color*spine*weight 901.3 0.004441 satell ~ color*spine*width*weight 901.4 0.004337 satell ~ spine*weight + color*spine*width + color*width*weight 901.5 0.004125 satell ~ color*spine + spine*width + color*weight + spine*weight + width*weight 902.2 0.002922 satell ~ width + color*spine + color*weight 902.3 0.002753 satell ~ weight + color*spine*width 902.4 0.002626 satell ~ color*spine + spine*width + color*weight + spine*weight 902.7 0.002312 satell ~ color*spine + color*width + color*weight + spine*weight 902.7 0.002272 satell ~ color*spine + color*width + color*weight + spine*weight + width*weight 902.8 0.002179 satell ~ spine*width + color*spine*weight + color*width*weight 902.9 0.002001 satell ~ color*spine + spine*width*weight 903.0 0.001977 satell ~ color*spine*width + color*spine*weight + spine*width*weight 903.0 0.001967 satell ~ color*width + color*spine*weight 903.5 0.001533 satell ~ spine*width + width*weight + color*spine*weight 903.5 0.001485 satell ~ color*spine + color*width + spine*width + color*weight + width*weight 903.5 0.001481 satell ~ color*width*weight 903.6 0.001412 satell ~ color*width + width*weight + color*spine*weight 903.8 0.001328 satell ~ color*spine + color*width + spine*width*weight 903.9 0.001231 satell ~ weight + color*spine 904.0 0.001186 satell ~ spine*width + color*spine*weight 904.0 0.001174 satell ~ color*spine + color*width + color*weight 904.4 0.000982 satell ~ spine + color*width*weight > > # Weight is so-called Akaike weight > # > # w <- exp(- criterion / 2) > # weight <- w / sum(w) > # > # So these are estimated Kullback-Leibler information normalized to sum to one. > # > # In order to not have too many, only models with AIC difference versus minimum > # less than 10 are listed and figure in the normalization. > # > # The idea is that these are sort of like a Bayesian posterior (more on this > # below). From that we see that AIC is all over the place and does not really > # favor any model. Also all of these have been checked > # https://github.com/cjgeyer/glmbb/blob/master/package/glmbb/tests/foo.Rout.save > # so the R function step really did fail to find the lowest AIC model > # in s3-try2.Rout. > > # now for BIC > > gout.BIC <- glmbb(satell ~ color * spine * width * weight, + data = crabs, criterion = "BIC") Warning message: glm.fit: fitted rates numerically 0 occurred > sout.BIC <- summary(gout.BIC) > sout.BIC Results of search for hierarchical models with lowest BIC. Search was for all models with BIC no larger than min(BIC) + 10 These are shown below. criterion weight formula 921.2 0.902698 satell ~ width*weight 926.5 0.064747 satell ~ weight 929.5 0.014021 satell ~ color + width*weight 930.1 0.010556 satell ~ spine + width*weight 930.7 0.007978 satell ~ width + weight > > # there is an enormous difference. BIC choses a much smaller model. > # Now the weight is the approximate Bayes factor, the approximate posterior > # probability of the model if using a flat prior on models. But why would > # a Bayesian use a flat prior (some objectivist notion, perhaps). Anyway, > # if we had a flat prior on models, more than 90% of the posterior probability > # would be on the model with the smallest BIC. And if we didn't have a > # flat prior, we could adjust these weights to be the approximate posterior > # for our prior > # > # weight <- weight * prior > # weight <- weight / sum(weight) > # > # would do the job > > # The problem here is that AIC and BIC are large n results (they use the > # chi-square approximation to the likelihood ratio test statistic) and > # n is not really large here. > # > # For situations like this one should really use AICc, which is AIC > # "corrected for small sample size". See Burnham and Anderson (2002) > # for more on this. > > gout.AICc <- glmbb(satell ~ color * spine * width * weight, + data = crabs, criterion = "AICc") Warning messages: 1: glm.fit: fitted rates numerically 0 occurred 2: glm.fit: fitted rates numerically 0 occurred 3: glm.fit: fitted rates numerically 0 occurred 4: glm.fit: fitted rates numerically 0 occurred 5: glm.fit: fitted rates numerically 0 occurred > sout.AICc <- summary(gout.AICc) > sout.AICc Results of search for hierarchical models with lowest AICc. Search was for all models with AICc no larger than min(AICc) + 10 These are shown below. criterion weight formula 899.4 0.166634 satell ~ color*spine + spine*weight 900.0 0.121157 satell ~ color*spine + width*weight 900.2 0.109952 satell ~ color*spine + color*width + width*weight 900.9 0.078183 satell ~ width + color*spine + spine*weight 901.0 0.075682 satell ~ color*spine + spine*width + width*weight 901.2 0.068659 satell ~ color*spine + spine*weight + width*weight 902.2 0.042278 satell ~ weight + color*spine + spine*width 902.2 0.040575 satell ~ color*spine + color*weight + width*weight 902.4 0.037077 satell ~ color*spine + color*width + spine*weight 902.8 0.030171 satell ~ color*spine + color*width + spine*weight + width*weight 903.5 0.022051 satell ~ color*spine + color*weight + spine*weight 903.9 0.017307 satell ~ color*spine + color*weight 904.0 0.017080 satell ~ color*spine + color*width + spine*width + width*weight 904.2 0.014849 satell ~ color*spine + spine*width + spine*weight + width*weight 904.3 0.014252 satell ~ weight + color*spine + color*width 904.4 0.013550 satell ~ color*spine + spine*width + spine*weight 904.8 0.011270 satell ~ color*spine*width + spine*width*weight 904.9 0.010636 satell ~ color*spine + color*width*weight 905.0 0.010062 satell ~ weight + color*spine + color*width + spine*width 905.2 0.009314 satell ~ width + color*spine + color*weight + spine*weight 905.3 0.008917 satell ~ color*spine + color*weight + spine*weight + width*weight 905.4 0.008527 satell ~ color*spine + spine*width + color*weight + width*weight 906.0 0.006294 satell ~ color*spine*weight 906.1 0.005892 satell ~ color*spine + spine*width + color*weight 906.1 0.005782 satell ~ width + color*spine + color*weight 906.2 0.005583 satell ~ weight + color*spine 906.3 0.005345 satell ~ color*spine + color*width + spine*width + spine*weight 906.9 0.003986 satell ~ color*spine + color*width + color*weight + width*weight 907.0 0.003814 satell ~ color*spine + color*width + spine*width + spine*weight + width*weight 907.0 0.003692 satell ~ color*width*weight 907.1 0.003558 satell ~ width*weight + color*spine*width 907.5 0.002911 satell ~ color*width + width*weight 907.8 0.002508 satell ~ color*spine + spine*weight + color*width*weight 907.9 0.002360 satell ~ width + color*spine*weight 908.1 0.002125 satell ~ color + width*weight 908.2 0.002015 satell ~ color*spine + spine*width + color*weight + spine*weight + width*weight 908.4 0.001835 satell ~ width + weight + color*spine 908.5 0.001755 satell ~ color*spine + spine*width + color*weight + spine*weight 908.7 0.001637 satell ~ color*weight + width*weight 908.8 0.001559 satell ~ width*weight + color*spine*weight 908.8 0.001519 satell ~ spine + color*width*weight 908.8 0.001505 satell ~ width*weight 909.1 0.001345 satell ~ weight + color*spine*width 909.1 0.001337 satell ~ color*spine + spine*width*weight 909.3 0.001166 satell ~ color*spine + spine*width + color*width*weight 909.3 0.001165 satell ~ spine*weight + color*spine*width 909.4 0.001129 satell ~ color*spine + color*width + color*weight + spine*weight > > # So what are these models? What are the other criteria for them? Where > # does the best on each list come on other lists? > # > # To answer these we need some other functions that I should put in the package > # but the rules say no more than one update for a package every 2 months, so > # we will just have to leave them here > > get.best.model <- function(object, k = 1) { + # get k-th best model + stopifnot(inherits(object, "glmbb")) + stopifnot(length(k) == 1) + stopifnot(is.numeric(k)) + stopifnot(is.finite(k)) + stopifnot(k == round(k)) + stopifnot(k > 0) + fits <- ls(envir = object$envir, pattern = "^sha1") + criteria <- Map(function(x) get(x, envir = object$envir)$criterion, fits) + criteria <- unlist(criteria) + ours <- sort(criteria)[k] + if (ours > min(criteria) + object$cutoff) + warning(paste(k, "-th", sep = ""), "best is above cutoff for search.\n", + "Result not accurate") + i <- which(criteria == ours) + get(fits[i], envir = object$envir) + } > > AICc <- function(object) { + stopifnot(inherits(object, "glm")) + # https://en.wikipedia.org/wiki/Akaike_information_criterion#AICc + # for definition of AICc + p <- sum(! is.na(object$coefficients)) + n <- length(object$fitted.values) + AIC(object) + 2 * p * (p + 1) / (n - p - 1) + } > > best.AIC <- get.best.model(gout.AIC) > AIC(best.AIC) [1] 894.3717 > BIC(best.AIC) [1] 979.5106 > AICc(best.AIC) [1] 904.7993 > best.BIC <- get.best.model(gout.BIC) > AIC(best.BIC) [1] 908.5877 > BIC(best.BIC) [1] 921.2009 > AICc(best.BIC) [1] 908.8258 > best.AICc <- get.best.model(gout.AICc) > AIC(best.AICc) [1] 896.3546 > BIC(best.AICc) [1] 943.654 > AICc(best.AICc) [1] 899.4119 > > range(sout.AIC$results$criterion) [1] 894.3717 904.3661 > range(sout.BIC$results$criterion) [1] 921.2009 930.6581 > range(sout.AICc$results$criterion) [1] 899.4119 909.4007 > > # we see that the best BIC model is off the AIC list entirely > > AIC(best.BIC) > max(sout.AIC$results$criterion) [1] TRUE > > # we see that the best AICc model is somewhere on the AIC list > > i <- which(sout.AIC$results$criterion == AIC(best.AICc)) > sout.AICc$results$formula[1] == sout.AIC$results$formula[i] [1] TRUE > i [1] 4 > > # we see that the best AIC model and the best AICc model are > # both off the BIC list entirely > > BIC(best.AIC) > max(sout.BIC$results$criterion) [1] TRUE > BIC(best.AICc) > max(sout.BIC$results$criterion) [1] TRUE > > # we see that the best AIC model is off the AICc list entirely > > AICc(best.AIC) > max(sout.AICc$results$criterion) [1] FALSE > > # we see that the best BIC model is somewhere on the AICc list > > i <- which(sout.AICc$results$criterion == AICc(best.BIC)) > sout.BIC$results$formula[1] == sout.AICc$results$formula[i] [1] TRUE > i [1] 42 > > # THE MORALS OF THE STORY > # > # (1) Stepwise is a TTD (thing to do). It doesn't have any other properties. > # It isn't guaranteed to find anything. > # > # (2) AIC isn't always optimal (even if you believe its story about the > # true model not being any model under consideration -- in this case > # either not a hierarchical model or the sampling scheme not being > # Poisson) because it is only valid as n goes to infinity and here > # sample size is not particularly large compared to the number of > # parameters in the models AIC is "selecting". For the model > # deemed best by AIC we have > > n <- nrow(crabs) > p <- sum(! is.na(best.AIC$coefficients)) > n [1] 173 > p [1] 27 > n / p [1] 6.407407 > > # AICc is recommended in cases like this (even if you believe the > # AIC story, because AICc has the same story -- it is just AIC > # corrected for n not being infinity) > # > # (3) BIC can choose MUCH smaller models than AIC or AICc. The AIC and AICc > # best models aren't even on the Bayesian's list. Their approximate > # posterior probabilties are, respectively, > > delta <- BIC(best.AIC) - BIC(best.BIC) > delta [1] 58.3097 > exp(- delta / 2) * sout.BIC$results$weight[1] [1] 1.966764e-13 > > delta <- BIC(best.AICc) - BIC(best.BIC) > delta [1] 22.45312 > exp(- delta / 2) * sout.BIC$results$weight[1] [1] 1.202014e-05 > > # Of course, as always with asymptotic approximation, 1.966764e-13 doesn't > # mean close to this number in the sense of relative error, but rather > # in the sense of absolute error. So it just means really small. Perhaps > # way less than 0.0001 if n is "large". But here n is not that large > # so perhaps not even that! > # > # (4) It matters which information criterion you pick (or something else > # like cross-validation or bootstrap if you like them). Don't just > # ignorantly use AIC. It's OK if you use AIC after judicious > # consideration! But the R function step doesn't give us a choice! > > > proc.time() user system elapsed 2.815 0.027 3.120