Fall Seminar Series  September 20, 2007
University of Minnesota
School of Statistics
College of Liberal Arts

Fully Non-parametric Bayesian Ensemble Modelling

Rob McCulloch
Graduate School of Business
University of Chicago

Thursday, September 20, 2007
3:30 PM, 115 Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall


Abstract

Suppose we would like to learn the relationship between y and a high dimensional vector x based on a limited number of observations. In "BART: Bayesian Additive Regression Trees" (2006), Chipman, George and McCulloch develop a fully Bayesian approach for discovering and drawing inference about an unknown function f based only on assuming y = f(x) + o with iid normal errors. In the spirit of "ensemble models", BART approximates f by a sum of many simple regression tree models, each of which are kept small with a strong regularization prior. In terms of out-of-sample prediction, BART's performance compares favorably with competing methods. Posterior evaluation by a well-mixing MCMC algorithm allows for the natural Bayesian quantification of uncertainty about f. Further, the modular nature of BART facilitates its embedding within larger hierarchical models (for example, see Zhang, Shih and Mueller 2006). In this work, we further extend the flexibility of the BART approach by relaxing the simple iid normal error specification and re- placing it with a Dirichlet process model for the errors. Various speci- fication and prior choices are explored. The costs as well as the benefits of this more flexible approach are illustrated.