SCHOOL OF STATISTICS

BUEHLER-MARTIN DISTINGUISHED LECTURER SERIES

Established by Mr. and Mrs. Thomas Martin

In memory of Robert J. Buehler, Professor of Statistics 1963-1988


In-Season Prediction of Batting Averages:

A Field-test of Basic Empirical Bayes and Bayes Methodologies


Lawrence D. Brown

Statistics Department, Wharton School

University of Pennsylvania


Tuesday, April 29, 2008

3:30 – 4:30 pm Physics 210

Refreshments 3:00 Ford 300


Batting average is one of the principle performance measures for an individual baseball player. It has a simple numerical structure as the percentage of successful attempts, “Hits”, as a proportion of the total number of qualifying attempts, “At-Bats”. This situation, with Hits as a number of successes within a qualifying number of attempts, makes it natural to statistically model each player’s batting average as a binomial variable outcome, with a given value of and a true (but unknown) value of that represents the player’s latent ability. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications.

We will look at batting records for every Major League player over the course of a single season (2005). The primary focus is on using only the batting record from an earlier part of the season (e.g., the first 3 months) in order to predict the batter’s latent ability, , and consequently to predict their batting-average performance for the remainder of the season. Since we are using a season that has already concluded, we can validate our predictive performance by comparing the predicted values to the actual values for the remainder of the season.

The methodological purpose of this study is to gain experience with a variety of predictive methods applicable to a much wider range of situations. Several of the methods to be investigated derive from empirical Bayes and hierarchical Bayes interpretations. Although the general ideas behind these techniques have been understood for many decades*, some of these methods have only been refined relatively recently in a manner that promises to more accurately fit data such as that at hand.

One feature of all of the statistical methodologies here is the preliminary use of a particular form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables with known variances. This transformation technique is also useful in validating the binomial model assumption that is the conceptual basis for all our analyses. If time permits we will also describe how it can be used to test for the presence of “streaky hitters” whose latent ability appears to significantly change over time.

* A particularly relevant background reference is Efron, B. and Morris, C. (1977) Stein’s paradox in statistics” Scientific American 236 119-127, and the earlier, more technical version (1975), “Data analysis using Stein’s estimator and its generalizations” Jour. Amer. Stat. Assoc. 70 311-319.