SCHOOL OF STATISTICS
BUEHLER-MARTIN DISTINGUISHED LECTURER SERIES
Established by Mr. and Mrs. Thomas Martin
In memory of Robert J. Buehler, Professor of Statistics 1963-1988
In-Season Prediction of Batting Averages:
A Field-test of Basic Empirical Bayes and Bayes Methodologies
Lawrence D. Brown
Statistics Department, Wharton School
University of Pennsylvania
Tuesday, April 29, 2008
3:30
– 4:30 pm
Physics 210
Refreshments
3:00 Ford
300
Batting
average is one of the principle performance measures for an
individual baseball player. It has a simple numerical structure as
the percentage of successful attempts, “Hits”, as a
proportion of the total number of qualifying attempts, “At-Bats”.
This situation, with Hits as a number of successes within a
qualifying number of attempts, makes it natural to statistically
model each player’s batting average as a binomial variable
outcome, with a given value of
and a true (but unknown) value of
that represents the player’s latent ability. This is a common
data structure in many statistical applications; and so the
methodological study here has implications for such a range of
applications.
We
will look at batting records for every Major League player over the
course of a single season (2005). The primary focus is on using only
the batting record from an earlier part of the season (e.g.,
the first 3 months) in order to predict the batter’s latent
ability,
,
and consequently to predict their batting-average performance for the
remainder of the season. Since we are using a season that has already
concluded, we can validate our predictive performance by comparing
the predicted values to the actual values for the remainder of the
season.
The methodological purpose of this study is to gain experience with a variety of predictive methods applicable to a much wider range of situations. Several of the methods to be investigated derive from empirical Bayes and hierarchical Bayes interpretations. Although the general ideas behind these techniques have been understood for many decades*, some of these methods have only been refined relatively recently in a manner that promises to more accurately fit data such as that at hand.
One feature of all of the statistical methodologies here is the preliminary use of a particular form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables with known variances. This transformation technique is also useful in validating the binomial model assumption that is the conceptual basis for all our analyses. If time permits we will also describe how it can be used to test for the presence of “streaky hitters” whose latent ability appears to significantly change over time.
* A particularly relevant background reference is Efron, B. and Morris, C. (1977) Stein’s paradox in statistics” Scientific American 236 119-127, and the earlier, more technical version (1975), “Data analysis using Stein’s estimator and its generalizations” Jour. Amer. Stat. Assoc. 70 311-319.
![]()