Detrending career statistics in professional baseball: Accounting for the steroids era and beyond
There is a long standing debate over how to objectively compare the career achievements of professional athletes from different historical eras. Developing an objective approach will be of particular importance over the next decade as Major League Baseball (MLB) players from the “steroids era” become eligible for Hall of Fame induction. Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from both exogenous and endogenous factors, such as talent dilution from expansion, equipment and training improvements, as well as performance enhancing drugs (PED). In this paper we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics for five statistical categories – hits (H), home runs (HR), runs batted in (RBI), wins (W) and strikeouts (K) – over the 90-year period 1920-2009. We find that the functional form of these pdfs are stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional baseball arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. Using this simple detrending technique, we examine the top 50 all-time careers for H, HR, RBI, W and K. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.
💡 Research Summary
The paper tackles the longstanding problem of objectively comparing baseball careers that span different historical periods, a question that has become especially pressing as players from the “steroids era” become eligible for Hall of Fame induction. The authors propose a simple yet powerful detrending method: each player’s seasonal statistics are divided by the league‑wide average (or median) for that season. By normalizing to the seasonal mean, the technique removes the influence of exogenous factors such as league expansion, changes in equipment, training advances, and endogenous factors like performance‑enhancing drug (PED) usage. The result is a set of “relative” performance numbers that reflect an individual’s ability independent of the era’s overall offensive or pitching environment.
Using data from 1920 to 2009, the study examines five key career metrics—hits (H), home runs (HR), runs batted in (RBI), wins (W), and strikeouts (K). For each metric the authors compare the probability density function (pdf) of raw cumulative totals with the pdf of detrended totals. Both distributions are right‑skewed and retain essentially the same functional shape after detrending, indicating that the underlying statistical regularity is driven by intrinsic talent variation and competition dynamics rather than by era‑specific shifts in baseline performance.
To model the observed distributions, the authors fit a Gamma distribution to each detrended metric. The Gamma’s shape (α) and scale (β) parameters differ across metrics: HR and RBI have low α values, producing heavy tails, while W and K have higher α, yielding thinner tails. This captures the intuition that power‑type statistics generate more extreme outliers than pitching‑type statistics. Leveraging extreme‑value theory, the fitted Gamma models are used to define objective benchmarks corresponding to the top 0.1 % of performances—effectively a statistical “Hall of Fame” threshold for each category.
Applying the detrending and benchmark calculations, the authors reconstruct the top‑50 career lists for each metric. The rankings change only modestly for hits and wins, but for home runs and RBI the detrended list elevates pre‑1990 players, reflecting the inflation of power numbers during the late‑20th‑century PED era. This demonstrates that the method can isolate genuine talent from era‑driven statistical inflation.
Strengths of the work include (1) the transparency and ease of the detrending step, which requires no complex modeling; (2) the consistent application across multiple performance categories, enabling cross‑metric comparisons; and (3) the provision of a mathematically grounded, reproducible benchmark for “extraordinary” careers, directly relevant to Hall of Fame debates. Limitations are also acknowledged: the seasonal average may itself be biased by league‑wide composition (e.g., positional imbalances), PED effects are subsumed into the average rather than isolated, and the exclusive use of the Gamma distribution precludes assessment of alternative heavy‑tailed models such as Pareto or log‑normal.
In conclusion, the study shows that after removing era‑specific baseline shifts, career success in professional baseball follows a stationary statistical law that can be captured by a Gamma distribution. This insight offers a practical framework for objective career evaluation and suggests avenues for future research—such as incorporating park factors, position‑specific adjustments, Bayesian uncertainty quantification, and extending the detrending approach to other sports.
Comments & Academic Discussion
Loading comments...
Leave a Comment