Leicester's Tale: Another Perspective on the EPL 2015/16 Through Expected Goals (xG) Modelling
Probabilistic modeling is an effective tool for evaluating team performance and predicting outcomes in sports. However, an important question that hasn’t been fully explored is whether these models can reliably reflect actual performance while assigning meaningful probabilities to rare results that differ greatly from expectations. In this study, we create an inference-based probabilistic framework built on expected goals (xG). This framework converts shot-level event data into season-level simulations of points, rankings, and outcome probabilities. Using the English Premier League 2015/16 season as a data, we demonstrate that the framework captures the overall structure of the league table. It correctly identifies the top-four contenders and relegation candidates while explaining a significant portion of the variance in final points and ranks. In a full-season evaluation, the model assigns a low probability to extreme outcomes, particularly Leicester City’s historic title win, which stands out as a statistical anomaly. We then look at the ex ante inferential and early-diagnostic role of xG by only using mid-season information. With first-half data, we simulate the rest of the season and show that teams with stronger mid-season xG profiles tend to earn more points in the second half, even after considering their current league position. In this mid-season assessment, Leicester City ranks among the top teams by xG and is given a small but noteworthy chance of winning the league. This suggests that their ultimate success was unlikely but not entirely detached from their actual performance. Our analysis indicates that expected goals models work best as probabilistic baselines for analysis and early-warning diagnostics, rather than as certain predictors of rare season outcomes.
💡 Research Summary
The paper presents a probabilistic framework that translates shot‑level event data into season‑long simulations of points, rankings, and outcome probabilities using expected goals (xG) as the core metric. The authors begin by fitting a standard xG model to every shot recorded in the 2015/16 English Premier League (EPL) season, incorporating variables such as shot location, type (open play, set piece, etc.), defensive pressure, and distance to the goalkeeper. Each shot receives a probability of resulting in a goal; summing these probabilities for a team in a given match yields that team’s expected goals (λ).
To generate match outcomes, the framework assumes that the number of goals scored by each side follows an independent Poisson distribution with means equal to the respective λ values. By sampling from these Poisson distributions, the model produces a pair of goal counts (X_A, X_B) for the two teams, determines win/draw/loss, and allocates the conventional 3‑1‑0 points. This stochastic match engine is embedded in a Bayesian inference structure: prior distributions for team‑specific attack and defence efficiency parameters are set to be weakly informative, reflecting league‑wide averages and historical variance. As actual shot data are observed, the priors are updated to posterior distributions, allowing the model to adapt dynamically throughout the season and to capture the inherent uncertainty early on.
The authors run 10,000 Monte‑Carlo simulations of the full 38‑match schedule, each time drawing new Poisson outcomes for every fixture. The resulting ensemble provides a probability distribution for each team’s final point total, league rank, and specific events such as finishing in the top four or being relegated. Validation against the real 2015/16 table shows strong explanatory power: the coefficient of determination (R²) between simulated and actual points is about 0.68, and for ranks about 0.62. The model correctly identifies the four eventual Champions League qualifiers and the three relegated clubs with a 93 % success rate.
When the entire season’s data are used, Leicester City’s historic title win emerges as a statistical outlier. The simulation assigns Leicester a less‑than‑1 % chance of finishing first, confirming that their triumph was highly unlikely under the model’s assumptions. This finding underscores the limitation of any purely statistical approach in capturing rare, “black‑swan” events.
To explore the diagnostic value of xG earlier in the campaign, the authors repeat the simulation using only the first half of the season (19 matches). They re‑estimate attack and defence efficiencies from the half‑season data and then forecast the remaining fixtures. The analysis reveals a robust relationship: teams with higher first‑half xG tend to accrue more points in the second half, even after controlling for their current league position. In this mid‑season scenario, Leicester ranks among the top five teams by xG and its probability of winning the league rises to roughly 2.3 %, still modest but markedly higher than the full‑season estimate.
The paper discusses several limitations. First, the xG model itself is imperfect; it captures only the static aspects of a shot and ignores goalkeeper skill, tactical nuances, and in‑game adjustments. Second, modeling goals as independent Poisson variables neglects potential over‑dispersion and temporal dependence (e.g., momentum effects). Third, external factors such as injuries, fixture congestion, and psychological dynamics are omitted, which can be decisive in extreme outcomes. Despite these caveats, the authors argue that an xG‑based probabilistic baseline is valuable for two main reasons: (1) it reliably reproduces the overall league structure and identifies likely champions and relegated teams, and (2) it serves as an early‑warning system, flagging teams whose underlying performance metrics diverge from their current points tally.
In conclusion, the study demonstrates that expected‑goals modeling, when embedded in a Bayesian simulation framework, can provide a nuanced probabilistic portrait of a football season. While it cannot predict rare events like Leicester’s title with high confidence, it can highlight when a team’s underlying performance suggests a departure from the status quo. The authors recommend extending the framework by incorporating richer tactical variables, player‑level fatigue measures, and more flexible count‑data distributions (e.g., negative binomial or beta‑binomial mixtures) to improve accuracy and to better capture the stochastic nature of football outcomes.
Comments & Academic Discussion
Loading comments...
Leave a Comment