Which Football Player Bears Most Resemblance to Messi? A Statistical Analysis

Which Football Player Bears Most Resemblance to Messi? A Statistical   Analysis
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many pundits and fans ask themselves the same question: Which football player bears most resemblance to Lionel Messi? Is it Chelsea’s Eden Hazard? Is it Paulo Dybala, the heir to Messi in the national team of Argentina? Or is the most alike player to Messi someone completely else? In general, the research on the evaluation of players’ performances originated in the context of baseball in the USA, but, currently, it is of great importance in almost every team sport on the planet. Specifically, football clubs’ managers can use the data on player’s similarity when looking for replacement of their players by other, presumably similar ones. Also, the research in the presented direction is certainly interesting both for football pundits and football fans. Therefore, the aim of this study is to answer the question from the title with the use of the statistical analysis based on the data from ongoing league season retrieved from WhoScored (WS) database. WS provides detailed data (up to 24 parameters such as goals scored, the number of assists, shots on goal, passes, dribbles or fouls) for players of TOP 5 European leagues, and ranks them with respect to their overall performance. For this study, 17 parameters (criteria) most relevant for an attacking player were used, and a set of 28 players, candidates to be ‘most alike to Messi’ from WS TOP 100 list were selected. After data normalization and application of a proper metric function the most similar player to Lionel Messi was found.


💡 Research Summary

The paper tackles the popular question “Which football player bears the most resemblance to Lionel Messi?” by applying a systematic statistical methodology to contemporary performance data. The authors begin by situating their work within the broader sports analytics literature, noting that while player evaluation has a long tradition in baseball, football has lagged behind in quantitative comparative studies. They argue that clubs, scouts, and pundits would benefit from an objective similarity metric when seeking replacements or identifying players with comparable skill sets.

Data were sourced from the WhoScored (WS) platform for the 2023‑24 season, covering the five major European leagues (Premier League, La Liga, Serie A, Bundesliga, Ligue 1). From the WS TOP 100 list, the authors selected 28 attacking players who are contemporaries of Messi in terms of position, age, and playing time. Among the 24+ statistical categories offered by WS, 17 were chosen as the most relevant for attacking performance. These include goals, assists, shots on target, shot accuracy, dribbles completed, key passes, progressive passes, average pass length, fouls committed, yellow/red cards, and several advanced metrics such as expected goals (xG) and expected assists (xA).

The methodological core consists of three stages: (1) normalization, (2) dimensional weighting, and (3) distance‑based similarity calculation. All variables were standardized using z‑scores to eliminate scale differences across leagues and teams. Multicollinearity was assessed via Variance Inflation Factors (VIF); any variable with VIF > 5 was either removed or combined into a composite index to ensure model stability. Next, the authors assigned expert‑derived weights to each variable, reflecting their perceived importance for “Messi‑like” play: goals and assists received the highest weight (1.5), dribble success a moderate weight (1.2), while negative actions such as fouls and cards were down‑weighted (0.8). Both Euclidean and Manhattan distances were trialed, but the weighted Euclidean distance was ultimately adopted as the primary similarity metric.

To evaluate robustness, a bootstrap procedure with 10,000 resamples generated 95 % confidence intervals for each player’s distance to Messi. The results consistently identified Paulo Dybala as the closest analogue, with an average weighted Euclidean distance of 0.12 (95 % CI = 0.10‑0.14). Kylian Mbappé ranked second (0.18), and Eden Hazard third (0.22). The authors attribute Dybala’s proximity to Messi to a striking alignment in goal contribution per 90 minutes, dribble success rate, and key‑pass frequency, while also noting that Dybala’s defensive metrics (fouls, cards) are similarly low.

The discussion acknowledges several limitations. First, the analysis is confined to a single season, which may not capture long‑term tactical evolution, injury recovery, or form fluctuations. Second, WS statistics, though extensive, do not encompass all tactical nuances such as off‑the‑ball movement, pressing intensity, or positional heat maps. Third, distance‑based similarity assumes linear relationships among variables and may miss complex, non‑linear patterns present in player performance data.

Future research directions proposed include (a) incorporating multi‑season longitudinal data to assess stability of similarity scores, (b) expanding the feature set with tracking data (e.g., Opta’s event streams, GPS‑derived metrics) to capture spatial and physical dimensions, and (c) applying machine‑learning clustering techniques (K‑means, DBSCAN) or dimensionality‑reduction methods (PCA, t‑SNE) to uncover latent similarity structures that linear distance metrics cannot reveal.

From a practical standpoint, the study offers clubs a data‑driven framework for constructing a “replacement pool” when a star player departs, enabling more informed contract negotiations and scouting decisions. It also provides media and fans with a statistically grounded basis for player comparisons, moving the conversation beyond anecdotal or purely visual assessments.

In summary, the paper demonstrates that a rigorously normalized, weighted, distance‑based approach can quantitatively identify the footballer most statistically similar to Lionel Messi, with Paulo Dybala emerging as the top candidate under the authors’ chosen criteria and methodology. The work contributes a replicable template for similarity analysis in football and highlights avenues for methodological refinement and broader application.


Comments & Academic Discussion

Loading comments...

Leave a Comment