Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Evaluating football player transfers is challenging because player actions depend strongly on tactical systems, teammates, and match context. Despite this complexity, recruitment decisions often rely on static statistics and subjective expert judgmen…

Authors: Miru Hong, Minho Lee, Geonhee Jo

Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football
Mo deling Matc hes as Language: A Generativ e T ransformer Approac h for Coun terfactual Pla y er V aluation in F o otball Miru Hong 1 , Minho Lee 2 , Geonhee Jo 1 , Hy eokje Jo 1 , P ascal Bauer 2 , 4 , and Sang-Ki K o 1 1 Departmen t of Artificial Intelligence, Univ ersity of Seoul, Seoul, Republic of Korea {mirunoyume,geonhee,brandon56,sangkiko}@uos.ac.kr 2 Institute for Sp orts and Preven tiv e Medicine, Saarland Universit y , Saarbrück en, German y minho.lee@uni-saarland.de 3 Chair for Sp orts Analytics, Saarland Universit y; Deutscher F ussball-Bund (DFB), German y pascal.bauer@uni-saarland.de Abstract. Ev aluating fo otball pla yer transfers is challenging because pla yer actions dep end strongly on tactical systems, teammates, and match con text. Despite this complexity , recruitment decisions often rely on static statistics and sub jective exp ert judgment, which do not fully ac- coun t for these contextual factors. This limitation stems largely from the absence of counterfactual simulation mechanisms capable of predicting outcomes in hypothetical scenarios. T o address these challenges, w e pro- p ose ScoutGPT, a generative mo del that treats football match even ts as sequential tokens within a language modeling framework. Utilizing a NanoGPT-based T ransformer arc hitecture trained on next-tok en predic- tion, ScoutGPT learns the dynamics of match even t sequences to simu- late even t sequences under hypothetical lineups, demonstrating sup erior predictiv e p erformance compared to existing baseline models. Leveraging this capability , the mo del employs Monte Carlo sampling to enable coun- terfactual simulation, allo wing for the assessment of unobserved scenar- ios. Exp erimen ts on K Le ague data show that sim ulated play er transfers lead to measurable changes in offensive progression and goal probabil- ities, indicating that ScoutGPT captures play er-sp ecific impact b ey ond traditional static metrics. Keyw ords: Sports Even t Sequence Mo deling · Coun terfactual T ransfer Sim ulation · Pla yer V aluation · Autoregressive T ransformer 1 In tro duction Ev aluating individual contribution is c hallenging in complex m ulti-agent envi- ronmen ts, where b eha vior dep ends not only on an agent’s o wn abilit y but also on interactions with surrounding agents and context. F o otball provides a partic- ularly demanding instance of this problem: play er actions are shap ed by tactical roles, teammates, opp onen ts, and match state. As a result, pla yer transfer ev al- uation cannot b e reduced to a like-for-lik e replacement problem, since mo ving a 2 Authors Suppressed Due to Excessiv e Length pla yer to a new team alters the tactical configuration and reshap es interaction patterns on the pitch. T ransfer ev aluation therefore requires estimating ho w a pla yer will b eha v e under this distribution shift, rather than extrap olating di- rectly from past p erformance alone. Existing metho ds only partially address this problem. T raditional v aluation framew orks such as Exp ected Threat (xT) [21] and V aluing Actions by Esti- mating Probabilities (V AEP) [7,8] quantify the v alue of observed even ts, but they do not generate ho w action sequences would evolv e under a new tactical con text. Pro jection systems in other sp orts typically op erate at the level of ag- gregate season outcomes and therefore do not capture the micro-in teractions that shap e fo otball actions on the pitch. Recent generative approac hes in sports analytics often fo cus on con tinuous tra jectories, whic h represent spatial mov e- men t but not the tactical semantics of discrete fo otball ev ents [5,24,6]. Prior w ork has also studied even t-based sequence mo deling for next-even t prediction in fo otball [20,26,15], but these approaches are generally designed to predict observ ed con tinuations rather than generate even t sequences under hypotheti- cal transfer scenarios. Another line of work estimates On-Ball V alue (OBV) b y predicting future tokens in an even t sequence, enabling counterfactual contin u- ation of play [12]. How ever, these approaches generate only short fragments of a sequence, limiting v alue estimation to that small segmen t of play . In contrast, ev aluating transfer scenarios requires generating full even t sequences under a new con text, enabling v alue computation ov er the en tire simulated p ossession. T o address this problem, we in tro duce ScoutGPT, an autoregressiv e gen- erativ e framework for football even t streams related to Large Ev ent Models (LEMs) [16]. ScoutGPT treats a match as a structured sequence in whic h each ev ent is decomp osed in to discrete attributes through tokenization and predicted sequen tially via next-token prediction, conditioned on play er identit y and match con text. Alongside next-action prediction, the mo del estimates scoring and con- ceding probabilities at each step, aligning generated sequences with match v alue (V AEP) and supp orting even t-level simulation of hypothetical pla yer transfers under new tactical en vironments [2,9,15]. T o summarize, our main con tributions are as follows: – Structured Even t Mo deling for Context-A ware Sim ulation: W e intro- duce a fine-grained tokenization scheme that decomp oses fo otball ev ents into seman tic comp onen ts (e.g., actor, lo cation, and action t yp e). This structure enables ScoutGPT to capture dep endencies across even t attributes and mo del fo otball even t sequences at a finer granularit y . – V alue-A ware Generative Modeling: W e propose a m ulti-task learning ob jective that combines next-token prediction with explicit scoring and con- ceding probability estimation. This design encourages the model to reflect b oth ev ent lik eliho od and match v alue, and improv es predictive p erformance o ver non-v alue-aw are v ariants. – Coun terfactual Sim ulation for Play er Recruitment: W e show that Scout- GPT can simulate how a play er’s on-ball contribution profile shifts in a new tactical en vironment, supp orting data-driven analysis of transfer fit. Generativ e T ransformer for Counterfactual Play er V aluation 3 Counterfactual Simulation What if Player A had replaced player B at that moment? D e B r u y n e A w a y C M M c T o m i n a y C a r r y G P T H ø j l u n d C o n t e x t A w a y C a r r y . . . 6 1 7 1 . . . A w a y R W U n k n o w n C M A w a y D e B r u y n e U n k n o w n D e B r u y n e C a r r y 6 1 3 9 . . . A w a y S u c c e s s S u c c e s s Active lineup T actical roles Pre-episode Game state Event Sequence T eam, Player, T ype, x , y , End x , End y ... Event sequence Context Shared GPT Backbone T rue T ocken Prediction Loss : Ignored Context Lead in Sequence tokens Player Scout GPT S t u r c t u r a l C o n s t r a i n t s V E R S A Logit Masking ? 5 0 ? Output Sequence T rue Prediction of Goal Scoring and Conceding Probabilities within 15 Seconds 10:20 10:30 10:37 Goal Scored False Probability loss Auxiliary Head Episode Component Home Next T oken Head 10:20 10:30 10:37 A way True True False True True False Minutes Gs Minutes Gc Monte Carlo Simulation for Statistical Stability Fig. 1. Overview of the ScoutGPT framework. Our nanoGPT-based T ransformer mo del autoregressively predicts even t tokens, enabling counterfactual ‘what-if ’ sim- ulations. F or instance, replacing Kevin De Bruyne with Scott McT ominay could alter actions (e.g., pass/shot) or mo dify the same action with a differen t lo cation, outcome, or V AEP . 2 Related W ork Our work sits at the in tersection of three lines of research: data-driven pla yer v aluation, generative mo deling of sp orts even t streams, and counterfactual sim- ulation for pla yer transfers. Data-Driv en Play er V aluation A ction-v alue frameworks hav e b ecome the standard for data-driven play er v aluation. V AEP quantifies play er contribu- tion by aggregating short-horizon changes in scoring and conceding probabilities across all on-ball actions [7], while EPV decomp oses instantaneous p ossession v alue into interpretable sub components [11]. Play eRank extends this further by constructing multi-dimensional, role-aw are play er ratings from large-scale ev ent logs [18]. Collectiv ely , these methods pro vide strong discriminative estimators for observ ed b eha vior. How ever, they ev aluate actions that ha ve already o ccurred and are not designed to generate counterfactual even t sequences under hypo- thetical team configurations—a requirement that arises when assessing transfer fit. Generativ e Mo deling of Sp orts Data Seq2Even t [20] and Large Even t Mod- els (LEMs) [16] frame fo otball ev ents as structured sequential prediction prob- 4 Authors Suppressed Due to Excessiv e Length lems, decomp osing eac h even t into multiple attributes and supp orting matc h con tinuation rollouts from a given game state. NMSTPP [25] and related neu- ral p oin t pro cess mo dels [10,29] extend even t-sequence mo deling to contin uous- time streams with explicit timing and mark distributions. Despite strong short- horizon predictive accuracy , these approac hes optimize primarily for sequence lik eliho o d and do not incorp orate goal-oriented supervision. Moreo ver, entit y- conditioning for play er substitution is either absent or indirect, making it diffi- cult to hold the surrounding context fixed while replacing a sp ecific play er—a requiremen t for counterfactual transfer simulation. Sequence Modeling for Even t Streams T ransformer-based architectures [23,4] ha ve b een applied to sp orts ev ent streams by treating matc hes as sequences of discrete tokens to b e predicted autoregressively [1,3,17,15]. These mo dels cap- ture complex long-range dep endencies across even t sequences more effectively than recurrent alternatives. Standard next-token ob jectives, ho wev er, prioritize frequen t actions and do not account for the tactical v alue of decisions or their impact on match outcomes. In addition, unconstrained generation can pro duce logically inconsistent even t transitions o ver longer horizons. ScoutGPT addresses b oth limitations b y pairing the autoregressiv e ob jective with explicit v alue su- p ervision and VERSA-based constraint masking [13]. Coun terfactual Sim ulation in Sp orts Macro-lev el transfer forecasting— including baseball pro jection systems (ZiPS, PECOT A) and so ccer abilit y-curve regression [2]—predicts aggregate season statistics from historical data and age curv es, but op erates at a coarse gran ularity that cannot capture even t-lev el tactical dynamics. Graph-based metho ds represent play ers as no des in a rela- tional netw ork to recommend p ositionally similar replacemen ts [27], but do not mo del how a play er’s b eha vior w ould change in a new team con text. At the mi- cro level, hierarc hical Bay esian xG estimation [14] and causal play er ev aluation framew orks [22] isolate the counterfactual impact of individual actions, yet they cannot generate the sequential tactical even t sequences needed to assess a full transfer scenario. T acEleven [28] lev erages language mo dels to explore attacking tactics but fo cuses on fragmented tactical paths and do es not accoun t for the systemic b eha vioral distribution shift that arises when a play er mov es to a new team. Ev entGPT [12] applies generative language modeling to fo otball even t se- quences, but its generation is limited to short fragmen ts of play , requiring the remaining v alue to b e appro ximated via residual OBV instead of being computed from fully sim ulated sequences. 3 Metho dology This section describ es ScoutGPT, including VERSA-based data v erification [13], structured tok enization, and a v alue-aw are m ulti-task ob jective. Generativ e T ransformer for Counterfactual Play er V aluation 5 3.1 Data Representation and V erification Reliable generative mo deling requires training data that satisfies fo otball’s log- ical and ph ysical constraints. Raw even t streams often contain inconsistencies suc h as missing even ts or temp oral ordering errors, so w e preprocess all data with VERSA. VERSA uses a formal state-transition mo del to enforce v alidity rules and automatically correct anomalies (e.g., inserting missing Pass R e c eive d ev ents or reordering physically imp ossible sequences). This prepro cessing yields logically consisten t training sequences and prev ents the mo del from in ternalizing annotation errors as v alid tactical b ehaviors. 3.2 Problem F orm ulation W e represent a fo otball match as a collection of discrete episo des, M = { E 1 , E 2 , . . . , E K } . Eac h episo de E k is a coherent phase of play (e.g., a possession chain starting from a recov ery or set-piece), consisting of a global context C k and an even t sequence E k = { e 1 , e 2 , . . . , e T } . In the raw data, each ev ent e raw t ∈ E k is recorded as a 12-dimensional tuple, including explicit lab els for goal o ccurrences: e raw t = ( h t , p os t , p t , a t , x start t , y start t , x end t , y end t , ∆t t , o t , gs t , gc t ) , where g s t , g c t ∈ { 0 , 1 } indicate whether a goal was scored or conceded at step t . Crucially , to preven t label leak age during the autoregressive generation pro cess, w e remov e gs t and gc t from the input sequence. Th us, the mo del observ es a 10-dimensional input tuple: e t = ( h t , p os t , p t , a t , x start t , y start t , x end t , y end t , ∆t t , o t ) . Our ob jectiv e is to model the joint probability P ( e t , gs t , gc t | C k , e 100 ), we apply a sliding window with fixed stride (e.g., 50 even ts), creating o verlapping ch unks while keeping the con text window fixed. Explicit p osition tokens are masked from the input to encourage the model to learn play er representations from broader even t context rather than direct p osition lab els; we verify this effect in the play er-em b edding analysis. 3.4 ScoutGPT Architecture W e utilize the nanoGPT arc hitecture 4 , an efficient im plemen tation of the GPT-2 deco der-only T ransformer [19]. Bac kb one Given the input sequence S , we map tokens to dense vectors using a learned tok en embedding matrix W wte and add learned absolute p ositional em b eddings W wpe . The mo del emplo ys a stack of Pre-Lay erNorm T ransformer blo c ks. Let x ( l ) denote the input to the l -th T ransformer blo c k. The blo c k computes: ˜ x ( l ) = x ( l ) + MSA ( LN ( x ( l ) )) x ( l +1) = ˜ x ( l ) + MLP ( LN ( ˜ x ( l ) )) , where MSA is Causal Multi-Head Self-A ttention and MLP is a feed-forward net work with GELU activ ation. Auxiliary Heads for V alue Estimation T o mo del action v alue, w e attach t wo auxiliary classification heads (Head GS and Head GC ) to the mo del’s final hidden state h ( L ) : logit GS t = Head GS ( h ( L ) t, outcome ) and logit GC t = Head GC ( h ( L ) t, outcome ) . Unlik e the language mo deling head, which predicts every next token, these auxiliary heads are activ ated only at Outcome-token indices ( o t ), i.e., the last tok en of each even t blo c k. This lets the mo del estimate immediate scoring and conceding probabilities after eac h action outcome. 4 https://github.com/karpathy/nanoGPT Generativ e T ransformer for Counterfactual Play er V aluation 7 3.5 Multi-T ask T raining Ob jectiv e W e train ScoutGPT using a comp osite loss function that balances generative capabilit y with v alue estimation. Generativ e Loss ( L gen ) The primary ob jectiv e is the cross-entrop y loss for next-tok en prediction. W e apply a masking strategy to ignore padding tok ens and, where applicable, mask sp ecific fields (e.g., play er IDs) to prev ent ov erfit- ting or to fo cus learning on tactical dynamics. In particular, pla yer-ID prediction is excluded b ecause, during inference, play er identit y is injected based on p osi- tional assignment rather than generated through unconstrained autoregressive deco ding. This k eeps the training ob jective consistent with the generation pro- cedure: L gen = − X i log P ( s i +1 | s ≤ i ) . Goal-Orien ted Auxiliary Loss ( L aux ) W e compute auxiliary Cross-Entrop y (CE) losses for Goal Scored (GS) and Goal Conceded (GC) predictions at outcome- tok en p ositions. F or each outcome p osition, the mo del predicts whether the cur- ren t action leads to a goal scored or conceded even t: L aux = X t ∈T out  CE( ˆ y GS t , y GS t ) + CE( ˆ y GC t , y GC t )  , where T out is the set of indices corresp onding to outcome tokens, and y GS t , y GC t are the ground-truth lab els retrieved from the raw data. T otal Loss The final ob jective is a weigh ted sum: L total = L gen + L aux . 3.6 Inference with Structural Constraints Generating realistic fo otball sequences requires strict game-rule and logical con- sistency . Standard sampling can pro duce syn tactically v alid but physically in- v alid sequences, so we use State-Dep enden t Logit Masking with a spatial heuris- tic for agen t resolution. Hierarc hical Deco ding and State-Dep enden t Masking The model gener- ates tokens in the fixed hierarc hical order defined in Section 3.3. At each step t , w e apply a v alidit y mask M t to the output logits based on the partial state s TrueTeam t ( H ) p ( H ) 1 u ( H ) 1 · · · p ( H ) 11 u ( H ) 11 | FalseTeam t ( A ) p ( A ) 1 u ( A ) 1 · · · p ( A ) 11 u ( A ) 11 ϕ ( p eriod , minute , score , cards ) (1) T able 8. Structure of the con text block used in the input sequence. The blo ck enco des team identit y , on-pitch lineup, and compact match-state information. Sym b ol Description t ( H ) , t ( A ) Home and aw ay team tokens p i P osition tok en of the i -th play er u i Pla yer tok en of the i -th play er ϕ ( · ) Matc h-state summary function (p eriod, minute, home goals, aw a y goals, y ellow/red cards) 18 Authors Suppressed Due to Excessive Length A.2 Multi-P osition Play ers T able 9. Positional distribution of represen tative multi-role play ers. Min utes indicate total playing time in eac h p osition. Pla yer Play ed p ositions (minutes) Jinsub Park CB (4,383), CDM (2,081), CM (1,849) Masatoshi Ishida CM (2,282), CAM (1,693), R W (287), CF (237), RF (141), LM (90), L W (77), LF (67) Sangho Na L W (3,670), R W (1,508), LM (1,073), RM (692), CF (451), LF (405), RF (90) Seungw on Jeong R WB (2,355), CM (2,021), R W (450), RM (270), RB (199), CAM (90)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment