Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Mo deling Matc hes as Language: A Generativ e T ransformer Approac h for Coun terfactual Pla y er V aluation in F o otball Miru Hong 1 , Minho Lee 2 , Geonhee Jo 1 , Hy eokje Jo 1 , P ascal Bauer 2 , 4 , and Sang-Ki K o 1 1 Departmen t of Artiﬁcial Intelligence, Univ ersity of Seoul, Seoul, Republic of Korea {mirunoyume,geonhee,brandon56,sangkiko}@uos.ac.kr 2 Institute for Sp orts and Preven tiv e Medicine, Saarland Universit y , Saarbrück en, German y minho.lee@uni-saarland.de 3 Chair for Sp orts Analytics, Saarland Universit y; Deutscher F ussball-Bund (DFB), German y pascal.bauer@uni-saarland.de Abstract. Ev aluating fo otball pla yer transfers is challenging because pla yer actions dep end strongly on tactical systems, teammates, and match con text. Despite this complexity , recruitment decisions often rely on static statistics and sub jective exp ert judgment, which do not fully ac- coun t for these contextual factors. This limitation stems largely from the absence of counterfactual simulation mechanisms capable of predicting outcomes in hypothetical scenarios. T o address these challenges, w e pro- p ose ScoutGPT, a generative mo del that treats football match even ts as sequential tokens within a language modeling framework. Utilizing a NanoGPT-based T ransformer arc hitecture trained on next-tok en predic- tion, ScoutGPT learns the dynamics of match even t sequences to simu- late even t sequences under hypothetical lineups, demonstrating sup erior predictiv e p erformance compared to existing baseline models. Leveraging this capability , the mo del employs Monte Carlo sampling to enable coun- terfactual simulation, allo wing for the assessment of unobserved scenar- ios. Exp erimen ts on K Le ague data show that sim ulated play er transfers lead to measurable changes in oﬀensive progression and goal probabil- ities, indicating that ScoutGPT captures play er-sp eciﬁc impact b ey ond traditional static metrics. Keyw ords: Sports Even t Sequence Mo deling · Coun terfactual T ransfer Sim ulation · Pla yer V aluation · Autoregressive T ransformer 1 In tro duction Ev aluating individual contribution is c hallenging in complex m ulti-agent envi- ronmen ts, where b eha vior dep ends not only on an agent’s o wn abilit y but also on interactions with surrounding agents and context. F o otball provides a partic- ularly demanding instance of this problem: play er actions are shap ed by tactical roles, teammates, opp onen ts, and match state. As a result, pla yer transfer ev al- uation cannot b e reduced to a like-for-lik e replacement problem, since mo ving a 2 Authors Suppressed Due to Excessiv e Length pla yer to a new team alters the tactical conﬁguration and reshap es interaction patterns on the pitch. T ransfer ev aluation therefore requires estimating ho w a pla yer will b eha v e under this distribution shift, rather than extrap olating di- rectly from past p erformance alone. Existing metho ds only partially address this problem. T raditional v aluation framew orks such as Exp ected Threat (xT) [21] and V aluing Actions by Esti- mating Probabilities (V AEP) [7,8] quantify the v alue of observed even ts, but they do not generate ho w action sequences would evolv e under a new tactical con text. Pro jection systems in other sp orts typically op erate at the level of ag- gregate season outcomes and therefore do not capture the micro-in teractions that shap e fo otball actions on the pitch. Recent generative approac hes in sports analytics often fo cus on con tinuous tra jectories, whic h represent spatial mov e- men t but not the tactical semantics of discrete fo otball ev ents [5,24,6]. Prior w ork has also studied even t-based sequence mo deling for next-even t prediction in fo otball [20,26,15], but these approaches are generally designed to predict observ ed con tinuations rather than generate even t sequences under hypotheti- cal transfer scenarios. Another line of work estimates On-Ball V alue (OBV) b y predicting future tokens in an even t sequence, enabling counterfactual contin u- ation of play [12]. How ever, these approaches generate only short fragments of a sequence, limiting v alue estimation to that small segmen t of play . In contrast, ev aluating transfer scenarios requires generating full even t sequences under a new con text, enabling v alue computation ov er the en tire simulated p ossession. T o address this problem, we in tro duce ScoutGPT, an autoregressiv e gen- erativ e framework for football even t streams related to Large Ev ent Models (LEMs) [16]. ScoutGPT treats a match as a structured sequence in whic h each ev ent is decomp osed in to discrete attributes through tokenization and predicted sequen tially via next-token prediction, conditioned on play er identit y and match con text. Alongside next-action prediction, the mo del estimates scoring and con- ceding probabilities at each step, aligning generated sequences with match v alue (V AEP) and supp orting even t-level simulation of hypothetical pla yer transfers under new tactical en vironments [2,9,15]. T o summarize, our main con tributions are as follows: – Structured Even t Mo deling for Context-A ware Sim ulation: W e intro- duce a ﬁne-grained tokenization scheme that decomp oses fo otball ev ents into seman tic comp onen ts (e.g., actor, lo cation, and action t yp e). This structure enables ScoutGPT to capture dep endencies across even t attributes and mo del fo otball even t sequences at a ﬁner granularit y . – V alue-A ware Generative Modeling: W e propose a m ulti-task learning ob jective that combines next-token prediction with explicit scoring and con- ceding probability estimation. This design encourages the model to reﬂect b oth ev ent lik eliho od and match v alue, and improv es predictive p erformance o ver non-v alue-aw are v ariants. – Coun terfactual Sim ulation for Play er Recruitment: W e show that Scout- GPT can simulate how a play er’s on-ball contribution proﬁle shifts in a new tactical en vironment, supp orting data-driven analysis of transfer ﬁt. Generativ e T ransformer for Counterfactual Play er V aluation 3 Counterfactual Simulation What if Player A had replaced player B at that moment? D e B r u y n e A w a y C M M c T o m i n a y C a r r y G P T H ø j l u n d C o n t e x t A w a y C a r r y . . . 6 1 7 1 . . . A w a y R W U n k n o w n C M A w a y D e B r u y n e U n k n o w n D e B r u y n e C a r r y 6 1 3 9 . . . A w a y S u c c e s s S u c c e s s Active lineup T actical roles Pre-episode Game state Event Sequence T eam, Player, T ype, x , y , End x , End y ... Event sequence Context Shared GPT Backbone T rue T ocken Prediction Loss : Ignored Context Lead in Sequence tokens Player Scout GPT S t u r c t u r a l C o n s t r a i n t s V E R S A Logit Masking ? 5 0 ? Output Sequence T rue Prediction of Goal Scoring and Conceding Probabilities within 15 Seconds 10:20 10:30 10:37 Goal Scored False Probability loss Auxiliary Head Episode Component Home Next T oken Head 10:20 10:30 10:37 A way True True False True True False Minutes Gs Minutes Gc Monte Carlo Simulation for Statistical Stability Fig. 1. Overview of the ScoutGPT framework. Our nanoGPT-based T ransformer mo del autoregressively predicts even t tokens, enabling counterfactual ‘what-if ’ sim- ulations. F or instance, replacing Kevin De Bruyne with Scott McT ominay could alter actions (e.g., pass/shot) or mo dify the same action with a diﬀeren t lo cation, outcome, or V AEP . 2 Related W ork Our work sits at the in tersection of three lines of research: data-driven pla yer v aluation, generative mo deling of sp orts even t streams, and counterfactual sim- ulation for pla yer transfers. Data-Driv en Play er V aluation A ction-v alue frameworks hav e b ecome the standard for data-driven play er v aluation. V AEP quantiﬁes play er contribu- tion by aggregating short-horizon changes in scoring and conceding probabilities across all on-ball actions [7], while EPV decomp oses instantaneous p ossession v alue into interpretable sub components [11]. Play eRank extends this further by constructing multi-dimensional, role-aw are play er ratings from large-scale ev ent logs [18]. Collectiv ely , these methods pro vide strong discriminative estimators for observ ed b eha vior. How ever, they ev aluate actions that ha ve already o ccurred and are not designed to generate counterfactual even t sequences under hypo- thetical team conﬁgurations—a requirement that arises when assessing transfer ﬁt. Generativ e Mo deling of Sp orts Data Seq2Even t [20] and Large Even t Mod- els (LEMs) [16] frame fo otball ev ents as structured sequential prediction prob- 4 Authors Suppressed Due to Excessiv e Length lems, decomp osing eac h even t into multiple attributes and supp orting matc h con tinuation rollouts from a given game state. NMSTPP [25] and related neu- ral p oin t pro cess mo dels [10,29] extend even t-sequence mo deling to contin uous- time streams with explicit timing and mark distributions. Despite strong short- horizon predictive accuracy , these approac hes optimize primarily for sequence lik eliho o d and do not incorp orate goal-oriented supervision. Moreo ver, entit y- conditioning for play er substitution is either absent or indirect, making it diﬃ- cult to hold the surrounding context ﬁxed while replacing a sp eciﬁc play er—a requiremen t for counterfactual transfer simulation. Sequence Modeling for Even t Streams T ransformer-based architectures [23,4] ha ve b een applied to sp orts ev ent streams by treating matc hes as sequences of discrete tokens to b e predicted autoregressively [1,3,17,15]. These mo dels cap- ture complex long-range dep endencies across even t sequences more eﬀectively than recurrent alternatives. Standard next-token ob jectives, ho wev er, prioritize frequen t actions and do not account for the tactical v alue of decisions or their impact on match outcomes. In addition, unconstrained generation can pro duce logically inconsistent even t transitions o ver longer horizons. ScoutGPT addresses b oth limitations b y pairing the autoregressiv e ob jective with explicit v alue su- p ervision and VERSA-based constraint masking [13]. Coun terfactual Sim ulation in Sp orts Macro-lev el transfer forecasting— including baseball pro jection systems (ZiPS, PECOT A) and so ccer abilit y-curve regression [2]—predicts aggregate season statistics from historical data and age curv es, but op erates at a coarse gran ularity that cannot capture even t-lev el tactical dynamics. Graph-based metho ds represent play ers as no des in a rela- tional netw ork to recommend p ositionally similar replacemen ts [27], but do not mo del how a play er’s b eha vior w ould change in a new team con text. At the mi- cro level, hierarc hical Bay esian xG estimation [14] and causal play er ev aluation framew orks [22] isolate the counterfactual impact of individual actions, yet they cannot generate the sequential tactical even t sequences needed to assess a full transfer scenario. T acEleven [28] lev erages language mo dels to explore attacking tactics but fo cuses on fragmented tactical paths and do es not accoun t for the systemic b eha vioral distribution shift that arises when a play er mov es to a new team. Ev entGPT [12] applies generative language modeling to fo otball even t se- quences, but its generation is limited to short fragmen ts of play , requiring the remaining v alue to b e appro ximated via residual OBV instead of being computed from fully sim ulated sequences. 3 Metho dology This section describ es ScoutGPT, including VERSA-based data v eriﬁcation [13], structured tok enization, and a v alue-aw are m ulti-task ob jective. Generativ e T ransformer for Counterfactual Play er V aluation 5 3.1 Data Representation and V eriﬁcation Reliable generative mo deling requires training data that satisﬁes fo otball’s log- ical and ph ysical constraints. Raw even t streams often contain inconsistencies suc h as missing even ts or temp oral ordering errors, so w e preprocess all data with VERSA. VERSA uses a formal state-transition mo del to enforce v alidity rules and automatically correct anomalies (e.g., inserting missing Pass R e c eive d ev ents or reordering physically imp ossible sequences). This prepro cessing yields logically consisten t training sequences and prev ents the mo del from in ternalizing annotation errors as v alid tactical b ehaviors. 3.2 Problem F orm ulation W e represent a fo otball match as a collection of discrete episo des, M = { E 1 , E 2 , . . . , E K } . Eac h episo de E k is a coherent phase of play (e.g., a possession chain starting from a recov ery or set-piece), consisting of a global context C k and an even t sequence E k = { e 1 , e 2 , . . . , e T } . In the raw data, each ev ent e raw t ∈ E k is recorded as a 12-dimensional tuple, including explicit lab els for goal o ccurrences: e raw t = ( h t , p os t , p t , a t , x start t , y start t , x end t , y end t , ∆t t , o t , gs t , gc t ) , where g s t , g c t ∈ { 0 , 1 } indicate whether a goal was scored or conceded at step t . Crucially , to preven t label leak age during the autoregressive generation pro cess, w e remov e gs t and gc t from the input sequence. Th us, the mo del observ es a 10-dimensional input tuple: e t = ( h t , p os t , p t , a t , x start t , y start t , x end t , y end t , ∆t t , o t ) . Our ob jectiv e is to model the joint probability P ( e t , gs t , gc t | C k , e 100 ), we apply a sliding window with ﬁxed stride (e.g., 50 even ts), creating o verlapping ch unks while keeping the con text window ﬁxed. Explicit p osition tokens are masked from the input to encourage the model to learn play er representations from broader even t context rather than direct p osition lab els; we verify this eﬀect in the play er-em b edding analysis. 3.4 ScoutGPT Architecture W e utilize the nanoGPT arc hitecture 4 , an eﬃcient im plemen tation of the GPT-2 deco der-only T ransformer [19]. Bac kb one Given the input sequence S , we map tokens to dense vectors using a learned tok en embedding matrix W wte and add learned absolute p ositional em b eddings W wpe . The mo del emplo ys a stack of Pre-Lay erNorm T ransformer blo c ks. Let x ( l ) denote the input to the l -th T ransformer blo c k. The blo c k computes: ˜ x ( l ) = x ( l ) + MSA ( LN ( x ( l ) )) x ( l +1) = ˜ x ( l ) + MLP ( LN ( ˜ x ( l ) )) , where MSA is Causal Multi-Head Self-A ttention and MLP is a feed-forward net work with GELU activ ation. Auxiliary Heads for V alue Estimation T o mo del action v alue, w e attach t wo auxiliary classiﬁcation heads (Head GS and Head GC ) to the mo del’s ﬁnal hidden state h ( L ) : logit GS t = Head GS ( h ( L ) t, outcome ) and logit GC t = Head GC ( h ( L ) t, outcome ) . Unlik e the language mo deling head, which predicts every next token, these auxiliary heads are activ ated only at Outcome-token indices ( o t ), i.e., the last tok en of each even t blo c k. This lets the mo del estimate immediate scoring and conceding probabilities after eac h action outcome. 4 https://github.com/karpathy/nanoGPT Generativ e T ransformer for Counterfactual Play er V aluation 7 3.5 Multi-T ask T raining Ob jectiv e W e train ScoutGPT using a comp osite loss function that balances generative capabilit y with v alue estimation. Generativ e Loss ( L gen ) The primary ob jectiv e is the cross-entrop y loss for next-tok en prediction. W e apply a masking strategy to ignore padding tok ens and, where applicable, mask sp eciﬁc ﬁelds (e.g., play er IDs) to prev ent ov erﬁt- ting or to fo cus learning on tactical dynamics. In particular, pla yer-ID prediction is excluded b ecause, during inference, play er identit y is injected based on p osi- tional assignment rather than generated through unconstrained autoregressive deco ding. This k eeps the training ob jective consistent with the generation pro- cedure: L gen = − X i log P ( s i +1 | s ≤ i ) . Goal-Orien ted Auxiliary Loss ( L aux ) W e compute auxiliary Cross-Entrop y (CE) losses for Goal Scored (GS) and Goal Conceded (GC) predictions at outcome- tok en p ositions. F or each outcome p osition, the mo del predicts whether the cur- ren t action leads to a goal scored or conceded even t: L aux = X t ∈T out  CE( ˆ y GS t , y GS t ) + CE( ˆ y GC t , y GC t )  , where T out is the set of indices corresp onding to outcome tokens, and y GS t , y GC t are the ground-truth lab els retrieved from the raw data. T otal Loss The ﬁnal ob jective is a weigh ted sum: L total = L gen + L aux . 3.6 Inference with Structural Constraints Generating realistic fo otball sequences requires strict game-rule and logical con- sistency . Standard sampling can pro duce syn tactically v alid but physically in- v alid sequences, so we use State-Dep enden t Logit Masking with a spatial heuris- tic for agen t resolution. Hierarc hical Deco ding and State-Dep enden t Masking The model gener- ates tokens in the ﬁxed hierarc hical order deﬁned in Section 3.3. At each step t , w e apply a v alidit y mask M t to the output logits based on the partial state s TrueTeam t ( H ) p ( H ) 1 u ( H ) 1 · · · p ( H ) 11 u ( H ) 11 | FalseTeam t ( A ) p ( A ) 1 u ( A ) 1 · · · p ( A ) 11 u ( A ) 11 ϕ ( p eriod , minute , score , cards ) (1) T able 8. Structure of the con text block used in the input sequence. The blo ck enco des team identit y , on-pitch lineup, and compact match-state information. Sym b ol Description t ( H ) , t ( A ) Home and aw ay team tokens p i P osition tok en of the i -th play er u i Pla yer tok en of the i -th play er ϕ ( · ) Matc h-state summary function (p eriod, minute, home goals, aw a y goals, y ellow/red cards) 18 Authors Suppressed Due to Excessive Length A.2 Multi-P osition Play ers T able 9. Positional distribution of represen tative multi-role play ers. Min utes indicate total playing time in eac h p osition. Pla yer Play ed p ositions (minutes) Jinsub Park CB (4,383), CDM (2,081), CM (1,849) Masatoshi Ishida CM (2,282), CAM (1,693), R W (287), CF (237), RF (141), LM (90), L W (77), LF (67) Sangho Na L W (3,670), R W (1,508), LM (1,073), RM (692), CF (451), LF (405), RF (90) Seungw on Jeong R WB (2,355), CM (2,021), R W (450), RM (270), RB (199), CAM (90)

Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment