Predicting sports scoring dynamics with restoration and anti-persistence

Predicting sp orts scoring dynamics with restoration and an ti-p ersistence Leto P eel 1 , ∗ and Aaron Clauset 1, 2, 3 , † 1 Dep artment of Computer Scienc e, University of Color ado, Boulder, CO 80309 2 BioF r ontiers Institute, University of Color ado, Boulder, CO 80303 3 Santa F e Institute, 1399 Hyde Park R d., Santa F e, NM 87501 Professional team sp orts pro vide an excellen t domain for studying the dynamics of so cial com- p etitions. These games are constructed with simple, well-deﬁned rules and pa yoﬀs that admit a high-dimensional set of p ossible actions and non trivial scoring dynamics. The resulting gameplay and eﬀorts to predict its evolution are the ob ject of great in terest to b oth sports professio nals and en- th usiasts. In this pap er, w e consider t wo online prediction problems for team sp orts: giv en a partially observ ed game Who wil l sc or e next? and ultimately Who wil l win? W e presen t no vel interpretable generativ e mo dels of within-game scoring that allo w for dep endence on lead size ( r estor ation ) and on the last team to score ( anti-p ersistenc e ). W e then apply these models to comprehensive within- game scoring data for four sports leagues ov er a ten y ear perio d. By assessing these mo dels’ relative go odness-of-ﬁt w e shed new ligh t on the underlying mechanisms driving the observ ed scoring dynam- ics of each sp ort. F urthermore, in b oth predictiv e tasks, the p erformance of our mo dels consisten tly outp erforms baselines mo dels, and our mo dels mak e quan titative assessments of the latent team skill, ov er time. I. INTR ODUCTION Comp etition in so cial systems is a natural and perv a- siv e mechanism for impro ving performance and distribut- ing limited resources. The quan titative study of such comp etitions can improv e our ability to predict the out- comes asso ciated with speciﬁc strategies and the strategic c hoices that competitors may make. How ev er, most real comp etitions take place in complex and evolving environ- men ts [15, 26], which makes them diﬃcult to study . Pro- fessional team sp orts, with their well deﬁned and consis- ten tly enforced rules, pro vide a controlled setting for the study of comp etition dynamics [14, 27, 29] and hav e pre- viously been used as model systems for studying business decision making and human behavioral biases [28, 34]. The recen t trend tow ard recording comprehensive and detailed data on the even ts within particular games pro- vides us with new opp ortunities to study , mo del, and predict the dynamics of these games. The results of these studies promise to shed new ligh t on a wide v ariet y of ex- isting competitive so cial systems, and enhance work on designing new ones, b oth oﬄine and online. Here, we examine the time series of scoring ev en ts in all league games across four diﬀerent team sp orts ov er a p eriod of ten years. W e construct and test probabilistic mo dels for t wo online predictiv e tasks: given a partially observ ed game Who wil l sc or e next? and ultimately Who wil l win? W e then use these mo dels to inv estigate the predictiv eness of the dynamical phenomena of r estor ation and anti-p ersistenc e , which are deﬁned b elow. The even ts within a particular game can b e eﬀectively mo deled as the in teraction of skill and c hance. Inferring skill from a series of competitions has a long history of ∗ leto.peel@colorado.edu † aaron.clauset@colorado.edu study , both for individuals [11, 17] and for teams [20, 35]. Ho wev er, this past work has t ypically only considered the ﬁnal outcome of games, in terms of either a win or loss, or the ﬁnal point diﬀerence. Here, w e fo cus on mo deling the speciﬁc pattern of scoring ev en ts within an individual game. The role of chance also has a long history of study , t ypically fo cusing on the question of whether one success increases the likelihoo d of subsequent success. This idea can b e formalized at diﬀerent lev els, e.g., success by in- dividual pla y ers within a game [2, 16, 39], or a team’s success acros s multiple games [1, 32, 37]. Here, for the ﬁrst time, w e focus on a diﬀerent level: success b y a whole team within a game. A simple starting point for such models is the basic idea of many skill ranking systems [6, 11], whic h mo del game outcomes as random v ariables dep enden t on the comp eting teams’ skills. W e extend this idea to consider the point-scoring even ts within a game to b e a sequence of indep enden t contests. Past work supp orts this approac h, as some studies ha ve found a lac k of dep endence b et ween an individual scoring and their ability to score subse- quen t p oin ts [16, 39], or b etw een a team winning and their chance to win future games [32, 37]. On the other hand, there is also evidence of non-indep endence, e.g., the probabilit y of scoring itself can v ary with the clo c k time within a game or with the size of the lead [14, 26, 27]. T o in vestigate the degree to whic h non-indep endence gov- erns scoring probabilities, w e construct a sequence of more complex mo dels that allo w sp eciﬁc aspects of a game’s current state to inﬂuence scoring rates, e.g., the team that scored last and the lead size. In many sp orts, including American football and bas- k etball, a simple source of non-indep endence is a forced c hange in ball p ossession after each scoring even t, putting the scoring team at a disadv antage. This can result in a phenomenon called anti-p ersistenc e , in whic h a score by one team is more likely to b e follow ed by a score by their 2 opp onen t [14]. Another p oten tial source of non-indep endence is the size of the lead itself. P ast work has shown that the observ ed probabilit y of scoring next can v ary with lead size [14, 27]. A negative dep endence may b e the result of strategy , e.g., a team using its best play ers when it falls b ehind and substituting them out when they are ahead. Suc h strategies hav e a r estor ative eﬀect on the lead size, serving to pull the size of the lead back tow ard zero. Con versely , anti-restoration or momen tum o ccurs when the leading team has a higher chance of scoring again, p erhaps by improving their con trol ov er the playing ﬁeld or by learning from gameplay how to b etter exploit the w eaknesses of the opp osing team. In this pap er, w e dev elop probabilistic generative mo d- els around these ideas to explore and predict the ev olu- tion of p oint scoring ov er the course of a game. W e use these mo dels to deduce the impact of c hance, strategy , and the rules of the game itself, and to test t wo simple h yp otheses: 1. the probability of scoring do es not dep end on the curren t state of the game (team skill alone mat- ters). 2. the probability of scoring do es dep end on the cur- ren t game state (as w ell as team skill). Our probabilistic mo dels enco de speciﬁc instances of these assumptions and we assess their accuracy under t wo online predictiv e tasks. W e present no v el predictive mo dels that can not only predict the outcome of a game, but also provide b etter predictions o v er baseline mo dels ab out the sequence of scoring ev en ts. I I. RELA TED W ORK Our work addresses t w o nov el prediction problems for predicting Who wil l sc or e next? and Who wil l win? , us- ing only the sequence of scoring even ts that hav e already o ccured during the game. In the following we outline re- lated work to eac h of these questions in turn. Essen tial to answering the question Who wil l sc or e next? is understanding the underlying mechanisms of scoring dynamics. The study of competitive team sp orts has a ric h history spanning a broad selection of features including the timing of scoring even ts [7, 12, 14, 21, 27, 36, 39], long-range correlations in scoring [30], the role of timeouts [31], the identiﬁcation of safe leads [8], and the impact of spatial p ositioning and playing ﬁeld design [5, 26, 40]. The most relev ant of these studies fo cuses on the analysis of individual pla y er “momen- tum” or “hot-hands” [2, 16, 39] and on team winning streaks [1, 16, 32, 37, 39]. Here, w e bring together these t wo ideas b y considering the notion of momentum, or its rev erse “restoration”, at the team lev el. Although some analysis has previously b een undertaken in this direc- tion [14], we go further to provide the ﬁrst predictive mo dels that answer the question: Who wil l sc or e next? The foundations of our approach lie in the ﬁeld of skill mo deling and team ranking [6, 11], which originated in the mid-20th cen tury . W ork in this area includes the ranking of individuals [9, 11, 17], teams [18, 32, 35], or both [20, 22]. These models hav e b een applied to a wide range of comp etitive ev ents, including baseball [32], c hess [9, 11, 17], American fo otball [18, 35], asso ciation fo otball (soccer) [23], and tennis [17]. More recen tly , they ha ve been adapted to matchmaking problems in online games [20] and to calibrating reviewer scores in computer science conferences [13]. Our w ork is the ﬁrst to use skill ranking models to pre- dict Who wil l win? b y predicting the sequence of scoring ev ents within a game. Skill ranking models hav e previ- ously been applied to predicting game outcomes but only based on the ﬁnal outcome of the game, either in terms of the win/loss result or the ﬁnal p oint diﬀerence. These past approac hes th us cannot up date their prediction as the game unfolds, while our mo dels can. W e train on a history of scoring ev ent sequences so that w e may pre- dict Who wil l win? in an online fashion. Some commer- cial online sp orts betting systems exist that make similar online predictions, but these systems are proprietary and closed, whic h precludes a scientiﬁc ev aluation or compar- ison with our mo dels. They are not considered hereafter. I II. SPOR TS D A T ASETS W e use scoring even t data 1 from four team sp orts: college-lev el American fo otball (CFB, 10 seasons; 2000- 2009), professional American fo otball (NFL, 10 seasons; 2000-2009), ho c key (NHL, 9 seasons; 2000-2003, 2005- 2009) 2 and basketball (NBA, 9 seasons; 2002-2010). Each dataset consists of the set of scoring ev en ts for eac h game pla yed in the season. It includes the time the ev ent was scored, the team and play er that scored, and its p oint v alue. T able I giv es a summary of these data including the n umber of teams, games, and individual scoring even ts. In our analysis and mo deling, we discard the timestamps of the ev ents and instead consider only the order in which ev ents app ear within a game. A. Prepro cessing W e extract from the ra w ev en t data t w o sequences to represen t each game: a p oint se quenc e φ , where φ i is the p oin t v alue of scoring ev en t i in the game, and a te am se quenc e ψ , where ψ i ∈ { r , b } is the iden tit y of the team that won those p oin ts. If there are N t ev ents in game 1 Data pro vided b y ST A TS LLC, cop yrigh t 2015 2 The entire 2004 NHL season was canceled due to an extensive lockout ov er a dispute about play er salary caps [33]. 3 T ABLE I. Summary of our sp orts data for multiple seasons across four team competitive sp orts. n umber of games n umber of scoring ev ents mean even ts sp ort abbrv. seasons teams total prepro cessed total prepro cessed (prepro cessed) F o otball (college) CFB 10, 2000–2009 461 14,588 13,689 190,337 117,752 8.60 F o otball (pro) NFL 10, 2000–2009 32 2,645 2,561 32,800 20,115 7.85 Bask etball (pro) NBA 9, 2002–2010 30 11,744 11,744 1,301,408 1,096,179 93.34 Ho c k ey (pro) NHL 9, 2000–2009 30 11,813 10,259 65,085 59,227 5.77 t , then the corresp onding φ and ψ each contain N t ele- men ts, and the lead size at even t i is L i = i X j =1 φ j δ ( ψ j , r ) − φ j δ ( ψ j , b ) , (1) for team lab els r and b (arbitrarily chosen), where δ ( ., . ) is the Kroneck er delta function and by con ven tion we compute L from team r ’s persp ective. W e b egin by remo ving some games and scoring even ts. W e remov e any ev en ts that o ccur during regulation ov er- time (0.88% of all ev en ts), b ecause these ev en ts follow diﬀeren t scoring processes than ev en ts in regular game time [27]. Additionally , any games in whic h only one team scored are remov ed (6.24% of all games), as the raw data do not indicate the iden tit y of the non-scoring team. Under certain game conditions, m ultiple scoring ev ents, p otentially b y diﬀerent teams, can occur at the same game clo ck time. F or example, in American foot- ball, the clo ck is stopp ed after a touchdo wn is scored but the scoring team gets a c hance to score a conv ersion. If the con version is unsuccessful, o ccasionally the opp osing team gains control and scores points b efore the clo ck is restarted. Similarly , in basketball, the clo ck is stopp ed during free throws after a foul, after which the ball is in b ounded (thro wn in). If the ball is in b ounded close to the other basket, it is p ossible to score b efore a second has elapsed on the clock. In these cases, the ordering of these even ts is am biguous. Remo ving these even ts would alter the running lead size, whic h is one of the game states of interest. Instead, w e merge simultaneous ev en ts into a single scoring pla y that remo ves the ordering ambiguit y while preserving the correct score. If one team scores tw o simultaneous even ts i and i + 1, we merge their v alues, setting φ i = φ i + φ i +1 , and removing ev en t i + 1 from b oth sequences. If t wo teams score sim ultaneously , we merge their v alues with that of the immediately preceding ev ent in a wa y that preserv es the running lead. Sp eciﬁcally , we set φ i − 1 = φ i − 1 ± | φ i − φ i +1 | , where the sign is consistent with the previous assignment of r and b labels to teams, and then remo ve even ts i and i + 1 from b oth sequences. B. Scoring and lead size W e use these p oin t and team sequences to make an initial inv estigation of our hypotheses. If the scoring dy- −50 0 50 0.0 0.2 0.4 0.6 0.8 1.0 p(score | lead) CFB −100 −50 0 50 100 NBA −60 −40 −20 0 20 40 60 lead size 0.0 0.2 0.4 0.6 0.8 1.0 p(score | lead) NFL −10 −5 0 5 10 lead size NHL FIG. 1. Probabilit y that a team scores next as a function of its lead size, for the observed ( yel low ) and sim ulated ( black ) patterns, eac h with a linear least squares ﬁt line. The sim u- lated scoring sequence assumes that the probabilit y of scoring is indep endent of the game’s state. namics are truly independent of the game’s state, these dynamics will be indistinguishable from an independent Bernoulli pro cess, in whic h each Bernoulli trial represents a scoring even t. W e ev aluate this model b y calculating the empirical probability that a team will score the next ev ent as a function of the current lead size L . Recall that w e compute L from the persp ective of team r ; thus, if r is leading, then L is p ositiv e, while if r is trailing, then L is negative (and vice versa for b ). This function is th us rotationally symmetric about a lead of L = 0, where neither team leads, and has the mathematical form of P ( ψ i = r | L i − 1 ) = 1 − P ( ψ i = b | − L i − 1 ). W e compare the empirical scoring function to one cal- culated from synthetic team sequences generated accord- ing to an indep endent Bernoulli pro cess, in which we ﬂip a biased coin to determine whic h team wins each scoring ev ent. The coin’s bias is determined b y the prop ortion of scoring even ts eac h team wins in that particular game, 1 N P N i =1 δ ( ψ i , r ) (or for b ). In this simulation, even ts are th us indep endent of the game state (h yp othesis 1). W e also compute a least-squares regression line for the em- pirical and for the syn thetic data, in which eac h point is giv en weigh t prop ortional to the n umber of times the corresp onding lead size was observ ed. All of the resulting gradients relating scoring probabil- 4 it y to lead size are nonzero (Fig. 1), and eac h Bernoulli pro cess pro duces a p ositive gradient. This pattern simply reﬂects the empirical distribution of biases used to sim- ulate the ensemble of games, with a more positive slope reﬂecting broader v ariance in these biases. The v ariance in the estimated scoring probability increases with lead size simply because progressively few er games produce leads of that magnitude. Comparing the observ ed and sim ulated scoring func- tions (Fig. 1), w e observe a clear contradiction. The gra- dien t and, in particular for NBA, the range of lead sizes generated b y the Bernoulli pro cess disagree strongly with those prop erties observ ed in the empirical data. These re- sults suggest that the probability of scoring do es indeed dep end, somehow, on the game state (h yp othesis 2). In subsequen t sections, we inv estigate this dep endence us- ing sophisticated probabilistic models to determine how the probability of scoring depends on game state. IV. WHERE ST AND ARD TESTS F AIL T o determine whether scoring even ts are indep endent, w e no w apply a suite of statistical randomization tests, whic h compare observed sequences to random sequences with similar prop erties. Sp eciﬁcally , w e employ the • serial test ( non-uniformity ), • W ald-W olfo witz runs test ( anti-r estor ation ), and • autocorrelation test ( p ersistenc e/anti-p ersistenc e ), where for each the null h yp othesis is that the team se- quence ψ is simply a random sequence. The serial test [25] examines bigram frequencies in a sequence and compares them to their expected frequen- cies under a uniformly random sequence. F or a team se- quence with N elemen ts, the observed fractions of bi- grams { rr, r b, br, bb } are compared to their exp ectations of N / 4. This test can iden tify the existence of a bias within each game, i.e., if one team is systematically more lik ely to score than another. The W ald-W olfo witz runs test [38] examines the ob- serv ed num ber of runs in a sequence, i.e., substrings of ψ for which each element is the same (either r or b ), which allo ws us to iden tify either p ositive momen tum or anti- restorativ e eﬀects in within-game scoring. W e reject the n ull hypothesis that ψ is random if the observ ed num b er of runs is signiﬁcantly b elow its expected v alue. Previ- ously , this test has b een used to detect winning streaks in sequences of games [37]. The auto correlation test measures the correlation of a sequence with itself, shifted b y one elemen t, whic h allo ws us to iden tify p erio dic dynamics that o ccur as a result of an ti-p ersistence. Here, we reject the null hypothesis that ψ contains no dep endence b etw een v alues if the auto cor- relation is signiﬁcantly higher or low er than zero. W e apply each of these three tests to eac h of our four data sets, and compare the results against a false positive 0 20 40 60 80 100 120 140 number of scoring events 0.00 0.05 0.10 0.15 0.20 probability CFB NBA NFL NHL 0.0 0.2 0.4 0.6 0.8 1.0 proportion of games '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 0.0 0.1 0.2 CFB '02 '03 '04 '05 '06 '07 '08 '09 '10 season 0.0 0.2 0.4 0.6 0.8 1.0 NBA Serial Test Wald-Wolfowitz Auto-correlation '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 0.0 0.1 0.2 NFL '00 '01 '02 '03 '05 '06 '07 '08 '09 season 0.0 0.1 0.2 NHL FIG. 2. ( top ) Probability distributions for the n um b er of scor- ing even ts in a game, and ( b ottom ) the randomization test results for each sp ort, by season, versus a false p ositiv e rate of α = 0 . 05 (dashed line). The team sequences of eac h game are tested indep enden tly and we plot the proportion of games that reject the n ull hypothesis that the sequences are random. Because CFB, NFL and NHL typically hav e a small num ber of ev ents p er game (upp er panel), the null hypothesis is diﬃcult to reject. rate of α = 0 . 05 (Fig. 2). W e also consider each season separately so as to reveal non-stationarities. Bask etball, unlik e the other sp orts, pro duces a large prop ortion of rejections for the serial and auto correlation tests, which reﬂects the kno wn an ti-persistence pattern in bask etball scoring [14]. On the other hand, for all sp orts except basketball, eac h of these tests rejects the n ull hypothesis at close to or below the c hosen false positive rate, a ﬁnding consis- ten t with eac h of these sequences being random. Ho w- ev er, this interpretation is problematic. T he serial test mak es the v ery strict assumption that eac h sequence is dra wn from a uniform random distribution, i.e., each is generated b y ﬂipping a fair coin sev eral time s. A face- v alue interpretation thus implies that all teams ha v e an equal chance of winning each game—a highly unlikely situation—and it predicts that the scoring function from Section I II B should b e independent of lead size, which con tradicts the observ ed pattern (Fig. 1). In fact, ho wev er, there is no contradiction: the ψ se- quences are simply to o short (Fig. 1) for these tests to reliably distinguish random from non-random sequences when we assume they are generated indep enden tly , i.e., the tests hav e low statistical pow er. The one exception is basketball, whose sequences typically contain 90 or so ev ents, while those for American football or ho c key typ- ically contain less than 10. In the following sections, we show ho w to circumv ent the low statistical p o w er of these tests by exploiting the fact that team sequences are not, in fact, indep endent of each other. Instead, each season’s sequences are gen- erated by rep eatedly selecting pairs from a ﬁnite and 5 ﬁxed p opulation of teams. This pro cess induces substan- tial correlations across games that we can capture by mo deling the latent skills of eac h team within a given season. V. SKILL-BASED SCORING DYNAMICS T ow ard this end, we dev elop a series of models of in- creasing complexity based on speciﬁc underlying mec h- anisms for sp orts scoring dynamics, including indepen- dence, restoration, and anti-persistence. Each of these mo dels represen ts team skill as a laten t v ariable. W e as- sume that team skill is ﬁxed ov er the course of any partic- ular season [18], whic h reﬂects the relatively stable team rosters and coaching staﬀs, and low injuries rates in these sp orts. F urthermore, mo deling each season separately al- lo ws us to run multiple tests for each sport—one for each season—and allows our mo dels to capture real changes in team skill across seasons [18]. Eac h of our models generates a team sequence ψ by extending the popular Bradley-T erry (BT) mo del [6] to generate individual scoring ev ents within a game. T ra- ditionally , the BT mo del is used to estimate unobserved (laten t) team skills from the observed outcomes of man y games among pairs of teams. The probability that team r wins in a match against team b is given b y the skill of r relativ e to b : P ( r wins against b ) = d rb = π r π r + π b , (2) where π r , π b ∈ [0 , 1] is the latent skill for team r . A. Indep enden t model When scoring even ts within a game are indep endent, their generation is equiv alent to a simple Bernoulli pro- cess with a game-speciﬁc bias. This is equiv alent to an “indep enden t mo del” that applies the game-lev el BT mo del of Eq. (2) to each of the individual scoring ev ents within a game, yielding P ( ψ i = r ) = d rb . (3) This represen ts our ﬁrst model, which can capture v ari- abilit y in a team sequence caused b y diﬀerences in team skill parameters, but not other sources of v ariabilit y . B. Restorativ e mo dels Real scoring functions (Fig. 1) pro duce a range of gra- dien ts. Ho wev er, the indep endent model can only pro- duce positive slop es. T o capture a wider v ariet y of scor- ing function shap es, and in particular a negative slop e or “restorativ e” pattern, we extend the independent mo del b y allowing each team’s skill to explicitly cov ary with its lead. Such a relationship could arise for psyc hological reasons, e.g., a winning team “loses steam” or a losing team gains motiv ation [4], or for strategic reasons, e.g., substituting out or in the more skilled pla y ers while in the lead in order to conserve their energy , a void injury , or create momentum [27]. Our restorative mo del augments the indep enden t mo del with an explicit p er-team “restorativ e force” pa- rameter γ r , which mo diﬁes team r ’s strength in resp onse to the current lead size from its p ersp ectiv e ` r and cap- tures the fact that diﬀeren t teams ma y hav e diﬀerent b eha viors in resp onse to ho w far ahead or behind they are. When γ r < 0, team r exhibits a restorativ e pattern, with skill b eing prop ortional to − ` r . When γ r > 0, team r exhibits an anti-restorativ e or momentum pattern, with skill b eing prop ortional to ` r . The probabilit y that team r scores against b is given b y P ( ψ i = r ) = d rb + ` ir c rb , (4) where ` ir is r ’s lead size just before ev en t i and c rb = γ r + γ b = c br . (5) A game as a whole exhibits a restorativ e pattern when- ev er c rb < 0. This occurs either when b oth teams exhibit a restorativ e pattern themselv es ( γ r < 0 and γ b < 0) or when one team’s restorative force is stronger than the other team’s an ti-restorative force ( γ r < 0, γ b > 0, and | γ r | > | γ b | ). The additional term in Eq. (4) relative to the in- dep enden t mo del means this mo del’s scoring function is no longer b ounded on the [0 , 1] in terv al. W e correct this b ehavior by using a sigmoid function of the form σ ( x ) = (1 + e − x ) − 1 to provide a smo oth and con tinuous appro ximation of the missp eciﬁed linear function. T o make this appro ximation, w e c hange v ariables so that a logistic curv e most closely appro ximates the linear equation, whic h o ccurs when w e matc h the gradients at the p oin t of symmetry at P ( ψ i = r ) = 1 / 2. Setting the deriv ative σ 0 equal to c rb , we ﬁnd σ 0 ( m rb ` ir + v rb ) = m rb e m rb ` ir + v rb (e m rb ` ir + e v rb ) 2 = c rb , (6) W e then solv e for when the logistic function equals 1 / 2, yielding σ ( m rb ` ir + v rb ) = 1 1 + e − ( m rb ` ir + v rb ) = 1 / 2 . (7) Finally , in solving Eqs. (6) and (7) we obtain the fol- lo wing transformation of v ariables: v rb = − 4 (1 / 2 − d rb ) (8) m rb = 4 c rb , (9) where m rb and v rb are the v ariables used in the logistic function suc h that c rb and d rb retain their linear in terpre- tation and are th us comparable to the skill v ariables in 6 −40 −20 0 20 40 60 80 lead size 0.0 0.2 0.4 0.6 0.8 1.0 p(score|lead) y = ¡ 0 : 0 2 x + 0 : 6 y = ¾ ( ¡ 0 : 0 8 x + 0 : 4 ) y = ¡ 0 : 0 1 x + 0 : 8 y = ¾ ( ¡ 0 : 0 4 x + 1 : 2 ) FIG. 3. Two examples of linear functions matc hed to logistic functions using the change of v ariables in Eqs. (8) and (9). the indep endent scoring mo del. Figure 3 shows examples of tw o linear functions and their corresponding logistic appro ximations. C. An ti-p ersistence models In many sp orts, we observe an anti-p ersistent pattern in the team sequences, in whic h the probability that r scores next dep ends on which team scored last, i.e., P ( ψ i +1 = r | ψ i ). F or example, for NBA team sequences, the rate of r r and bb bigrams is only 0.35, indicating strong an ti-p ersistence. (The rates for CFB, NFL, and NHL are 0.45, 0.44, and 0.49, respectively .) Such an an ti- p ersistence pattern can o ccur when teams hav e diﬀerent degrees of skill at defensive and oﬀensive play , e.g., when b oth teams hav e oﬀenses that are relatively stronger than the opp osing team’s defense. T o capture these eﬀects, we extend the indep enden t mo del so that eac h team has an oﬀensiv e skill parame- ter π oﬀ and a defensive parameter π def . F or sp orts like American football and basketball, ball p ossession (oﬀen- siv e pla y) t ypically alternates after a scoring ev ent. W e mo del this game rule by applying a team’s defensive skill immediately after it scores and its oﬀensiv e skill after the other team scores. Under this indep endent anti-persistent mo del, the probability of scoring ev ent i is P ( ψ i = r | ψ i − 1 ) = ( π def r  π def r + π oﬀ b  if ψ i − 1 = r π oﬀ r  π oﬀ r + π def b  if ψ i − 1 = b . (10) Finally , we obtain a fourth mo del by combining the restorativ e mo del with the an ti-p ersisten t mo del. VI. MODELING SCORING DYNAMICS W e ﬁt the (i) indep endent, (ii) restorative, (iii) in- dep enden t anti-persistent, and (iv) restorative an ti- p ersisten t mo dels to the team sequences within a giv en season of each sp ort, using Mark o v c hain Mon te Carlo to estimate each mo del’s parameters. F or each, we assess mo del go o dness-of-ﬁt b y calculating the held out like- liho od for each mo del under a 10-fold cross v alidation. F urthermore, we follow this pro cedure for each season of eac h sport separately , the results of whic h are giv en in T ables I I – V. By treating seasons indep endently , we ob- tain multiple mo del assessments within eac h sp ort while con trolling for within season v ariability . F or each season, w e highlight the t wo highest scores in blue and the high- est score in b old. In basketball (NBA), we ﬁnd that the restorative anti- p ersisten t mo del consistently provides the b est ﬁt across all seasons (T able I I), with the second b est mo del b e- ing the indep enden t an ti-p ersisten t mo del. These results indicate a strong role for b oth restoration and anti- p ersistence in driving basketball scoring dynamics. Pre- vious analysis of bask etball scoring using random w alk theory came to similar conclusions [14]. American fo otball (NFL and CFB) shows a diﬀerent result, with b oth t yp es of indep endent mo del being heav- ily fa vored ov er b oth t yp es of restorativ e mo del (T ables I II and IV). The p oor ﬁt here of the restorative models indicates that the comp etitiv e pro cesses that produce a restorativ e force in bask etball are largely absen t in Amer- ican fo otball. This diﬀerence may b e related to the muc h greater scoring rate in basketball relativ e to American fo otball (Fig. 2): an increased scoring rate low ers the marginal v alue of each scoring even t relative to the game outcome (who wins), and low v alue in teractions in other systems are asso ciated with restorative forces [10, 19]. F urthermore, the anti-persistent mo del for NFL is fa- v ored in 8 of 10 seasons ov er the indep endent mo del, while in CFB, it is fa vored in only 2 of 10 seasons. That is, anti-persistence appears to play a stronger role in NFL games than in CFB games. In fact, CFB is the only sp ort to strongly fa vor the indep endent mo del, a result that agrees with the our previous sim ulation results (Fig. 1), whic h show ed that the trivial indep enden t mo del pro- duced the smallest disagreemen t for CFB betw een real and simulated scoring function gradien ts among the four sp orts. The results for ho c key (NHL) are less clear cut (T a- ble V). In 8 out of 9 seasons, the indep endent anti- p ersisten t mo del is either the b est or second best mo del, and the indep endent mo del is best or second b est in 7 out of 9. On the other hand, the simple restorative mo del wins for 2 seasons, and is second b est for one. (The restorative an ti-persistent mo del is a p o or ﬁt for all ho ck ey seasons.) W e note, ho w ever, that the log- lik eliho o ds among these three models are all very close, indicating that each p erforms ab out as well as the oth- ers for these data. Given that NHL is also the one sp ort among the four that is not anti-persistent by design (p os- session is determined by a “faceoﬀ ” after eac h goal) and that its scoring function has a negativ e gradient, we ten- tativ ely conclude that the restorativ e mo del is b etter. Across seasons, the b est ov erall models appear to b e CFB: indep enden t; NFL: indep enden t an ti-p ersisten t; NBA: restorativ e anti-persistent; and NHL: restorativ e. 7 T ABLE I I. Log-likelihoo ds on held-out data for NBA games. 2002 2003 2004 2005 2006 2007 2008 2009 2010 Indep enden t -80849 -78814 -84698 -84744 -84795 -86070 -85727 -86314 -85114 Restorativ e -80573 -78506 -84361 -84404 -84469 -85777 -85444 -86005 -84704 Indep enden t anti-persistent -75655 -73823 -79151 -78841 -79088 -80174 -79841 -80513 -79386 Restorativ e an ti-p ersisten t -75627 -73777 -79097 -78796 -79040 -80141 -79812 -80465 -79297 T ABLE I II. Log-likelihoo ds on held-out data for NFL games. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Indep enden t -1286 -1307 -1408 -1372 -1403 -1373 -1369 -1433 -1484 -1395 Restorativ e -1324 -1347 -1450 -1402 -1451 -1422 -1424 -1466 -1530 -1432 Indep enden t anti-persistent -1278 -1290 -1401 -1361 -1392 -1378 -1372 -1425 -1473 -1387 Restorativ e an ti-p ersisten t -1322 -1337 -1450 -1496 -1448 -1427 -1434 -1470 -1520 -1426 T ABLE IV. Log-likelihoo ds on held-out data for CFB games. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Indep enden t -7487 -7575 -8098 -8105 -7675 -7708 -7265 -8673 -8435 -8097 Restorativ e -8114 -8182 -8689 -8656 -8268 -8176 -7884 -9334 -9065 -8777 Indep enden t anti-persistent -7486 -7643 -8142 -8201 -7741 -7759 -7328 -8678 -8458 -8078 Restorativ e an ti-p ersisten t -8011 -8113 -8625 -8586 -8198 -8110 -7781 -9214 -8880 -8630 T ABLE V. Log-likelihoo ds on held-out data for NHL games. 2000 2001 2002 2003 2005 2006 2007 2008 2009 Indep enden t -4432 -4238 -4300 -4078 -5026 -4712 -4504 -4755 -4655 Restorativ e -4432 -4238 -4313 -4056 -5031 -4695 -4511 -4761 -4663 Indep enden t anti-persistent -4420 -4237 -4287 -4068 -5020 -4706 -4497 -4761 -4668 Restorativ e an ti-p ersisten t -4449 -4254 -4318 -4090 -5045 -4721 -4521 -4787 -4687 −50 0 50 0.0 0.2 0.4 0.6 0.8 1.0 p(score | lead) CFB −100 −50 0 50 100 NBA −60 −40 −20 0 20 40 60 lead size 0.0 0.2 0.4 0.6 0.8 1.0 p(score | lead) NFL −10 −5 0 5 10 lead size NHL FIG. 4. Probabilit y that a team scores next as a function of its lead size, for the observed ( yel low ) and sim ulated ( black ) patterns, eac h with a linear least squares ﬁt line. Eac h sim ula- tion uses the best o verall skill model for that sport to generate syn thetic point and team sequences. W e chec k these mo dels b y p erforming a semi-parametric b ootstrap, generating synthetic φ and ψ sequences of the same n um b er and lengths as observed empirically in each season, and comparing the simulated and empirical scor- ing functions. That is, we repeat the assessment of Fig- ure 1, but no w using mo dels that can capture dep en- dence across sequences. The results show that our skill- based mo dels are a dramatic improv ement ov er simulat- ing each game independently (Fig. 4), agreeing closely with the empirical scoring patterns in b oth the gradient and range of lead sizes. VI I. PREDICTING OUTCOMES W e now apply our mo dels to tw o online prediction tasks in each of the sp orts: Who wil l sc or e next? and Who wil l win? F or b oth tasks, we let our mo dels observe the p oint and team sequences of the ﬁrst T games in a particular season. W e then use these mo dels to predict for each unobserved game in that season (i) the team se- quence v alues ψ i for 1 ≤ i ≤ N , and (ii) the identit y of the winning team, when eac h model is allow ed to observ e the game states ( ψ j , φ j ) for 1 ≤ j < i . In the second task, all mo dels predict p oint v alues φ i as the mean v alue h φ i a veraged o ver all even ts in the season. W e compare our predictions to those of three baseline mo dels. The ﬁrst baseline is a na ¨ ıv e le ading mo del, which as- sumes that the team curren tly in the lead is the stronger 8 0.50 0.52 0.54 0.56 0.58 0.60 0.62 AUC CFB 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 NBA independent restorative independent anti-pers restorative anti-pers Bradley-Terry first order Markov leading 0.1 0.2 0.3 0.4 0.5 0.6 0.7 proportion of season observed 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 AUC NFL 0.1 0.2 0.3 0.4 0.5 0.6 0.7 proportion of season observed 0.52 0.53 0.54 0.55 NHL FIG. 5. Probability of accurately predicting which team will score next (A UC), when mo dels observ e diﬀerent fractions of a season. Based on 95% conﬁdence interv als, our b est mo del p erforms signiﬁcantly b etter than the baseline mo dels for CFB and NBA, and after observing at least half of the season for NFL and NHL. team and thus more lik ely to b oth score next and win the game. Sp eciﬁcally , it predicts that team holding the lead at ev en t i will win the next ev ent, i.e., it predicts ψ i +1 = r if L > 0 and ψ i +1 = b if L < 0, and will also win the game. If L = 0, the model ﬂips a fair coin for r and b . The second baseline is the standard Br ad ley-T erry mo del in whic h w e infer laten t team skills π from the win- loss records among teams in the observed games. This mo del is simpler than our indep endent mo del, which in- fers team skills using team sequences { ψ } of the observ ed games. The third baseline is a simple ﬁrst or der Markov mo del. It predicts that the next team to score will either b e the same or diﬀeren t than the team that scored last ac- cording to the empirical bigram frequencies { r r , bb, r b, br } observ ed in the ﬁrst T games of the season. F ormally , it predicts that a team will score again giv en it scored last time as P ( ψ i +1 = ψ i ) = T X t =1 N t − 1 X i =1 δ ( ψ i +1 , ψ i ) !, T X t =1 N t − 1 ! . (11) F or both prediction tasks, w e assess prediction accu- racy via AUC statistic, which gives the probability that a randomly selected true p ositiv e is ranked ab o v e a ran- domly selected false p ositiv e. The AUC is a statistically principled measure for binary classiﬁcation tasks like ours where the c ost of an error is the same in either direction (since team lab els, r or b , are arbitrary). A. Who will score next? In the ﬁrst task, we aim to predict whic h team will score ev en t i , for each 1 ≤ i ≤ N , giv en the sequence of preceding game states ( φ j , ψ j ) for 1 ≤ j < i . F or this online prediction task, w e learn eac h mo del’s parameters from the ﬁrst T games in a season and then make pre- dictions across all unobserved games within a season and calculate the A UC for all predictions across all seasons to obtain a single score. Eac h mo del observ es at least 10% of a season, which ensures that every team has pla yed at least a few times. The results sho w that the ov erall best models identiﬁed in the previous section also tend to b e the b est predictors at who will score next (Fig. 5), although some alternativ e mo dels also p erform well. F or instance, the b est mo del for NFL games early in the season is the ﬁrst order Mark o v mo del; ho wev er, the best NFL model beats this baseline after about 30% of a season is observ ed. Similarly , the ﬁrst order Marko v mo del p erforms almost as w ell as the b est skill model in predicting who will score next in NBA games, b y capturing the kno wn anti-persistence pattern in that sp ort. One of the worst models across all four sp orts is the leading baseline, whic h often p erforms only sligh tly b etter than chance. B. Who will win? Predicting who will win a game requires extrap olating the p oint and team sequences to determine the game’s ﬁnal outcome. W e simplify this task slightly b y assum- ing that the n umber of scoring ev ents N in each game is kno wn. W e then allow the mo dels to learn their parame- ters from the ﬁrst 30% of eac h season (other c hoices lead to qualitativ ely similar results as those rep orted here). F or each game in the remainder of a season, the models predict the identit y of the winning team when they are al- lo wed to observe a progressiv ely greater fraction of game states ( φ i , ψ i ) for 0 . 1 ≤ i/ N ≤ 0 . 9—as if eac h mo del w ere w atching the game unfold in real time. The results sho w that the ov erall best mo del for eac h sp ort b oth consistently outp erforms the baselines and also correctly predicts the winner with at least 80% ac- curacy at a game’s halftime (Fig. 6). The relatively p o orer p erformance of the “leading” baseline mo del illustrates that this prediction task is non-trivial—who is leading at a given moment is not as predictive of who wins as knowing something about team skills and scoring dynamics. On the other hand, the Bradley-T erry baseline p erforms comparably w ell v ery early in the game, but is quic kly b eaten because it cannot learn from the real-time ev olution of a game . F or this task, most of our skill-based mo dels make v ery similar predictions and the ﬁrst order Marko v mo del also p erforms well. Although the distributions of ﬁnal lead sizes ma y b e diﬀeren t, the means are v ery close, and the individual predictions across mo dels correlate strongly . 9 FIG. 6. A UC scores for predicting which team will win given the current state of the game. The greatest diﬀerence o ccurs at the start of the game. In particular, the ﬁrst order Marko v model p erforms m uc h w orse than the skill-based mo dels at the beginning be- cause it has no information about the heterogeneit y of team scoring abilities. As the game progresses the predic- tions tend to conv erge. This o ccurs b ecause these mo dels all mak e predictions based on random walks on a binary sequence { r , b } , the diﬀerence being in how they mo del the transition probabilities. Later in the game we extrap- olate less and so the diﬀerences b etw een mo dels b ecome less pronounced. VI II. TEAM SKILL EV OL VES OVER TIME A useful feature of our probabilistic mo dels is the in- terpretabilit y of their parameters, whic h are meaningful measures of team skill here. By learning these parame- ters indep enden tly for eac h season in eac h sp ort, we can in vestigate how team skills hav e evolv ed o v er time. Using the b est ov erall mo del for eac h sp ort, w e learn its parameters using all data in eac h particular season and calculate the Sp earman rank correlation across team skills for eac h pair of seasons (Fig. 7). W e ﬁnd that the relative ordering of teams by their inferred skills ex- hibits strong serial correlation ov er time, whic h app ears as a strong diagonal comp onent in the pairwise correla- tion matrices. The lo w or in v erse correlation in the far oﬀ-diagonal elements, as well as the blo c k-lik e patterns observ ed in CFB and NFL, implies an underlying non- stationarit y in team skills for eac h of the leagues ov er the roughly 10-year span of data. The manner in whic h team rosters change ov er time is a lik ely source of such long-term dynamics in relative team skill. At short time scales, team rosters are fairly stable, with only a few play ers c hanging from season to season. Ho wev er, ov er longer time scales, these changes accum ulate, and rosters separated in time by more than a '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 CFB 0.32 0.40 0.48 0.56 0.64 0.72 0.80 0.88 0.96 '02 '03 '04 '05 '06 '07 '08 '09 '10 '02 '03 '04 '05 '06 '07 '08 '09 '10 NBA −0.30 −0.15 0.00 0.15 0.30 0.45 0.60 0.75 0.90 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 NFL −0.30 −0.15 0.00 0.15 0.30 0.45 0.60 0.75 0.90 '00 '01 '02 '03 '05 '06 '07 '08 '09 '00 '01 '02 '03 '05 '06 '07 '08 '09 NHL −0.30 −0.15 0.00 0.15 0.30 0.45 0.60 0.75 0.90 FIG. 7. Correlation of inferred skills ov er years for eac h sp ort. W e see that the highest correlations o ccur along the blo ck di- agonal indicating that adjacent years are more similar. Note that the scale is diﬀerent for CFB due to a muc h higher cor- relation across all years. few y ears are likely to be v ery diﬀeren t, with concomitant diﬀerences in team skill. The exception to this pattern is CFB, which sho ws a larger long-term correlation, i.e., a slow er rate of change in relative team skills, than in professional sp orts. W e sp eculate that this diﬀerence is caused b y the diﬀerence in pla y er mobilit y betw een college and professional-lev el sp orts: professional teams op erate in a national pla yer mark et, and pla y ers can mo ve relatively freely among teams, while colleges op erate as rough regional monop o- lies ov er the sources of their play ers. The inferred season-by-season skill orderings them- selv es are also of interest, as they reveal the particular tra jectories of individual teams ov er time. W e show vi- sualizations of these tra jectories for NBA and NFL in Figures 8 and 9. W e omit CFB b ecause there are to o man y teams (461) to meaningfully visualize and NHL for space reasons. F or eac h plot we highlight the t wo teams that w on the league c hampionship (NFL Super Bowl and NBA Finals) more than once during the p eriod cov ered b y the dataset. It is notable that these teams are not necessarily the most skilled teams under our model. This is unsurprising, as tournamen ts by brack et are the highest v ariance method of iden tifying the most skilled team [3]. Interestingly , in b oth NFL and NBA games, the highlighted teams tend to hav e strong oﬀensiv e skills, while their defensive skills are more v ariable. This pattern suggests that oﬀensiv e skills are more important for winning games, whic h seems reasonable given that a strong defense alone cannot win a game. Lo oking at individual teams, w e can see ho w their skills c hange with respect to the total ordering. F or in- stance, the Clev eland Cav aliers drafted LeBron James in 2003 and wen t from b eing ranked the third worst (of- 10 2002 2003 2004 2005 2006 2007 2008 2009 Increasing Skill Increasing Skill Defensive Offensive 0.4 - 0.6 0.6 - 0.8 0.8 - 1 0.6 - 0.8 0.8 - 1 0.4 - 0.6 0.4 - 0.6 0.8 - 1 0.6 - 0.8 0.4 - 0.6 0.6 - 0.8 0.8 - 1 0.4 - 0.6 0.6 - 0.8 0.8 - 1 0.4 - 0.6 0.8 - 1 0.6 - 0.8 0.4 - 0.6 0.6 - 0.8 0.8 - 1 0.4 - 0.6 0.6 - 0.8 0.8 - 1 0.4 - 0.6 0.6 - 0.8 0.8 - 1 Denver Nu ggets Miami Heat Cleveland C avaliers Chica go Bulls T or onto R aptors A tlanta Hawks Char lotte Bobcats L os Angeles Clippers Boston Celtics Memphis G rizzlies New Y ork Kn icks Detr oit P istons Seattle Super Sonics Orlando Ma gic Milwauk ee Buck s W ashingt on W izar ds New Orlean s Hor nets L os Angeles Lak ers Indiana P acer s Golden State W ar riors Philadelphia 76 ers Phoenix Su ns Houston R ock ets Sacra mento Kin gs Minnesota T imberwolves P ortland T rail Blazers New Jersey Nets Dallas Maveric ks Utah Jazz San Ant onio Spurs Cleveland C avaliers New Jersey Nets W ashingt on W izar ds Sacra mento Kin gs Golden State W ar riors T or onto R aptors Detr oit P istons Minnesota T imberwolves Char lotte Bobcats L os Angeles Clippers A tlanta Hawks Milwauk ee Buck s Indiana P acer s Utah Jazz New Y ork Kn icks Philadelphia 76 ers Phoenix Su ns Houston R ock ets New Orlean s Hor nets P ortland T rail Blazers Dallas Maveric ks Orlando Ma gic Chica go Bulls Seattle Super Sonics San Ant onio Spurs Memphis G rizzlies Boston Celtics Miami Heat Denver Nu ggets L os Angeles Lak ers 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 0 - 0. 2 0.2 - 0.4 Cleveland C avaliers T or onto R aptors New Y ork Kn icks Milwauk ee Buck s Char lotte Bobcats L os Angeles Clippers Denver Nu ggets Miami Heat A tlanta Hawks Chica go Bulls Orlando Ma gic Boston Celtics Memphis G rizzlies Golden State W ar riors Houston R ock ets New Orlean s Hor nets Phoenix Su ns Detr oit P istons L os Angeles Lak ers W ashingt on W izar ds Indiana P acer s Seattle Super Sonics Philadelphia 76 ers Minnesota T imberwolves Utah Jazz Dallas Maveric ks P ortland T rail Blazers Sacra mento Kin gs New Jersey Nets San Ant onio Spurs Cleveland C avaliers Minnesota T imberwolves W ashingt on W izar ds New Jersey Nets T or onto R aptors New Y ork Kn icks Golden State W ar riors Char lotte Bobcats Sacra mento Kin gs Indiana P acer s Detr oit P istons Utah Jazz Phoenix Su ns Milwauk ee Buck s L os Angeles Clippers Houston R ock ets A tlanta Hawks Orlando Ma gic Denver Nu ggets Philadelphia 76 ers Seattle Super Sonics P ortland T rail Blazers Dallas Maveric ks New Orlean s Hor nets Memphis G rizzlies San Ant onio Spurs L os Angeles Lak ers Boston Celtics Chica go Bulls Miami Heat 2010 FIG. 8. NBA defensive ( top ) and oﬀensive ( b ottom ) skill rankings. T eams that won more than one NBA ﬁnals game in the data are highlighted, i.e., Lakers ( or ange ) 2002, 2009 and 2010, Spurs ( black ) 2003, 2005 and 2007. 0 - 0. 25 0.25 - 0.5 0.75 - 1 0.5 - 0.75 0.25 - 0.5 0.75 - 1 0 - 0. 25 0.5 - 0.75 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0 - 0. 25 0.5 - 0.75 0.75 - 1 0.25 - 0.5 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.5 - 0.75 0.75 - 1 0.25 - 0.5 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 San F ranc isco 49er s A tlanta F alcons Seattle Seah awks Cincin nati Benga ls San Diego Ch ar gers Bu ﬀ alo Bills St. L ouis R ams New En gland P atriots Cleveland Br owns Arizona C ar dinals Minnesota V ikings W ashingt on R edskins New Orlean s Saints Gr een Bay P ack ers Jacks onville Jaguars Denver Br oncos Dallas Cowboys T ennessee T itans P ittsbur gh Steeler s Detr oit Lions T ampa Bay Bucca neers Chica go Bears Philadelphia E agles K ansas City C hiefs Car olina P anth ers New Y ork Gian ts Houston T e xans New Y ork Jets Indianapolis C olts Oakland R aiders Baltimor e R avens Miami Dolphins St. L ouis R ams Detr oit Lions Oakland R aiders Jacks onville Jaguars T ampa Bay Bucca neers Car olina P anth ers K ansas City C hiefs Miami Dolphins P ittsbur gh Steeler s San F ranc isco 49er s Cleveland Br owns Seattle Seah awks Bu ﬀ alo Bills A tlanta F alcons Chica go Bears Baltimor e R avens W ashingt on R edskins New Y ork Gian ts Indianapolis C olts San Diego Ch ar gers Arizona C ar dinals New Orlean s Saints Denver Br oncos Houston T e xans Dallas Cowboys Cincin nati Benga ls New En gland P atriots Philadelphia E agles New Y ork Jets Gr een Bay P ack ers Minnesota V ikings T ennessee T itans 0 - 0. 25 0.25 - 0.5 0.75 - 1 0.5 - 0.75 0.25 - 0.5 0.75 - 1 0 - 0. 25 0.5 - 0.75 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0 - 0. 25 0.5 - 0.75 0.75 - 1 0.25 - 0.5 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.5 - 0.75 0.75 - 1 0.25 - 0.5 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 0 - 0. 25 0.25 - 0.5 0.5 - 0.75 0.75 - 1 Arizona C ar dinals Cleveland Br owns San Diego Ch ar gers A tlanta F alcons Cincin nati Benga ls Chica go Bears Dallas Cowboys New Y ork Jets Oakland R aiders San F ranc isco 49er s Miami Dolphins W ashingt on R edskins Philadelphia E agles P ittsbur gh Steeler s K ansas City C hiefs Indianapolis C olts Detr oit Lions Jacks onville Jaguars New Y ork Gian ts New En gland P atriots Car olina P anth ers Houston T e xans New Orlean s Saints Bu ﬀ alo Bills Seattle Seah awks Baltimor e R avens Gr een Bay P ack ers Denver Br oncos St. L ouis R ams T ampa Bay Bucca neers Minnesota V ikings T ennessee T itans T ampa Bay Bucca neers Cleveland Br owns Seattle Seah awks Detr oit Lions K ansas City C hiefs W ashingt on R edskins St. L ouis R ams Chica go Bears Denver Br oncos Oakland R aiders Arizona C ar dinals T ennessee T itans Cincin nati Benga ls New Y ork Gian ts San F ranc isco 49er s Bu ﬀ alo Bills Houston T e xans Dallas Cowboys New Y ork Jets Jacks onville Jaguars Miami Dolphins A tlanta F alcons Car olina P anth ers Gr een Bay P ack ers New En gland P atriots Indianapolis C olts Philadelphia E agles San Diego Ch ar gers Baltimor e R avens P ittsbur gh Steeler s New Orlean s Saints Minnesota V ikings 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Increasing Skill Increasing Skill Defensive Offensive FIG. 9. NFL defensiv e ( top ) and oﬀensiv e ( b ottom ) skill rankings. T eams that w on more than one NFL sup er bowl games in the data are highlighted, i.e., Patriots ( black ) 2002, 2004 and 2005, Steelers ( or ange ) 2006 and 2009. 11 fensiv e) team to a mid-range one. When James left for Miami Heat at the start of the 2010–11 season, w e see the Cav aliers’ oﬀensiv e skill drop to the bottom ranked team, while Miami Heat’s oﬀensive and defensive skills increased to b e ranked third and ﬁrst resp ectively . W e also see that the Los Angeles Lakers’ skills (b oth oﬀen- siv e and defensive) drop for the 2004–05 season. After facing a diﬃcult 2003-04 season [24], they disbanded the team, lost their coach and faced a n umber of injuries, resulting in a p o orer p erformance in 2004–05. Finally , the range of v alues o ccupied b y oﬀensiv e and defensiv e skills is diﬀeren t b etw een NFL and NBA teams: in the latter, these t wo skills o ccup y non-ov erlapping ranges (large and small resp ectiv ely), while in the for- mer, they fall in similar ranges. That is, NBA teams are less likely to score when playing defensiv ely than when they hav e p ossession of the ball, which serves to create a stronger an ti-p ersistence scoring pattern (0.36) than for NFL (0.44), where skills are more evenly matched. IX. CONCLUSION In this work we considered the online prediction tasks of Who wil l sc or e next? and Who wil l win? based on the sequence of scoring even ts in the game so far. Our proba- bilistic models based on laten t team skills p erform w ell at b oth predictive tasks and can predict with a high degree of certaint y ( > 80%) who will win a game in each of the four leagues we studied, after only half of the game has elapsed. F urthermore, by using gamepla y , i.e., the partic- ular sequence of even ts within each game, to mo del team skill rather than just game outcomes, w e can infer diﬀer- en t types of latent team skills e.g., oﬀensiv e vs. defensive skills. Our statistical models pro vide a quan titativ e and prin- cipled means of capturing and testing hypotheses ab out the v ariabilit y induced b y c hance, the biases produced b y real diﬀerences in team skill, and the structural impact of game-sp eciﬁc rules. In applying these models to com- prehensiv e data from four diﬀerent sports, we found that eac h of the leagues is best ﬁt b y a diﬀeren t model. This indicates that skill, luck, strategy , and the rules of the game serve diﬀerent roles in the scoring dynamics across sp orts. The exception w as professional ho c k ey (NHL), where the v ery low scoring rate resulted in no clear pre- dictiv e winner among our models. Our models and results open up man y new directions for future work. F or instance, we could incorp orate other data such as pla y er or ball positioning [5, 40], or the timing of even ts [27] to improv e our mo dels and allo w us to apply them to low scoring sp orts suc h as soccer. These mo dels could also b e used to mak e other predictions, e.g., the num b er of scoring ev en ts in a game, the ﬁnal score, and when a lead is safe [8], and to pro duce more rigorous team rankings (Figs. 8 and 9). In addition to data on gamepla y , data on individual pla yer attributes and p erformance in comp etitive settings are also often a v ailable, e.g., height, strength, sp eed, ac- curacy when scoring, defensive skill, passing skill, etc. Ho wev er, there are no go o d models that connect these c haracteristics to team skills and to gamepla y as a means of predicting game outcomes. The mo dels w e formulated here solv e part of this problem by connecting team skill to gamepla y . An interesting direction for future work would b e to predict outcomes from play er statistics via team skills as an intermediary . Such a mo del would allow teams to mak e more data-driven choices ab out how they build team rosters and train their pla y ers. This extension of our w ork would also op en up new av enues in designing realistic sim ulations of comp etitiv e pla y , e.g., for b etter AI in video games. X. A CKNOWLEDGEMENTS W e thank Rub en Co en Cagli, Ramsey F aragher, The- ofanis Karaletsos, Marina Kogan, Da vid Edward Lloyd- Jones and Sam W ay for helpful con v ersations, and ac- kno wledge supp ort from Grant #F A9550-12-1-0432 from the U.S. Air F orce Oﬃce of Scientiﬁc Researc h (AF OSR) and the Defense Adv anced Researc h Pro jec ts Agency (D ARP A). [1] J. Ark es and J. Martinez. Finally , evidence for a momen- tum eﬀect in the NBA. J. of Quantitative A nalysis in Sp orts , 7(3):article 13, 2011. [2] M. Bar-Eli, S. Avugos, and M. Raab. Twen ty y ears of “hot hand” research: Review and critique. Psycholo gy of Sp ort and Exer cise , 7(6):525–553, 2006. [3] E. Ben-Naim and N. W. Hengartner. How to choose a c hampion. Phys. Rev. E , 76:026106, 2007. [4] J. Berger and D. P ope. Can losing lead to winning? Man- agement Sci. , 57(5):817–827, 2011. [5] J. Bourb ousson, C. S` ev e, and T. McGarry . Space-time co ordination dynamics in basketball: Part 2. The in terac- tion b et ween the tw o teams. J. of Sp orts Sci. , 28(3):349– 358, 2012. [6] R. A. Bradley and M. E. T erry . Rank analysis of incom- plete blo c k designs: I. the metho d of paired comparisons. Biometrika , 39(3-4):324–345, 1952. [7] S. E. Buttrey , A. R. W ashburn, and W. L. Price. Esti- mating NHL scoring rates. J. of Quantitative Analysis in Sp orts , 7(3):article 24, 2011. [8] A. Clauset, M. Kogan, and S. Redner. Safe leads and lead c hanges in comp etitiv e team sp orts. Pr eprint, arXiv:1503.03509 , 2015. [9] P . Dangauthier, R. Herbrich, T. Mink a, and T. Graep el. T rueSkill through time: Revisiting the history of chess. In Neur al Information Pr o c essing Systems 20 , pages 337– 12 344, 2007. [10] Y. Durham, J. Hirshleifer, and V. L. Smith. Do the ric h get richer and the p o or p o orer? Exp erimen tal tests of a mo del of p ow er. Americ an Ec onomic R ev. , 88(4):970– 983, 1998. [11] A. E. Elo. The rating of chessplayers, p ast and pr esent , v olume 3. Batsford, 1978. [12] P . Everson and P . S. Goldsmith-Pinkham. Composite P oisson mo dels for goal scoring. J. of Quantitative Anal- ysis in Sp orts , 4(2):article 13, 2008. [13] P . A. Flach, S. Spiegler, B. Gol´ enia, S. Price, J. Guiver, R. Herbrich, T. Graepel, and M. J. Zaki. No vel to ols to streamline the conference review pro cess: exp eriences from SIGKDD’09. ACM SIGKDD Explor ations Newslet- ter , 11(2):63–67, 2010. [14] A. Gabel and S. Redner. Random w alk picture of bas- k etball scoring. J. of Quantitative Analysis in Sp orts , 8(1):man uscript 1416, 2012. [15] T. Galla and J. D. F armer. Complex dynamics in learning complicated games. Pr o c. Natl. A c ad. Sci. , 110(4):1232– 1236, 2013. [16] T. Gilo vic h, R. V allone, and A. Tv ersky . The hot hand in bask etball: On the misperception of random sequences. Co gnitive Psycholo gy , 17(3):295–314, 1985. [17] M. E. Glickman. P arameter estimation in large dynamic paired comparison exp erimen ts. J. of the R oyal Statisti- c al So ciety: Series C (Applie d Statistics) , 48(3):377–394, 1999. [18] M. E. Glic kman and H. S. Stern. A state-space mo del for National F o otball League scores. J. of the Americ an Statistic al Asso ciation , 93(441):25–35, 1998. [19] K. Hartley and T. Sandler. Handb o ok of defense e c o- nomics , volume 2. Elsevier, 2007. [20] R. Herbrich, T. Mink a, and T. Graepel. T rueskill(TM): A Ba yesian skill rating system. In Neur al Information Pr oc essing Systems 20 , pages 569–576, 2007. [21] A. Heuer, C. M¨ uller, and O. Rubner. Soccer: Is scoring goals a predictable P oissonian pro cess? Eur. Phys. L ett. , 89(3):38007, 2010. [22] T.-K. Huang, R. C. W eng, and C.-J. Lin. Generalized Bradley-T erry mo dels and multi-class probability esti- mates. J. Machine L e arning R ese ar ch , 7:85–115, 2006. [23] L. M. Hv attum and H. Arn tzen. Using ELO ratings for matc h result prediction in asso ciation fo otball. Int. J. of F or ec asting , 26(3):460–470, 2010. [24] P . Jackson and M. Arkush. The last se ason: a te am in se arch of its soul . P enguin Press, 2004. [25] D. E. Knuth. The art of c omputer pr o gr amming 2: seminumeric al algorithms . Addision W esley , 1998. [26] S. Merritt and A. Clauset. Environmen tal structure and comp etitiv e scoring adv an tages in team comp etitions. Sci. R ep orts , 3:3067, 2013. [27] S. Merritt and A. Clauset. Scoring dynamics across pro- fessional team sp orts: tempo, balance and predictability . Eur. Phys. J. Data Sci. , 3(1):article 4, 2014. [28] M. Rabin and D. V ay anos. The gambler’s and hot-hand fallacies: Theory and applications. The R ev. of Ec onomic Studies , 77(2):730–778, 2010. [29] D. Reed and M. Hughes. An exploration of team sp ort as a dynamical system. Int. J. of Performance A nalysis in Sp ort , 6(2):114–125, 2006. [30] H. V. Rib eiro, S. Mukherjee, and X. H. T. Zeng. Anomalous diﬀusion and long-range correlations in the score ev olution of the game of cric k et. Phys. R ev. E , 86(2):022102, 2012. [31] S. Saa vedra, S. Mukherjee, and J. P . Bagro w. Is coac hing exp erience asso ciated with eﬀectiv e use of timeouts in bask etball? Sci. Rep orts , 2:676, 2012. [32] C. Sire and S. Redner. Understanding baseball team standings and streaks. Eur. Phys. J. B , 67(3):473–481, 2009. [33] P . D. Staudohar. The hock ey lo ck out of 2004-05. Monthly L ab. R ev. , 128:23–29, 2005. [34] T. St¨ ockl, J. Hub er, M. Kirchler, and F. Lindner. Hot hand b elief and gambler’s fallacy in teams: Evidence from in vestmen t exp erimen ts. T echnical rep ort, Universit y of Innsbruc k, 2013. [35] D. T arlow, T. Graepel, and T. Mink a. Kno wing what w e don’t know in NCAA fo otball ratings: Understand- ing and using structured uncertaint y . MIT Sloan Sports Analytics Conf., 2014. [36] A. Thomas. Inter-arriv al times of goals in ice ho c key . J. of Quantitative Analysis in Sp orts , 3(3):article 5, 2007. [37] R. C. V ergin. Winning streaks in sports and the misp er- ception of momentum. J. of Sp ort Behavior , 23(2):181– 197, 2000. [38] A. W ald and J. W olfo witz. On a test whether t wo samples are from the same population. The A nnals of Mathemat- ic al Statistics , 11(2):147–162, 1940. [39] G. Y aari and S. Eisenmann. The hot (in visible?) hand: can time sequence patterns of success/failure in sp orts be mo deled as rep eated random indep enden t trials? PLOS ONE , 6(10):e24532, 2011. [40] Y. Y ue, P . Lucey , P . Carr, A. Bialko wski, and I. Matthews. Learning ﬁne-grained spatial mo dels for dynamic sp orts pla y prediction. In Int. Conf. on Data Mining , pages 670–679, 2014.

Predicting sports scoring dynamics with restoration and anti-persistence

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment