Time-to-Injury Forecasting in Elite Female Football: A DeepHit Survival Approach
Injury occurrence in football poses significant challenges for athletes and teams, carrying personal, competitive, and financial consequences. While machine learning has been applied to injury prediction before, existing approaches often rely on static pre-season data and binary outcomes, limiting their real-world utility. This study investigates the feasibility of using a DeepHit neural network to forecast time-to-injury from longitudinal athlete monitoring data, while providing interpretable predictions. The analysis utilised the publicly available SoccerMon dataset, containing two seasons of training, match, and wellness records from elite female footballers. Data was pre-processed through cleaning, feature engineering, and the application of three imputation strategies. Baseline models (Random Forest, XGBoost, Logistic Regression) were optimised via grid search for benchmarking, while the DeepHit model, implemented with a multilayer perceptron backbone, was evaluated using chronological and leave-one-player-out (LOPO) validation. DeepHit achieved a concordance index of 0.762, outperforming baseline models and delivering individualised, time-varying risk estimates. Shapley Additive Explanations (SHAP) identified clinically relevant predictors consistent with established risk factors, enhancing interpretability. Overall, this study provides a novel proof of concept: survival modelling with DeepHit shows strong potential to advance injury forecasting in football, offering accurate, explainable, and actionable insights for injury prevention across competitive levels.
💡 Research Summary
**
The paper presents a proof‑of‑concept study that applies the DeepHit deep learning survival model to forecast time‑to‑injury in elite female football players using longitudinal monitoring data. The authors leveraged the publicly available SoccerMon dataset, which contains two full seasons of training, match, and wellness records from 37 Norwegian first‑division women’s footballers. After extensive data cleaning—removing physiologically implausible GPS speeds, erroneous session durations, and outlier values—the authors engineered a set of 39 features that combine objective load metrics (e.g., daily/weekly training load, ACWR, speed zones, distance covered) with subjective wellness scores (fatigue, mood, soreness, sleep, readiness). They also created derived variables such as cumulative prior‑injury count and the proportion of missing questionnaire responses, hypothesising that incomplete reporting may signal disengagement and heightened injury risk.
Missing data were addressed with three imputation strategies: (1) player‑specific median imputation, (2) a bespoke teammate‑relative formula that preserves each player’s standing within the squad, and (3) linear interpolation. The authors compared the resulting distributions and correlations with injury occurrence, ultimately retaining all three strategies for downstream model comparison.
For benchmarking, three conventional machine‑learning classifiers—Random Forest, XGBoost, and Logistic Regression—were tuned via grid search using a custom weighted metric that favoured F1‑score and recall (to mitigate the severe class imbalance: 43 injuries among 4,449 player‑day observations). Oversampling of the injured class was applied, and models were evaluated across multiple look‑back windows (3, 5, 7, 10, 14 days) and forecasting horizons of equal length, using rolling averages of the features.
The DeepHit model was built with a multilayer perceptron (MLP) backbone rather than a recurrent architecture, due to limited sequence length. Input windows spanned 21 days, and the model predicted injury risk over a 7‑day horizon. Each observation was labelled with a decreasing “time‑to‑event” value (7 → 1 days before injury) and a binary event indicator (injury = 1, censored = 0). All features were standardised before training. Model performance was primarily assessed with the concordance index (C‑index), which measures the ability to correctly rank individuals by relative risk—a metric directly relevant to coaching decisions.
Two validation schemes were employed. First, a chronological split mimicked real‑time deployment, training on earlier weeks and testing on later weeks. Second, a leave‑one‑player‑out (LOPO) cross‑validation iteratively held out each athlete, training on the remaining 36 and testing on the excluded player. The DeepHit model achieved a C‑index of 0.762 on the chronological split, outperforming the baseline classifiers (which ranged roughly between 0.68 and 0.71). In LOPO validation, most players retained C‑indices above 0.70, indicating good generalisability and limited over‑fitting to individual injury histories.
Interpretability was addressed using SHAP (Shapley Additive Explanations). For the highest‑performing players and for days identified as high‑risk, SHAP values highlighted several clinically plausible risk factors: (1) acute‑to‑chronic workload ratio (ACWR), (2) cumulative prior‑injury count, (3) subjective fatigue and muscle soreness scores, (4) training monotony and strain, and (5) the proportion of missing wellness questionnaires. These findings align with established sports‑medicine literature, reinforcing the model’s credibility. Notably, the missing‑questionnaire metric emerged as a novel indicator that may reflect reduced engagement or inadequate recovery, warranting further investigation.
The authors acknowledge several limitations. The dataset is modest in size (37 athletes, two seasons), restricting external validation and potentially inflating performance estimates. Injury labels were derived from team injury reports rather than independent medical diagnoses, introducing possible misclassification. The use of an MLP backbone, while pragmatic, may not fully capture complex temporal dependencies; future work could explore recurrent or transformer‑based DeepHit variants when larger longitudinal datasets become available. Additionally, the three imputation strategies yielded slightly different performance, suggesting that handling of missing data remains a critical factor.
In conclusion, the study demonstrates that DeepHit survival modelling can provide accurate, time‑sensitive injury risk forecasts for elite female footballers, surpassing traditional binary classifiers and delivering interpretable insights via SHAP. By delivering daily risk estimates rather than season‑long binary predictions, the approach offers actionable information for coaches and medical staff to tailor training loads, recovery protocols, and player‑specific interventions in near real‑time. This work thus bridges a gap between advanced machine‑learning techniques and practical injury‑prevention strategies in women’s sport, and it sets the stage for larger‑scale, multi‑team investigations that could further refine and validate the methodology.
Comments & Academic Discussion
Loading comments...
Leave a Comment