The super learner for time-to-event outcomes: A tutorial

The super learner for time-to-event outcomes: A tutorial
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Estimating risks or survival probabilities conditional on individual characteristics based on censored time-to-event data is a commonly faced task. This may be for the purpose of developing a prediction model or may be part of a wider estimation procedure, such as in causal inference. A challenge is that it is impossible to know at the outset which of a set of candidate models will provide the best risk estimates. The super learner is a powerful approach for finding the best model or combination of models (’ensemble’) among a pre-specified set of candidate models or ’learners’, which can include both ‘statistical’ models (e.g. parametric, semi-parametric models) and ‘machine learning’ models. Super learners for time-to-event outcomes have been developed, but the literature is technical and the full details of how these methods work and can be implemented in practice have not previously been presented in an accessible format. In this paper we provide a practical tutorial on super learner methods for time-to-event outcomes. An overview of the general steps involved in the super learner is given, followed by details of three specific implementations for time-to-event outcomes. These include the originally proposed super learner, which involves using a discrete time scale, and two more recently proposed versions of the super learner for continuous-time data. We compare the properties of the methods and provide information on how they can be implemented in R. The methods are illustrated using an open access data set and R code is provided.


💡 Research Summary

This paper provides a practical tutorial on applying the Super Learner (SL) methodology to time‑to‑event (survival) outcomes, addressing the common problem of not knowing in advance which predictive model will perform best on censored data. After introducing the notation—observed time (\tilde T = \min(T,C)), event indicator (\Delta), covariates (X), and the target conditional survival probability (S(\tau|X))—the authors outline the general SL workflow: K‑fold cross‑validation, fitting a library of (p) candidate learners, obtaining cross‑validated predictions, computing a loss function that accounts for censoring, and either selecting the single best learner (non‑ensemble SL) or estimating non‑negative weights that combine all learners (ensemble SL). The “oracle property” guarantees that the SL will perform at least as well as the best candidate in the library.

Three concrete implementations are described. The first, originally proposed by Polley and van der Laan (2011), discretises follow‑up time into unit intervals, transforms the problem into a series of binary outcomes, and fits binary‑outcome learners (logistic regression, random forests, GAMs, neural nets) to estimate the discrete‑time hazard (Q(t|X)). Survival probabilities are then obtained by multiplying ((1-Q(t|X))) across intervals. This approach allows any binary classifier to be used but can suffer from inflexibility in modelling the baseline hazard when many intervals are needed.

The second and third implementations are recent continuous‑time versions. Westling et al. (2023) and Munch & Gerds (2024, 2025) retain the original continuous time scale, model the censoring distribution (G(t|X)) separately, and incorporate inverse‑probability‑of‑censoring weights (IPCW) into the loss function (e.g., weighted squared error or Brier score). These methods can directly include Cox proportional hazards models, parametric survival models (Weibull, log‑normal), random survival forests, and gradient boosting machines. The Munch & Gerds version adds a log‑likelihood‑based loss and uses constrained non‑negative least squares to obtain ensemble weights that sum to one, automatically dropping learners with zero weight.

Implementation details in R are emphasized. The authors show how to split data by individuals (so all rows for a given subject stay in the same fold), fit each candidate learner on the training folds, predict on the validation folds, compute IPCW using a separate censoring model, and feed the weighted loss into the SuperLearner package. They provide code snippets that integrate survival, riskRegression, glmnet, randomForestSRC, and xgboost.

The tutorial is illustrated with the publicly available Rotterdam breast‑cancer cohort (included in the survival package). Five learners—standard Cox, Lasso‑Cox, Weibull, random survival forest, and gradient boosting—are placed in the SL library. Both the discrete‑time SL and the two continuous‑time SLs are fitted, and performance is evaluated with Brier scores and concordance indices at a 5‑year horizon. Results show that the continuous‑time SLs consistently outperform the discrete‑time version, especially in the presence of heavy censoring, confirming the advantage of using IPCW‑adjusted loss functions and retaining the original time scale.

In the discussion, the authors highlight the flexibility of SL to combine parametric, semi‑parametric, and machine‑learning models, its data‑driven model‑selection property, and practical considerations such as choice of loss function, handling of censoring, and ensuring non‑negative weight constraints. They suggest future extensions to high‑dimensional settings, incorporation of deep‑learning survival models, and adaptation to multi‑state or competing‑risk frameworks.

Overall, the paper demystifies the technical details of survival‑specific Super Learner methods, provides reproducible R code, and demonstrates that continuous‑time SLs offer superior predictive accuracy while preserving the theoretical guarantees of the Super Learner framework.


Comments & Academic Discussion

Loading comments...

Leave a Comment