Ensemble Machine Learning and Statistical Procedures for Dynamic Predictions of Time-to-Event Outcomes
Dynamic predictions for longitudinal and time-to-event outcomes have become a versatile tool in precision medicine. Our work is motivated by the application of dynamic predictions in the decision-making process for primary biliary cholangitis patient…
Authors: Nina van Gerwen, Sten Willemsen, Bettina E. Hansen
Ensem ble Mac hine Learning and Statistical Pro cedures for Dynamic Predictions of Time-to-Ev en t Outcomes Nina v an Gerw en 1 , 2 , Sten Willemsen 1 , 2 , Bettina E. Hansen 1 , 2 , Christophe Corp ec hot 3 , Marco Carb one 4 , Cyn thia Levy 5 , Maria-Carlota Londo ˜ no 6 , A tsushi T anak a 7 , P alak T riv edi 8 , Alejandra Villamil 9 , Gideon Hirsc hfield 10 , Dimitris Rizop oulos 1 , 2 1 Departmen t of Biostatistics, Erasm us Univ ersity Medical Cen ter, the Netherlands 2 Departmen t of Epidemiology , Erasmus Univ ersity Medical Cen ter, the Netherlands 3 Reference Center for Inflammatory Biliary Diseases and Autoimm une Hepatitis, Europ ean Reference Net work on Hepatological Diseases (ERN Rare-Liver), Saint-An toine Hospital, Assistance Publique - Hˆ opitaux de P aris; Inserm UMR S938, Saint-An toine Researc h Cen ter, Sorbonne Universit y , P aris, F rance 4 Departmen t of Medicine and Surgery , Universit y of Milan Bico cca, Italy 5 Miller School of Medicine, Univ ersity of Miami, United States of America 6 Viral, T o xic and Metab olic Hepatopathies, Hospital Cl ´ ınic de Barcelona, Spain 7 Departmen t of Medicine, T eikyo Universit y School of Medicine, Japan 8 Biomedical Research Unit & Centre for Liv er Researc h, Univ ersity of Birmingham, United Kingdom 9 Division of Gastro en terology & Hepatology , Hospital Italiano de Buenos Aires, Argentina 10 Cen tre for Liver Disease, T oronto General Hospital, Canada F ebruary 24, 2026 Corresp ondence Erasm us MC attn. Nina v an Gerw en, Wytema weg 12, 3015 CN, Rotterdam, the Netherlands Ac kno wledgemen ts W e would like to thank the Global PBC Study Group for granting us av ailability of their data for the curren t article. 1 Statemen ts and declarations Ethical considerations Ethical appro v al for the analysis and publication of the retrospectively obtained and anon ymised data of the Global PBC Study Group w as not required. Consen t to participate Not applicable. Consen t for publication Not applicable. Declaration of conflicting in terest The authors declared no p oten tial conflicts of interest with resp ect to the researc h, authorship, and/or publication of this article. F unding statement This work is funded by Stress in Action. The research pro ject ‘Stress in Action’: www.stress-in-action.nl is financially supp orted by the Dutch Research Council and the Dutch Ministry of Education, Culture and Science (NWO gravitation gran t n umber 024.005.010). Data Av ailabilit y Statement The data that supp ort the findings of this study are a v ailable from the Global PBC Study Group. Restrictions apply to the a v ailability of these data, which w ere used under license for this study . Data are av ailable at h ttps://www.globalpb c.com with the p ermission of the Global PBC Study Group. 2 Abstract Dynamic predictions for longitudinal and time-to-even t outcomes hav e b ecome a versatile to ol in precision medicine. Our work is motiv ated b y the application of dynamic predictions in the decision-making process for primary biliary cholangitis patients. F or these patients, serial biomarker measurements (e.g., bilirubin and alk aline phosphatase levels) are routinely collected to inform treating ph ysicians of the risk of liver failure and guide clinical decision-making. Tw o p opular statistical approaches to derive dynamic predictions are join t mo delling and landmarking. How ever, recen tly , machine learning techniques hav e also b een prop osed. Eac h approach has its merits, and no single metho d exists to outp erform all others. Consequently , obtaining the best p ossible surviv al estimates is challenging. Therefore, we extend the Sup er Learner framew ork to com bine dynamic predictions from differen t mo dels and procedures. Sup er Learner is an ensemble learning tec hnique that allo ws users to com bine different prediction algorithms to impro ve predictive accuracy and flexibilit y . It uses cross-v alidation and different ob jectiv e functions of p erformance (e.g., squared loss) that suit sp ecific applications to build the optimally weigh ted combination of predictions from a library of candi- date algorithms. In our work, w e pay sp ecial attention to appropriate ob jective functions for Sup er Learner to obtain the most optimal weigh ted combination of dynamic predictions. In our primary biliary cholangitis application, Super Learner presen ted unique b enefits due to its ability to flexibly combine outputs from a div erse set of models with v arying assumptions for equal or b etter predictiv e p erformance than any model fit separately . Keywor ds— dynamic predictions, ensemble learning, join t models, landmarking, predictive accuracy 1 In tro duction Dynamic predictions for longitudinal and time-to-even t outcomes, where predictions are up dated as more information is collected o ver time, ha ve become a v ersatile to ol in precision medicine for numerous diseases. 1,2,3,4 Our w ork is motiv ated b y primary biliary c holangitis (PBC), a slo w progressive and chronic autoimmune liver disease. There is currently no cure av ailable for the disease, and a large prop ortion of patients hav e an inadequate resp onse to the current clinical guidance treatment ursodeoxyc holic (UDCA) therapy . 5 Th us, after diagnosis, PBC patients are routinely follo wed up for the remainder of their lives to trac k disease progression and identify patien ts at risk of liver failure (either an acute deterioration in liver function or the need for liver transplantation). F or these patients, several liver-related biomarkers are routinely measured. Ph ysicians then use patien ts’ biomark er v alues to guide clinical decision-making, such as who should receive a liver transplan tation or b e more closely follow ed up. T o help b etter make these critical decisions for PBC patients, a surviv al mo del that giv es highly accurate dynamic time-to-even t prediction is paramount. Curren tly , the tw o most commonly used pro cedures for dynamic time-to-even t prediction are join t mo delling 6 and 3 landmarking, 7 though more metho ds exist (e.g., DynamicDeep-Hit, 8 a neural netw ork based approach). How ever, en- suring that a single model or pro cedure pro vides the most accurate dynamic predictions can b e c hallenging for sev eral reasons. F oremost, researchers need to hav e a considerable amount of a-priori information on different asp ects of model sp ecification with join t mo delling or landmarking, such as v ariable inclusion, model terms and functional forms to link the t wo outcomes. F or example, to accurately predict disease progression in PBC patients, a researcher should know: (a) what risk factors, biomarkers and in teractions to include in the model, (b) the tra jectory ov er time of the different biomark ers (e.g., whether the biomark ers show linear trends or nonlinear trends ov er time within patien ts), and (c) ho w the v arious biomarkers are related to disease progression (e.g., whether current biomarker concen tration in the patien ts’ blo od or the rate at whic h the biomarker concentration increases predicts disease progression). Commonly , researchers use standard mo del selection techniques (e.g., information criteria or significance tests) to answ er these questions b y choosing the mo del that displays the b est fit. Ho wev er, in the context we consider, joint mo dels and landmarking cannot b e directly compared b ecause they are estimated using different likelihoo d functions. F urthermore, the likelihoo d of joint mo dels is often dominated b y the longitudinal outcome, and information criteria ma y not fo cus on optimising the predictive accuracy of the time-to-ev ent outcome. Researc hers can also use metrics of predictive accuracy (e.g., Brier Score (BS) or area-under-the-receiv er op erating curve (A UC)) to determine the best mo del. How ever, these metrics hav e b een shown to not b e sensitive enough to determine differences b et ween mo dels. 9,10 Finally , it is worth noting that b ecause we work with m ultiple outcomes the set of p ossible statistical models increases exp onen tially , further complicating model specification. W e prop ose to use ensemble learning techniques in the setting of dynamic predictions to ov ercome the issues de- scrib ed abov e. In ensem ble learning, the predictiv e performance of multiple mo dels is combined to pro vide b etter and more flexible predictions than an y single mo del. W e extend the ensemble learning technique Sup er Learner 11 (SL) in the con text of dynamic prediction. Previously , Rizop oulos et al. 12 ha ve shown ho w SL can combine multiple join t mo dels to accommo date uncertain t y in the asso ciation betw een the longitudinal and time-to-ev ent outcome. How ever, they focused on com bining exclusively joint mo dels to accommo date differen t longitudinal tra jectories and functional forms of the as- so ciation b etw een the t wo outcomes, and they paid less attention to different framew orks. F urthermore, join t mo dels are kno wn to be a computationally intensiv e method, and using solely joint mo dels in an ensemble for SL explosiv ely increases computational complexity . Lastly , joint mo dels need to b e fully pre-sp ecified in cov ariate selection and model terms. T anner et al. 13 to ok another approac h to SL for dynamic predictions. By discretising time, they show ed how SL can combine binary classification algorithms to estimate dynamic predictions. Even though this results in a very flexible algorithm with data-driven cov ariate selection, discretising time brings the disadv antages of information loss and sub jectivity in the size of the discrete time interv als. Both pap ers also fo cused on optimising predictive accuracy b y minimising BS or exp ected predictiv e cross-entrop y . Although SL has b een extended to optimise the time-inv ariant A UC for binary classification problems, 14 SL for dynamic predictions has not b een extended to optimising the time-v arying 4 A UC. 15 ( tv -AUC) Therefore, our work builds up on these metho ds by com bining the predictive p erformance of differen t dynamic prediction pro cedures in contin uous time for the optimisation of previously used and new metrics. W e also further inv estigate the p erformance of SL in v arying settings of longitudinal and time-to-even t outcomes. The remainder of the pap er is organised as follows: Section 2 formally details the metho ds for SL with dynamic predictions. In Section 3, we emplo y SL for dynamic predictions in the Global PBC dataset. Section 4 introduces a sim ulation study that examines the performance of SL for dynamic predictions with v arying forms of censoring. Lastly , Section 5 presen ts the findings and practical implications. 2 Sup er Learner for dynamic predictions Supp ose w e ha ve a sample D n = { T i , δ i , y i ; i = 1 , . . . , n } of size n from our target p opulation. Let T ∗ i denote the true ev ent time for sub ject i . W e denote T i = min( T ∗ i , C i ) as the observed even t time, where C i denotes the censoring time. The even t indicator δ i = 1 ( T i ≤ C i ), in which 1 ( · ) is an indicator function that equals 1 when the condition within holds and 0 otherwise, tells us whether individual i was censored or experienced the even t of interest. Finally , y i = ( y i 1 , . . . , y iM ) is a v ector of m = 1 , . . . , M longitudinal outcomes of response- and sub ject-sp ecific size k m,i , where y m,il is the v alue of the m th longitudinal response at time p oin ts t m,il , l = 1 , . . . , k m,i . These time points need not b e regular interv als and can differ p er sub ject and longitudinal outcome. W e assume righ t-censoring o ccurred where sub jects are lost to follow-up. The ob jective is to make accurate surviv al predictions for a new sub ject j from the target p opulation. This sub ject should then also ha ve sets of longitudinal measurements up to t : y j ( t ), which concurrently implies surviv al of sub ject j up to t . Therefore, we fo cus on surviv al predictions conditional on surviv al at t . Mathematically , w e are interested in the surviv al probabilit y at time u > t : π j ( u | t ) = Pr( T ∗ j ≥ u | T ∗ j > t, y j ( t ) , D n ) . (1) This probability can then b e up dated whenever a new measuremen t for sub ject j becomes a v ailable at t + > t to obtain π j ( u | t + ). Different metho ds to estimate (1) are discussed in Section 2.2. In SL, w e assume that w e hav e set K = {M 1 , . . . , M K } of K base learners for D n that can estimate (1). Imp ortan tly , these base learners come from different mo delling frameworks (e.g., landmarking, joint mo delling, etc.) and v ary in sp ecification within a framew ork (e.g., differen t baseline cov ariates, different model terms, etc.). The pro cess of SL w orks as follows. First, we split up D n in to v = 1 , . . . V folds and obtain cross-v alidated estimates of π i ( u | t ) b y fitting each base learner in K on V − 1 folds, and estimating predictions on the v -th fold excluded in mo del fitting. This is repeated for each fold. W e end up with K predictions for all i ∈ R ( t ), where R ( t ) denotes the set of individuals still at risk at 5 time t (i.e., T i > t ): ˆ π v i ( u | t, M k ) = Pr( T ∗ i ≥ u | T ∗ i > t, M k , D − v n ) , where M k is base learner k ∈ K and D − v n is D n with individuals from the v -th fold remo ved. Then we assign w eights ω k ( t ) to the estimated cross-v alidated predictions. SL optimises predictive accuracy by estimating ω k ( t ) suc h that an a-priori chosen loss function λ ( · ) is minimised. The result is an ensem ble SL (eSL): ˜ π v i ( u | t ) = K X k =1 ˆ ω k ( t ) ˆ π v i ( u | t, M k ) , for v = 1 , . . . V , (2) in which ˆ ω k ( t ) = argmin ω k ( t ) X i ∈R ( t ) λ K X k =1 ω k ( t ) ˆ π v i ( u | t, M k ) , T i , sub ject to ω k ( t ) ≥ 0 ∧ K X k =1 ω k ( t ) = 1 . Cho osing an appropriate λ ( · ) that fits the research ob jectiv e is imp ortan t, as this will define the estimation of ω k ( t ). W e fo cus on three p ossible functions that estimate p erformance for dynamic predictions: BS, integrated BS (IBS) and tv -AUC. In Section 2.1, we describ e these functions in more detail. After estimation of ω k ( t ), SL can estimate (1) b y fitting the base learners on the complete dataset to estimate π j ( u | t, M k ), and then weighing these dynamic surviv al probabilities by the estimated ˆ ω k ( t ). A critical issue in SL is sp ecifying K . According to the philosoph y of SL, users should add all p ossible model sp ecifications to K because increasing the library size can only improv e the accuracy of SL due to its oracle property . The oracle prop ert y states that as n and K approach infinity , the eSL p erforms asymptotically as well as the oracle mo del (OM), whic h is the model that would p erform b est in an indep enden t test dataset. The OM is typically unknown in observ ational studies such as the Global PBC Study , sho wing the prop ert y’s desirabilit y . Finally , in the setting of dynamic predictions, tw o more interlink ed hyperparameters affect the performance of SL: the prediction window { t, u } and the n umber of folds V . Stable estimation of the predictive accuracy metrics requires a sufficien t n umber of ev ents within the prediction windo w in all folds to differentiate b et ween ev ent-ha vers and non-ev ent- ha vers. If the amount of even ts in a fold is to o low, it b ecomes difficult for SL to determine which base learner p erforms b est due to the unstable estimates. If the num b er of even ts is low er, dividing the data in a fewer folds, where each fold has appro ximately the same num b er of ev ents lik ely leads to more stable results compared to a larger num ber of folds. T o help achiev e stable results, we create folds stratifying on the ev ent outcome. 16 After specifying λ ( · ), K , { t, u } and V , there are tw o types of SL to c ho ose from. First is the eSL where w e optimally w eigh the K mo dels by minimising λ ( · ) as shown in (2). Second is the discrete SL (dSL), in whic h w e choose M k with the low est cross-v alidated a verage λ ( · ). In other words, the dSL is the base learner that p erformed best in the v alidation set. An added b enefit of working with SL for dynamic predictions is that the weigh ts b ecome time-dependent and can 6 b e optimised for any prediction window { t, u } . F or example, in a hypothetical scenario where model A gives the most accurate predictions from 0 ≤ t < 10, and mo del B giv es the most accurate predictions for t ≥ 10, SL will prop erly reflect these c hanges when optimising predictiv e accuracy in the different prediction windo ws. 2.1 Loss functions for Dynamic Predictions T o estimate ω k ( t ) for the eSL, w e use the BS, IBS and tv -A UC. Other p ossible options as loss functions are the exp ected predictiv e cross-entrop y and mean absolute error. W e fo cus on the BS, IBS and tv -AUC b ecause they are regularly used to ev aluate the accuracy of dynamic predictions. A challenge in the estimation of these three predictive accuracy measures is censoring. In particular, we fo cus on sub jects at risk at time t and calculate the accuracy measures at u > t . W e will account for censored observ ations in the interv al ( t, u ) using in verse probability of censoring weigh ting. These w eights are estimated by W i ( u, t ) = 1 ( t < T i ≤ u ) δ i G ( T i | t ) + 1 ( T i > u ) G ( u | t ) , (3) where G ( · ) is the distribution for censoring times. W e estimate G ( · ) using the Kaplan-Meier estimator, assuming indep enden t censoring. In (3), we see that patien ts who exp erience an even t within the prediction window are weighed b y the in verse probabilit y of b eing censored at T i . P atients who experience an ev ent after the prediction window are w eighed b y the inv erse probabilit y of b eing censored at u , and patients censored in the in terv al are given zero weigh t. The BS measures predictiv e accuracy for time-to-ev en t outcomes b y balancing model discrimination and calibration. 17 F or dynamic predictions, the BS is estimated b y BS( u | t ) = 1 n R ( t ) X i ∈R ( t ) ˆ W i ( u, t ) n 1 ( T i ≤ u ) − ˜ π v i ( u | t ) o 2 , where n R ( t ) is the num b er of patients in R ( t ) and ˜ π v i ( u | t ) is the estimate of (2). The BS estimates error only at the end of the prediction window. If prediction error ov er the whole prediction interv al has more medical relev ance, we can estimate the IBS approximated by Simpson’s rule: 18 IBS( u | t ) = 1 u − t Z u t BS( s, t ) ds ≈ 2 3 BS( t + u 2 | t ) + 1 6 BS( u | t ) . Lastly , the tv -AUC measures only mo del discrimination. The metric is interpreted as the proportion of individuals in the prediction windo w correctly ranked in terms of surviv al probabilit y had no one b een censored, i.e., the prop ortion of ev ent-ha vers who hav e a low er predicted surviv al probability than non-even t-hav ers. The tv -AUC for prediction window 7 { t, u } is estimated through the formula tv -AUC( u, t ) = P i ∈R ( t ) P j ∈R ( t ) 1 ˆ ˜ π v i ( u, t ) > ˆ ˜ π v j ( u, t ) D i ( u, t ) 1 − D j ( u, t ) ˆ W i ( u, t ) ˆ W j ( u, t ) P i ∈R ( t ) P j ∈R ( t ) D i ( u, t ) 1 − D j ( u, t ) ˆ W i ( u, t ) ˆ W j ( u, t ) , where D i ( u, t ) = 1 ( t < T i ≤ u ) δ i . In contrast with the BS and IBS, we maximise the tv -A UC for estimation of ω k ( t ). 2.2 Base learners for dynamic predictions W e include base learners from three procedures in the model library of SL. The first approac h w e use is landmarking, a computationally simple metho d to obtain dynamic predictions. The concept b ehind landmarking is to subset R ( t ) from D n . Then, a time-to-ev ent model is fit to R ( t ), where time zero denotes landmark time t . In landmarking, a function f ( · ) of the longitudinal outcome y i is added to the time-to-ev ent model as a baseline cov ariate. F or example, if we landmark with a Cox’s Prop ortional Hazards model 19 (notated as Cox mo del or Co x regression from here on forth), the hazard function is: h i ( t ) = h 0 ( t ) exp γ ⊤ w i + M X m =1 α m f ( y im ) , where the baseline hazard h 0 ( · ) is estimated non-parametrically , and γ denotes the co efficients of the baseline cov ariates w i , and α m is the coefficient of function f ( · ) of longitudinal outcome y im . Multiple v ersions of landmarking exist, with a different amount of complexity and accuracy . The most basic ve rsion is one-stage landmarking, where f ( · ) denotes the last v alue a v ailable of y im whic h is carried forw ard (L VCF) , i.e.: f ( y im ) = y m,ip , p = max l · 1 ( t m,il ≤ t ) . One-stage landmarking has multiple issues, ho wev er. F or example, with L VCF, we assume there is no measuremen t error. F urthermore, we assume that the risk of an ev ent dep ends only on the last observ ed v alue, and we ignore the time difference b et ween the last measuremen t and the landmark time and its impact on the longitudinal outcome. One-stage landmarking has b een extended to tw o-stage landmarking to comp ensate for these issues. In tw o-stage landmarking, we tak e a t wo-step approac h. First, a linear mixed mo del is fit to mo del the longitudinal outcome. The general form of a linear mixed mo del for contin uous y im is E ( y im ( t ) | b im ) = η im ( t ) = x ⊤ im ( t ) β m + z ⊤ im ( t ) b im , b im ∼ N ( 0 , D m ) , (4) where η im ( t ) is the exp ected latent tra jectory of the m th longitudinal outcome for individidual i ov er time, and x im ( t ) and z im ( t ) denote time-v arying design vectors for the fixed effects β m and random effects b im , respectively . In the 8 second step, the linear mixed mo del is used to get an estimate of η im at t to include in the tw o-staged landmark model: f ( y im ) = ˆ η im ( t ) . F or all Cox regression landmark mo dels, we estimate (1) using the Breslo w estimator 20 ˆ π j ( u | t ) = exp ( − ˆ H 0 ( u ) exp γ ⊤ w j + M X m =1 α m f ( y j m ) ) , in which ˆ H 0 ( u ) = X i ∈R ( t ) 1 ( T i ≤ u ) δ i P j ∈R ( t ) exp { γ ⊤ w j + P M m =1 α m f ( y j m ) } . Note that more extensions hav e b een made to landmark mo dels to further improv e their predictive accuracy . 21,22,23 F urthermore, any time-to-even t landmark mo del can b e added as a base learner to the SL ensem ble (e.g., a parametric accelerated failure time landmark mo del). How ev er, we adhere to Co x regression one-stage and tw o-stage landmarking in the curren t article. The second approac h for base leaners we employ is a generalisation of a standard Cox landmark regression: a tree- based gradien t-b oosted Cox regression with landmarking. Gradient bo osting is another form of ensem ble learning, and b y including these mo dels as base learners in our SL ensemble w e p erform nested ensemble learning. F or clarity , w e denote the base learners used within gradien t-b oosted mo dels as sub-base learners. A tree-based gradient-bo osted Co x regression o vercomes the issue of requiring a-priori information on cov ariate inclusion (baseline and longitudinal), mo del terms and interactions. The standard Cox landmark model also assumes a linear relationship betw een the logarithm of the hazard and the co v ariates, whereas a gradien t-b oosted version can estimate this relationship non-parametrically . Compared to SL, gradien t bo osting arrives at a final model in a different manner. In SL, the ensemble is achiev ed through a one-time optimisation problem, sho wn in (2). In tree-based gradient bo osting of a Cox model with landmarking, w e arrive at an optimal mo del M k ∈ K by minimising the partial log-likelihoo d solely for i ∈ R ( t ) through an iterative pro cess with the landmark time as time zero. First, we initialise our tree-based prediction function relying on X = ( X 1 , . . . , X n R ( t ) ) with ˆ ψ 0 ( X ). Here, X i is short-hand notation to denote the baseline and longitudinal v ariables for individual i at the landmark time and X n R ( t ) denotes that we only look at individuals included in R ( t ). Note that the longitudinal outcomes in X can be derived through either one-stage or tw o-stage landmarking. Subsequen tly , in eac h iteration b = 1 , . . . , b stop , the negative gradien t v ector u b = ( u 1 ,b , . . . u n R ( t ) ,b ) is computed. F or individual i in iteration b , the v alue of the negative gradient v ector is derived by u i,b = δ i − X l ∈R ( t ) δ l exp n ˆ ψ b − 1 ( X l ) o P k ∈R ( t ) exp n ˆ ψ b − 1 ( X k ) o , where ˆ ψ b − 1 ( · ) is the estimated tree-based prediction function of the b o osting algorithm for iteration b − 1. 9 Afterw ards, the sub-base learners predict u b . The predicted ˆ u b from the sub-base learner that b est predicts u b according to the least squares estimator is then used to up date the prediction function: ˆ ψ b ( X ) = ˆ ψ b − 1 ( X ) + ν ˆ u b , resulting in data-driven cov ariate selection. The up date is partly decided by ν , a step-size parameter determining the prop ortion of the sub-base learner added to the ensemble in eac h iteration. F or a more detailed description of the gradient b oosting algorithm, see Hofner et al. 24 There are three imp ortan t h yp erparameters when designing a gradient b oosting model: the sub-base learners, ν and b stop . W e include only regression trees of baseline and longitudinal v ariables and tw o-wa y interactions as sub-base learners to model the even t outcome non-parametrically . W e chose a small v alue (0 . 025) for ν to decrease the chances of surpassing the global minimum of the negative partial log-likelihoo d. Finally , we used 5-fold cross-v alidation nested in D − v n to determine the optimal n umber of iterations, b opt , to decrease the c hances that the mo del either stops to o early or ov erfits. After we fit a landmark tree-based gradient b oosted Co x model, we estimate π j ( u | t ) with the Breslo w estimator: ˆ π j ( u | t ) = exp − ˆ H 0 ( u ) exp h ˆ ψ b opt ( X j ) i , in which ˆ H 0 ( u ) = X i ∈R ( t ) 1 ( T i ≤ u ) δ i P j ∈R ( t ) exp n ˆ ψ b opt ( X j ) o . Compared to a standard Co x landmark regression, the equation within the exp onen t is a now black-box function of the included cov ariates and t wo-w a y in teractions. Bo osted landmark mo dels were fit using the m b o ost pack age. 25 The final framework we use to estimate (1) is joint modelling. In this framework, sub-mo dels are defined for the t wo outcomes and linked through shared parameters to specify the joint distribution. F or the longitudinal outcomes y i , a general form for the sub-mo del is defined in (4). Although the form is identical, the estimation of the longitudinal mo del differs in joint modelling compared to landmarking because a joint mo del estimates the biv ariate association of the longitudinal and time-to-even t outcomes. Sp ecifically , for the time-to-even t pro cess, the join t mo delling framework assumes that the instantaneous risk of ha ving an even t dep ends on baseline cov ariates, functions of the m ∈ M latent tra jectories η im ( t ) of the longitudinal outcomes and the vectors of random effects b i = ( b i 1 , . . . , b iM ): h i { t | H i ( t ) , w i } = lim ∆ t → 0 Pr t ≤ T ∗ i < t + ∆ t | T ∗ i ≥ t, H i 1 ( t ) , . . . , H iM ( t ) , w i ∆ t = h 0 ( t ) exp γ ⊤ w i + M X m =1 Q X q =1 f mq n η im ( t ) , b im , α mq o , (5) where H im ( t ) = { η im ( s ) , 0 ≤ s < t } represen ts the history of the longitudinal process up to t for the m th longitudinal outcome, h 0 ( · ) is the baseline hazard function, and w i are baseline cov ariates with regression co efficients γ . F unction f mq ( · ) and its parameterisation α mq is the functional form of the m th longitudinal outcome and identifies its relation to the time-to-ev ent outcome. Common examples of the functional form are the longitudinal outcome’s current v alue, 10 lagged v alue, slop e or integral or a combination of multiple. F or a more detailed o verview of functional forms, see Rizop oulos et al. 26 As evident in (4) and (5), the t wo sub-models are link ed through the distributions of random effects b i . The dra wbac ks of this method are that joint mo dels assume conditional independence giv en b i , and estimation of the join t distribution requires in tegration ov er the random effects. Join t mo dels are, therefore, a computationally intensiv e method, esp ecially as the dimensionalit y of the longitudinal outcomes and random effects increases. Under the Bay esian join t mo delling framework, π j ( u | t ) can b e estimated with Marko v c hain Monte Carlo metho ds from the p osterior predictive distribution: π j ( u | t ) = Z Pr( T ∗ j ≥ u | T ∗ j > t, y j ( t ) , θ ) p ( θ | D n ) d θ , where θ represents all model parameters. F or a more detailed explanation of dynamic prediction estimation under the join t mo delling framework, see Rizop oulos et al. 27 Join t mo dels were fit using the JMbay es2 pack age. 28 3 Global PBC Data W e return to our motiv ating example of PBC, where w e apply SL to the Global PBC Data to obtain dynamic surviv al prediction ensem bles. The Global PBC Data results from an in ternational effort by 15 liv er cen tres as a part of the Global PBC Study Group. W e work with a subset of the complete database that contains follow-up data on 1884 PBC patients from 1988 to 2013. All patients received UDCA therap y with v arying doses at the start of follow-up. Our time-to-even t outcome of interest is a composite endpoint of liver decompensation, liv er transplan tation and death. During follow-up, 326 patients (17 . 3%) experienced the comp osite endp oint. F or patients, ten liver-related biomarkers w ere sp oradically collected during follo w-up time. W e fo cus on three of the biomarkers that were measured most often, namely total bilirubin (TB), alk aline phosphatase (ALP), and aspartate aminotransferase (AST), as previous research has sho wn these to b e strongly asso ciated with ev ent outcomes in PBC patien ts. 29,30 Altogether, there w ere 23,102 longitudinal measurement times. P atien ts had a rounded av erage of 10 measuremen ts for TB ( S D = 11 . 10), 11 measurements for ALP ( S D = 11 . 17) and 11 measuremen ts for AST ( S D = 11 . 10). Our goal is to utilise the information in these three biomark ers and baseline co v ariates sex (90 . 1% F emale), age at inclusion ( M = 52 . 78 , S D = 11 . 91), disease duration b efore inclusion ( M = 0 . 70 , S D = 1 . 60), whether lesions were disco vered in a liver biopt at inclusion (85 . 0% Y es) and the year of study inclusion ( M = 2000 , I QR = 1995-2005) to create a highly accurate and flexible dynamic prediction mo del for the composite endp oint in PBC patients through SL. In our library , we consider nine base learners. Mo dels M 1 and M 2 are one-stage landmark mo dels based respectively on a standard and bo osted Cox regression. Mo del M 3 is a standard Cox tw o-stage landmark mo del, where the linear mixed mo dels hav e linear sp ecifications for time with random in tercepts and slopes. Mo del M 4 is the bo osted counterpart 11 of M 3 . Because the three biomark ers ha ve b een shown to follow nonlinear tra jectories, 29 w e included another standard Co x tw o-stage landmark mo del with nonlinear time sp ecification in the longitudinal model through natural cubic splines as mo del M 5 . W e also included a bo osted version of M 5 with mo del M 6 . Finally , we included three join t mo dels based on clinical knowledge from hepatologists: a joint model with linear sub ject-specific time trends assuming current v alue asso ciation b et ween all biomarkers and hazard ( M 7 ), a joint mo del with nonlinear sub ject-specific time trend and curren t v alue asso ciation b et ween all biomarkers and hazard ( M 8 ), and a joint mo del with nonlinear sub ject-specific time trend, where the assumed asso ciation b etw een the biomarkers and hazard were: curren t v alue and slop e for TB, curren t v alue and area for ALP , and current v alue and area for AST ( M 9 ). F or an ov erview of the included mo dels in the mo del library , we refer readers to the App endix. PBC is a slo w progressiv e chronic disease. Therefore, we estimated cross-v alidated predictions in tw o prediction windo ws with high clinical relev ance. The prediction windows w e c hose are year 6 to 9 and year 9 to 12 since study inclusion. At y ear 6, 1257 patien ts were at risk of exp eriencing the even t. During y ears 6 to 9, 77 patien ts exp erienced the comp osite even t, while 373 patients were censored. A t year 9, 807 patients remained in the risk set. In the follo wing three years, another 43 patients exp erienced the ev ent while 287 patients w ere censored. W e used the following pro cedure to train the SL. First, we created an indep endent test set that contained 20% of the total data. Then, we estimate the SL for the t wo prediction windows on the remaining data (i.e., training data) using a cross-v alidation procedure. Sp ecifically , w e split the training data in to sev en and four folds for the prediction windows { 6 , 9 } and { 9 , 12 } respectively to hav e enough even ts in the time in terv als in all folds of the v alidation set. Then, w e used the three performance metrics described in Section 2.1 to estimate mo del weigh ts for the eSL and determine the dSL. After estimation of the eSL and dSL, we train the base learners on the complete training data. In total, the tw o prediction windows contain 61 and 31 even ts. How ever, landmark-based base learners are trained on all even ts past the c hosen landmark time. This means that all landmark mo dels w ere trained on resp ectiv ely 137 and 76 even ts for the tw o prediction windo ws, allowing us to learn more complex relations from the data. Finally , w e estimated the p erformance metrics of the fitted models separately , the eSL and the dSL in the indep enden t test set and compared the results. F or a more complete ov erview of the data analysis, w e included a pseudo-algorithm in the App endix. W e present the results of the analysis in T able 1 and Figure 1. T able 1 shows the time-dep endent prop ert y of SL w eights for dynamic predictions, indicating that changing the weigh ts of mo dels lead to a b etter p erformance in different prediction windows. F or example, we find that for year 9 to 12, the weigh t for M 8 under the IBS increases compared to y ear 6 to 9, indicating that the M 8 most likely p erformed b etter for the later prediction window. Finally , we noted that when optimising the tv -A UC, we often ran in to conv ergence issues. W e argue this occurred due to the nature of the tv -AUC as a prop ortion, where the gradien t of the loss function quickly b ecomes flat. This leads to optimisation algorithms getting stuck at local maxima when maximising the tv -AUC. W e combatted this issue b y running the optimisation algorithm multiple times, each time using different starting v alues. Afterwards, w e chose 12 Loss function Mo del BS IBS tv -A UC 1 0.00 0.00 0.00 2 0.00 0.00 0.07 3 0.00 0.00 0.02 4 0.47 0.23 0.15 5 0.00 0.00 0.00 6 0.00 0.00 0.00 7 0.35 0.77 0.27 8 0.00 0.00 0.46 9 0.18 0.00 0.03 Loss function Mo del BS IBS tv -A UC 1 0.89 0.00 0.13 2 0.00 0.00 0.00 3 0.00 0.00 0.00 4 0.00 0.00 0.30 5 0.00 0.00 0.32 6 0.00 0.00 0.05 7 0.11 0.00 0.06 8 0.00 1.00 0.14 9 0.00 0.00 0.00 T able 1: Sup er Learner weigh t estimates at prediction window { 6, 9 } (left) and { 9, 12 } (righ t) under the three differen t loss functions describ ed in Section 2.1 for predicting the comp osite endp oin t in the Global PBC Data. The boldfaced n umbers denote the discrete SL for a given prediction window and loss function. Models M 1 to M 9 are denoted by 1 to 9. the optimised w eights that resulted in the highest tv -AUC. Insp ecting the predictiv e performance of SL and the base leaners in the indep enden t test set, Figure 1 sho ws that the eSL tends to perform as w ell as or b etter than an y mo del fit separately on the different predictive accuracy metrics, displa ying the b enefit of SL. This is b ecause the b est mo del dep ends on b oth the metric used to assess p erformance and the sp ecified time window. How ever, with SL w e alw ays optimise predictive accuracy , allowing high predictive accuracy without putting to o m uch emphasis into one mo del being correctly sp ecified. The results display only p oin t estimates of the p erformance b ecause confidence in terv als for SL for dynamic predictions can only b e achiev ed through b ootstrapping the whole SL procedure, which is practically challenging to implement due to the large size of the data and the computational complexity of the more complex base learners. F or an impression of the v ariance of the p erformance of SL for dynamic predictions, we refer readers to the simulation results in Section 4.2. 4 Sim ulation Study In Section 2, we men tion that we estimated the different predictive accuracy metrics using Kaplan-Meier to estimate the IPCW. How ev er, using Kaplan-Meier for IPCW ignores a frequen t issue in the setting of b oth longitudinal and time-to- ev ent outcomes: the issue of informative censoring. Although there is no strong indication of informative censoring in the Global PBC Data, where most patients w ere censored due to administrative censoring, it often o ccurs that a treating ph ysician may remo ve patien ts from a study for their safety if their condition deteriorates. In these cases, the decision to censor a sub ject from the study dep ends on their biomarker v alues. Missp ecification of the censoring distribution b y using Kaplan-Meier will then bias error estimation. Kv amme and Borgan 31 discuss this issue in more depth and ha ve sho wn that ignoring informative administrativ e censoring may result in ov er-optimistic BS. As a result, w e ma y giv e more w eight to a model’s predictions, thinking it performs b etter than it actually do es. Therefore, w e perform a sim ulation study to examine the impact of informative censoring on the performance of SL and its estimates of the three predictiv e accuracy metrics. 13 Figure 1: Metric estimates of the ensem ble SL (eSL) and discrete SL (dSL) for predicting the comp osite endpoint in the Global PBC Data. Mo dels M 1 to M 9 are denoted by 1 to 9. The blue dashed line serves as reference p oin t for the p erformance of the eSL. The textb oxes in each graph denote the p erformance of a Kaplan-Meier (KM) mo del that con tained no cov ariates. 4.1 Sim ulation Design W e designed a simulation study to examine the impact of informative censoring on the p erformance of SL in the setting of longitudinal and time-to-ev ent data. W e tried to emulate a setting where there is no single true mo del in the following w ay . F or the longitudinal pro cess, we generated data from one pro cess: a mixed mo del with a linear time trend and parameters based on the Global PBC dataset with measurement times taken from a uniform distribution. F or the surviv al pro cess, we generate data from tw o different mo dels. The first mo del imitates a one-stage landmark pro cess. At eac h measurement time, we generate even t times based on the latest recorded longitudinal v alue and baseline v ariables using the metho d from Bender et al. 32 . If the even t time is shorter than the next measurement time, we assume the individual exp erienced the even t and terminate the collection of additional longitudinal measurements. If the ev ent time exceeds the next measurement time, we simulate a new even t time based on the subsequent longitudinal outcome. In the second setting, we generate data from a join t model process using in verse transform sampling. Sp ecifically , for each p erson, we estimate the hazard function in contin uous time based on the longitudinal pro cess and baseline cov ariates. Then by integrating out ov er time, w e sim ulate even t times for each individual by deriving at whic h point in time the cum ulative hazard equals their dra w from a standard uniform distribution. 14 After generating longitudinal and even t data from the tw o pro cesses ( n = 625), we censor under three different scenarios. In Scenario 1, we use the data as is - i.e., no censoring. In Scenario 2, we generate censoring times from a uniform distribution and censor those whose censoring time is low er than their even t time to mo del random censoring. In Scenario 3, we assume an informativ e censoring process. At eac h measurement time point, w e estimate the probability of b eing censored from a logistic distribution based on the current longitudinal v alue and baseline co v ariates. T o ensure a fair comparison, we con trolled the num b er of individuals censored to b e similar within the prediction window betw een Scenario 2 and 3. Finally , we randomly sampled 125 individuals from the total dataset to form the independent test set, lea ving 500 observ ations for fitting SL. W e ran the simulation 100 times. F or eac h scenario, w e used 5-fold cross-v alidation to train SL with three base learners: the true one-stage landmark mo del and the true joint mo del, describ ed ab ov e, and a t wo-stage landmark mo del. F or all base learners, we employ ed the same longitudinal model. After estimating the weigh ts for the eSL, w e compared the performance of the eSL with the p erformance of the dSL and the OM - i.e., the model that p erformed b est in the indep endent test set. In all scenarios, w e estimated IPCW using Kaplan-Meier for the censoring distribution. 4.2 Results Figure 2 presents the results of the estimates of the v arious metrics in the 100 simulated training and test datasets of the three scenarios. T able 2 sho ws the a verage SL weigh ts and standard deviation for the differen t loss functions under the three scenarios. F or the tv -AUC and BS, w e find the exp ected result that the estimated performance under Scenario 2 is close to the true performance in Scenario 1 in both the training and test set. Comparing Scenario 1 and Scenario 3, we find that the BS increases and tv -AUC decreases in Scenario 3. That is to sa y , inappropriately accoun ting for informativ e censoring leads to a decrease in predictive accuracy , according to the BS and tv -A UC. These results differ from those shown of Kv amme and Borgan. 31 A p ossible explanation for the different findings is a dissimilarity in censoring mec hanisms. F or example, had w e imposed stronger informative censoring in Scenario 3, more individuals w ould b e censored before the prediction windo w. Consequently , there w ould be few er ev ents in the prediction window, ensuring that the BS in Scenario 3 is lo wer than in Scenario 2. Lo oking at the distribution of the weigh ts for the BS and tv -AUC, the tw o true mo dels receive the most weigh t. Ho wev er, joint mo dels tend to receiv e the larger w eight. Joint models might dominate the weigh ts for t wo reasons. First, they are more efficien t than the landmarking metho d, as joint mo dels can use all the longitudinal data up to the prediction window versus just the last observ ed v alue. Second, there is a loss of information in the landmark mo del as it cannot mo del the time gap b etw een the last measurement time and the start of the prediction windo w. W e find different results for the IBS. First, compared to the BS and tv -AUC the w eigh ts for the IBS seem to shift more to ward the true landmark mo del than the joint mo del, indicating that the landmark model has low er estimated error 15 Figure 2: Metric estimates of the ensemble SL (eSL) and discrete SL (dSL) and oracle model (OM) under no censoring (Scenario 1), random censoring (Scenario 2) and informative censoring (Scenario 3) ov er 100 simulated datasets. The b o x plots under T rain Error denote the metric v alues in the dataset the mo del was trained on, the b ox plots under T est Error show the metric v alues in the holdout data set. Scenario 1 Scenario 2 Scenario 3 Mo del BS IBS tv -A UC BS IBS tv -AUC BS IBS tv -A UC 1 0.30 (0.32) 0.47 (0.28) 0.34 (0.20) 0.23 (0.29) 0.35 (0.31) 0.37 (0.23) 0.31 (0.33) 0.56 (0.30) 0.36 (0.24) 2 0.00 (0.00) 0.04 (0.05) 0.01 (0.04) 0.01 (0.02) 0.04 (0.06) 0.01 (0.05) 0.00 (0.01) 0.02 (0.04) 0.01 (0.05) 3 0.79 (0.32) 0.50 (0.28) 0.64 (0.20) 0.76 (0.29) 0.61 (0.32) 0.62 (0.23) 0.69 (0.33) 0.42 (0.29) 0.63 (0.23) T able 2: Av erage mo del weigh t and its standard deviation in brac kets under no censoring (Scenario 1), random censoring (Scenario 2) and informativ e censoring (Scenario 3). Mo del 1 is the true one-stage landmark mo del, mo del 2 is a t wo- stage landmark mo del and mo del 3 is the true joint mo del. at the halfwa y p oin t of the prediction window. How ever, a more important difference is that the IBS under Scenario 2 do es not seem to differ from the error in Scenario 3. W e h yp othesise that this happens b ecause we estimate the censoring distribution at t wo time points b y approximating the IBS through Simpson’s rule. The differences betw een the tw o censoring mechanisms might only b egin to sho w more prominently at later time p oin ts, whic h is w eighed do wn in Simpson’s rule. F rom this p ersp ectiv e, the IBS gives biased results to wards the impact of censoring on the mo del’s p erformance, ignoring informative censoring. In vestigating the performance of SL in the different scenarios, we see that the p erformance of the dSL and eSL remains close to that of the OM for the different predictive accuracy metrics. These results indicate that the oracle prop ert y of SL also holds under non-random censoring mechanisms for dynamic predictions. 16 Finally , we chec ked con vergence for the eSL due to the issues we encountered for the tv -AUC in the Global PBC Data. Conv ergence was a greater issue for the tv -AUC, where SL did not prop erly conv erge 12 . 0% of the time o ver all scenarios. In comparison, the prop ortion of non-conv ergence for the BS and IBS were 0 . 00 and 0 . 02, respectively . These prop ortion further strengthen the imp ortance of caution when optimising the tv -A UC. 5 Discussion In the curren t article, w e further extended the Sup er Learner framework in the longitudinal and time-to-ev ent data setting. Our extensions combine the predictive p erformance of different dynamic prediction frameworks that work in con tinuous time. W e also included a new predictive accuracy metric for the framew ork, the tv -AUC. Applying our metho d to optimise predictive accuracy for patients suffering from PBC using the Global PBC data, we found that the eSL performed as w ell as or b etter than an y single model on the different predictive accuracy metrics with more flexibilit y in mo del sp ecification. In our sim ulation study , w e show ed that under informative censoring, the p erformance of the eSL came close to the performance of the OM in an independent test set. How ever, informative censoring negativ ely impacts the p erformance of mo dels, which the IBS ma y not correctly tak e in to accoun t. The SL framework presented offers unique and v aluable adv antages to the setting of longitudinal and time-to-ev ent outcomes due to the large v ariability in pro cedures and mo del sp ecification. F or future use cases of SL for dynamic predictions, we recommend researchers to create a mo del library that is tailored to their ob jective. If computational efficiency is a primary concern, then the library should consist mostly of landmark models and simple b o osting algo- rithms as these are fastest to fit. How ever, if the ob jectiv e is optimising predictive accuracy , it is w orth creating a mo del library that is as broad and diverse as p ossible while k eeping computational efficiency in mind. In the creation of the mo del library , it is also worth considering the base learners separately and as a whole. That is to say , it may b e w orth adding a relatively simple base learner that by itself do es not predict well yet predicts something else than the other base learners. F urthermore, previous research in similar research settings can be used to help guide the creation of the mo del library . F or example, v ariable imp ortance measures from previous studies can help guide cov ariate inclusion for the base learners. Nonetheless, it is imp ortan t to note that the flexibility and accuracy of SL are not without costs, as SL decreases mo del interpretation. W e caution against ov er-interpreting mo del w eights and mo del co efficients from individual base learners. F uture extensions of the framework include allowing ev en more diverse methods in the mo del library , suc h as discretised-time metho ds, and appropriately changing the loss functions to the context of comp eting risks. W e also found that informative censoring negatively impacted the predictive accuracy of SL. Therefore, another a ven ue of future research is to inv estigate how we can mitigate the impact of informative censoring on the predictive p erformance of SL. Finally , the framework we presented is fully predictive. A future direction we are interested in is coun terfactual pre- diction to answer causal questions suc h as: what would the dynamic surviv al probabilit y of a patien t with an inadequate 17 resp onse on UDCA therapy ha ve been, had they received an add-on therapy of ob etic holic acid? V an der Laan and Pol- ley 11 dev elop ed SL as a semiparametric metho d to obtain an initial estimate of the probabilit y distribution. This initial estimate can then be up dated with extra assumptions to get a targeted causal estimate through T argeted Learning. 33 Extending T argeted Learning to the setting of longitudinal and time-to-even t data would b e a highly relev ant addition to drawing p ossible causal conclusions from observ ational longitudinal and time-to-even t data. References 1. F ontein DB et al. Dynamic prediction in breast cancer: pro ving feasibility in clinical practice using the TEAM trial. Ann Oncol 2015; 26:1254–62 2. Lafourcade A et al. F actors asso ciated with breast cancer recurrences or mortalit y and dynamic prediction of death using history of cancer recurrences: the F rench E3N cohort. BMC cancer 2018; 18:1–9 3. Mauguen A et al. Dynamic prediction of risk of death using history of cancer recurrences in joint frailty mo dels. Stat Med 2013; 32:5366–80 4. Prague M, Commenges D and Thi´ ebaut R. Dynamical mo dels of biomark ers and clinical progression for p ersonalized medicine: The HIV context. Adv Drug Deliv Rev 2013; 65:954–65 5. Sura weera D et al. T reatment of primary biliary cholangitis urso deoxyc holic acid non-resp onders: a sys- tematic review. Liver In t 2017; 37:1877–86 6. Tsiatis AA and Davidian M. Join t mo deling of longitudinal and time-to-even t data: an ov erview. Stat Sinica 2004 :809–34 7. V an Hou welingen HC. Dynamic prediction by landmarking in ev ent history analysis. Scand J Stat 2007; 34:70–85 8. Lee C, Y o on J and V an Der Schaar M. Dynamic-deephit: A deep learning approach for dynamic surviv al analysis with comp eting risks based on longitudinal data. IEEE T rans Biomed Eng 2019; 67:122–33 9. Wilks DS. Sampling distributions of the Brier score and Brier skill score under serial dep endence. Quart J Roy Meteorol Soc 2010; 136:2109–18 10. Benedetti R. Scoring rules for forecast v erification. Mon W eather Rev 2010; 138:203–11 11. V an der Laan MJ, Polley EC and Hubbard AE. Sup er learner. Stat Appl Genet Mol Biol 2007; 6 12. Rizopoulos D and T aylor JM. Optimizing dynamic predictions from joint mo dels using sup er learning. Stat Med 2024; 43:1315–28 18 13. T anner KT et al. Dynamic surviv al prediction combining landmarking with a machine learning ensemble: Metho dology and empirical comparison. J R Stat So c 2021; 184:3–30 14. LeDell E, Laan MJ v an der and P etersen M. A UC-maximizing ensem bles through metalearning. In t. J. Biostat. 2016; 12:203–18 15. Blanc he P et al. Quan tifying and comparing dynamic predictiv e accuracy of join t mo dels for longitudinal mark er and time-to-ev ent in presence of censoring and comp eting risks. Biometrics 2015; 71:102–13 16. Koha vi R. A study of cross-v alidation and b ootstrap for accuracy estimation and mo del selection. 1995 17. Sc ho op R, Graf E and Sch umacher M. Quantifying the predictive p erformance of prognostic mo dels for censored surviv al data with time-dep enden t cov ariates. Biometrics 2008; 64:603–10 18. A tkinson K. An in tro duction to numerical analysis. John wiley & sons, 1991 19. Co x DR. Regression models and life-tables. J R Stat So c 1972; 34:187–202 20. Breslo w NE. Analysis of surviv al data under the prop ortional hazards mo del. In t Stat Rev 1975 :45–57 21. Putter H and Houw elingen HC v an. Landmarking 2.0: bridging the gap b et ween joint mo dels and land- marking. Stat Med 2022; 41:1901–17 22. Nicolaie M et al. Dynamic prediction b y landmarking in comp eting risks. Stat Med 2013; 32:2031–47 23. V an Hou welingen H and Putter H. Dynamic prediction in clinical surviv al analysis. CR C Press, 2011 24. Natekin A and Knoll A. Gradient b oosting machines, a tutorial. F ront neurorob ot 2013; 7:21 25. Hofner B et al. Mo del-based b o osting in R: a hands-on tutorial using the R pac k age mbo ost. Comput Stat 2014; 29:3–35 26. Rizopoulos D. Joint mo dels for longitudinal and time-to-ev ent data: With applications in R. CR C press, 2012 27. Rizopoulos D. Dynamic predictions and prosp ectiv e accuracy in joint mo dels for longitudinal and time-to- ev ent data. Biometrics 2011; 67:819–29 28. Rizopoulos D, Papageorgiou G and Miranda Afonso P. JMbay es2: Extended Joint Mo dels for Longitudi- nal and Time-to-Even t Data. R pack age version 0.5-1, h ttps://github.com/drizopoulos/JMbay es2. 2024. Av ailable from: https://drizopoulos.github.io/JMbayes2/ 29. Lammers WJ et al. Lev els of alk aline phosphatase and bilirubin are surrogate end p oin ts of outcomes of patien ts with primary biliary cirrhosis: an in ternational follo w-up study. Gastroenterology 2014; 147:1338– 49 19 30. Nyblom H et al. The AST/AL T ratio as an indicator of cirrhosis in patients with PBC. Liv er Int 2006; 26:840–5 31. Kv amme H and Borgan Ø. The Brier score under administrative censoring: problems and a solution. J Mac h Learn Res 2023; 24:1–26 32. Bender R, Augustin T and Blettner M. Generating surviv al times to simulate Co x prop ortional hazards mo dels. Stat Med 2005; 24:1713–23 33. V an der Laan MJ, Rose S, et al. T argeted learning: causal inference for observ ational and exp erimen tal data. V ol. 4. Springer, 2011 20 App endix Global PBC Data W e present tw o tables and tw o pseudo-algorithms that detail the analysis of the Global PBC Data using Sup er Learner for dynamic predictions of time-to-even t outcomes. T able 3 sho ws the pack ages and their version num b er that we used for the different t yp es of base learners that w e included in the ensem ble and their h yp erparameters. T able 4 presents a summary of all included base learners in the ensemble. Finally , Algorithms 1 and 2 detail the fitting and estimation pro cess of the base learners and Sup er Learner for dynamic predictions. 21 Details Base learner P ack age V ersion Hyp erparameters Landmark: Standard Cox surviv al 3.6-4 - Landmark: Bo osted Cox m b oost 2.9-10 b stop , ν , b opt Join t Mo del JMba yes2 0.5-0 priors, MCMC samples, c hains, burn in T able 3: Details of the implementation of the differen t t yp e of base learners and their h yp erparameters. F or all b oosted mo dels, b stop w as set to 500, ν w as set to 0.025 and the early stopping criteria b opt w as determined using 5-fold cross-v alidation in the training set (see Algorithm 2). The tree-based learners in the b o osting algorithm also con tain many h yp erparameters, such as the minim um split size and the maximum depth of the trees. F or all tree-related h yp erparameters, we used the m b oost pack age defaults. F or joint mo dels, we used the JMbay es2 pac k age defaults for all priors. F or the fitting parameters, w e employ ed tw o MCMC chains with a total num b er of 5000 iterations of which 1000 were burn-in. F or more complete information on the fitting pro cess for the SL, we refer readers to Algorithm 1 and the Co de and Data Supplement. 22 Details F unctional forms Base learner T yp e Algorithm Longitudinal mo del TB ALP AST M 1 One-stage LM Standard Cox - - - - M 2 One-stage LM Boosted Cox - - - - M 3 Tw o-stage LM Standard Co x linear - - - M 4 Tw o-stage LM Bo osted Cox linear - - - M 5 Tw o-stage LM Standard Co x splines (df = 3) - - - M 6 Tw o-stage LM Bo osted Cox splines (df = 3 - - - M 7 JM - linear CV CV CV M 8 JM - splines (df = 3) CV CV CV M 9 JM - splines (df = 3) CV & S CV & A CV & A T able 4: Summary table for the 9 base learners included in the Sup er Learner ensemble. The column Type details the t yp e of base learner, Algorithm defines how the base learner is estimated, Longitudinal mo del details the time sp ecification for the base learner, and F unctional form shows the asso ciation betw een per longitudinal outcome and the hazard function. A dash implies non-applicability . The abbreviations are LM = landmark; JM = joint model; df = degrees of freedom; CV = current v alue; S = slope; A = area. 23 Algorithm 1 Pseudo-Algorithm of Sup er Leaner for Dynamic Predictions Application 1: Initialise 2: Sp ecify λ ( · ), K , { t, u } , V 3: Split D n in to D train n and D test n 4: 5: Base learners 6: Split D train n up in V folds, stratifying on ev ent indicator δ i 7: for v in 1 : V do 8: T rain M k ∈ K on D train , − v n : 9: for M k ∈ K do 10: if M k = landmark mo del then 11: Run Algorithm 2 (see b elow) 12: else if M k = joint mo del then 13: Define mixed mo del functions η im ( t ) for m = 1 , . . . , M 14: Define hazard function h i { t | H i ( t ) , w i } = h 0 ( t ) exp γ ⊤ w i + P M m =1 P Q q =1 f mq n η im ( t ) , b im , α mq o 15: Estimate η im ( t ) and h i { t | H i ( t ) , w i } simultaneously 16: end if 17: Predict π v i ( u | t, M k ) 18: end for 19: end for 20: Return ˆ π v i ( u | t, M k ) for all i in R ( t ) and all M k ∈ K 21: 22: Meta learner 23: Estimate ˆ ω k ( t ) = argmin ω k ( t ) P i ∈R ( t ) λ P K k =1 ω k ( t ) ˆ π v i ( u | t, M k ) , T i , 24: sub ject to ω k ( t ) ≥ 0 ∧ P K k =1 ω k ( t ) = 1 25: 26: Ev aluate 27: T rain all M k ∈ K on full D train n 28: Ev aluate ˜ π v j ( u | t ) = P K k =1 ˆ ω k ( t ) ˆ π v j ( u | t, M k ) on R ( t ) test = { j ∈ D test n | T j > t } 24 Algorithm 2 Fitting Pro cess for Landmark Base Learners 1: Filter R ( t ) − v = { i ∈ D train , − v n | T i > t } 2: 3: if M k = one-stage landmark mo del then 4: F or all i ∈ R ( t ) − v , f ( y im ) = y m,ip , where p = max l · 1 ( t m,il ≤ t ) for m = 1 , . . . , M 5: else if M k = tw o-stage landmark mo del then 6: Filter y i ( t ) for all i ∈ D train , − v n 7: Define and estimate mixed mo del functions η im ( t ) for m = 1 , . . . , M 8: F or all i ∈ R ( t ) − v , f ( y im ) = ˆ η im ( t ) for m = 1 , . . . , M 9: end if 10: 11: if M k = standard Cox model then 12: Fit Cox’s Prop ortional Hazards Model on R ( t ) − v , where h i ( t ) = h 0 ( t ) exp γ ⊤ w i + P M m =1 α m f ( y im ) 13: else if M k = b oosted Cox mo del then 14: Boosting 15: Create tree-based learners for all cov ariates and tw o-wa y interactions in X for i ∈ R ( t ) − v 16: Initialise prediction function ψ 0 ( X ) 17: for b in 1 : b stop do 18: Compute negative gradient v ector u b 19: Predict u b with tree-based learners 20: Select b est predicted ˆ u b according to least squares criteria 21: Up date the estimated prediction function: ˆ ψ b ( X ) = ˆ ψ b − 1 ( X ) + ν ˆ u b 22: end for 23: 24: Early stopping 25: Split R ( t ) − v in to p = 1 , . . . , 5 folds 26: for p in 1 : 5 do 27: Filter individuals from fold p from R ( t ) − v to create R ( t ) − v , − p 28: Rep eat Boosting , where R ( t ) − v = R ( t ) − v , − p 29: Estimate the negative partial log lik eliho o d of the mo del on the individuals in fold p 30: end for 31: Determine b opt = b where the av erage negative partial log likelihoo d o ver folds is low est 32: Return ˆ ψ b opt ( X ) on R ( t ) − v 33: end if 25
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment