Age-Specific Logistic Regression with Complex Event Time Data
In attempt to advance the current practice for assessing and predicting the primary ovarian insufficiency (POI) risk in female childhood cancer survivors, we propose two estimating function based approaches for age-specific logistic regression. Both …
Authors: Haoxuan, Zhou, X. Joan Hu
Age-Sp ecific Logistic Regression with Complex Ev en t Time Data Hao xuan (Charlie) Zhou , X. Joan Hu , Yi Xiong, and Y an Y uan Marc h 26, 2026 Abstract In attempt to adv ance the curren t practice for assessing and predicting the pri- mary ov arian insufficiency (POI) risk in female childhoo d cancer survivors, w e propose t wo estimating function based approac hes for age-sp ecific logistic regression. Both ap- proac hes adapt the inv erse probability of censoring w eighting (IPCW) strategy and yield consistent estimators with asymptotic normalit y . The first approac h mo difies the IPCW w eights used by Im et al. (2023) to account for doubly censoring. The second approac h extends the outcome weigh ted IPCW approac h to use the information of the sub jects censored b efore the analysis time. W e consider v ariance estimation for the es- timators and explore by sim ulation the tw o approaches implemented in the situations where the conditional righ t-censoring time distribution required in the IPCW weighs is unknown and appro ximated using the surviv al random forest approac hes, stratified empirical distribution functions, or the estimator under the Cox prop ortional hazards mo del. The n umerical studies indicate that the second approach is more efficient when righ t-censoring is relatively heavy , whereas the first approach is preferable when the righ t-censoring is ligh t. W e also observe that the p erformance of the tw o approac hes hea vily relies on the estimation of censoring distribution in our sim ulation settings. The POI data from a c hildho o d cancer survivor study are employ ed throughout the pap er for motiv ation and illustration. Our data analysis provides new insight in to understanding the POI risk among cancer survivors. keywor ds: Doubly Censored Even t Times, Estimating F unction, Estimation of Censoring Time Distribution, Inv erse Probability of Censoring W eigh ts(IPCW) 1 1 In tro duction Assessing the risk of dev eloping a disease b y a sp ecific age with a patient’s risk profile has b ecome increasingly imp ortan t in clinical decision-making (Chemaitilly et al., 2017; Mostoufi- Moab et al., 2016; W ebber et al., 2016; T ouraine et al., 2024). The age-sp ecific and patient- sp ecific risk prediction is crucial, particularly for the vulnerable p opulation of c hildho o d cancer survivors, as cancer treatmen t can subsequen tly put them at risk of adverse health outcomes. F or example, o v arian tissue is vulnerable to gonadotoxic cancer treatments suc h as p elvic radiotherap y and alkylating agen ts, which can accelerate the age-related decline of the o v arian follicle and subsequen tly result in premature ov arian insufficiency (POI)(Chemaitilly et al., 2017; Johnston and W allace, 2009; Im et al., 2023). Dev eloping a risk assessmen t to ol for this p opulation is challenging b ecause it requires collecting data from a long-term follow- up study . This pap er aims to mo del POI among female childhoo d cancer surviv ors. W e use data from the Childho o d Cancer Surviv or Study (CCSS)(Robison et al., 2002, 2009), an ongoing multi-institutional North American cohort study of 5-year surviv ors of childhoo d cancer treated betw een 1970 and 1999. The large sample size and long follo w-up of the study pro vide a unique opp ortunity to empirically assess disease risk among cancer survivors. Despite its strength, observ ational studies like CCSS often p ose c hallenges in statisti- cal analysis due to censored outcomes. Surviv al prediction mo dels accommo dating cen- soring ha ve b een extensively studied, spanning from traditional approac hes that p osit semi- parametric assumptions (e.g., Cox prop ortional hazards (PH) mo del(Co x, 1972)) to mac hine learning algorithms (e.g., random surviv al forest(Ish waran et al., 2008)) to mo del the rela- tionship of time-to-ev en t outcomes and cov ariates. The approaches describ ed th us far either treat the time-to-even t outcome as a contin uous v ariable or mo del the discrete-time hazard of the ev ent. Although the latter approac h gains flexibilit y in modelling complex co v ariate rela- tionships and outp erforms the traditional Co x PH mo del in the presence of high-dimensional co v ariates, it alw ays requires to transform and split the contin uous follow-up time into pre- sp ecified time in terv als(Suresh et al., 2022). The sp ecification of time in terv als can yield p ost-selection bias and thereby substantially influence the predictiv e performance of the mo del. In this pap er, we consider quantifying the risk of POI as the cumulativ e incidence prob- abilit y , which is the probabilit y of exp eriencing POI b y a sp ecific age. W e directly mo del it with a logistic regression mo del to circumv en t sp ecifying discrete time in terv als for the follo w-up time. The key issue in using the logistic regression model to predict cumulativ e incidence probability is to accommo date censored outcomes. An increasingly common tec h- nique is to apply the in v erse-probability-of-censoring w eighting (IPCW) to handle censored 2 outcomes. The IPCW approac h w as originally prop osed to correct for censoring, particu- larly dep endent censoring(Robins and Rotnitzky, 1992), b y creating a pseudo p opulation with more weigh t placed on sub jects who are not censored. There are t wo main streams in adjusting estimators by applying the IPCW with logistic regression mo dels. One wa y is to adjust the estimating equation with the IPCW(Zheng et al., 2006; Uno et al., 2007; Y uan et al., 2018; V o ck et al., 2016; Im et al., 2023) and the alternative w ay is to create a w eigh ted resp onse v ariable(Sc heike et al., 2008; Blanche et al., 2023). Detailed discussions on these t w o IPCW-based metho ds can b e found in Blanche et al. (2023). It is w ell ac knowledged that the v alidit y of IPCW-based metho ds relies on correct sp ecification ab out the distribution of censoring times and it is essen tial to examine the efficiency loss with differen t w ays to estimate the distribution of censoring. One c hallenge arising from the CCSS data is that the target p opulation is the cancer surviv ors. If a prediction mo del is built up on defining the time origin as the diagnosis age of cancer, it lacks interpretabilit y of risk in a given age. Therefore, we follow the prior w ork in Im et al. (2023) to use age as the time scale. Since POI is dev elop ed after cancer diagnosis, the prediction mo del needs to adjust risk sets b y correctly including those who ha ve b een diagnosed with cancer. This requires inference from doubly censored even t time, which is defined as the even t time can only b e accurately measured within a certain range (Cai and Cheng, 2004). In this pap er, we follow Betensky and Mandel (2015) to incorp orate an additional risk set indicator in the estimating functions with an age-sp ecific logistic regression mo del, and ex- tend the existing t wo IPCW metho ds to account for doubly-censored outcomes. W e provide inference pro cedures based on robust v ariance estimators with t wo IPCW-based metho ds. F urthermore, w e present sim ulation studies and theoretical justifications that explore the p er- formance of the prop osed approac hes with v arious estimated censoring distributions. The pap er is organized as follo ws. Section 2 describ es the mo del and the prop osed estimation pro cedures. In Section 3, the prop osed approac hes are applied to the CCSS data that moti- v ated this researc h. Section 4 rep orts the simulation studies which is conducted to ev aluate our findings. W e conclude in Section 5 with final remarks. 2 Age-Sp ecific Logistic Regression Analysis 2.1 Notation and Mo deling Consider a study on the risk of a particular even t. Let T b e the age at the even t, sa y , exp eriencing the aforementioned POI. Supp ose the even t of interest only takes place after 3 an initial even t, say , the cancer diagnosis, which o ccurs at age V . W e assume the ev en t age T can b e mo deled by the following age-sp ecific logistic regression model, at a presp ecified age t 0 , log P ( T ≤ t 0 | Z , T > V , V < t 0 ) 1 − P ( T ≤ t 0 | Z , T > V , V < t 0 ) = α ( t 0 ) + β ( t 0 ) T Z (1) where α ( t 0 ) represents the intercept and β ( t 0 ) is a p × 1 vector of regression co efficien ts at t 0 . This mo del differs from the original mo del considered by Im et al. (2023), log P ( T < t 0 | Z , T > V ) 1 − P ( T < t 0 | Z , T > V ) = α ( t 0 ) + β ( t 0 ) T Z , (2) unless T is contin uous and V < t 0 . Motiv ated b y the CCSS study , we aim to estimate the mo del parameters in (1) using the data describ ed b elo w. Supp ose the study sub jects are indep endent with the ages at the initial ev ent and the ev en t of in terest as V i and T i for i = 1 , . . . , n , resp ectively . The ev ent is sub ject to doubly censoring, with the left censoring at V i and the right censoring at C i , where all V i and C i are a v ailable. F or sub ject i , denote the observed age at study exit by U i = min { T i , C i } with the indicator δ i = I ( T i ≤ C i ), and the p × 1 cov ariate v ector b y Z i . W e assume that the ev en t time and the righ t censoring times are indep endent conditional on the cov ariates. The a v ailable data are O = { O i , 1 ≤ i ≤ n } with O i = ( U i , δ i , V i , Z i ) for i = 1 , . . . , n . The mo del (1) is not conv entional in surviv al analysis. How ever, it enjo ys the natural parameter in terpretation of logistic regression. Rep eating the logistic regression analysis for differen t t 0 , one can estimate the effect size on the log-o dds scale of the even t risk ov er time. It can provide a natural dynamic risk prediction. 2.2 Prop osed Estimation Pro cedures When all the even t ages T i are a v ailable at t 0 > V i , one may estimate the mo del parameters α ( t 0 ) and β ( t 0 ) using the standard iterativ e algorithms; see the b o ok by McCullagh (2019). When T i is sub ject to censoring, the IPCW strategy has b een widely employ ed (Robins and Rotnitzky, 2006). W e assume that the supp ort of the censoring time’s conditional surviv al function G ( c | Z ) = P ( C ≥ c | Z ) contains [0 , t 0 ]. F ollo wing the IPCW approach, we prop ose t w o estimating function-based pro cedures. 4 2.2.1 Approac h A: IPCW with generalized linear mo del. The approach by Im et al in Im et al. (2023) can b e written as U ( α, β ; t 0 | G ) = n X i =1 " 1 Z i # W ∗ i ( t 0 ; G ) I ( T i < t 0 ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 , (3) where the IPCW is W ∗ i ( t 0 ; G ) = I ( U i < t 0 ) δ i G ( U i | Z i ) + I ( U i ≥ t 0 ) G ( t 0 | Z i ) , for i = 1 , . . . , n . Adapting the estimating function in (3) to account for left censoring, approac h A is given b y U A ( α, β ; t 0 | G ) = n X i =1 I ( t 0 ≥ V i ) " 1 Z i # W i ( t 0 ; G ) I ( T i ≤ t 0 ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 , (4) where the IPCW is W i ( t 0 ; G ) = I ( U i ≤ t 0 ) δ i G ( U i | Z i ) + I ( U i > t 0 ) G ( t 0 | Z i ) . (5) When the selected analysis time t 0 is after all the initial ev ents of the study sub jects, the a v ailable data reduce to right censored even t times and the prop osed estimating function is then n X i =1 " 1 Z i # W i ( t 0 ; G ) I ( T i ≤ t 0 ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 , whic h has b een considered in the literature (Zheng et al., 2006; Blanc he et al., 2023). W e note that the comp onent I ( t 0 ≥ V i ) in (4) is necessary if t 0 is b efore age 21 in the aforemen tioned CCSS study , to accoun t for the left censoring in the data. Under some regularity conditions, it is straigh tforward to establish the consistency and asymptotic normality of the estimator with the estimation function in (4) at a fixed t 0 > 0, pro vided that the even t time and censoring time are indep enden t conditional on Z . 2.2.2 Approac h B: Outcome w eigh ted IPCW As guarded b y the IPCW in (5), approach A excludes all the sub jects censored b efore t 0 . Appropriately using the a v ailable information of those excluded sub jects is lik ely to yield more efficient estimation, esp ecially at t 0 where the censoring is heavy . This consideration 5 leads to approach B: U B ( α, β ; t 0 | G ) = n X i =1 I ( t 0 ≥ V i ) " 1 Z i # I ( T i ≤ t 0 ) W i ( t 0 ; G ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 . (6) The estimating function 6 is an extension of the outcome w eighted IPCW approac h presen ted in Scheik e et al. (2008) to account for left censoring. One can show that the estimating function U B ( α, β ; t 0 | G ) is unbiased if the even t time and censoring time are indep endent conditional on Z . It is also straightforw ard to establish, under some regularit y conditions, the consistency and asymptotic normality of the estimator deriv ed from the estimation function U B ( α, β ; t 0 | G ) in (6) at a fixed t 0 > 0. 2.2.3 Comparison of Approac hes A and B in asymptotic efficiency W e now compare the asymptotic efficiency of the tw o prop osed approaches at a fixed time t 0 . Let θ = ( α ( t 0 ) , β ( t 0 ) T ) T , ˆ θ A and ˆ θ B denote the estimators of approaches A and B, whic h are the solutions to U A ( θ ; t 0 | G ) = 0 and U B ( θ ; t 0 | G ) = 0, resp ectiv ely . Viewing U A ( θ ; t 0 | G ) = P n i =1 A i with i.i.d terms A i with mean zero and U B ( θ ; t 0 | G ) = P n i =1 B i with i.i.d. terms B i with mean zero, under the con ven tional regularity conditions, w e can sho w that √ n ( ˆ θ A − θ ) d − → N (0 , AV A ( θ ; G )) and √ n ( ˆ θ B − θ ) d − → N (0 , AV B ( θ ; G )) as n → ∞ , pro vided that the num b er of study sub jects with the observed ev en t n ∗ = n X i =1 I ( V i < t 0 ) δ i → ∞ as n → ∞ . The difference of the asymptotic v ariances of the t wo estimators is then AV A ( θ ; G ) − AV B ( θ ; G ) = Γ − 1 A ( θ ; G )Σ A ( θ ; G ) Γ − 1 A ( θ ; G ) T − Γ − 1 B ( θ ; G )Σ B ( θ ; G ) Γ − 1 B ( θ ; G ) T = Γ − 1 A ( θ ; G ) [Σ B ( θ ; G ) − Σ A ( θ ; G )] Γ − 1 A ( θ ; G ) T , (7) where Γ A ( θ ; G ) and Γ B ( θ ; G ) are the limits of − 1 n ∂ U A ( θ | G ) ∂ θ T and − 1 n ∂ U B ( θ | G ) ∂ θ T , and Σ A ( θ ; G ) and Σ B ( θ ; G ) are the asymptotic v ariances of 1 √ n U A ( θ | G ) and 1 √ n U B ( θ | G ). The last equation in (7) is due to Γ A ( θ ; G ) = Γ B ( θ ; G ) since E ( W i ( t 0 ; G ) | T i , Z i ) = 1. The detailed deriv ation is pro vided in Supplementary Material ?? . One ma y estimate Γ A ( θ ; G ) and Γ B ( θ ; G ) b y − 1 n ∂ U A ( θ | G ) ∂ θ T and − 1 n ∂ U B ( θ | G ) ∂ θ T , and 1 n P n i =1 A i A T i and 1 n P n i =1 B i B T i ma y b e used to estimate Σ A ( θ ; G ) and Σ B ( θ ; G ), resp ectively . The esti- mated difference of the tw o asymptotic v ariances can pro vide efficiency comparison of the t w o approac hes. 6 2.3 Practical Op eration of Approac hes A and B The conditional distribution of the censoring time is usually unkno wn in practice. T o im- plemen t the prop osed approac hes, w e need to estimate G ( ·| Z ) in (5) using the a v ailable data. Let ˜ θ A b e the parameter estimator derived from the estimating function U A ( α, β ; t 0 | ˆ G ). If ˆ G ( ·| Z ) is a uniformly consistent estimator, one ma y establish the consistency and asymptotic normalit y of ˜ θ A . Express the estimating function as U A ( α, β ; t 0 | ˆ G ) = P n i =1 ˜ A i , of whic h ˜ A i ’s are correlated through ˆ G ( ·| Z ) and E( ˜ A i | Z i , V i ) = 0. The sandwich estimator for the v ariance of ˜ θ A is d V ar ˜ θ A = ∂ U A ( θ ; t 0 | ˆ G ) ∂ θ T − 1 ˆ V U A ( θ ; t 0 | ˆ G ) " ∂ U A ( θ ; t 0 | ˆ G ) ∂ θ T − 1 # T , (8) where ˆ V U A ( θ ; t 0 | ˆ G ) = P n i =1 ˜ A i − ¯ ˜ A ˜ A i − ¯ ˜ A T with ¯ ˜ A = 1 n P n i =1 ˜ A i . Similarly , the asymptotic prop erties of the estimator ˜ θ B from the estimating function U B ( θ ; t 0 | ˆ G ) can b e derived. How ever, there app ear to b e no conv enient wa ys to implement the sandwich estimator for the v ariance of ˜ θ B in this form ∂ U B ( θ ; t 0 | ˆ G ) ∂ θ T − 1 ˆ V U B ( θ ; t 0 | ˆ G ) " ∂ U B ( θ ; t 0 | ˆ G ) ∂ θ T − 1 # , where the middle term ˆ V U B ( θ ; t 0 | ˆ G ) is deriv ed from V ar U B ( θ ; t 0 | ˆ G ) . W rite U B ( α, β ; t 0 | ˆ G ) = P n i =1 ˜ B i , of which ˜ B i ’s are correlated through ˆ G ( ·| Z ) but E( ˜ B i | Z i , V i ) = 0 in general. As V ar U B ( θ ; t 0 | ˆ G ) = E V ar n X i =1 ˜ B i | Z i , V i + V ar E n X i =1 ˜ B i | Z i , V i , the first term on the right-hand side can b e estimated b y ˆ V app U B ( θ ; t 0 | ˆ G ) = P n i =1 ˜ B i − ¯ ˜ B ˜ B i − ¯ ˜ B T with ¯ ˜ B = 1 n P n i =1 ˜ B i . The second term results from using ˆ G ( ·| Z ) in the IPCW and is generally p ositiv e. It is hard to estimate, esp ecially when ˆ G ( ·| Z ) is not a parametric estimator. In fact, since E( ˜ B i | Z i , V i ) = I ( t 0 ≥ V i ) " 1 Z i # E n I ( T i ≤ t 0 ) G ( T i | Z i ) ˆ G ( T i | Z i ) − 1 U i , Z i , V i o , (9) its v ariance estimation dep ends hea vily on the estimator ˆ G ( ·| Z ). One may emplo y a resam- pling v ariance estimation pro cedure to calculate the v ariance estimate in practice. Alter- 7 nativ ely , one may ignore that second term and use the corresp onding form of the v ariance estimator for ˜ θ A to estimate the v ariance of ˜ θ B as d V ar ˜ θ B = ∂ U B ( θ ; t 0 | ˆ G ) ∂ θ − 1 ˆ V app U B ( θ ; t 0 | ˆ G ) ∂ U B ( θ ; t 0 | ˆ G ) ∂ θ − 1 . (10) This v ariance estimator underestimates ˜ θ B ’s v ariance. When ˆ G ( ·| Z ) is sufficien tly close to G ( ·| Z ), the resulting bias from the v ariance underestimation ma y b e acceptable. The sand- wic h estimators d V ar ˜ θ A and d V ar ˜ θ B in (8) and (10) are used to obtain the sandwich standard error estimates in the real data analysis reported in Section 3. W e found that these estimated standard errors works well in practice since they were close to the b o otstrap standard errors. Also, the b o otstrap standard errors w ork w ell when the n umber of b o ot- strapping is sufficiently large (Figure ?? in Supplemen tary Material). A detailed discussion is presented in Section 3. W e apply the surviv al random forest (SRF), a machine learning tec hnique, to estimate G ( ·| Z ) for implementing approaches A and B. Section 3 show cases ho w to apply approac hes A and B using real data. Extensiv e simulations in Section 4 demonstrate the effectiveness of estimating G ( ·| Z ) using the SRF with appropriately chosen hyperparameters, as well as three alternativ e estimators for the censoring distribution: (i) a stratified empirical cumu- lativ e distribution function (ECDF), where the contin uous cov ariate Z 1 is partitioned in to quartiles ([0 , 0 . 25), [0 . 25 , 0 . 5), [0 . 5 , 0 . 75), and [0 . 75 , 1]); (ii) a Co x PH mo del for the censoring time C (referred to as standard Cox mo del in this pap er), and (iii) ) a Cox PH mo del for the gap time C ∗ = C − ( V + 5) (referred to as gap-time Cox mo del in this pap er). This latter form ulation accounts for the CCSS cohort structure, in whic h all sub jects survived at least fiv e y ears after cancer diagnosis. The SRF metho d results in satisfactory p erformance for b oth approaches A and B in simulation studies. 3 Analysis of Premature Ov arian Insufficien t (POI) Data This section rep orts the analysis of the POI data from the Childho o d Cancer Surviv or Study (CCSS) by applying the p rop osed approac hes. The real w orld data analysis guided the design of the simulation studies rep orted in Section 4. 8 3.1 Descriptiv e Statistics The CCSS is a m ulti-institutional study that enrolled childhoo d cancer survivors from 31 cen ters across North America. Eligibility criteria for the cohort included a cancer diagnosis b efore the age of 21 b etw een 1970 and 1999, and surviv al of at least five y ears p ost-diagnosis (Robison et al., 2002, 2009). F or this analysis, the study sample was further restricted to female survivors who w ere at least 18 years old at their last follo w-up and had a v ailable self-rep orted menstrual information (Im et al., 2023). The even t of in terest is POI, whic h is defined as the cessation of ov arian function b efore the age of 40. This is a particularly salien t late effect for female cancer surviv ors, with a cumulativ e incidence that far exceeds that of the general p opulation. (Mishra et al., 2017; W ebb er et al., 2016). F ollowing Im et al. (2023), we select five exp osures and tw o interaction terms as the p oten tial cov ariates. The five exposures are the rescaled age at cancer diagnosis (age divided b y 21 so its range is betw een 0 and 1, Z 1 ), race (categorized as Caucasian, African American, or Other, whic h is baseline, Z 2 and Z 3 , resp ectiv ely), and the indicators for receiving b one marro w transplan t (BMT, Z 4 ), alkylating agen ts ( Z 5 ), and radiotherap y to the ab domen, p elvis, or whole b o dy ( Z 6 ). The t w o in teractions are the rescaled age at cancer diagnosis with the indicator for BMT ( Z 1 ∗ Z 4 ) and it with the indicator for radiotherapy ( Z 1 ∗ Z 6 ). The even t POI is defined as o ccurring after the age of cancer diagnosis and prior to age 40 in the CCSS study . The righ t-censoring age is the minim um of the age at surgical premature menopause (SPM), the age at the second malignant neoplasm, and the age at last follo w-up. The censoring ages of all the study sub jects are a v ailable. Assuming data is missing at random, w e used a study group comprised of 6,961 sub jects with complete data for the analysis. T able 1 presen ts descriptiv e statistics of the six exp osure v ariables and the POI status. Among those sub jects, 11 . 47% exp erienced the POI b efore the censoring. The ma jorit y of the sub jects w ere Caucasian. More than half of them receiv ed alkylating agen ts, while 22 . 63% received radiation therap y and only 4 . 28% receiv ed BMT. 3.2 Inferen tial Analysis The study b y Im et al. (2023) fo cused on estimating the probability of POI b et w een ages 21 and 40 conditional on co v ariates, referred to as ‘Approac h b y Im et al’ b elow. Replacing I ( T i < t 0 ), their resp onse v ariable at the analysis time t 0 , b y I ( T i ≤ t 0 ), the resp onse v ariable in our analysis, approach by Im et al is then the same as the prop osed approac h A without accoun ting for the left censoring due to the cancer diagnosis, an IPCW based GLM (the generalized linear mo del) approach. W e conduct an analysis at ages from 17 to 40 by applying the three approaches, approac h by Im et al and approaches A and B. The G ( ·| Z ) in 9 T able 1: Descriptive statistics of the POI data used in the analysis Num b er of Sub jects (total n = 6961) Status of POI Developmen t Y es 798 No 6163 Age at Cancer Diagnosis (21 × Z 1 ) Median (Q1, Q3) 7.43 (3.27, 13.86) Race (vs Caucasian) African American ( Z 2 =1) 389 Other ( Z 3 =1) 469 Caucasian ( Z 2 = Z 3 =0) 6103 Ha ving Receiv ed BMT Y es ( Z 4 =1) 298 No ( Z 4 =0) 6663 Ha ving Receiv ed Alkylating agents Y es ( Z 5 =1) 3516 No ( Z 5 =0) 3445 Ha ving Receiv ed Radiotherap y Y es ( Z 6 =1) 1575 No ( Z 6 =0) 5386 the IPCW estimated by the SRF, where b oth the num b er of trees and the minimal no de size to split at (referred to as no de size in this pap er) are set to 100. All analyses are conducted in R. The R pac k ages ranger and survival were emplo y ed to estimate ˆ G ( · | Z ). The estimated age-specific intercept and regression co efficients b etw een age 21 and 40 are presented in Figure 1a. The estimates asso ciated with approach b y Im et al are different from the ones using Approac hes A and B b efore age 21. The difference is apparent in the estimates for the co efficients of Z 1 , the rescaled age at cancer diagnosis. It v erifies that the estimates of approac h by Im et al b efore t 0 = 21 are biased since they do not adjust for the at risk set. When t 0 ≥ 21, approac h by Im et al is equiv alent to approac h A and thus yields the same estimates. The estimates by approaches A and B are similar except for the co efficien ts of Z 4 (BMT) and the interactions Z 1 ∗ Z 4 , of which the estimates by approac h A fluctuate after age 30. It is lik ely b ecause that only 4.28% of the study sub jects received BMT ov erall. The estimated standard errors (SEs) using the sandwich v ariance estimators (8) and (10) are presented in Figure 1b. In general, SEs asso ciated with Approac h B are smaller than the ones asso ciated with Approach A, particularly after age 30. The gaps increase as the t 0 increases. The magnitudes of the estimated SEs asso ciated with the estimated co efficien ts of Z 4 and interaction Z 1 ∗ Z 4 are relatively larger compared to the other estimated SEs. It reflects again little information a v ailable in the data on the effect of Z 4 . W e constructed approximate 95% confidence interv als (CIs) for all the mo del parameters of Approaches A and B (Figure ?? in Supplementary Material). W e observ e the cov ariate effects on POI risk as follo ws: The y ounger the cancer diagnosis age ( Z 1 ), the low er the risk. 10 (a) Estimated Co efficients (b) Sandwich Standard Error Estimates Figure 1: POI data analysis outcomes b y three approac hes aided b y SRF esti- mate ˆ G ( ·| Z ) with the n umber of trees and the no de size in SRF are set to 100. The sandwich standard errors are calculated using Equation 8 and 10, recep- tiv ely . Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-African American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy 11 But the age at cancer diagnosis effect diminished as t 0 increases. Non-Caucasian sub jects consisten tly show a higher risk than Caucasian sub jects. Ho wev er, these effects were not statistically significan t at most ages. The estimated effect of BMT is influenced b y the approac h used. In Approach B, sub jects exp osed to BMT ( Z 4 ) hav e a significantly higher risk than those not exp osed . The age at cancer diagnosis mo difies the effect BMT, the BMT exp osure effect increases as the age at cancer diagnosis increases. The BMT exp osure effect on the POI risk increases as attained age ( t 0 ) increases to 25 and then remains at the same lev el. The alkylating agen ts ( Z 5 ) significan tly increase POI risk. Its effect shows an increasing trend o ver t 0 . The radiation exp osure ( Z 6 ) increases the POI risk, sho wing an increasing trend ov er t 0 . The cancer diagnosis age do es not mo dify the effect of radiation exp osure ( Z 1 ∗ Z 6 ). W e estimated the conditional distribution of censoring time G ( ·| Z ) using Figure 2: The estimated co efficien ts were obtained using three approac hes. The LOESS metho d was applied when span v alue is set to 0.5. The ˆ G ( ·| Z ) is obtained using SRF. The n um b er of trees and the no de size in SRF are set to 100. Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-African American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy SRF, the stratified ECDF, the standard Cox mo del, and the gap-time Cox mo del. These estimates are presen ted in Figures ?? and ?? (Supplementary Material). W e observe that the estimation metho d for the censoring distribution has a substantial impact on the co efficient 12 estimates. On the other hand, its effect on standard error estimation is negligible across all approac hes (Figures ?? and ?? , Supplemen tary Material). In addition, lo cally estimated scatterplot smoothing (LOESS) w as applied to each col- lection of the estimated co efficien ts using the three approaches. The LOESS curves with the span v alue at 0 . 5 are presented in Figure 2. It sho ws the trends of the cov ariate effects ov er t 0 . The trends by Approaches A and B are similar, except for those associated with Z 4 (BMT receipt) and its in teraction with Z 1 (rescaled cancer diagnosis age). A more comprehensive comparison, with the span v alues at 0 . 3 , 0 . 5 and 0 . 8, is presented in Figure ?? , and the same conclusion can b e drawn. 4 Sim ulation W e conducted three sim ulation studies to examine the consistency , efficiency , and robustness of the prop osed estimators with four different metho ds to estimate the censoring distribu- tion: the standard Cox mo del, the gap-time Co x mo del, the stratified ECDF, and SRF. W e also rep ort the estimated result of using the true CDF of the censoring time, G ( ·| Z ), as a b enc hmark. Across the sim ulation studies, each dataset consisted of n = 7000 sub jects and included t w o cov ariates: a con tin uous v ariable Z 1 and a binary v ariable Z 2 . All results are based on 1000 replications. R was used to conduct the simulation. 4.1 Data Generation The main difference among the three sim ulation studies is in the generation of the ev en t time T | Z , T > V . The data generation pro cedure for sub ject i in Sim ulations 1 and 3 is rep orted in Algorithm 1. In Sim ulation 2, an in termediate step, Step 1b , is introduced after Step 1. In this step, sub jects are stratified into tw o groups based on V i . The ev ent indicator I ( T i | Z i , T i > V i ) is then sim ulated from differen t logistic distributions dep ending on V i . In all the simulation studies, the time unit s is set to 1 / 12. The con tin uous cov ariate Z 1 w as sampled from a Beta( a 1 , a 2 ) distribution, with the age at diagnosis defined as V = 21 × Z 1 . W e sp ecified ( a 1 , a 2 ) = (0 . 94 , 1 . 06) for Sim ulations 1 and 3, and a 1 = a 2 = 2 for Sim ulation 2. The binary cov ariate Z 2 follo w ed a Bernoulli(0 . 40) distribution. The ev en t time T | Z , T > V , V < t 0 w as only generated for sub jects whose I ( T i | Z i , T i > V i ) = 1 at eac h given t 0 from its assumed distribution. T o mimic the POI data where sub jects surviv e at least 5 years p ost-diagnosis, the censoring time was defined as C = C ∗ + V + 5. The comp onen t C ∗ w as generated from a W eibull distribution with shap e parameter ψ 3 i 13 Algorithm 1 Sim ulation Pro cess for rep etition k , where 1 ≤ k ≤ K F or sub ject i = 1 , . . . , n do : 1. Generate Z 1 i ∼ Beta( a 1 , a 2 ) and Z 2 i ∼ Bern( p ) 2. Calculate V i = 21 × Z 1 i 3. Generate Censoring Time: • Sample C ∗ i ∼ W eibull( ψ 3 i , ψ 4 i ) • Set C i = C ∗ i + V i + 5 F or sub ject i = 1 , . . . , n , let Y i ( t 0 ) : = I ( T i | Z i , T i > V i ), and for each t 0 do : 4. Determine Even t at t 0 : Calculate π i ( t 0 ) = P ( T i ≤ t 0 | Z i , T i > V i , V i < t 0 ) Sample Y i ( t 0 ) ∼ Bern( π i ( t 0 )) 5. If Y i ( t 0 ) = 1 then : • Set q ← 1 • While Y i ( t 0 − ( q − 1) s ) == 1 do : – Calculate π i ( t 0 − q s ) = P ( T i ≤ t 0 − q s | Z i , T i > V i , V i < t 0 − q s ) – Chec k Threshold: If π i ( t 0 − q s ) < 0 . 005 then Set Y i ( t 0 − q s ) = 0 and break – Calculate ρ cond = π i ( t 0 − q s ) π i ( t 0 − ( q − 1) s ) – Sample Y i ( t 0 − q s ) ∼ Bern( ρ cond ) – If Y i ( t 0 − q s ) = 1, set q ← q + 1. Else break • Set T i = t 0 − ( q − 1) s 6. Let δ i = I ( T i ≤ C i ) 7. Store generated data O i ( t 0 ) ( k ) = { Y i ( t 0 ) , δ i , V i , C i , Z i } ∪ { T i : if Y i ( t 0 ) = 1 } and scale parameter ψ 4 i . These parameters v aried by scenario: ψ 3 i = 3 . 34 − 0 . 10 × Z 2 i and ψ 4 i = 21 . 00 − 2 . 00 × Z 2 i in Sim ulation 1.1; ψ 3 i = 6 . 00 − 1 . 00 × Z 2 i and ψ 4 i = 31 . 00 − 2 . 00 × Z 2 i in Simulation 1.2; ψ 3 i = 3 . 34 − 0 . 10 × Z 2 i and ψ 4 i = 22 . 00 − 2 . 00 × Z 2 i in Simulation 2; and ψ 3 i = 3 . 34 − 2 . 00 × Z 2 i and ψ 4 i = 20 . 00 in Simula tion 3. Descriptive statistics for the sim ulated studies are summarized in T able 2. 4.2 Sim ulation Outcomes W e assessed the consistency and efficiency of the prop osed approac hes using four metrics: the sample mean of the estimates (SMEAN), the sample standard deviation of the estimates (SSD), the sample mean of the estimated standard errors (SMESE), and the ro ot sample mean squared error of the estimates (RSMSE) in Sim ulations 1 and 2. W e assessed the robustness of the prop osed approac hes based on a comparison of the estimated conditional 14 T able 2: The descriptiv e statistics of censoring rate and age at cancer diag- nosis in differen t simulation studies. Censoring Rate Age at Cancer Diagnosis ( V ) Cleaned Data Sim 1.1 Sim 1.2 Sim 2 Sim 3 Cleaned Data Sim 1 and 3 Sim 2 Age 13 0.10% - - 0.11% - Min 0.00 0.00 0.13 Age 14 0.14% - - 0.20% - Q1 3.27 4.56 6.86 Age 15 0.22% - - 0.33% 0.94% Median 7.43 9.62 10.50 Age 20 - - - 2.29% 4.87% Q3 13.86 15.01 14.15 Age 21 5.57% 8.21% 0.35% - - Max 21.00 20.99 20.87 Age 25 - - - 7.39% 14.38% Mean 8.59 9.87 10.50 Age 30 31.38% 35.86% 3.41% 14.96% 29.94% SD 5.95 6.05 4.69 Age 35 50.51% 55.00% 6.83% 22.12% 48.58% Age 40 66.46% 71.12% 10.58% 26.55% 65.82% surviv al probabilities with their true v alues in Simulations 2 and 3. 4.2.1 Sim ulation 1: Examining consistency and efficiency W e consider t w o differen t censoring rates. Simulation 1.1: Heavy Censoring W e generated data with censoring rate comparable to the aforemen tioned real data. Sp ecifically , the data w ere generated from a logistic regression. log P ( T ≤ t | Z , T > V ) 1 − P ( T ≤ t | Z , T > V ) = α 1 ( t ) + β 1 ( t ) Z 1 + β 2 ( t ) Z 2 , (11) where the in tercept α 0 ( t ) = γ 0 + γ 1 × t , with γ 0 = − 7 . 5 and γ 1 = 0 . 23. The regression co efficien ts β ( t ) = { β 1 ( t ) , β 2 ( t ) } T are held constant o v er time with β 1 ( t ) = β 1 = − 4 . 83 and β 2 ( t ) = β 2 = − 1 . 00. All the estimates from the three approaches where ev aluated at four distinct time p oints: 21, 30, 35, and 40 using the sim ulated data. The three hyperparameters of SRF (the n umber of candidate v ariables drawn in each split; the no de size; the n um b er of trees) are set to 2, 200, and 100, resp ectiv ely . W e c ho ose the combination of h yp erparameters based on the low est v alue of RSMSE. The estimated co efficien ts by Approach B alw a ys hav e smaller RSMSEs than those b y Approach A at ages 35 and 40. F rom T able 3 and Figure ?? (Supplemen tary Material), w e found that consistency w as held when censoring distribution w as estimated by metho ds other than directly using standard Cox mo del. The standard Cox mo del yielded biased estimates for b oth β 1 ( t ) and 15 T able 3: The result of sim ulation 1.1. W e rep ort the sample mean of the estimates (SMEAN), the sample standard deviation of the estimates (SSD), the sample mean of the estimated standard errors (SMESE), and the ro ot sample mean squared error of the estimates (RSMSE) of eac h approac h. In SRF, the n um b er of tree is set to 100 and the no de size is set to 200. α (21) = − 2 . 670 β 1 (21) = − 4 . 830 β 2 (21) = − 1 . 000 Age 21 SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN -2.666 -2.615 -2.670 -2.637 -2.668 -4.916 -5.037 -4.912 -5.013 -4.913 -1.027 -1.021 -1.026 -1.023 -1.030 SSD 0.189 0.190 0.189 0.190 0.189 0.691 0.702 0.692 0.708 0.692 0.309 0.307 0.309 0.307 0.309 SMESE 0.187 0.187 0.187 0.188 0.187 0.653 0.663 0.653 0.670 0.653 0.311 0.310 0.311 0.310 0.311 RSMSE 0.189 0.197 0.189 0.193 0.189 0.696 0.732 0.696 0.731 0.697 0.310 0.307 0.310 0.308 0.310 Approach B SMEAN -2.670 -2.730 -2.664 -2.702 -2.668 -4.908 -4.713 -4.916 -4.792 -4.913 -1.033 -1.047 -1.038 -1.033 -1.030 SSD 0.189 0.186 0.190 0.185 0.190 0.691 0.668 0.692 0.664 0.691 0.309 0.307 0.309 0.307 0.309 SMESE 0.187 0.183 0.187 0.183 0.186 0.653 0.632 0.653 0.628 0.653 0.311 0.309 0.311 0.310 0.311 RSMSE 0.189 0.195 0.190 0.188 0.189 0.695 0.678 0.697 0.665 0.695 0.311 0.310 0.311 0.309 0.310 α (30) = − 0 . 600 β 1 (30) = − 4 . 830 β 2 (30) = − 1 . 000 Age 30 SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN -0.578 -0.586 -0.593 -0.524 -0.595 -4.890 -4.819 -4.859 -5.065 -4.857 -0.993 -1.026 -1.013 -0.997 -1.004 SSD 0.123 0.121 0.125 0.117 0.127 0.327 0.317 0.331 0.325 0.333 0.193 0.188 0.195 0.180 0.195 SMESE 0.131 0.126 0.129 0.124 0.129 0.346 0.325 0.342 0.337 0.342 0.190 0.183 0.190 0.179 0.189 RSMSE 0.125 0.122 0.126 0.139 0.127 0.332 0.317 0.333 0.401 0.334 0.193 0.190 0.195 0.180 0.195 Approach B SMEAN -0.616 -0.805 -0.594 -0.742 -0.600 -4.813 -4.366 -4.848 -4.420 -4.845 -1.023 -1.016 -1.017 -1.020 -1.006 SSD 0.121 0.112 0.122 0.108 0.125 0.322 0.296 0.326 0.271 0.331 0.193 0.184 0.195 0.180 0.196 SMESE 0.130 0.119 0.129 0.115 0.129 0.341 0.308 0.340 0.283 0.340 0.188 0.178 0.188 0.178 0.188 RSMSE 0.122 0.234 0.122 0.178 0.125 0.322 0.551 0.326 0.491 0.331 0.194 0.185 0.195 0.181 0.196 α (35) = 0 . 550 β 1 (35) = − 4 . 830 β 2 (35) = − 1 . 000 Age 35 SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN 0.704 0.229 0.575 0.644 0.567 -5.175 -4.148 -4.886 -5.089 -4.886 -0.925 -1.136 -1.040 -1.003 -1.006 SSD 0.188 0.225 0.197 0.160 0.202 0.421 0.500 0.450 0.356 0.455 0.219 0.333 0.264 0.212 0.258 SMESE 0.198 0.220 0.197 0.171 0.197 0.421 0.461 0.429 0.357 0.426 0.227 0.306 0.251 0.217 0.245 RSMSE 0.243 0.392 0.198 0.186 0.202 0.544 0.846 0.453 0.440 0.459 0.231 0.360 0.267 0.212 0.258 Approach B SMEAN 0.448 0.436 0.550 0.295 0.548 -4.640 -4.679 -4.845 -4.183 -4.841 -1.063 -0.911 -1.000 -1.017 -1.011 SSD 0.159 0.209 0.181 0.132 0.193 0.351 0.520 0.417 0.270 0.432 0.212 0.296 0.246 0.205 0.247 SMESE 0.184 0.217 0.193 0.150 0.193 0.379 0.504 0.419 0.280 0.416 0.219 0.277 0.238 0.205 0.234 RSMSE 0.189 0.238 0.180 0.287 0.193 0.399 0.541 0.417 0.701 0.431 0.221 0.309 0.246 0.206 0.247 α (40) = 1 . 700 β 1 (40) = − 4 . 830 β 2 (40) = − 1 . 000 Age 40 SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue SRF Cox C Co x C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN 2.422 0.542 1.861 1.858 1.845 -6.103 -3.044 -5.147 -5.186 -5.139 -0.818 -1.312 -1.053 -0.979 -0.993 SSD 0.314 0.986 0.505 0.387 0.508 0.530 1.695 0.955 0.712 0.943 0.244 1.298 0.519 0.439 0.497 SMESE 0.328 0.610 0.429 0.409 0.430 0.540 0.955 0.742 0.693 0.740 0.263 0.671 0.411 0.416 0.399 RSMSE 0.787 1.520 0.530 0.418 0.528 1.378 2.461 1.006 0.796 0.991 0.304 1.334 0.521 0.439 0.497 Approach B SMEAN 1.140 2.433 1.675 1.289 1.678 -3.859 -5.729 -4.762 -3.928 -4.772 -1.173 -1.053 -1.011 -1.021 -1.074 SSD 0.222 0.839 0.451 0.257 0.447 0.364 1.201 0.805 0.430 0.817 0.226 0.857 0.449 0.339 0.428 SMESE 0.270 1.294 0.504 0.321 0.488 0.433 1.834 0.879 0.515 0.868 0.256 1.069 0.440 0.358 0.415 RSMSE 0.602 1.114 0.452 0.485 0.448 1.037 1.499 0.807 0.999 0.818 0.284 0.858 0.449 0.339 0.434 16 Figure 3: Differences b et ween estimated v ariances of the estimators b y Ap- proac h A and Approac h B when G ( ·| Z ) is known in Simulation 1.1. The red line in eac h plot represen ts a difference of 0. β 2 ( t ) across b oth approaches as censoring rates increased at ages 30, 35, and 40, likely due to a violation of the prop ortional hazards assumption for the censoring pro cess. The efficiency of the estimated co efficien ts is contingen t up on the censoring rate at t 0 , as w ell as the metho d emplo yed to estimate the censoring distribution (T able 3 and Figure ?? ). The SE estimation is v alid when the censoring distribution is estimated b y metho ds other than the standard Cox mo del, as the SMESE and SSD are in close agreement. While the SMESE and SSD are also similar when using the standard Co x mo del, the presence of bias in the co efficien t estimates, β 1 ( t ) and β 2 ( t ), mak es the discussion of SE v alidit y for that mo del unnecessary . Figure 3 presents the differences b etw een the sample standard deviations of the estimators b y approach A and approach B when G ( ·| Z ) is known. The red horizontal line at zero serves as a reference for equal efficiency , and p ositive v alues indicate that Approac h B is more 17 efficien t than Approach A as t 0 rises, corresp onding to a higher censoring rate. Next, we compared the RSMSE of the four metho ds used to estimate the censoring distribution in T able 3 and Figure ?? , whic h considers b oth bias and v ariance at the same time. The true CDF serv es as a benchmark for the other methods. With Approach A, the w ell-tuned SRF and the gap-time Co x mo del consistently p erform well, closely aligning with the T rue CDF at earlier ages (21, 30, and 35). The standard Cox mo del also p erforms adequately in early ages; how ever, it p erforms p o orly at age 40 across all co efficien ts, where its RSMSE spikes dramatically . This shows that the standard Co x mo del is highly unstable at later time p oints compared to the SRF and the gap-time Co x mo del, whic h is due to the failure of the prop ortional hazards assumption. When using Approach B, the SRF and gap-time Cox mo del ha ve go o d p erformance, almost p erfectly mirroring the T rue CDF for α 1 ( t ) and β 1 ( t ) across all ages. Across b oth approaches, the stratified ECDF demonstrates notably strong p erformance for Z 2 ( t ), ac hieving the low est or near-low est RMSE at most ages. How ever, its p erformance for the in tercept and Z 1 is approac h-dep enden t. Ov erall, these results suggest that the gap-time Cox mo del and the well-tuned SRF provide reliable estimation across most settings, while the standard Cox mo del should b e used with caution at later time p oints where prop ortional hazards assumptions are most likely violated. In conclusion, b oth approac hes provide a consisten t estimator when the censoring dis- tribution is estimated w ell. When the censoring rate is high, Approach B is more efficient than Approac h A when using a stratified ECDF, a well-tuned SRF, a standard Cox mo del, or a gap-time Cox mo del to estimate the censoring distribution. The results underscore the imp ortance of mo deling the censoring distribution w ell to achiev e v alid and efficien t inference when analyzing doubly censored data. Simulation 1.2: Light Censoring W e simulate the outcome data from the same mo del as in Sim ulation 1.1. W e changed the v alues of ψ 3 i and ψ 4 i to yield a muc h low er censoring rate. The simulation results are presen ted in Supplementary ?? . When the censoring distribution is estimated well, b oth approaches A and B provide consistent estimates, while Approach A w as more efficient. F urthermore, the RMSE for β 2 ( t ) is high when the censoring distribution is estimated using either the standard or the gap-time Cox mo del. This is due to the dep endence of the shap e parameter ψ 3 i (for the C ∗ distribution) on the co v ariate Z 2 , which results in a violation of the prop ortional hazards assumption ev en for the gap-time Cox mo del. Sim ulations 1.1 and 1.2 indicate that the relativ e p erformance of the t w o prop osed ap- proac hes dep ends on the censoring distribution which is consisten t with previous findings for righ t censored data (Blanche et al., 2023). 18 4.2.2 Sim ulation 2: Examining Prop osed Approac hes under a Mixture of Lo- gistic Distributions This simulation study was conducted to serv e tw o purp oses: (i) to verify the necessity of ac- coun ting for left-censoring, and (ii) to inv estigate the robustness of the prop osed approaches against outcome mo del missp ecification. The ev ent time w as generated from a mixture of t w o logistic regression models dep ending on the age of cancer diagnosis: log P ( T ≤ t | Z , T > V ) 1 − P ( T ≤ t | Z , T > V , ) = α 1 ( t ) + β 1 ( t ) Z 1 + β 21 ( t ) Z 2 , when 0 ≤ V < 16 , α 2 ( t ) + β 1 ( t ) Z 1 + β 22 ( t ) Z 2 , when 16 ≤ V ≤ 21 , (12) where α 1 ( t ) = γ 01 + γ 1 × t with γ 01 = − 6 . 3, γ 1 = 0 . 30, β 1 ( t ) = β 1 , and β 21 ( t ) = β 21 . W e set β 1 = − 0 . 36 and β 21 = 1 . 00. In addition, α 2 ( t ) = γ 02 + γ 1 × t . W e set γ 02 = − 6 . 90, β 22 ( t ) = β 22 = 1 . 60. In this sim ulation, w e had an intermediate step, denoted Step 1b , after Step 1 of algorithm 1 for the necessity of adjusting the risk set by comparing the Approac h b y Im et al and approac hes A and B at t 0 = 13, 14, and 15. When employing SRF, b oth the n umber of trees and the no de size are set to 100 in simulation 2.1, and the no de size changes to 500 in sim ulation 2.2. Simulation 2.1: Risk Set Adjustment T able ?? presen ts the results using approaches A and B and the Approac h b y Im et al. It demonstrates that both approac hes A and B yield un biased estimates. The RSMSEs for b oth approaches are consisten tly lo w and comparable across all metho ds used to estimate the censoring distribution. This is due to the v ery low censoring rates at these ages, resulting in ˆ G ( ·| Z ) / ˆ G ( · ) close to one for all sub jects. In con trast, the Approac h b y Im et al exhibits substan tial bias in the estimation of β 1 ( t 0 ), as reflected by larger deviations of SMEANs from the true v alues and higher RSMSEs. The large bias in β 1 ( t 0 ) o ccurs because the metho d includes sub jects in to the analysis b efore they are actually at risk. Since a sub ject’s cancer diagnosis age equals 21 × Z 1 , this mistak e sp ecif- ically adds to o man y p eople with high Z 1 v alues to the data. Because these wrongly included sub jects cannot yet exp erience the ev ent, the mo del in terprets the asso ciation b et w een high Z 1 v alue and the absence of even ts as evidence that Z 1 is more protectiv e than it truly is, pro ducing a substan tial negative bias in ˆ β 1 ( t 0 ). On the other hand, Z 2 is indep endent of Z 1 . Consequen tly , the estimate of β 2 remains unbiased. The intercept α 1 sho ws mo derate bias, as the misincluded sub jects contribute additional zero outcomes to the estimating equation, 19 but the effect is less pronounced than for β 1 Simulation 2.2. Robustness against mo del missp ecification When SRF, gap-time Cox mo del, or stratified ECDF is used to estimate the censoring distribution (Figure ?? , ?? , and ?? in Supplemen tary Materials), b oth approaches A and B yield surviv al probability estimates that closely align with the true surviv al probability across all time points, indicating that b oth approaches are consisten t and efficien t in this setting. When the standard Co x mo del is used to estimate the censoring distribution (Figure ?? ), Approac h B sho ws an increasing bias as t 0 increases, while Approac h A remains comparatively robust and estimates are closer to the true surviv al probabilit y . Ov erall, the results in T able ?? demonstrate that prop er risk set adjustment is essen tial in doubly censored data. Otherwise, as in the Approach b y Im et al, it may lead to sub- stan tial bias in parameter estimation. Both approac hes A and B, which incorp orate risk set adjustmen t, yield unbiased and efficien t estimates under lo w censoring rates. The choice of metho d for estimating the censoring distribution has minimal impact on estimates of regres- sion co efficients when censoring rate is lo w. Ho wev er, Figures ?? – ?? show the necessit y of correctly sp ecifying the censoring mo del in IPCW-based metho ds when the outcome mo del is mis-sp ecified. 4.2.3 Sim ulation 3: Robustness of Prop osed Approac hes when Age at POI F ollo ws a W eibull distribution Lastly , we generated even t times from a W eibull distribution using Cox PH mo del, with the hazard function taking the form λν t ν − 1 exp { β ( t ) T Z } (Bender et al., 2005; Austin, 2012). W e set λ = 4 . 50 × 10 − 9 , ν = 5 . 00, and co efficien ts β 1 ( t ) = β 1 = 2 . 00 and β 2 ( t ) = β 2 = − 0 . 30. The estimated surviv al probability using either Approach A or B was close to the true surviv al curve across all t 0 when SRF, the gap-time Co x mo del, or the stratified ECDF is used to estimate the censoring distribution (Figures ?? , ?? , and ?? in Supplementary Material). The n um b er of trees is set to 100 and the no de size is set to 500 in SRF. This result demonstrates that these three metho ds pro vide reliable ˆ G ( ·| Z ) in this con text, yielding consisten t estimators for b oth approac hes. In contrast, when the standard Cox mo del was used to estimate the censoring distribution (Figure ?? ), Approac h A show ed bias at age 40 where the censoring rate w as higher. F urthermore, the initial data-generating algorithm only generates data at each selected t 0 , where the true ev ent time is only generated for sub jects satisfying I ( T i ≤ t 0 ) = 1. T o v alidate this approac h, w e consider an alternativ e algorithm that generates T i for all sub jects, as detailed in Algorithm ?? . W e applied this alternative algorithm to replicate the scenarios 20 in Simulation 3 and computed the estimated surviv al probabilities for all metho ds across b oth approac hes. As sho wn in Figure ?? , the differences betw een the tw o algorithms are negligible. In conclusion, simulations 2.2 and 3 sho w that estimating the censoring distribution b ecomes more imp ortant when even t times follo w a non-logistic distribution. P o or estimation of the censoring distribution can easily lead to biased results. Finally , we confirm the v alidit y of our prop osed data-generating pro cess. 5 Final Remarks This pap er prop osed t wo IPCW-based estimating function approac hes for age-sp ecific logistic regression in order to handle doubly censored ev ent times. Left censoring is accommo dated b y mo difying the risk sets at analysis time t 0 , while righ t censoring is addressed using IPCW. Our primary contributions are the adaptation and ev aluation of the tw o proposed approac hes and assessing the impact of differen t censoring distribution estimation metho ds on the co efficient estimation through simulation studies. There are four ma jor findings. Firstly , prop er risk set adjustment is essential for v alid inference; approaches that fail to do so will yield biased co efficien t estimates. Secondly , when the censoring rate is high, approac h B is more efficien t than approac h A when the censoring distribution is well estimated (e.g., w ell-tuned SRF). When the censoring rate is lo w, approach A is more efficien t when the true censoring distribution is well estimated. The sup erior p erformance of approach B when censoring rate is high is due to its use of more information. This pattern suggests that relative efficiency is dynamic and dep ends on the censoring rate at the analysis age t 0 . Thirdly , non-parametric metho ds, suc h as SRF, are preferred for estimating the censoring distribution as they do not rely on the prop ortional hazards assumption. While the stratified ECDF is an alternative, it b ecomes impractical as the n um b er of con tin uous co v ariates increases and requires a sufficient sample size within eac h stratum to ensure estimation accuracy . Conv ersely , although SRF a voids these issues, it necessitates careful h yp erparameter tuning; sub optimal parameter selection can lead to biased co efficient estimates and unreliable standard errors. Lastly , b oth the co efficients and their standard error estimates are affected b y ˆ G ( ·| Z ). The sandwic h standard error estimate is reasonable when G ( · | Z ) is estimated w ell. There are several limitations in this study that present a v enues for future inv estigation. Firstly , our approac hes did not consider comp eting risks. The age at SPM is treated as part of the censoring age in the curren t study; how ever, sub jects exp eriencing SPM first cannot hav e POI afterw ards. Secondly , the estimation of the co efficien t functions, α ( t 0 ) and 21 β ( t 0 ), were p erformed in tw o steps by smo othing the p oint-wise estimated co efficients at a set of c hosen t 0 . T o address these tw o issues, future researc h could fo cus on extending these approac hes to a comp eting risks framework and dev eloping an integrated, one-step pro ce- dure for estimating the co efficien t function. One p ossible approac h would b e to incorp orate p enalized splines or kernel-based generalized estimating equation. Thirdly , exploring the ex- tension of the prop osed approac hes to a contin uous-time scale w ould b e v aluable. F ourthly , mac hine learning models such as the surviv al support vector mac hine or the sup er learner for surviv al prediction could b e used to estimate the censoring distribution. F urthermore, more in v estigation on the use of SRF is needed. Previous researc h has iden tified the imp ortance of the loss function used for tuning hyperparameters in SRF (Berko witz et al., 2025). W e did not inv estigate any other loss functions b esides RSMSE. Lastly , we did not ev aluate the p erformance of the five considered metho ds for estimating the censoring distribution in high-dimensional settings; ho wev er, the adv antages of the RSF approach are exp ected to b e more pronounced in scenarios in volving a larger n umber of cov ariates. In summary , this study pro vides a framew ork for mo deling doubly censored even t times and offers guidance on selecting the prop er approac h and metho d to estimate censoring distribution. Ac kno wledgmen ts The authors are grateful to Y unshan (Daisy) Dai for pro ofreading the manuscript. Financial disclosure This work was supp orted in part by Simon F raser Univ ersit y through Graduate F ellowships and a PhD Research Scholarship aw arded to Hao xuan (Charlie) Zhou. Researc h utilizing the Childho o d Cancer Surviv or Study data under Professor Y an Y uan is supp orted by the National Cancer Institute (U24CA55727), a resource that facilitates research on long-term c hildho o d and adolescent cancer surviv ors. Conflict of in terest The authors declare no p oten tial conflicts of interest. 22 References Austin, P . C. (2012). Generating surviv al times to simulate co x prop ortional hazards mo dels with time-v arying cov ariates. Statistics in Me dicine , 31(29):3946–3958. Bender, R., Augustin, T., and Blettner, M. (2005). Generating surviv al times to simulate co x prop ortional hazards mo dels. Statistics in Me dicine , 24(11):1713–1723. Berk o witz, M., Altman, R. M., and Loughin, T. M. (2025). T argeted tuning of random forests for quantile estimation and prediction interv als. arXiv pr eprint arXiv:2507.01430 . Betensky , R. A. and Mandel, M. (2015). Recognizing the problem of dela y ed en try in time- to-ev en t studies: b etter late than nev er for clinical neuroscien tists. A nnals of Neur olo gy , 78(6):839–844. Blanc he, P . F., Holt, A., and Scheik e, T. (2023). On logistic regression with right censored data, with or without comp eting risks, and its use for estimating treatmen t effects. Lifetime Data Analysis , 29(2):441–482. Cai, T. and Cheng, S. (2004). Semiparametric regression analysis for doubly censored data. Biometrika , 91(2):277–290. Chemaitilly , W., Li, Z., Krasin, M. J., Bro oke, R. J., Wilson, C. L., Green, D. M., Klosky , J. L., Barnes, N., Clark, K. L., F arr, J. B., et al. (2017). Premature o v arian insufficiency in c hildho o d cancer survivors: a rep ort from the st. jude lifetime cohort. The Journal of Clinic al Endo crinolo gy & Metab olism , 102(7):2242–2250. Co x, D. R. (1972). Regression models and life-tables. Journal of the R oyal Statistic al So ciety: Series B (Metho dolo gic al) , 34(2):187–202. Im, C., Lu, Z., Mostoufi-Moab, S., Delaney , A., Y u, L., Baedke, J. L., Han, Y., Sapkota, Y., Y asui, Y., Cho w, E. J., et al. (2023). Dev elopment and v alidation of age-sp ecific risk prediction mo dels for primary ov arian insufficiency in long-term survivors of c hildho o d cancer: a rep ort from the childhoo d cancer survivor study and st jude lifetime cohort. The L anc et Onc olo gy , 24(12):1434–1442. Ish w aran, H., Kogalur, U. B., Blac kstone, E. H., and Lauer, M. S. (2008). Random surviv al forests. The Annals of Applie d Statistics , 2(3):841–860. Johnston, R. J. and W allace, W. H. B. (2009). Normal o v arian function and assessment of o v arian reserv e in the surviv or of childhoo d cancer. Pe diatric blo o d & c anc er , 53(2):296– 302. 23 McCullagh, P . (2019). Gener alize d Line ar Mo dels / by P. McCul lagh. T aylor and F rancis, an imprint of Routledge, second edition. Mishra, G. D., Pandey a, N., Dobson, A. J., Chung, H.-F., Anderson, D., Kuh, D., Sandin, S., Giles, G. G., Bruinsma, F., Hay ashi, K., et al. (2017). Early menarc he, n ulliparity and the risk for premature and early natural menopause. Human R epr o duction , 32(3):679– 686. Mostoufi-Moab, S., Seidel, K., Leisenring, W. M., Armstrong, G. T., Oeffinger, K. C., Sto- v all, M., Meac ham, L. R., Green, D. M., W eathers, R., Ginsb erg, J. P ., et al. (2016). Endo crine abnormalities in aging surviv ors of c hildho o d cancer: a rep ort from the c hild- ho o d cancer survivor study . Journal of Clinic al Onc olo gy , 34(27):3240–3247. Robins, J. and Rotnitzky , A. (2006). In verse probability w eighting in surviv al analysis. Survival and event history analysis. Chichester. UK: Wiley , pages 266–71. Robins, J. M. and Rotnitzky , A. (1992). Recov ery of information and adjustment for de- p enden t censoring using surrogate mark ers. In AIDS Epidemiolo gy , pages 297–331. Springer. Robison, L. L., Armstrong, G. T., Boice, J. D., Chow, E. J., Da vies, S. M., Donaldson, S. S., Green, D. M., Hammond, S., Meadows, A. T., Mertens, A. C., et al. (2009). The childhoo d cancer survivor study: a national cancer institute–supp orted resource for outcome and in terv ention research. Journal of Clinic al Onc olo gy , 27(14):2308–2318. Robison, L. L., Mertens, A. C., Boice, J. D., Breslo w, N. E., Donaldson, S. S., Green, D. M., Li, F. P ., Meado ws, A. T., Mulvihill, J. J., Neglia, J. P ., et al. (2002). Study design and cohort characteristics of the c hildho o d cancer survivor study: a multi-institutional collab orativ e pro ject. Me dic al and Pe diatric Onc olo gy , 38(4):229–239. Sc heik e, T. H., Zhang, M.-J., and Gerds, T. A. (2008). Predicting cumulativ e incidence probabilit y b y direct binomial regression. Biometrika , 95(1):205–220. Suresh, K., Severn, C., and Ghosh, D. (2022). Surviv al prediction mo dels: an introduction to discrete-time mo deling. BMC Me dic al R ese ar ch Metho dolo gy , 22(1):207. T ouraine, P ., Chabb ert-Buffet, N., Plu-Bureau, G., Duranteau, L., Sinclair, A. H., and T uck er, E. J. (2024). Premature ov arian insufficiency . Natur e R eviews Dise ase Primers , 10(1):63. 24 Uno, H., Cai, T., Tian, L., and W ei, L.-J. (2007). Ev aluating prediction rules for t-year sur- viv ors with censored regression models. Journal of the Americ an Statistic al Asso ciation , 102(478):527–537. V o ck, D. M., W olfson, J., Bandy opadhy ay , S., Adomavicius, G., Johnson, P . E., V azquez- Benitez, G., and O’Connor, P . J. (2016). Adapting mac hine learning techniques to censored time-to-ev en t health record data: A general-purp ose approach using inv erse probabilit y of censoring weigh ting. Journal of Biome dic al Informatics , 61:119–131. W ebb er, L., Da vies, M., Anderson, R., Bartlett, J., Braat, D., Cartwrigh t, B., Cifko v a, R., de Muinck Keizer-Sc hrama, S., Hogerv orst, E., Janse, F., Liao, L., Vlaisavljevic, V., Zillik ens, C., and V ermeulen, N. (2016). Eshre guideline: managemen t of w omen with premature ov arian insufficiency . Human r epr o duction (Oxfor d) , 31(5):926–937. Y uan, Y., Zhou, Q. M., Li, B., Cai, H., Chow, E. J., and Armstrong, G. T. (2018). A threshold-free summary index of prediction accuracy for censored time to ev en t data. Statistics in Me dicine , 37(10):1671–1681. Zheng, Y., Cai, T., and F eng, Z. (2006). Application of the time-dep endent ro c curves for prognostic accuracy with multiple biomark ers. Biometrics , 62(1):279–287. 25 Supplemen tary Material : Age-Sp ecific Logistic Regression with Complex Ev ent Times Hao xuan Charlie Zhou, X.Joan Hu, Yi Xiong, Y an Y uan Section S1 presen ts the deriv ation of the difference of the asymptotic v ariances of the tw o estimators when the conditional surviv al function of the censoring time G ( · | Z ) is kno wn. The additional real data analysis results are presented in Section S2. The additional simulation results of Simulations 1, 2, and 3 are rep orted in Section S3, S4, and S5, respectively . S1 Deriv ation of difference in asymptotic v ariance b et ween the t w o approac hes Giv en G ( · | Z ) is kno wn, we first deriv e the asymptotic v ariances of the estimators obtained using approac hes A and B. W e then sho w the difference betw een these t wo asymptotic v ariances in Equation S6. ∆( θ ; t 0 | G ) = U A ( α, β ; t 0 | G ) − U B ( α, β ; t 0 | G ) = n X i =1 I ( t 0 ≥ V i ) " 1 Z i # (1 − W i ( t 0 ; G )) exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 (S1) W e first show the deriv ation of Σ A ( θ ; G ), Σ B ( θ ; G ), and Σ ∆ b elo w Σ A ( θ ; G ) = V ar I ( t 0 ≥ V i ) " 1 Z i # W i ( t 0 ; G ) I ( T i ≤ t 0 ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 ! = E I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T E ( W i ( t 0 ; G ) I ( T i ≤ t 0 ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 2 | Z i , V i ) Σ B ( θ ; G ) = V ar I ( t 0 ≥ V i ) " 1 Z i # I ( T i ≤ t 0 ) W i ( t 0 ; G ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 ! = E I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T E ( I ( T i ≤ t 0 ) W i ( t 0 ; G ) − exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 2 | Z i , V i ) Σ ∆ ( θ ; G ) = V ar I ( t 0 ≥ V i ) " 1 Z i # (1 − W i ( t 0 ; G )) exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 ! = E I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 2 E (1 − W i ( t 0 ; G )) 2 | Z i , V i (S2) 1 Next, w e can sho w Γ A ( θ ; G ) and Γ B ( θ ; G ) as below 1 n ∂ U A ( θ ; t 0 | G ) ∂ θ T = 1 n − n X i =1 I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T W i ( t 0 ; G ) exp { α ( t 0 ) + β ( t 0 ) T Z i } (exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1) 2 a.s. − − → − E I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T W i ( t 0 ; G ) exp { α ( t 0 ) + β ( t 0 ) T Z i } (exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1) 2 ≜ − Γ A ( θ ; G ) (S3) 1 n ∂ U B ( θ ; t 0 | G ) ∂ θ T = 1 n − n X i =1 I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T exp { α ( t 0 ) + β ( t 0 ) T Z i } (exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1) 2 a.s. − − → − E I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T exp { α ( t 0 ) + β ( t 0 ) T Z i } (exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1) 2 ≜ − Γ B ( θ ; G ) (S4) The Γ A ( θ ; G ) = Γ B ( θ ; G ) since E ( W i ( t 0 ; G ) | T i , Z i ) = 1. W e then sho wn 1 n ∂ ∆( θ ; t 0 | G ) ∂ θ T = 1 n n X i =1 I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T (1 − W i ( t 0 ; G )) exp { α ( t 0 ) + β ( t 0 ) T Z i } (exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1) 2 a.s. − − → E I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T (1 − W i ( t 0 ; G )) exp { α ( t 0 ) + β ( t 0 ) T Z i } (exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1) 2 = 0 (S5) As a result, we can sho w AV A ( θ ; G ) − AV B ( θ ; G ) = Γ − 1 A ( θ ; G )Σ A ( θ ; G ) Γ − 1 A ( θ ; G ) T − Γ − 1 B ( θ ; G )Σ B ( θ ; G ) Γ − 1 B ( θ ; G ) T = Γ − 1 A ( θ ; G ) Σ ∆ ( θ ; G ) + E 2 I ( t 0 ≥ V i ) " 1 Z i # " 1 Z i # T exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 exp { α ( t 0 ) + β ( t 0 ) T Z i } exp { α ( t 0 ) + β ( t 0 ) T Z i } + 1 − E W i ( t 0 ; G ) 2 I ( T i < t 0 ) | Z i , V i Γ − 1 A ( θ ; G ) T (S6) While AV A ( θ ; G ) − AV B ( θ ; G ) can b e approximated via n umerical integration, we instead illustrate the difference betw een the t w o approac hes empirically . Sp ecifically , w e compare their sample standard deviations of estimates, as given in (S7), using data generated under Simulations 1.1 and 1.2; results are sho wn in Figures 3 and S14, resp ectively . n 1 K − 1 P K k =1 ( ˜ θ ( k ) A − ¯ ˜ θ A ) 2 − 1 K − 1 P K k =1 ( ˜ θ ( k ) B − ¯ ˜ θ B ) 2 o n (S7) Recall that there are 1000 rep etitions in Simulations 1.1 and 1.2 (i.e., 1 ≤ k ≤ K , where K = 1000), and there is 7000 sub jects (i.e., n = 7000) in each rep etition. The ˜ θ ( k ) A and ˜ θ ( k ) B are the estimated co efficients using approac hes A and B at t 0 , repetitively . The ¯ ˜ θ A = 1 K K X k =1 ˜ θ ( k ) A and ¯ ˜ θ B = 1 K K X k =1 ˜ θ ( k ) B . 2 In the sim ulation rep etition k , the W ( k ) i ( t 0 ; G ) = I ( U i ≤ t 0 ) δ i G ( U i | Z ( k ) i ) + I ( U i >t 0 ) G ( t 0 | Z ( k ) i ) , where G ( t 0 | Z ( k ) i ) and Z ( k ) i are the true IPCW and cov ariates,resp ectiv ely . 3 S2 Additional Result of POI Analysis Figure S1 compares the sandwich and b o otstrap standard errors under approaches A and B for b o otstrap sample sizes of 500, 1,000, 5,000, and 10,000. Under approach B, the b o otstrap SEs are close to and slightly larger than the corresp onding sandwich SEs, except for Z 2 when t 0 ≥ 35; this discrepancy will diminish as the n umber of bo otstrap resamples increases (e.g., 50,000). Under approach A, the bo otstrap and sandwich SEs are similar and slightly larger for t 0 ≤ 30, but at later ages the b o otstrap SEs tend to b e slightly smaller for most co efficients. This div ergence is due to the high and increasing censoring rate b eyond age 30 (e.g., 32.5% at age 30), which causes some b o otstrap resamples to yield unstable estimates under approach A, particularly when the data are imbalanced (e.g., Z 2 ). (a) Approac h A (b) Approac h B Figure S1: Comparison of sandwich standard error and b o otstrap standard error of esti- mated co efficients when using b oth approaches. The num b er of b o otstrap is 500,1000, 5000, and 10000. The G ( ·| Z ) is estimated using SRF with the n umber of trees and the no de size both set to 100. Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-Africa American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy 4 The estimated co efficients along with their 95% CI using b oth approaches A and B is shown in Figure S2. Figure S2: The estimated co efficient with 95% CI, and ˆ G ( ·| Z ) is ob- tained using SRF. F ollowing Im et al., the n umber of trees and the no de size in SRF are set to 100. The blac k line denotes a horizon- tal line of 0. Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-Africa American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy 5 In addition, a tw o-step pro cedure is employ ed to estimate the co efficien t functions, where the LOESS metho d is fitted on top of the estimated co efficient at each t 0 . W e chose span v alues equal to 0.3, 0.5, and 0.8. The result is sho wn in Figure S3 Figure S3: The estimated co efficien ts w ere obtained using three approaches. The LOESS metho d was applied using three different span v alues: 0.3 (dash- dot line), 0.5 (solid line) and 0.8 (dotted line). The ˆ G ( ·| Z ) is obtained us- ing SRF. F ollowing Im et al., the n umber of trees and the no de size in SRF are set to 100. Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-Africa American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy The estimated co efficien ts obtained by fiv e different metho ds for calculating G ( · | Z ) are in Figure S4a. When with Approac h A, and Figure S4a with Approac h B is used. W e conducted the same comparison for the corresponding sandwich standard errors, which are shown in Figures S5a and S5b. 6 (a) Estimated Coefficients using Approach A (b) Estimated Coefficients using Approach B Figure S4: The estimated co efficient using approac h A and B with differen t metho ds to obtained ˆ G ( ·| Z ). The num b er of trees and the no de size in SRF are set to 100. Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-Africa American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy 7 (a) Standard Error Estimates using Approach A (b) Standard Error Estimates using Approach B Figure S5: The standard error of estimated co efficien t using approach A and B with differen t mo dels to obtained ˆ G ( ·| Z ). The num b er of trees and the no de size in SRF are set to 100. Z 1 : Rescaled age at cancer diagnosis, Z 2 : Race-Africa American, Z 3 : Race-Other, Z 4 : Receipt of BMT, Z 5 : Receipt of Alkylating agents Z 6 : Receipt of radiation to the ab domen/p elvis/total b o dy. 8 S3 Additional Outcome of Simulation 1 W e summarized the efficiency of different metho ds and approaches in sim ulation 1.1 as follo ws. When using a well-tuned SRF, Approac h B demonstrates greater efficiency under high censoring rates but is slightly less efficien t when the censoring rate is low at the earliest age (age 21). When G ( ·| Z ) is estimated with the stratified ECDF or Co x PH mo del on C ∗ , Approach B generally yields a smaller SSD. A similar pattern is sho wn when the Cox PH mo del on C is used at ages 21 and 30. How ever, the Cox PH mo del on C app ears to lose efficiency at ages 35 and 40 compared to other methods , particularly when using Approach B (Figure S8). W e summ arized the efficiency of differen t methods and approac hes in simulation 1.2 as follo ws. When using a w ell-tuned SRF, Approach A demonstrates greater efficiency than Approach B. A similar trend is observ ed when G ( ·| Z ) is estimated with the stratified ECDF or Cox PH mo del on C ∗ , where Approach A consisten tly yields a smaller SSD than Approach B. At early ages (21 and 30), the p erformance across b oth approac hes remains nearly identical, primarily due to the negligible censoring rate (0 . 34%). How ever, a notable div ergence o ccurs at age 40 when the censoring rate is lo w (10 . 7%), as illustrated in Figure S12. S3.1 Sim ulation 1.1: Regarding Performance of Different Metho ds and Ap- proac hes W e compare the root sample mean square error (RSMSE) of estimates when using a surviv al random forest (SRF) with different sets of input parameters to obtain ˆ G ( · | Z ). Since we only hav e tw o cov ariates, we use b oth of them in each split. W e select the num b er of trees from 100 to 1000 for some of the simulation datasets. Since a higher n umber of trees provides a similar RSMSE but takes m uch longer to run, we ha ve decided to stick with 100 trees. W e used b oth v ariables at each split, and the minimal no de size to split at (i.e., no de size) is selected from 15, 50, 100, 200, 500, and 3500. W e refer to these as situations 1 to 6 and denote them as S1 to S6, resp ectiv ely . A lo wer RSMSE is considered to b e b etter. W e observe that small no de sizes p erform well when the cov ariate is a binary v ariable, while large node sizes p erform well when the cov ariate is a c on tinuous v ariable. W e select no de size equal to 200 (i.e., S4) when using SRF in b oth approac hes. T o find which metho d has the b est p erformance when using each approach to estimate co efficien ts, we compare the RSMSE of estimated coefficients when using SRF with the selected hyperparameter v alue, stratified ECDF , the Co x PH mo del on C , the Cox PH model on C ∗ , and the true CDF to obtain ˆ G ( · | Z ). F or the stratified ECDF, the cov ariate Z 1 w as partitioned in to four levels: [0 , 0 . 25), [0 . 25 , 0 . 5), [0 . 5 , 0 . 75), and [0 . 75 , 1]. Results for b oth estimation framew orks are illustrated in Figure S6. 9 Figure S6: Comparison of Ro ot Sample Mean Square Error (RSMSE) for co efficient estimates in Simulation 1.1. Results are shown for t w o approac hes across five ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 200; (ii) stratified ECDF; (iii) Co x PH mo del on C ; (iv) Co x PH mo del on C ∗ ; and (v) the true CDF. 10 The Figure S7 compares the estimated co efficients obtained using tw o approac hes, while the censoring distribution is estimated using different methods alongside their true v alues. Figure S7: Comparison of sample mean of the estimates (SMEAN) for co efficien t estimates in Simulation 1.1. Results are shown for t w o approac hes across five ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 200; (ii) stratified ECDF; (iii) Co x PH mo del on C ; (iv) Co x PH mo del on C ∗ ; and (v) the true CDF. 11 The Figure S8 compares the sample standard deviation of the estimated coefficients using b oth approaches when the censoring distribution is estimated using differen t metho ds. Figure S8: Comparison of sample standard deviation of the estimates (SSD) for co efficien t estimates in Sim ulation 1.1. Results are sho wn for t w o approaches across five ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 200; (ii) stratified ECDF; (iii) Co x PH mo del on C ; (iv) Cox PH mo del on C ∗ ; and (v) the true CDF. 12 The Figure S9 compares the SMESE to the SSD for assessing the standard error estimation Figure S9: Ev aluation of standard error (SE) estimator p erformance in Simulation 1.1. The plot displa ys the sample mean of the estimated SEs (SMESE; horizontal lines) and the corresp onding 2.5%–97.5% empirical quantiles (shaded regions) relative to the sample standard deviation of the co efficient estimates (SSD; diamonds). Results are shown for tw o approac hes across fiv e ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 200; (ii) stratified ECDF; (iii) Cox PH mo del on C ; (iv) Cox PH mo del on C ∗ ; and (v) the true CDF. 13 S3.2 Sim ulation 1.2: Regarding Performance of Different Metho ds and Ap- proac hes F ollo wing the same procedure men tioned in section S3.1, w e select the n umber of trees equal to 100 and the no de sizes equal to 50 when using SRF, as this set of hyperparameters shows go o d p erformance in terms of RSMSE at all ages using either approac h. T o find which metho d has the b est p erformance when using each approach to estimate co efficien ts, we compare the RSMSE of estimated coefficients when using SRF with the selected hyperparameter v alue, stratified ECDF , the Co x PH mo del on C , the Cox PH model on C ∗ , and the true CDF to obtain ˆ G ( · | Z ). F or the stratified ECDF, the cov ariate Z 1 w as partitioned in to four levels: [0 , 0 . 25), [0 . 25 , 0 . 5), [0 . 5 , 0 . 75), and [0 . 75 , 1]. Results for b oth estimation framew orks are illustrated in Figure S10. The T able S1 summarizes the estimates of α 1 ( t ) and β ( t ) (which equal β ) in simulation 1.2, using b oth approac hes and all five metho ds to obtain ˆ G ( ·| Z ). W e set the num b er of tree s equal to 100 and the no de size equal to 50, as mentioned ab ov e . T able S1: The result of simulation 1.2. W e rep ort the sample mean of the estimates (SMEAN), the sample standard deviation of the estimates (SSD), the sample mean of the estimated standard errors (SMESE), and the ro ot mean squared error of the estimates (RSMSE) of each approac h. In SRF, the num b er of tree is set to 100 and the no de size is set to 50. α (21) = − 1 . 440 β 1 (21) = − 5 . 460 β 2 (21) = 1 . 500 Age 21 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN -1.439 -1.436 -1.441 -1.436 -1.439 -5.472 -5.492 -5.477 -5.482 -5.471 1.500 1.506 1.505 1.500 1.499 SSD 0.092 0.092 0.092 0.092 0.092 0.262 0.263 0.262 0.263 0.262 0.097 0.097 0.097 0.097 0.097 SMESE 0.092 0.092 0.092 0.092 0.092 0.257 0.258 0.257 0.257 0.257 0.099 0.099 0.099 0.099 0.099 RSMSE 0.092 0.092 0.092 0.092 0.092 0.262 0.265 0.263 0.263 0.262 0.097 0.097 0.097 0.097 0.097 Approach B SMEAN -1.439 -1.442 -1.438 -1.442 -1.439 -5.471 -5.453 -5.465 -5.461 -5.471 1.499 1.494 1.494 1.499 1.499 SSD 0.092 0.092 0.092 0.092 0.092 0.262 0.261 0.262 0.261 0.262 0.097 0.097 0.097 0.097 0.097 SMESE 0.092 0.092 0.092 0.092 0.092 0.257 0.256 0.256 0.256 0.257 0.099 0.099 0.099 0.099 0.099 RSMSE 0.092 0.092 0.092 0.092 0.092 0.262 0.261 0.262 0.261 0.262 0.097 0.097 0.097 0.097 0.097 α (30) = 0 . 900 β 1 (30) = − 5 . 460 β 2 (30) = 1 . 500 Age 30 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN 0.899 0.915 0.890 0.906 0.897 -5.462 -5.524 -5.466 -5.480 -5.456 1.506 1.526 1.525 1.501 1.501 SSD 0.060 0.060 0.060 0.059 0.060 0.148 0.149 0.149 0.148 0.149 0.072 0.072 0.072 0.072 0.073 SMESE 0.062 0.062 0.062 0.062 0.062 0.150 0.149 0.149 0.149 0.150 0.070 0.069 0.070 0.070 0.070 RSMSE 0.060 0.062 0.061 0.060 0.060 0.148 0.162 0.149 0.149 0.149 0.073 0.076 0.076 0.072 0.072 Approach B SMEAN 0.897 0.863 0.909 0.881 0.897 -5.446 -5.315 -5.426 -5.409 -5.456 1.492 1.445 1.450 1.500 1.501 SSD 0.061 0.060 0.062 0.060 0.062 0.148 0.143 0.148 0.146 0.152 0.071 0.071 0.072 0.071 0.073 SMESE 0.063 0.062 0.064 0.063 0.063 0.152 0.146 0.151 0.151 0.153 0.071 0.069 0.070 0.071 0.071 RSMSE 0.061 0.070 0.063 0.063 0.062 0.148 0.203 0.152 0.154 0.152 0.072 0.090 0.088 0.071 0.073 α (35) = 2 . 200 β 1 (35) = − 5 . 460 β 2 (35) = 1 . 500 Age 35 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN 2.207 2.231 2.194 2.206 2.203 -5.478 -5.548 -5.475 -5.485 -5.467 1.509 1.531 1.531 1.503 1.502 SSD 0.083 0.082 0.084 0.081 0.084 0.156 0.154 0.158 0.154 0.158 0.074 0.073 0.073 0.074 0.074 SMESE 0.082 0.080 0.081 0.080 0.081 0.154 0.151 0.153 0.151 0.154 0.073 0.073 0.073 0.074 0.074 RSMSE 0.083 0.087 0.084 0.081 0.084 0.157 0.178 0.158 0.156 0.158 0.074 0.079 0.080 0.074 0.074 Approach B SMEAN 2.194 2.115 2.236 2.181 2.203 -5.440 -5.199 -5.441 -5.406 -5.467 1.481 1.391 1.395 1.505 1.502 SSD 0.085 0.080 0.089 0.083 0.093 0.163 0.151 0.167 0.160 0.179 0.077 0.076 0.078 0.078 0.083 SMESE 0.090 0.085 0.091 0.089 0.090 0.171 0.157 0.170 0.168 0.173 0.079 0.075 0.077 0.080 0.080 RSMSE 0.085 0.117 0.096 0.085 0.093 0.164 0.302 0.168 0.169 0.179 0.079 0.132 0.131 0.078 0.083 α (40) = 3 . 500 β 1 (40) = − 5 . 460 β 2 (40) = 1 . 500 Age 40 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al/ Approach A SMEAN 3.524 3.542 3.502 3.490 3.504 -5.499 -5.550 -5.483 -5.466 -5.468 1.512 1.529 1.534 1.500 1.502 SSD 0.144 0.140 0.144 0.136 0.146 0.221 0.216 0.222 0.208 0.223 0.088 0.088 0.090 0.089 0.089 SMESE 0.149 0.145 0.148 0.142 0.149 0.223 0.219 0.223 0.212 0.223 0.089 0.089 0.090 0.090 0.090 RSMSE 0.146 0.147 0.144 0.136 0.146 0.224 0.234 0.223 0.208 0.223 0.089 0.093 0.096 0.089 0.089 Approach B SMEAN 3.466 3.322 3.605 3.514 3.506 -5.397 -5.027 -5.503 -5.442 -5.471 1.456 1.323 1.302 1.523 1.502 SSD 0.168 0.153 0.187 0.168 0.195 0.257 0.229 0.276 0.256 0.296 0.097 0.094 0.097 0.101 0.109 SMESE 0.193 0.172 0.203 0.195 0.197 0.291 0.255 0.300 0.293 0.298 0.107 0.099 0.102 0.113 0.110 RSMSE 0.172 0.235 0.214 0.168 0.195 0.264 0.489 0.280 0.257 0.296 0.107 0.201 0.220 0.104 0.109 14 Figure S10: Comparison of Ro ot Sample Mean Square Error (RSMSE) for co efficien t esti- mates in Simulation 1.2. Results are shown for t w o approac hes across fiv e ˆ G ( · | Z ) which is obtained using: (i) SRF with 100 trees and a minimum no de size of 50; (ii) stratified ECDF; (iii) Cox PH mo del on C ; (iv) Co x PH mo del on C ∗ ; and (v) the true CDF. 15 The Figure S11 compares the estimated co efficients obtained using tw o approaches, while the censoring distribution is estimated using different methods alongside their true v alues. Figure S11: Comparison of sample mean of the estimates (SMEAN) for co efficien t estimates in Simulation 1.2. Results are shown for t w o approac hes across five ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 50; (ii) stratified ECDF; (iii) Cox PH mo del on C ; (iv) Co x PH mo del on C ∗ ; and (v) the true CDF. 16 The Figure S12 compares the sample standard deviation of the estimated co efficien ts using b oth ap- proac hes when the censoring distribution is estimated using differen t metho ds. Figure S12: Comparison of sample standard deviation of the estimates (SSD) for co efficient estimates in Sim ulation 1.1. Results are sho wn for t w o approaches across five ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 50; (ii) stratified ECDF; (iii) Co x PH mo del on C ; (iv) Cox PH mo del on C ∗ ; and (v) the true CDF. 17 The Figure S13 compares the SMESE to the SSD for assessing the standard error estimation Figure S13: Ev aluation of standard error (SE) estimator p erformance in Sim ulation 1.1. The plot displa ys the sample mean of the estimated SEs (SMESE; horizontal lines) and the corresp onding 2.5%–97.5% empirical quantiles (shaded regions) relative to the sample standard deviation of the co efficient estimates (SSD; diamonds). Results are shown for tw o approac hes across fiv e ˆ G ( · | Z ) whic h is obtained using: (i) SRF with 100 trees and no de size of 50; (ii) stratified ECDF; (iii) Cox PH mo del on C ; (iv) Cox PH mo del on C ∗ ; and (v) the true CDF. 18 Figure S14: Comparison of differences in estimated v ariances when G ( ·| Z ) is known in Sim ulation 1.2. The red line in eac h plot represen ts a difference of 0. S4 Numerical Result of Simulation 2 The simulation 2.1 is conducted to show the necessit y of considering risk set adjustment when the analysis is done b efore age 21 (T able S2). In addition, the simulation 2.2 is conducted to show robustness of the approac hes. The results in Figure S15 to Figure S19 highlight that while both approaches perform well when the censoring model is correctly specified. S4.1 Sim ulation 2.1: Regarding Risk Set Adjustmen t The T able S2 summarizes the estimates of α 1 ( t ) and β ( t ) (which equals β ) in simulation 2, using b oth approac hes and all five methods to obtain ˆ G ( ·| Z ). W e set b oth the n umber of trees and the node size equal to 100. 19 T able S2: The result of simulation 2.1. W e rep ort the sample mean of the estimates (SMEAN), the sample standard deviation of the estimates (SSD), the sample mean of the estimated standard errors (SMESE), and the ro ot sample mean squared error of the esti- mates (RSMSE) of each approach. In SRF, b oth the n um b er of trees and the no de size are set to 100. α 1 (13) = − 2 . 400 β 1 (13) = − 6 . 300 β 2 (13) = 1 . 000 Age 13 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al SMEAN -2.298 -2.290 -2.298 -2.294 -2.298 -6.955 -6.978 -6.955 -6.968 -6.954 1.021 1.022 1.021 1.021 1.020 SSD 0.227 0.227 0.227 0.227 0.227 0.617 0.619 0.617 0.619 0.618 0.214 0.214 0.214 0.214 0.214 SMESE 0.221 0.221 0.221 0.221 0.221 0.600 0.602 0.600 0.602 0.600 0.214 0.214 0.214 0.214 0.214 RSMSE 0.249 0.252 0.249 0.251 0.249 0.900 0.918 0.900 0.910 0.900 0.215 0.215 0.215 0.215 0.215 Approach A SMEAN -2.425 -2.417 -2.425 -2.421 -2.425 -6.305 -6.330 -6.304 -6.318 -6.303 1.018 1.019 1.018 1.018 1.017 SSD 0.248 0.248 0.248 0.248 0.248 0.750 0.752 0.750 0.751 0.750 0.213 0.213 0.213 0.213 0.213 SMESE 0.242 0.242 0.242 0.242 0.242 0.728 0.730 0.728 0.730 0.728 0.213 0.213 0.213 0.213 0.213 RSMSE 0.249 0.249 0.249 0.249 0.249 0.749 0.752 0.749 0.751 0.749 0.214 0.214 0.214 0.214 0.214 Approach B SMEAN -2.425 -2.430 -2.425 -2.428 -2.425 -6.304 -6.289 -6.303 -6.295 -6.303 1.017 1.017 1.017 1.017 1.017 SSD 0.248 0.248 0.248 0.248 0.248 0.750 0.748 0.749 0.748 0.749 0.213 0.213 0.213 0.213 0.213 SMESE 0.242 0.242 0.242 0.242 0.242 0.728 0.726 0.728 0.727 0.728 0.213 0.213 0.213 0.213 0.213 RSMSE 0.249 0.249 0.249 0.249 0.249 0.749 0.747 0.749 0.747 0.749 0.214 0.214 0.214 0.214 0.214 α 1 (14) = − 2 . 100 β 1 (14) = − 6 . 300 β 2 (14) = 1 . 000 Age 14 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al SMEAN -2.015 -2.004 -2.016 -2.010 -2.016 -6.769 -6.800 -6.768 -6.784 -6.767 1.008 1.009 1.008 1.008 1.007 SSD 0.198 0.198 0.198 0.198 0.198 0.534 0.537 0.534 0.536 0.534 0.183 0.183 0.183 0.183 0.183 SMESE 0.194 0.194 0.194 0.194 0.194 0.527 0.529 0.527 0.528 0.526 0.185 0.185 0.185 0.185 0.185 RSMSE 0.215 0.220 0.215 0.217 0.215 0.710 0.734 0.710 0.722 0.709 0.183 0.183 0.183 0.183 0.183 Approach A SMEAN -2.106 -2.095 -2.107 -2.101 -2.107 -6.321 -6.355 -6.319 -6.337 -6.318 1.005 1.006 1.005 1.005 1.005 SSD 0.212 0.212 0.212 0.212 0.212 0.615 0.618 0.616 0.617 0.615 0.183 0.183 0.183 0.183 0.183 SMESE 0.208 0.208 0.208 0.208 0.208 0.608 0.610 0.608 0.610 0.608 0.184 0.184 0.184 0.184 0.184 RSMSE 0.212 0.212 0.212 0.212 0.212 0.615 0.620 0.616 0.618 0.615 0.183 0.183 0.183 0.183 0.183 Approach B SMEAN -2.106 -2.114 -2.106 -2.110 -2.106 -6.319 -6.295 -6.318 -6.306 -6.319 1.004 1.003 1.004 1.004 1.005 SSD 0.212 0.211 0.212 0.211 0.212 0.615 0.612 0.614 0.613 0.614 0.183 0.183 0.183 0.183 0.183 SMESE 0.208 0.207 0.208 0.207 0.208 0.608 0.606 0.608 0.606 0.608 0.184 0.184 0.184 0.184 0.184 RSMSE 0.212 0.212 0.212 0.211 0.212 0.615 0.612 0.614 0.613 0.614 0.183 0.183 0.183 0.183 0.183 α 1 (15) = − 1 . 800 β 1 (15) = − 6 . 300 β 2 (15) = 1 . 000 Age 15 SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue SRF Cox C Cox C ∗ ECDF T rue Approach used by Im et al SMEAN -1.746 -1.731 -1.747 -1.740 -1.747 -6.618 -6.660 -6.615 -6.636 -6.614 1.010 1.012 1.010 1.010 1.009 SSD 0.172 0.172 0.172 0.172 0.172 0.459 0.462 0.460 0.461 0.460 0.165 0.165 0.166 0.165 0.166 SMESE 0.171 0.172 0.171 0.172 0.171 0.465 0.467 0.465 0.466 0.465 0.162 0.162 0.162 0.161 0.162 RSMSE 0.180 0.185 0.180 0.182 0.180 0.558 0.585 0.557 0.570 0.556 0.166 0.166 0.166 0.166 0.166 Approach A SMEAN -1.809 -1.793 -1.810 -1.803 -1.810 -6.319 -6.363 -6.316 -6.337 -6.315 1.007 1.009 1.008 1.007 1.007 SSD 0.181 0.181 0.181 0.181 0.181 0.509 0.512 0.509 0.511 0.509 0.165 0.165 0.165 0.165 0.165 SMESE 0.180 0.180 0.180 0.180 0.180 0.515 0.517 0.515 0.517 0.515 0.161 0.161 0.161 0.161 0.161 RSMSE 0.181 0.181 0.181 0.181 0.181 0.509 0.515 0.509 0.512 0.509 0.165 0.165 0.165 0.165 0.165 Approach B SMEAN -1.809 -1.821 -1.809 -1.815 -1.809 -6.316 -6.281 -6.315 -6.298 -6.316 1.006 1.005 1.006 1.006 1.007 SSD 0.181 0.180 0.180 0.180 0.180 0.509 0.505 0.508 0.506 0.508 0.165 0.165 0.165 0.165 0.165 SMESE 0.180 0.180 0.180 0.180 0.180 0.515 0.512 0.515 0.513 0.515 0.161 0.161 0.161 0.161 0.161 RSMSE 0.181 0.181 0.180 0.180 0.180 0.509 0.505 0.508 0.506 0.508 0.165 0.165 0.165 0.165 0.165 S4.2 Sim ulation 2.2: Regarding Performance of Different Metho ds and Ap- proac hes In Figure S15 to S19, we rep ort the estimated surviv al probability using each metho d for b oth approaches and compare it with the true surviv al probabilit y based on the model in simulation 2. W e set the num b er of trees equal to 100 and the no de size equal to 500 when using SRF. 20 Figure S15: Compare the a verage estimated surviv al probabilit y across all approaches with the av erage true surviv al probability in sim ulation 2.2. W e use SRF to estimate the surviv al function of censoring time. In SRF, the num b er of trees is set to 100, and the no de size is set to 500. The standard error of the estimated surviv al probability is calculated b y finding the standard error of the linear predictor (log-o dds) using the co v ariance matrix of estimated co efficien ts, and then transforming that error to the probability scale. 21 Figure S16: Compare the a verage estimated surviv al probabilit y across all approaches with the av erage true surviv al probabilit y in simulation 2.2. W e use Cox PH Model to estimate the surviv al function of censoring time. The standard error of estimated surviv al probabilit y is calculated b y finding the standard error of the linear predictor (log-o dds) using the co v ariance matrix of estimated co efficients, and then transforming that error to the probability scale. 22 Figure S17: Compare the a verage estimated surviv al probabilit y across all approaches with the a verage true surviv al probability in sim ulation 2.2. W e use Cox PH Mo del to estimate the surviv al function of C ∗ , whic h equals to C − V − 5. The standard error of estimated surviv al probabilit y is calculated b y finding the standard error of the linear predictor (log- o dds) using the co v ariance matrix of estimated co efficien ts, and then transforming that error to the probability scale. 23 Figure S18: Compare the a verage estimated surviv al probabilit y across all approaches with the av erage true surviv al probability in sim ulation 2.2. W e use stratified ECDF to estimate the surviv al function of censoring time. The standard error of estimated surviv al probability is calculated by finding the standard error of the linear predictor (log-o dds) using the cov ari- ance matrix of estimated co efficients, and then transforming that error to the probability scale. 24 Figure S19: Compare the a verage estimated surviv al probabilit y across all approaches with the av erage true surviv al probabilit y in sim ulation 2.2. W e use true CDF of censoring time. The standard error of estimated surviv al probability is calculated b y finding the standard error of the linear predictor (log-o dds) using the co v ariance matrix of estimated co efficien ts, and then transforming that error to the probability scale. S5 Numerical Result of Simulation 3 In Figure S20 to S24, we report the estimated surviv al probability using eac h metho d for b oth approaches. W e compare it with the true surviv al probability based on the mo del in simulation 3 and an estimated surviv al probability using the Cox PH mo del with an unsp ecified baseline hazard function. W e set the n umber of trees equal to 100 and the no de size equal to 500 when using SRF. 25 Figure S20: Compare the a verage estimated surviv al probabilit y across all approaches with the av erage true surviv al probability in simulation 3. W e use SRF to estimate the surviv al function of censoring time. In SRF, the n um b er of trees is set to 100, and the no de size are set to 500. The standard error of estimated surviv al probability is calculated by finding the standard error of the linear predictor (log-o dds) using the co v ariance matrix of estimated co efficien ts, and then transforming that error to the probability scale. 26 Figure S21: Compare the a verage estimated surviv al probabilit y across all approaches with the av erage true surviv al probability in simulation 3. W e use Cox PH Mo del to estimate the surviv al function of censoring time. TThe standard error of estimated surviv al probability is calculated b y finding the standard error of the linear predictor (log-o dds) using the co v ariance matrix of estimated co efficients, and then transforming that error to the probability scale. 27 Figure S22: Compare the a verage estimated surviv al probabilit y across all approaches with the a v erage true surviv al probabilit y in simulation 3. W e use Co x PH Model to estimate the surviv al function of C ∗ , whic h equals to C − V − 5. The standard error of estimated surviv al probabilit y is calculated b y finding the standard error of the linear predictor (log- o dds) using the co v ariance matrix of estimated co efficien ts, and then transforming that error to the probability scale. 28 Figure S23: Compare the a verage estimated surviv al probabilit y across all approaches with the a verage true surviv al probabilit y in sim ulation 3. W e use stratified ECDF to estimate the surviv al function of censoring time. The standard error of estimated surviv al probabilit y is calculated b y finding the standard error of the linear predictor (log-o dds) using the co v ariance matrix of estimated co efficients, and then transforming that error to the probability scale. 29 Figure S24: Compare the a verage estimated surviv al probabilit y across all approaches with the a verage true surviv al probability in sim ulation 3. W e use true CDF of censoring time.The standard error of estimated surviv al probability is calculated by finding the standard error of the linear predictor (log-o dds) using the cov ariance matrix of estimated co efficients, and then transforming that error to the probability scale. 30 S6 V alidation of Data Generation Pro cess In the simulation studies,the current data generation algorithm only generates data at eac h selected t 0 . Also, the true even t time is only generated for sub jects who hav e I ( T i ≤ t 0 ) = 1. T o ensure that this data generation algorithm works, we consider another algorithm that generates T i for all sub jects, as shown b elo w. Algorithm S1 Another Data Sim ulation Pro cess of sub jects in rep etition k , where 1 ≤ k ≤ K Initialize: Set parameters for the true model(s) of T | Z, T > V and distribution parameters. F or notational con venience, we omit the subscript k in the follo wing steps F or sub ject i = 1 , 2 , . . . , n do: 1. Generate Z 1 i from Beta( a 1 , a 2 ) 2. Generate Z 2 i from Bernoulli( p ) 3. Generate V i from mixture of truncated normal distributions: 4. Generate T i | Z i , T i > V i : • Sample B i ∼ Uniform(0 , 1) • If B i < 0 . 005, set B i = 0 . 005 • Obtain T i using in verse transform sampling 5. Generate censoring time C i : • Sample C ∗ ∼ W eibull( ψ 3 i , ψ 4 i ) • Set C = C ∗ + V i + 5 6. Compute U i = min { T i , C i } and δ i = I ( T i ≤ C i ) 7. Store observed data: O i = { U i , δ i , V i , C i , Z i } W e subsequently employ the proposed data-generating algorithm to replicate the datasets from Simulation 3, computing estimated surviv al probabilities for all methods under b oth approac hes. As illustrated in Figure S25, the resulting differences in these estimates are smaller. 31 Figure S25: The difference b etw een tw o a verage estimated surviv al probabilit y using both approac hes on tw o different data generation in simulation 3. 32
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment