Distribution-Free Selection of Low-Risk Oncology Patients for Survival Beyond a Time Horizon

Distribution-Free Selection of Low-Risk Oncology Patients for Survival Beyond a Time Horizon
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the problem of selecting a subset of patients who are unlikely to experience an adverse event within a fixed time horizon by calibrating a screening rule based on a black-box survival model. We consider two complementary, distribution-free frameworks for this task. The first extends classical calibration ideas – estimating the event rate among selected patients using a hold-out dataset – by integrating them with the Learn-Then-Test (LTT) framework, yielding high-probability guarantees for data-adaptively tuned screening rules. The second takes a different perspective by reformulating screening as a hypothesis testing problem on future patient outcomes, enabling false discovery rate (FDR) control via the Benjamini-Hochberg procedure applied to selective conformal p-values, and providing guarantees in expectation. We clarify the theoretical relationship between these approaches, explain how both can be adapted to right-censored time-to-event data via inverse probability of censoring weighting, and compare them empirically using simulations and oncology data from the Flatiron Health Research Database. Our results reveal a trade-off between efficiency and strength of guarantees: FDR-based screening is typically more powerful, while LTT-based calibration is more conservative but offers stronger guarantees. We also provide practical guidance on implementation and tuning.


💡 Research Summary

This paper addresses the clinically important problem of identifying a subset of oncology patients who are highly unlikely to experience an adverse event within a prespecified time horizon (e.g., three months). The authors treat the predictive survival model as a black box and focus on calibrating the downstream screening rule that selects patients whose estimated survival probability exceeds a data‑driven threshold. Two complementary, distribution‑free frameworks are developed, each providing statistical guarantees while handling right‑censored data via inverse probability of censoring weighting (IPCW).

The first framework extends classical calibration ideas by embedding them in the Learn‑Then‑Test (LTT) paradigm. For a monotone family of screening rules Aλ(x)=I{ẑ(x)≥λ}, where ẑ(x) is the estimated survival probability at the horizon, the method first constructs pointwise IPCW risk upper bounds and then applies fixed‑sequence testing to certify that the selected set’s event rate does not exceed a target α with high probability (1–δ). This yields high‑probability risk control comparable to uniform confidence‑band approaches but with substantially less conservatism, allowing yields close to those obtained by naïve pointwise calibration. The LTT approach is the first to be applied to right‑censored time‑to‑event data.

The second framework reformulates screening as a multiple‑testing problem. Selective conformal p‑values are constructed for each patient using IPCW to account for censoring. Applying the Benjamini–Hochberg (BH) procedure to these p‑values controls the false discovery rate (FDR) in expectation, which translates into control of the average event fraction among the selected patients. This method does not require a pre‑specified confidence level and introduces a single tuning parameter that can dramatically increase selection yield while preserving theoretical guarantees. The guarantee is in expectation rather than high‑probability, so individual realizations may exceed the target α, but the average risk aligns with the desired level.

A key theoretical contribution (Theorem 2) links the two notions of error control, showing that the expected event rate and the FDR coincide except when selected events are extremely rare. Consequently, in settings where the adverse event is uncommon (as often in early‑phase oncology de‑escalation studies), the FDR‑based approach tends to be more powerful, whereas the LTT‑based method offers stricter per‑sample safety guarantees.

Empirical evaluation includes extensive simulations across a range of censoring proportions and event prevalences, as well as a real‑world case study using the Flatiron Health Research Database (breast and lung cancer cohorts). Results confirm that LTT provides the most stringent risk guarantees but yields fewer patients, while the FDR method admits substantially more patients with average risk close to the target. Both methods outperform traditional pointwise and uniform IPCW confidence‑interval calibration in terms of the trade‑off between safety and efficiency. Sensitivity analyses demonstrate that accurate estimation of the censoring distribution Ĝ is crucial for all procedures.

The paper concludes with practical implementation guidance: (1) split data into model‑training, calibration, and application sets; (2) fit separate survival and censoring models; (3) for LTT choose δ and the fixed‑sequence depth; (4) for FDR select the BH level q and the additional tuning parameter τ; (5) use the provided open‑source R/Python libraries that expose functions such as fit_survival, fit_censor, calibrate_LTT, and calibrate_FDR. By offering both high‑probability and expectation‑based calibration strategies, the work equips clinicians and data scientists with flexible tools to safely expand low‑risk patient enrollment in clinical trials or to allocate limited therapeutic resources in routine oncology practice.


Comments & Academic Discussion

Loading comments...

Leave a Comment