Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times

Nonparametric inference for competing risks current status data with   continuous, discrete or grouped observation times
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified “naive estimator” have been established under certain smoothness conditions. In this paper, we establish the large-sample behavior of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two data sets regarding the cumulative incidence of (i) different types of menopause from a cross-sectional sample of women in the United States and (ii) subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand.


💡 Research Summary

This paper extends the non‑parametric inference framework for competing‑risks survival data subject to current‑status censoring to two additional observation‑time settings that are common in practice but have received little theoretical attention. The authors consider (i) a discrete observation‑time model, where the inspection times take values on a finite set of points (e.g., scheduled clinic visits), and (ii) a grouped‑time model, where the exact inspection times are continuous but are recorded only within pre‑specified intervals (e.g., age bands). For each setting they study two estimators: the non‑parametric maximum likelihood estimator (NPMLE) and a simplified “naive” estimator that treats each cause separately as if the data were uncensored.

In the discrete‑time case the likelihood collapses to a product over the finite support points, and the NPMLE can be expressed in closed form as a set of empirical proportions. The authors prove that, unlike the continuous‑time case where the NPMLE converges at the cube‑root‑n rate to a Chernoff‑type distribution, the discrete‑time NPMLE converges at the usual √n rate and is asymptotically normal. This speed‑up is due to the extra information supplied by the point‑mass structure of the inspection‑time distribution. The naive estimator shares the same √n asymptotics and, because it is computationally trivial, becomes a very attractive alternative in large samples.

For the grouped‑time model the authors assume that within each interval the inspection times are uniformly distributed. They derive a modified likelihood that replaces each exact inspection time by the interval’s midpoint and show that the NPMLE still exists and is unique. The asymptotic behavior now depends on the width of the intervals: as the interval length shrinks the model approaches the continuous‑time setting (cube‑root‑n rate, Chernoff limit), while for relatively wide intervals the estimator behaves more like the discrete case (√n rate, normal limit). Consequently the grouped model occupies a spectrum between the two extremes, and the authors provide explicit formulas for the asymptotic variance that incorporate the interval lengths.

The paper also develops practical confidence‑interval procedures tailored to each setting. In the discrete case standard errors are estimated directly from the empirical proportions and normal‑based intervals are constructed. In the grouped case the authors propose a hybrid approach that combines a bootstrap of the grouped data with a likelihood‑ratio‑type correction, yielding intervals that respect the non‑standard Chernoff component when intervals are narrow. They further discuss an “inverse‑Chernoff” technique that inverts the known Chernoff distribution to obtain accurate one‑sided bounds.

To illustrate the methodology, two real data applications are presented. The first concerns a cross‑sectional survey of U.S. women in which the type of menopause (natural, surgical, early, etc.) is recorded together with the women’s age. The authors analyze the data under all three observation‑time models, showing that the discrete‑time analysis (using exact ages at interview) produces substantially tighter confidence intervals for the cumulative incidence functions than the continuous‑time analysis, while the grouped‑time analysis (using 5‑year age bands) yields intermediate precision. The second application examines subtype‑specific HIV infection among injecting drug users in Thailand. Here the inspection times are the dates of serological testing, which are either exact (discrete) or recorded in 2‑year calendar intervals (grouped). Again the discrete‑time approach yields the most precise estimates of the cause‑specific cumulative incidence, and the naive estimator performs almost identically to the NPMLE in large samples.

Overall, the paper makes three major contributions. First, it rigorously characterizes how the structure of the observation‑time distribution influences the asymptotic rate and limiting distribution of the NPMLE and naive estimator in competing‑risks current‑status data. Second, it provides concrete, implementable inference tools—including variance estimators and confidence‑interval formulas—for each of the three models (continuous, discrete, grouped). Third, it demonstrates through substantive epidemiological examples that choosing an analysis model that matches the data‑collection design can lead to materially more efficient inference, especially when the goal is to compare subtle differences in cause‑specific cumulative incidence. The results are directly applicable to a wide range of cross‑sectional and sero‑prevalence studies where exact event times are unavailable, and they offer a clear roadmap for statisticians to incorporate the timing structure of their data into non‑parametric competing‑risks analysis.


Comments & Academic Discussion

Loading comments...

Leave a Comment