Goodness-of-Fit Tests for Censored and Truncated Data: Maximum Mean Discrepancy Over Regular Functionals

Goodness-of-Fit Tests for Censored and Truncated Data: Maximum Mean Discrepancy Over Regular Functionals
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We develop a systematic, omnibus approach to goodness-of-fit testing for parametric distributional models when the variable of interest is only partially observed due to censoring and/or truncation. In many such designs, tests based on the nonparametric maximum likelihood estimator are hindered by nonexistence, computational instability, or convergence rates too slow to support reliable calibration under composite nulls. We avoid these difficulties by constructing a regular (pathwise differentiable) Neyman-orthogonal score process indexed by test functions, and aggregating it over a reproducing kernel Hilbert space ball. This yields a maximum-mean-discrepancy-type supremum statistic with a convenient quadratic-form representation. Critical values are obtained via a multiplier bootstrap that keeps nuisance estimates fixed. We establish asymptotic validity under the null and local alternatives and provide concrete constructions for left-truncated right-censored data, current status data, and random double truncation; in particular, to the best of our knowledge, we give the first omnibus goodness-of-fit test for a parametric family under random double truncation in the composite-hypothesis case. Simulations and an empirical illustration demonstrate size control and power in practically relevant incomplete-data designs.


💡 Research Summary

This paper introduces a unified, omnibus goodness‑of‑fit (GOF) testing framework for parametric distributional models when the variable of interest is only partially observed because of censoring and/or truncation. Traditional GOF procedures for such incomplete‑data designs rely on non‑parametric maximum‑likelihood estimators (NPMLEs) of the target distribution and compare them to the fitted parametric model using Kolmogorov–Smirnov, Cramér–von Mises, or related distances. However, in many censoring/truncation settings the NPMLE may not exist, may be non‑unique, or may converge at rates too slow to permit reliable bootstrap calibration, especially under composite null hypotheses where nuisance parameters must be re‑estimated in each resample.

To overcome these difficulties, the authors combine two modern statistical ideas: (i) a regular (pathwise‑differentiable) Neyman‑orthogonal score process that characterizes the null hypothesis through a rich family of moment restrictions, and (ii) an aggregation of these orthogonal scores over a reproducing kernel Hilbert space (RKHS) unit ball, yielding a maximum‑mean‑discrepancy (MMD)‑type statistic.

The construction proceeds as follows. Let Z denote the observed data, which is a measurable transformation of the target variable X and auxiliary censoring/truncation variables W, observed only when a known indicator B(X,W)=1. For any test function φ, a basic moment is defined as
 g_φ(Z;θ,G)=ψ_φ(Z)−E_θ


Comments & Academic Discussion

Loading comments...

Leave a Comment