Semi-knockoffs: a model-agnostic conditional independence testing method with finite-sample guarantees

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Conditional independence testing (CIT) is essential for reliable scientific discovery. It prevents spurious findings and enables controlled feature selection. Recent CIT methods have used machine learning (ML) models as surrogates of the underlying distribution. However, model-agnostic approaches require a train-test split, which reduces statistical power. We introduce Semi-knockoffs, a CIT method that can accommodate any pre-trained model, avoids this split, and provides valid p-values and false discovery rate (FDR) control for high-dimensional settings. Unlike methods that rely on the model-$X$ assumption (known input distribution), Semi-knockoffs only require conditional expectations for continuous variables. This makes the procedure less restrictive and more practical for machine learning integration. To ensure validity when estimating these expectations, we present two new theoretical results of independent interest: (i) stability for regularized models trained with a null feature and (ii) the double-robustness property.

💡 Research Summary

This paper introduces “Semi‑knockoffs,” a novel conditional independence testing (CIT) framework that works directly with any pre‑trained machine‑learning model without requiring a train‑test split, while delivering finite‑sample valid p‑values and false discovery rate (FDR) control. Traditional model‑X based methods such as the Conditional Randomization Test (CRT) or knockoffs assume full knowledge of the covariate distribution or require the construction of exact knockoff variables, both of which are difficult in high‑dimensional, real‑world settings. Model‑agnostic approaches that avoid these assumptions typically split the data, which reduces statistical power, especially when sample sizes are modest.

Semi‑knockoffs circumvents these limitations by relying only on estimates of two conditional expectations for each feature j: ν = E

Semi-knockoffs: a model-agnostic conditional independence testing method with finite-sample guarantees

💡 Research Summary

Comments & Academic Discussion

Leave a Comment