Interpretable Generalized Additive Models for Datasets with Missing Values

Interpretable Generalized Additive Models for Datasets with Missing Values
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model’s mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through l0 regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naive inclusion of indicator variables.


💡 Research Summary

The paper introduces M‑GAM, a sparse, generalized additive model designed to handle missing values without sacrificing interpretability. Traditional approaches either impute missing entries—thereby obscuring the original feature semantics and often creating multivariate interactions—or add missingness indicator variables, which can explode the feature space and reduce sparsity. M‑GAM resolves this tension by augmenting each feature’s univariate shape function with two types of terms: (1) a missingness indicator that adds a constant offset when the feature is missing for a specific reason, and (2) a “missingness adjustment” interaction that modifies the shape function of another feature whenever a third feature is missing. This preserves the core GAM property—predictions as a sum of univariate functions—while allowing the model to adapt its behavior based on missingness patterns.
To keep the model parsimonious, the authors employ ℓ₀ regularization, directly penalizing the number of non‑zero indicator and interaction coefficients. An efficient coordinate‑descent algorithm with approximate ℓ₀ handling is used for training. Theoretical contributions include: Proposition 3.1, which shows that even with a perfect imputer, using missingness as a predictive feature can outperform an impute‑then‑predict Bayes‑optimal classifier when missingness is informative; Corollary 3.2, which demonstrates that perfect imputation can prevent any Bayes‑optimal model from being realized, highlighting a fundamental limitation of impute‑then‑predict pipelines; and Theorem 3.4, proving that for any affine imputation combined with a GAM predictor, there exists an M‑GAM that recovers the same expected classification score, establishing that M‑GAM subsumes existing imputation‑based approaches.
Empirically, the authors evaluate M‑GAM on several benchmark datasets where missing‑at‑random (MAR) patterns are synthetically introduced, as well as on real‑world datasets with naturally occurring missingness (e.g., medical records). Baselines include single‑imputation GAM, multiple‑imputation (MICE) followed by XGBoost, a linear model with only missingness indicators, and XGBoost variants that allow splits on missingness. Across experiments, M‑GAM matches or exceeds predictive accuracy while achieving dramatically higher sparsity—often using only 30 % of the parameters of the baselines—and reducing training time by a factor of two to three. The ℓ₀ regularizer automatically selects the most informative missingness indicators and interactions, avoiding over‑parameterization even when the theoretical number of possible terms is O(d²).
In conclusion, M‑GAM offers a principled, interpretable alternative for predictive modeling with incomplete data. By treating missingness as a first‑class feature and controlling model complexity through ℓ₀ regularization, it delivers the transparency required in high‑stakes domains such as healthcare and criminal justice without compromising performance. The work bridges the gap between purely imputation‑based pipelines and naïve indicator‑only models, providing a scalable framework that can be readily adopted in practice.


Comments & Academic Discussion

Loading comments...

Leave a Comment