GRASP: group-Shapley feature selection for patients
Feature selection remains a major challenge in medical prediction, where existing approaches such as LASSO often lack robustness and interpretability. We introduce GRASP, a novel framework that couples Shapley value driven attribution with group $L_{21}$ regularization to extract compact and non-redundant feature sets. GRASP first distills group level importance scores from a pretrained tree model via SHAP, then enforces structured sparsity through group $L_{21}$ regularized logistic regression, yielding stable and interpretable selections. Extensive comparisons with LASSO, SHAP, and deep learning based methods show that GRASP consistently delivers comparable or superior predictive accuracy, while identifying fewer, less redundant, and more stable features.
💡 Research Summary
The paper introduces GRASP (Group‑Shapley Feature Selection for Patients), a novel feature‑selection framework that integrates Shapley‑value attribution with structured sparsity via a group L₂₁ regularizer. The authors first train an XGBoost classifier on the training fold of a dataset and compute SHAP values on a held‑out validation fold. For each feature j, the mean SHAP value ϕ_j across validation samples is calculated. Features are pre‑grouped (e.g., clinical categories, laboratory panels) into disjoint sets G = {g₁,…,g_G}. Group importance s_g is defined as the average of the member features’ SHAP values. To translate importance into a penalty weight, the authors apply an exponential transformation: ω̃_g = exp(−s_g/τ₀) + ε, then normalize so that Σ_g ω_g = 1. Consequently, groups with larger SHAP importance receive smaller ω_g, reducing their contribution to the regularization term.
The loss function combines the binary cross‑entropy of logistic regression, L(β) = −(1/n) Σ_i
Comments & Academic Discussion
Loading comments...
Leave a Comment