Beyond Attribution: Unified Concept-Level Explanations
There is an increasing need to integrate model-agnostic explanation techniques with concept-based approaches, as the former can explain models across different architectures while the latter makes explanations more faithful and understandable to end-users. However, existing concept-based model-agnostic explanation methods are limited in scope, mainly focusing on attribution-based explanations while neglecting diverse forms like sufficient conditions and counterfactuals, thus narrowing their utility. To bridge this gap, we propose a general framework UnCLE to elevate existing local model-agnostic techniques to provide concept-based explanations. Our key insight is that we can uniformly extend existing local model-agnostic methods to provide unified concept-based explanations with large pre-trained model perturbation. We have instantiated UnCLE to provide concept-based explanations in three forms: attributions, sufficient conditions, and counterfactuals, and applied it to popular text, image, and multimodal models. Our evaluation results demonstrate that UnCLE provides explanations more faithful than state-of-the-art concept-based explanation methods, and provides richer explanation forms that satisfy various user needs.
💡 Research Summary
The paper introduces UnCLE (Unified Concept‑Level Explanations), a general and lightweight framework that lifts existing local model‑agnostic explanation methods to the concept level without altering their core algorithms. The authors observe that popular techniques such as LIME, Anchors, LORE, and Kernel SHAP share a three‑step pipeline: predicate generation, perturbation, and learning. By replacing low‑level feature predicates with high‑level concept predicates extracted by pretrained vision or language models, and by performing perturbations directly on concepts rather than on raw features, UnCLE can generate three types of explanations—attributions, sufficient conditions, and counterfactuals—using the same learning machinery as the original methods.
The framework works as follows. First, a concept‑extracting model identifies semantically meaningful concepts (objects, scenes, topics, sentiments, etc.) from the input. Each concept is turned into a binary predicate p_c that indicates its presence. Second, a “concept‑level perturbation model” creates binary vectors over these predicates and maps them back to concrete inputs using large pretrained generative models (e.g., GPT‑4, CLIP). The prompts explicitly enforce the inclusion or exclusion of each concept, yielding realistic perturbed samples. Third, the original learning algorithm (linear regression for LIME/SHAP, KL‑LUCB for Anchors, decision trees for LORE) is applied to the concept‑predicate representations and the model’s outputs, producing concept‑based explanations.
Empirical evaluation covers text classifiers (BERT, Llama‑2), image classifiers (YOLOv8, ResNet), and a multimodal CLIP‑ViT model. UnCLE‑augmented LIME, Anchors, LORE, and Kernel SHAP achieve an average fidelity improvement of 56.8 % over their vanilla counterparts. Moreover, UnCLE uniquely provides rule‑based sufficient conditions and actionable counterfactuals, which standard feature‑level methods lack. A human study with 30 participants shows that explanations generated by UnCLE reduce task completion time by 34 % and increase accuracy by 22 % in downstream decision‑making tasks, confirming practical utility.
The authors acknowledge limitations: dependence on the quality of concept extraction, increased computational cost due to large generative models, and the need for careful prompt design for complex concept combinations. They suggest future work on domain‑specific concept vocabularies, lightweight perturbation models, and automated selection of the most suitable explanation type.
In summary, UnCLE demonstrates that concept‑level explanations can be obtained by a simple augmentation of existing model‑agnostic methods, delivering richer, more faithful, and user‑friendly insights across text, image, and multimodal domains. This bridges the gap between the flexibility of model‑agnostic techniques and the interpretability of concept‑based explanations, opening new avenues for trustworthy AI deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment