Transferring Visual Explainability of Self-Explaining Models to Prediction-Only Models without Additional Training

Transferring Visual Explainability of Self-Explaining Models to Prediction-Only Models without Additional Training
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In image classification scenarios where both prediction and explanation efficiency are required, self-explaining models that perform both tasks in a single inference are effective. However, for users who already have prediction-only models, training a new self-explaining model from scratch imposes significant costs in terms of both labeling and computation. This study proposes a method to transfer the visual explanation capability of self-explaining models learned in a source domain to prediction-only models in a target domain based on a task arithmetic framework. Our self-explaining model comprises an architecture that extends Vision Transformer-based prediction-only models, enabling the proposed method to endow explanation capability to many trained prediction-only models without additional training. Experiments on various image classification datasets demonstrate that, except for transfers between less-related domains, the transfer of visual explanation capability from source to target domains is successful, and explanation quality in the target domain improves without substantially sacrificing classification accuracy.


💡 Research Summary

The paper addresses a practical problem: many organizations already possess high‑performing image classifiers that output predictions only, yet they would like to obtain visual explanations (e.g., heatmaps) without incurring the heavy computational cost of post‑hoc methods such as SHAP or Grad‑CAM. Training a dedicated self‑explaining model from scratch would require additional labeling of explanations and extensive fine‑tuning, which is often prohibitive.

To solve this, the authors propose a method that transfers the visual explainability learned by a self‑explaining model in a source domain to a prediction‑only model in a target domain, using the task‑arithmetic framework. The key steps are:

  1. Self‑explaining model design – They extend a Vision‑Transformer‑based vision‑language model (e.g., CLIP) by adding a domain‑specific linear head whose weights are fixed text embeddings of class names. The backbone (θ) is the only trainable component. The model simultaneously produces class logits (via the

Comments & Academic Discussion

Loading comments...

Leave a Comment