Hybrid Attribution Priors for Explainable and Robust Model Training

Reading time: 1 minute
...

📝 Original Info

  • Title: Hybrid Attribution Priors for Explainable and Robust Model Training
  • ArXiv ID: 2512.14719
  • Date: 2025-12-09
  • Authors: Zhuoran Zhang, Feng Zhang, Shangyuan Li, Yang Shi, Yuanxing Zhang, Wei Chen, Tengjiao Wang, Kam-Fai Wong

📝 Abstract

Small language models (SLMs) are widely used in tasks that require low latency and lightweight deployment, especially classification. With the growing emphasis on interpretability and robustness, explanation-guided learning offers an effective framework by incorporating attribution-based supervision during training. However, how to derive general and reliable attribution priors remains an open challenge. Upon analyzing representative attribution methods in classification tasks, we find that while these methods reliably highlight class-relevant tokens, they tend to focus on common keywords shared by semantically similar classes. Since these classes are already prone to confusion under standard training, the attributions fail to provide sufficient discriminative cues, limiting their ability to enhance model differentiation. To address this challenge, we introduce Class-Aware Attribution Prior (CAP), a novel attribution prior extraction framework designed to guide language models in capturing fine-grained class distinctions, thus producing more salient and discriminative attribution priors. Building on this, we propose CAP Hybrid , which integrates priors from CAP and existing attribu...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut