AIR: Post-training Data Selection for Reasoning via Attention Head Influence

AIR: Post-training Data Selection for Reasoning via Attention Head Influence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

LLMs achieve remarkable multi-step reasoning capabilities, yet effectively transferring these skills via post-training distillation remains challenging. Existing data selection methods, ranging from manual curation to heuristics based on length, entropy, or overall loss, fail to capture the causal importance of individual reasoning steps, limiting distillation efficiency. To address this, we propose Attention Influence for Reasoning (AIR), a principled, unsupervised and training-free framework that leverages mechanistic insights of the retrieval head to select high-value post-training data. AIR first identifies reasoning-critical attention heads of an off-the-shelf model, then constructs a weakened reference model with disabled head influence, and finally quantifies the resulting loss divergence as the Attention Influence Score. This score enables fine-grained assessment at both the step and sample levels, supporting step-level weighted fine-tuning and global sample selection. Experiments across multiple reasoning benchmarks show that AIR consistently improves reasoning accuracy, surpassing heuristic baselines and effectively isolating the most critical steps and samples. Our work establishes a mechanism-driven, data-efficient approach for reasoning distillation in LLMs.


💡 Research Summary

The paper introduces Attention Influence for Reasoning (AIR), a novel, unsupervised, training‑free framework for selecting post‑training data that best transfers multi‑step reasoning abilities from a large language model (LLM) to a smaller student model. The authors argue that existing data‑selection strategies—manual curation, length‑based heuristics, entropy, or overall loss—are coarse proxies that fail to capture the causal importance of individual reasoning steps. To overcome this, AIR leverages recent mechanistic interpretability findings that identify a subset of attention heads, termed “retrieval heads,” which are responsible for token‑level copying (the “copy‑paste” operation) that underlies factual retrieval and chain‑of‑thought (CoT) reasoning.

The method proceeds in three stages:

  1. Retrieval‑Head Identification – For each attention head h, the authors compute a Retrieval Score (R_h) based on two conditions: (C1) the generated token appears in the source context, and (C2) the head assigns the maximal attention weight to the position of that token. The proportion of such successful copy events over the total context defines (R_h). Heads in the top δ percentile (e.g., 5 %) are deemed “reasoning‑critical retrieval heads.”

  2. Construction of a Weakened Reference Model – The identified heads are masked in a copy of the base model, producing a reference model (\theta_{ref}). Masking is implemented by forcing the attention distribution of each disabled head to a uniform vector (1/L), thereby nullifying its specialized retrieval capability while leaving all other parameters unchanged.

  3. Attention Influence Scoring – The core metric is the loss divergence between the base model (\theta_{base}) and the weakened reference model for each token (x_t): \


Comments & Academic Discussion

Loading comments...

Leave a Comment