Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may provide valuable insights. Influence functions have been proposed as a means to explain model predictions based on training data. However, existing approaches fail to compute ``multi-stage’’ influence and lack scalability to billion-scale LLMs. In this paper, we propose the multi-stage influence function to attribute the downstream predictions of fine-tuned LLMs to pre-training data under the full-parameter fine-tuning paradigm. To enhance the efficiency and practicality of our multi-stage influence function, we leverage Eigenvalue-corrected Kronecker-Factored (EK-FAC) parameterization for efficient approximation. Empirical results validate the superior scalability of EK-FAC approximation and the effectiveness of our multi-stage influence function. Additionally, case studies on a real-world LLM, dolly-v2-3b, demonstrate its interpretive power, with exemplars illustrating insights provided by multi-stage influence estimates. Our code is public at https://github.com/colored-dye/multi_stage_influence_function.


💡 Research Summary

The paper tackles the problem of attributing the predictions of fine‑tuned large language models (LLMs) back to the massive pre‑training corpora from which they originally learned. While influence functions (IFs) have been proposed as a principled way to quantify the contribution of individual training examples, existing methods are limited to a single training stage and do not scale to models with billions of parameters. Moreover, they cannot handle cases where the fine‑tuned model’s output space differs from that of the pre‑trained model (e.g., a language model fine‑tuned for binary classification).

To address these gaps, the authors introduce a multi‑stage influence function that operates under the “pre‑train → fine‑tune” paradigm. They formalize fine‑tuning as an optimization problem that includes a proximity term penalizing deviation from the pre‑trained parameters. Under the assumption that fine‑tuned parameters stay close to the pre‑trained ones, they derive a closed‑form expression (Equation 12) for the influence of a pre‑training example z on a downstream test instance xₜ. The expression consists of two pre‑conditioned gradient terms: one involving the inverse of the pre‑training generalized Gauss‑Newton (GGN) matrix applied to the pre‑training loss gradient, and another involving the inverse of the fine‑tuning GGN applied to the test‑time measurement gradient.

Computing exact inverses of the GGN (or Hessian) is infeasible for LLMs because it would require O(p³) time and O(N p²) memory, where p is the number of parameters and N the number of training examples. The paper therefore adopts Eigenvalue‑Corrected Kronecker‑Factored Approximation (EK‑FAC), an extension of the Kronecker‑Factored Approximation (K‑FAC) that corrects eigenvalue‑wise variance and yields a more accurate block‑diagonal approximation of the GGN. EK‑FAC is applied separately to the MLP and multi‑head attention (MHA) blocks of the transformer, while embedding and unembedding layers are omitted due to dimensional mismatches. The resulting EK‑FAC factors are pre‑computed and stored, enabling fast retrieval during influence calculations.

A second scalability bottleneck is the need to evaluate influence for potentially billions of pre‑training examples. The authors propose a semantic similarity‑based candidate selection step: given a query, they first filter the pre‑training corpus using embedding similarity (e.g., cosine similarity of sentence embeddings) to obtain a manageable subset (a few thousand examples). Influence is then computed only for this subset, dramatically reducing runtime while preserving the most relevant contributors.

Empirical evaluation proceeds along three dimensions. First, the authors benchmark EK‑FAC against iterative methods such as LiSSA and Conjugate Gradient on synthetic and real LLM tasks, showing that EK‑FAC converges an order of magnitude faster and yields higher correlation with ground‑truth influence (computed via costly leave‑one‑out retraining on small models). Second, they compare the proposed multi‑stage IF against the single‑stage IF of Grosse et al. (2023) on a custom fact‑tracing benchmark. The multi‑stage version achieves substantially higher precision‑recall in identifying the true pre‑training sources of downstream predictions. Third, a case study on the publicly available instruction‑tuned model dolly‑v2‑3b (3 B parameters) demonstrates practical interpretability: for a set of user queries, the method surfaces specific pre‑training documents that most heavily influenced the model’s generated answers, providing concrete evidence of data provenance. An additional analysis reveals that MLP parameters contribute a larger share of the total influence than MHA parameters, suggesting a practical trade‑off where only MLP blocks need to be approximated for very large models.

In summary, the paper delivers a scalable framework for multi‑stage influence estimation in modern transformer‑based LLMs. By integrating EK‑FAC for efficient inverse‑G​GN computation and a similarity‑driven candidate filtering step, the authors make it feasible to trace predictions back to billions of pre‑training tokens. This capability opens avenues for improving model transparency, auditing training data for bias or toxicity, and informing data‑centric model development practices.


Comments & Academic Discussion

Loading comments...

Leave a Comment