Efficient Estimation of Kernel Surrogate Models for Task Attribution
Modern AI agents such as large language models are trained on diverse tasks – translation, code generation, mathematical reasoning, and text prediction – simultaneously. A key question is to quantify how each individual training task influences performance on a target task, a problem we refer to as task attribution. The direct approach, leave-one-out retraining, measures the effect of removing each task, but is computationally infeasible at scale. An alternative approach that builds surrogate models to predict a target task’s performance for any subset of training tasks has emerged in recent literature. Prior work focuses on linear surrogate models, which capture first-order relationships, but miss nonlinear interactions such as synergy, antagonism, or XOR-type effects. In this paper, we first consider a unified task weighting framework for analyzing task attribution methods, and show a new connection between linear surrogate models and influence functions through a second-order analysis. Then, we introduce kernel surrogate models, which more effectively represent second-order task interactions. To efficiently learn the kernel surrogate, we develop a gradient-based estimation procedure that leverages a first-order approximation of pretrained models; empirically, this yields accurate estimates with less than $2%$ relative error without repeated retraining. Experiments across multiple domains – including math reasoning in transformers, in-context learning, and multi-objective reinforcement learning – demonstrate the effectiveness of kernel surrogate models. They achieve a $25%$ higher correlation with the leave-one-out ground truth than linear surrogates and influence-function baselines. When used for downstream task selection, kernel surrogate models yield a $40%$ improvement in demonstration selection for in-context learning and multi-objective reinforcement learning benchmarks.
💡 Research Summary
The paper tackles the problem of task attribution—quantifying how each individual training task influences the performance of a target task in multi‑task learning settings such as large language models (LLMs) or multitask neural networks. The naïve solution, leave‑one‑out (LOO) retraining, requires a full training run for every task removal and is therefore infeasible for realistic numbers of tasks. Influence‑function methods provide a first‑order analytical approximation but still need repeated Hessian‑vector products, which become prohibitive at the scale of modern models.
Recent work has introduced surrogate models that learn a mapping from a binary task‑selection vector s ∈ {0,1}^K (indicating which tasks are used) to the test loss F(s) of the model trained on that subset. Most prior surrogates are linear (α + βᵀs) and can only capture additive, first‑order effects. The authors first formalize a unified “task‑weighting” framework and show, via a second‑order Taylor expansion and a delta‑method analysis, that linear surrogates essentially recover the same quantity as influence functions—namely the gradient ∇_s F(s*) evaluated at the uniform weight vector s* =
Comments & Academic Discussion
Loading comments...
Leave a Comment