차이 벡터 기반 비등방성 스케일링을 이용한 효율적 모델 편집
📝 Abstract
Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability. Task arithmetic has recently emerged as a promising solution, using simple arithmetic operations-addition and negation-based on task vectors which are the differences between fine-tuned and pre-trained model weights, to efficiently modify model behavior. However, the full potential of task arithmetic remains underexplored, primarily due to limited mechanisms for overcoming optimization stagnation. To address this challenge, we introduce the notion of difference vector, a generalized form of task vectors derived from the historical movements during optimization. Using difference vectors as directed perturbations, we propose the Difference Vector-based Anisotropic Scaling Iterative algorithm (DV-BASI) to enable a continuous optimization process for task arithmetic methods without relying on any additional modules or components. Notably, by leveraging escapability and directional advantages of difference vectors, the average performance on different tasks of the multi-task model merged by DV-BASI may even outperform models individually fine-tuned. Based on this observation, we extend the application of difference vectors to a feasible fine-tuning method for single-task models. On the practical side, DV-BASI allows expressive searching directions with few learnable parameters and forms a scalable framework. We also integrate DV-BASI with task arithmetic methods and advanced optimization techniques to achieve state-of-the-art performance on both supervised and unsupervised evaluation protocols.
💡 Analysis
Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability. Task arithmetic has recently emerged as a promising solution, using simple arithmetic operations-addition and negation-based on task vectors which are the differences between fine-tuned and pre-trained model weights, to efficiently modify model behavior. However, the full potential of task arithmetic remains underexplored, primarily due to limited mechanisms for overcoming optimization stagnation. To address this challenge, we introduce the notion of difference vector, a generalized form of task vectors derived from the historical movements during optimization. Using difference vectors as directed perturbations, we propose the Difference Vector-based Anisotropic Scaling Iterative algorithm (DV-BASI) to enable a continuous optimization process for task arithmetic methods without relying on any additional modules or components. Notably, by leveraging escapability and directional advantages of difference vectors, the average performance on different tasks of the multi-task model merged by DV-BASI may even outperform models individually fine-tuned. Based on this observation, we extend the application of difference vectors to a feasible fine-tuning method for single-task models. On the practical side, DV-BASI allows expressive searching directions with few learnable parameters and forms a scalable framework. We also integrate DV-BASI with task arithmetic methods and advanced optimization techniques to achieve state-of-the-art performance on both supervised and unsupervised evaluation protocols.
📄 Content
Pre-trained models are essential in contemporary machine learning systems due to their efficiency and transferability. Editing models after pre-training is widely recognized as an effective way to enhance model performance on specific downstream tasks (Wortsman et al. 2022;Zhuang et al. 2021;Matena and Raffel 2022), mitigate undesired behaviors (Santurkar et al. 2021;Ribeiro and Lundberg 2022;Murty et al. 2022), align models with human preferences (Askell et al. 2021;Ouyang et al. 2022;Kasirzadeh and Gabriel 2022), or incorporate new information (Cao, Aziz, and Titov 2021;Mitchell et al. 2022a,b). However, traditional editing approaches, which rely on expensive joint finetuning across multiple tasks (Vu et al. 2022) and human feedback (Matthews 1975), face limitations in scalability and accessibility. Moreover, optimizing models for downstream tasks often comes at the expense of diminished pretraining performance or zero-shot accuracy (Garipov et al. 2018;Loshchilov and Hutter 2019;Stallkamp et al. 2011a).
Recently, innovative research on task arithmetic has introduced cost-effective and scalable model editing techniques (Ilharco et al. 2023;Yadav et al. 2023;Yang et al. 2024;Ortiz-Jiménez, Favero, and Frossard 2023;Yoshida et al.;Zhang et al. 2024). By leveraging the concept of task vector that is defined as the element-wise difference between the weights of fine-tuned and pre-trained models, task arithmetic can modify various models through simple arithmetic operations on these vectors (Ilharco et al. 2023). Specifically, negating a task vector can eliminate undesirable behaviors on specific tasks (task negation), while adding task vectors from different tasks can lead to the creation of a multi-task model that performs well on multiple tasks simultaneously (task addition). Recent advances on linearized task vectors deepen the theoretical understanding on task arithmetic by addressing the interference among task vectors. Through techniques based on model linearization via the neural tangent kernel approximation (Ortiz-Jiménez, Favero, and Frossard 2023) and τ -Jacobian product regularization (τ Jp Reg) (Yoshida et al.) during the model pretraining stage, the linearized task vectors can be produced with less weight disentanglement error.
Although recent studies have advanced our understanding of task arithmetic, current approaches for designing task vector combination strategies have not yet realized the full potential of task arithmetic. Ideally, a merged multi-task model edited through task arithmetic is expected to achieve performance comparable to that of individually fine-tuned single-task models. However, due to the limited expressive power of combination coefficients learned via coarse grid search (Ilharco et al. 2023;Yadav et al. 2023), this goal remains elusive in practice. Although current finer-grained Parameter-Efficient Fine-Tuning (PEFT) task vector combination methods based on block-wise optimization (Zhang et al. 2024;Yang et al. 2024) are addressing this issue, they are still fundamentally constrained by their single-step nature. Specifically, optimization often stops prematurely when model parameters become trapped in local optima where gradients vanish, thus impeding further exploration. Therefore, analogous to traditional parameter optimization methods, developing a multi-step optimization approach that can efficiently escape local optima and realize continuous optimization to find a better global solution is a crucial task.
To address this challenge, we propose a difference vectorbased anisotropic scaling iterative algorithm (DV-BASI) to achieve continuous exploration in the parameter space, as illustrated in Figure 1. We extend the concept of task vectors to a more general difference vector, defined as the elementwise difference between the weights of a model in any arbitrary state during training and those of the pre-trained model. Similar to task vectors as knowledge carriers, the difference vector, as a cumulative result of previous optimizations, contains information of historical movements about the model weights from the training process. Through theoretical and empirical analysis, we demonstrate that difference vectors enable continuous model optimization with the following merits: (i) Escapability and Directional Advantage: When model weights are trapped in a local optimum, the updated difference vector at that point acts as a directed perturbation, effectively helping the model weights escape the current critical point and continue searching for a potentially better solution. (ii) Component-Free Continuity: Continuous exploration in the parameter space relies solely on the updates of the difference vector, without depending on additional components such as adapters (Houlsby et al. 2019), prompts (Jia et al. 2022), or LoRA (Hu et al. 2022).
We demonstrate that DV-BASI is a scalable multi-step task arithmetic framework. Adhering to the standard evaluation protocols of task
This content is AI-processed based on ArXiv data.