Reading time: 16 minute
...

๐Ÿ“ Original Info

  • Title:
  • ArXiv ID: 2512.17771
  • Date:
  • Authors: Unknown

๐Ÿ“ Abstract

While the enormous parameter scale endows Large Models (LMs) with unparalleled performance, it also limits their adaptability across specific tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a critical approach for effectively adapting LMs to a diverse range of downstream tasks. However, existing PEFT methods face two primary challenges: (1) High resource cost. Although PEFT methods significantly reduce resource demands compared to full fine-tuning, it still requires substantial time and memory, making it impractical in resource-constrained environments. (2) Parameter dependency. PEFT methods heavily rely on updating a subset of parameters associated with LMs to incorporate task-specific knowledge. Yet, due to increasing competition in the LMs landscape, many companies have adopted closed-source policies for their leading models, offering access only via Application Programming Interface (APIs). Whereas, the expense is often cost-prohibitive and difficult to sustain, as the fine-tuning process of LMs is extremely slow. Even if small models perform far worse than LMs in general, they can achieve superior results on particular distributions while requiring only minimal resources. Motivated by this insight, we propose Easy Adaptation (EA), which designs Specific Small Models (SSMs) to complement the underfitted data distribution for LMs. Extensive experiments show that EA matches the performance of PEFT on diverse tasks without accessing LM parameters, and requires only minimal resources.

๐Ÿ“„ Full Content

In recent years, Large Models (LMs) have achieved remarkable success [Zheng et al., 2025]. However, as their parameter scale continue to grow, the computational resources required for fine-tuning have become increasingly burdensome. For instance, fine-tuning LLaMA-65B [Touvron et al., 2023] requires ten GPUs with 80 GB of memory each, making the process prohibitively costly for most individuals and thereby motivating a growing reliance on cloud-based LM services [Borzunov et al., 2023;Chen et al., 2025b].

To reduce the computational cost of fine-tuning and enable LMs to adapt to various specific tasks, Parameter-Efficient Fine-Tuning (PEFT) have been proposed, which train only a small subset of parameters while keeping the majority of parameters unchanged [Han et al., 2024;Lei et al., 2023]. For example, for the most popular PEFT methods Low-Rank Adaptation (LoRA) [Hu et al., 2022], in the case of finetuning GPT-3, LoRA can reduce hardware requirements by a factor of three compared to the full fine-tuning.

Although PEFT methods have demonstrated impressive performance, it remains challenging to adapt LMs to specific tasks, which can be summarized as: (1) High resource cost. As shown in Figure 1, when fine-tuning with LoRA, the memory and time cost for fine-tuning a LM remain significantly higher than those for training a small model. For instance, fine-tuning Qwen2.5-14B with LoRA requires about 40 GB of memory, and the training time with the same setting is approximately nine times longer compared to that of a small model. High resource costs hinder the effectiveness of PEFT in resource-constrained environments, as the vast majority of individual devices still remain at the same level as before the era of LMs. (2) Parameter dependency. As competition among LMs intensifies, many companies have chosen to close-source their leading LMs, providing only limited access via APIs to safeguard proprietary technologies. Finetuning these closed-source LMs in the cloud has become pro-hibitively expensive. For instance, training the o4-mini model via APIs incurs a cost of $100 per hour1 . Given that finetuning LMs is often a time-consuming process, this can result in significant expenses. Furthermore, existing PEFT methods heavily depend on the parameters update, rendering them unsuitable for fine-tuning the closed-source models.

Prior studies have demonstrated that models with larger parameter scales can accommodate a broader range of data distributions [Kaplan et al., 2020;Bahri et al., 2024]. Consequently, LMs with extensive prior knowledge generally outperform Specific Small Models (SSMs) that are trained on specific datasets. Although SSMs lag behind LMs in terms of overall performance, existing works indicate that SSMs can achieve comparable or even superior results to LMs on particular data distributions [Chen et al., 2024;Chen et al., 2025a]. Therefore, we propose to complement the underfitted distributions of LMs by training SSMs, thereby mitigating the aforementioned limitations of PEFT methods.

As illustrated in Figure 2 (a), we propose Easy Adaptation (EA) based on the collaboration between cloud-based LMs and tailored SSMs, enabling the vast majority of individuals to efficiently inject task-specific knowledge into LMs even in resource-constrained environments. Specifically, when adapting a LM to a specific task, EA first trains SSMs to coarse-grained complement the underfitted data distribution of the LM and employs a Router to match input to the most appropriate model. Subsequently, EA selects underfitted training data with the existing SSMs and the LM, thereby targeting and compensating for the current capability deficiencies to further enhance overall performance. Extensive experiments on various tasks demonstrate that EA can achieve performance comparable to or surpassing PEFT methods like LoRA, while significantly reducing training cost. Particularly, in image classification tasks, EA enhances the performance of LLaVA-V1.6-7B by 2.47%, surpassing LoRA by 1.07%, while requiring only 4.01% and 4.35% of LoRA’s time cost and memory cost, respectively.

The main contributions of this paper can be summarized as follows:

โ€ข We discuss the issues of high resource cost and parameter dependency associated inherent in PEFT methods.

Existing PEFT methods can be broadly categorized into three types: additive fine-tuning, selective fine-tuning, and reparameterized fine-tuning [Han et al., 2024]. Additive finetuning incorporates additional trainable components-such as adapters or prefix tokens-into the LM to capture taskspecific knowledge without modifying the base model [Pfeiffer et al., 2020]. Selective fine-tuning focuses on fine-

The proposed Easy Adaptation (EA) is a general framework that trains Specific Small Models (SSMs) on different tasks to complement the underfitted distributions of a Large Model (LM), enabling the LM to adapt to specific tasks. In this section, we will further elaborate on the implementation process of the proposed EA.

LM, denoted as M L , with the vast parameter scale, are capable of fitting a wide range of data distributions, D L . In contrast, SSM, denoted as M S , due to the more limited parameter scale, can effectively fit only a narrower set of data distributions, D S . Generally, the distribution D L exhibits a high degree of overlap with the data distributions of most tasks, which explains why LM perform well in zero-shot and few-shot settings. Building on this, fine-tuning is employed to incorporate task-specific knowledge into the LM, complementing for distributional gaps and thereby enhancing performance on specific tasks. When training a SSM for a specific task, the fitted distribution D S , although significantly narrower than D L , typically includes content absent from D L but essential for the task. The formal expression of the above process is:

(1) where function Scale(โ€ข) measures the scale.

For an input x of a specific task, when x โˆˆ D L and x / โˆˆ D S , the input is not suitable for processing by the SSM, and the LM have a higher probability of producing the correct result for x:

where y is the ground truth. When x โˆˆ D S and x / โˆˆ D L , the input x originates from the distribution of a specific task learned by the SSM, which remains unknown to the LM. Thus:

there is no significant difference between processing the input x with the SSM or the LM.

In order to decouple from the LMs’ parameters and enable adaptation to specific tasks, we introduce the Specific Layer within the EA framework. As illustrated in Figure 2 (a), Specific Layer consists of multiple SSMs trained on specific task datasets. Due to differences in model architecture, etc, each SSM may fit a distinct distribution D S . By selecting the appropriate SSM for each input through a Router, the overall framework can cover a broader range of data distributions that LMs has not previously fitted. Initially, we independently train N SSMs, where each model M i S is fitted to its unique distribution D i S . To assess whether a SSM is proficient at handling a given input x, we use the confidence output by the softmax layer Sof tmax(โ€ข) as the evaluation criterion [Niculescu-Mizil and Caruana, 2005]:

where ฯ„ denotes the threshold that determines whether M i S is proficient at handling the input x. According to Eq.4, selecting different SSMs to process x can significantly reduce the dependence on the LM, thereby reducing the cost associated with LMs. However, in Specific Layer, SSMs exhibit weak correlations with distributions that are poorly fitted by the LM, making it challenging to address the LM’s capability deficiencies on specific tasks. Therefore, we further introduce the Augmented Layer.

The SSMs within the Specific Layer only provide coarsegrained compensation and do not specifically target the underfitted distributions of the LM. To better complement for the deficiencies of LMs on specific tasks, we propose the Augmented Layer as an extension of the Specific Layer. After identifying the underfitted dataset X U train of the Specific Layer and the LM by Eq.5, we can train the Augmented Specific Small Models (ASSMs) to achieve targeted compensation:

where ฮธ is the parameter set of the SSM M S . Eq.6 represents the fine-tuning process of M S with data X U train . With ASSM M AS , we can further bridge the gaps in the fitted distributions of both SSMs and the LM, thereby enhancing the performance of the EA framework on specific tasks.

The Specific Layer and Augmented Layer provide the foundational capabilities for the LM to adapt to specific tasks. However, during inference, when a new input is received, a routing mechanism is still needed to determine the input should be processed by which model. According to the prior work [Niculescu-Mizil and Caruana, 2005], confidence C i S

where I is indicator. With performance evaluated by Eq.7, we can rank SSMs in descending order:

where P 1 corresponds to the best SSM, while that of P N corresponds to the worst one, Index(โ€ข) is the index function that can return the model id. During inference, input x will first be processed by M Index(P1) S

. If confidence C

โ‰ฅ ฯ„ 1 , the processing of the input data x will terminate. Conversely, if C Index(P1) S < ฯ„ 1 , the input x will be passed to the next SSM, and the above procedure will be repeated. When the input x does not belong to the distribution fitted by any SSM, it will ultimately be handled by the LM. The mechanism of Router 2 in Figure 2 (a) is identical to that of Router 1, with the sole distinction that the Augmented Layer is activated only when LM cannot handle the input. The whole process of EA is summarized in Appendix A.

In our experiments, we aim to (1) evaluate whether EA can enable the LM to adapt to specific tasks, while incurring minimal local resource costs compared to PEFT methods. (2) evaluate the effectiveness of Specific Layer within EA to achieve coarse-grained compensation for deficient capabilities of the LM, (3) evaluate the effectiveness of Augmented Layer within EA to achieve targeted compensation for ca-

Currently, the most popular PEFT methods, such as LoRA [Hu et al., 2022], QLoRA [Dettmers et al., 2023], and Freeze, have significantly reduced the resources required for finetuning LMs. However, for the majority of individuals using relatively low-cost devices, the resource requirements of PEFT remain prohibitively high. In particular, many users significantly reduce the batch size to lower memory usage, which in turn leads to substantial increases in time cost. To alleviate the above issue, we propose EA that injects specific knowledge by training SSMs, and as a result, the resource cost is comparable to model training before the era of LMs. In Table 1, we compare the effectiveness of EA, LoRA, QLoRA, and Freeze in adapting LMs to specific classification tasks across different datasets.

As shown in the Table 1, EA achieves results very close to the best-performing PEFT methods in most cases, while significantly reducing resource cost. For example, in the NLI task, QLoRA improves the performance of GLM-4-9B from 86.73% to 90.68%. EA achieves a comparable result of 90.46%, while consuming only about one-quarter of QLoRA’s memory and one-third of its training time. In addi-tion, in the IC task, EA boosted the performance of LLaVA-V1.6-7B from 93.57% to 96.04%, outperforming other PEFT methods, while its time and memory cost are only onethirteenth and one-twenty-eighth, respectively, of those of the best-performing PEFT method. Furthermore, we present the results of EA on the generation task in Table 2. As can be observed, compared with the classification tasks reported in Table 1, the generation task incurs relatively higher time and memory cost due to the larger local model (T5). Nevertheless, the cost remains only about one-third to one-quarter of that required by LoRA.

The aforementioned results indicate that EA can achieve performance comparable to existing PEFT methods while more efficiently conserving computational resources. By invoking the original LMs via APIs, all EA-related results in Table 1 can be rapidly obtained with a RTX2050 (4 GB).

The Specific Layer in the EA framework offers coarsegrained capability compensation to assist the LM in adapting to specific tasks. Although the SSMs in Specific Layer cannot provide targeted compensation for underfitted distributions, they route a portion of the inputs away from the LM via the Router. This substantially reduces the LM’s invocation frequency, thereby conserving computational resources.

In Table 3, we present the model invocation proportions within Specific Layers, as well as their performance. It can be observed that, compared to the original LM, the Specific Layer achieves comparable results through model collaboration, while significantly reducing the invocation rate of the LM. For instance, for the Llama-3-8B of SA task, Specific Layer achieves a result of 80.85%, slightly higher than the LM’s 80.39%, while the invocation rate of the LM is only 24.37%. This indicates that the time required to complete task-specific inference can be significantly reduced, as the majority of inputs will be processed by the smaller, faster SSMs. In addition, the results in Table 3 show significant variation in the invocation proportions of SSMs within the Specific Layer. Taking the previously mentioned SA task as an example, in the Router module that routes inputs based on performance ranking, RoBERTa is invoked first and handles 71.78% of the inputs. XLNet follows, processing only 3.83% 4, where EA(Full) refers to the Augmented Layer are fine-tuned on all data where the LM makes errors.

As shown in the Table 4, the performance of targeted compensation (i.e., EA) is comparable to that of fine-tuning with all misclassified data (i.e., EA(Full)). Meanwhile, since EA collects underfitted data that is filtered by both the SSMs and the LM, the resulting dataset is smaller in size but more targeted in content. As a result, EA with targeted compensation offers a clear advantage in training efficiency. For instance, for NLI and SA, the training time of EA is about one-half to two-thirds of that of EA(Full), while for IC, the training time of EA is about 2.31% of that of EA(Full). Additional experiments with different LLMs are provided in Appendix C.

As competition among LMs intensifies, an increasing number of companies are adopting closed-source policies and providing services to users via APIs. As PEFT methods heavily rely on parameters update, making it impossible to adapt closedsource LMs to specific tasks. On the other hand, uploading datasets for cloud-based fine-tuning incurs significant costs. We demonstrate the EA framework can enable closed-source LMs to adapt to specific tasks, and related results are presented in Table 5. Obviously, the EA framework yields notable improvements for various closed-source LMs on both NLI and IC tasks. In particular, for the IC task, EA improves the performance of Doubao-1.5-pro-32k by 2.03%. All results in Table 5 clearly demonstrate that the EA framework can effectively adapt LMs to specific tasks without requiring access to model parameters, thereby offering broader applicability across diverse scenarios.

In Table 6, we present a fine-grained analysis of the impact of different methods on the LM by partitioning the CIFAR-10 based on the number of samples in each category. For Qwen2VL-7B, QLoRA improves the overall accuracy of the LM from 92.83% to 96.45%, with its fine-tuning process primarily enhancing the performance on head and middle data, resulting in accuracy gains of 3.79% and 3.85%, respectively. In contrast, EA mainly addresses the LM’s underfitting on head data, while its performance on tail data even slightly declines. This leads to EA’s relatively weaker performance for Qwen2-VL-7B. As for LLaVA-V1.6-7B, it demonstrates strong performance on head and middle data, which contributes to its overall better results compared to Qwen2-VL-7B. However, due to its excellent fitting on head data and the large proportion of head samples, the model becomes overfitted to head data during fine-tuning. Such overfitting negatively affects the performance of PEFT methods on middle and tail data. In contrast, EA mitigates this issue by training targeted small models to fit specific task distributions, making it less susceptible to overfitting. Overall, when the original LM is more prone to overfitting the specific task distribution, the EA framework tends to exhibit greater performance advantages.

More experiments about hyperparameter analysis and ab-lation study please refer to the Appendix D and E.

Parameter Efficient Fine-Tuning (PEFT) methods are crucial for the widespread adoption of Large Models (LM). Although PEFT methods, such as LoRA, have significantly reduced the difficulty of fine-tuning LMs, they still face challenges in extreme scenarios, including high resource cost and parameter dependency. To address these issues, we propose training Specific Small Models (SSMs) to complement the underfitted distributions of LMs on specific tasks, which enables fast, lightweight, and parameter-free adaptation of LMs.

Extensive experiments validate the effectiveness of the proposed framework, particularly in resource-constrained environments.

Experiments are conducted on XNLI[Conneau et al., 2018] for the Natural Language Inference (NLI) task, Yelp Reviews[Asghar, 2016] for Sentiment Analysis (SA), CIFAR-10 [Krizhevsky et al., 2009] for Image Classification (IC), and CNN/DailyMail [Nallapati et al., 2016] for Summarization.Unless otherwise specified, the small model used by EA is RoBERTa [Liu et al., 2019], MobileNet V2 [Sandler et al., 2018] and T5 [Raffel et al., 2020]. Moreover, in the following experiments, the reported memory cost reflects only the memory required for training the local SSMs, while the time cost includes both the cloud-based LM inference time and the local SSMs training time. More details about the datasets and experimental settings please refer to the Appendix B.

Experiments are conducted on XNLI[Conneau et al., 2018] for the Natural Language Inference (NLI) task, Yelp Reviews[Asghar, 2016] for Sentiment Analysis (SA), CIFAR-10 [Krizhevsky et al., 2009]

Experiments are conducted on XNLI[Conneau et al., 2018] for the Natural Language Inference (NLI) task, Yelp Reviews[Asghar, 2016] for Sentiment Analysis (SA), CIFAR-10 [

Experiments are conducted on XNLI[Conneau et al., 2018] for the Natural Language Inference (NLI) task, Yelp Reviews[Asghar, 2016]

Experiments are conducted on XNLI[Conneau et al., 2018] for the Natural Language Inference (NLI) task, Yelp Reviews

Experiments are conducted on XNLI[Conneau et al., 2018]

Experiments are conducted on XNLI

https://openai.com/api/pricing/ tuning a small subset of the LM’s parameters, such as specific layers or parameter blocks, to reduce computational cost while preserving adaptability [Houlsby et al.

, 2019;Sung et al., 2021]. Reparameterized fine-tuning modifies the parameterization of the model-for example, by using lowrank decomposition (e.g., LoRA)-to enable efficient learning with fewer trainable parameters[Dettmers et al., 2023].

Code is included in the supplemental material and will be released upon the paper acceptance.

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut