Federated Co-tuning Framework for Large and Small Language Models
By adapting Large Language Models (LLMs) to domain-specific tasks or enriching them with domain-specific knowledge, we can fully harness the capabilities of LLMs. Nonetheless, a gap persists in achieving simultaneous mutual enhancement between the server’s LLM and the downstream clients’ Small Language Models (SLMs). To address this, we propose FedCoLLM, a novel and parameter-efficient federated framework designed for co-tuning LLMs and SLMs. This approach is aimed at adaptively transferring server-side LLMs knowledge to clients’ SLMs while simultaneously enriching the LLMs with domain insights from the clients. To accomplish this, FedCoLLM utilizes lightweight adapters in conjunction with SLMs, facilitating knowledge exchange between server and clients in a manner that respects data privacy while also minimizing computational and communication overhead. Our evaluation of FedCoLLM, utilizing various public LLMs and SLMs across a range of NLP text generation tasks, reveals that the performance of clients’ SLMs experiences notable improvements with the assistance of the LLMs. Simultaneously, the LLMs enhanced via FedCoLLM achieves comparable performance to that obtained through direct fine-tuning on clients’ data. Our code has been contributed to the FATE open-source project and is now publicly accessible at https://github.com/FederatedAI/FATE-LLM/tree/main/python/fate_llm/algo/fedcollm.
💡 Research Summary
**
The paper introduces FedCoLLM, a novel federated co‑tuning framework that simultaneously improves a server‑hosted large language model (LLM) and multiple client‑side small language models (SLMs). The motivation stems from three practical challenges: (1) domain‑specific data are often private and cannot be shared with LLM owners; (2) many enterprises lack the computational resources to fine‑tune massive LLMs; (3) existing work treats LLM and SLM adaptation as separate processes, missing the opportunity for mutual knowledge transfer.
FedCoLLM addresses these issues by combining parameter‑efficient fine‑tuning (PEFT) with knowledge distillation in a federated learning (FL) setting. Both the server’s LLM and each client’s SLM are equipped with lightweight low‑rank adapters (LoRA). The original model weights remain frozen, so only the adapter parameters—typically a few hundred thousand—are communicated.
During each communication round, the server broadcasts the current global SLM adapter θ to all clients. Each client replaces its local adapter with θ and fine‑tunes it on its private dataset D_k using standard FL loss. After local training, clients securely aggregate their updated adapters via SecureAggregation, producing a new global θ. Concurrently, the server attaches its own LoRA adapter ω to the LLM and performs mutual knowledge distillation with the globally updated SLM (g + θ) on an auxiliary public dataset D_a. The distillation loss combines cross‑entropy on D_a with a KL‑divergence term weighted by λ, enabling the LLM to absorb domain‑specific patterns from the SLM while the SLM benefits from the LLM’s broad linguistic knowledge.
The authors provide a detailed algorithmic description, computational and communication complexity analysis, and a privacy‑preserving argument. Because only adapters are exchanged and raw data never leave the client, communication overhead is reduced by an order of magnitude compared with naïve federated fine‑tuning of full models. The use of PEFT also keeps client‑side memory requirements modest, allowing deployment on resource‑constrained devices. Privacy is further protected by the standard FL security primitives (SecureAggregation) and by the fact that the distillation step uses a non‑sensitive public dataset.
Experiments involve four clients and one server, evaluating GPT‑2, OPT, and LLaMA‑2 as LLMs and LLaMA‑2‑1.3B as the base SLM. The tasks span text generation, summarization, and question answering. Results show that SLMs fine‑tuned with FedCoLLM achieve 4–7 % absolute improvements on BLEU, ROUGE, and Exact Match metrics relative to isolated fine‑tuning. The server‑side LLM, after co‑tuning, reaches performance comparable to direct fine‑tuning on the aggregated client data, confirming the effectiveness of the mutual distillation. Communication volume drops to less than 30 % of the baseline, and overall training time is reduced by roughly 20 %.
The paper’s contributions are threefold: (1) a bidirectional knowledge‑transfer mechanism between LLMs and SLMs within a federated setting; (2) a practical, privacy‑preserving architecture that leverages LoRA adapters and secure aggregation; (3) extensive empirical validation across multiple public LLMs and downstream NLP tasks. Limitations include dependence on the quality of the auxiliary distillation dataset and a current focus on text‑only tasks; future work is suggested on multimodal data, automated adapter design, and scaling to larger client populations. Overall, FedCoLLM offers a compelling solution for organizations that wish to benefit from powerful LLMs without compromising data privacy or incurring prohibitive computational costs.
Comments & Academic Discussion
Loading comments...
Leave a Comment