A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning

February 23, 2026

Reading time: 5 minute

...

📝 Abstract

Federated Learning (FL) enables decentralized, privacy-preserving model training but struggles to balance global generalization and local personalization due to non-identical data distributions across clients. Personalized Fine-Tuning (PFT), a popular post-hoc solution, fine-tunes the final global model locally but often overfits to skewed client distributions or fails under domain shifts. We propose adapting Linear Probing followed by full Fine-Tuning (LP-FT), a principled centralized strategy for alleviating feature distortion (Kumar et al., 2022), to the FL setting. Through systematic evaluation across seven datasets and six PFT variants, we demonstrate LP-FT’s superiority in balancing personalization and generalization. Our analysis uncovers federated feature distortion, a phenomenon where local fine-tuning destabilizes globally learned features, and theoretically characterizes how LP-FT mitigates this via phased parameter updates. We further establish conditions (e.g., partial feature overlap, covariate-concept shift) under which LP-FT outperforms standard fine-tuning, offering actionable guidelines for deploying robust personalization in FL.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Federated Learning (FL) (McMahan et al., 2017) enables collaborative learning from decentralized data while preserving privacy, typically by training a shared global model, referred to as General FL (GFL). However, variations in client data distributions often limit GFL’s effectiveness. Personalized FL (PFL) (Kairouz et al., 2021) addresses this by customizing models to individual clients. Personalized Fine-Tuning (PFT) (Wu et al., 2022), a simple and practical strategy in the PFL family, is a widely adopted post-hoc, plug-and-play approach to diverse GFL methods. As shown in Fig. 1(a), PFT fine-tunes the final global model from GFL to personalize it. This simple strategy ensures easy implementation and adaptation across FL scenarios.

Unlike process-integrated PFL methods (e.g., those involving server-client coordination that modifies the entire federated training process (Deng et al., 2020;Collins et al., 2021) or local training strategies that require iterative server feedback (Karimireddy et al., 2020;Tamirisa et al., 2024)), PFT eliminates the need for costly global-training-dependent adaptations. Instead, it fine-tunes the final GFL model once post-training, ensuring simplicity, broad compatibility, and deployment robustness without redesigning the GFL framework (see Tab. 1). These characteristics establish PFT as a critical fallback strategy when process-integrated PFL approaches prove infeasible -particularly in scenarios where global training protocols are unmodifiable due to infrastructure lock-in or legacy FL infrastructure, or strict coordination agreement constraints (e.g., healthcare systems bound by long-term service agreements).

However, PFT often causes models to overfit on local data, thereby compromising the generalization of FL. This is particularly concerning in critical real-world applications, such as FL across multiple hospitals for disease diagnosis, where a local model must not only perform well on hospital patient data, but also generalize effectively to diverse patient populations that may be encountered on-site in the future (Xu et al., 2021). Therefore, balancing the optimization of individual client performance (personalization) with strong global performance (generalization across all clients) is crucial (Wu et al., 2022;Huang et al., 2024).

In this work, we conduct a comprehensive evaluation of various strategies for PFT in heterogeneous FL environments under different distribution shifts, categorized as covariate shift (Peng et al., 2019a;Hendrycks & Dietterich, 2019) and concept shift (Izmailov et al., 2022).

Despite meticulously tuning the hyper-parameters in some FT methods (full parameter FT, sparse FT Lee et al. (2018) and Proximal FT Li et al. (2020b)) adapted in FL, we observe persistent issues of local overfitting when increasing the local fine-tuning epochs, wherein localized performance gains are achieved at the significant cost of global generalization. LP-FT (Kumar et al., 2022)-a two-phase fine-tuning strategy that first updates only the linear classifier (Linear Probing, LP) before optimizing all parameters (Full Fine-Tuning, FT)-has demonstrated state-of-theart performance in centralized learning by mitigating overfitting and enhancing domain adaptation. However, its potential to address FL challenges, such as client data heterogeneity and instability during decentralized personalization, remains unexplored. In FL, local fine-tuning risks overfitting to client distributions and diverging from globally useful representations. LP-FT’s structured separation of updating the head and then fine-tuning offers a principled framework to stabilize personalization in non-IID settings while preserving global knowledge.

Yet, no work has rigorously evaluated LP-FT’s efficacy in FL-a critical oversight given the growing demand for lightweight, flexible, and robust personalization strategies. Empirically, we conduct a comprehensive evaluation across seven datasets and diverse distribution shifts, benchmarking our adapted LP-FT against other advanced fine-tuning methods in our PFT framework. Our findings reveal two key insights: (1) existing PFT methods suffer from personalized overfitting, where local fine-tuning distorts feature representations, degrading global performance (Fig. 2); (2) LP-FT mitigates this issue, preserving generalization while enhancing local adaptation under extreme data heterogeneity. Further, extensive ablation studies (Fig. 4) confirm that LP-FT reduces federated feature distortion, establishing it as a strong and scalable baseline for PFT in FL.

Theoretically, we revisit feature distortion-a key challenge previously defined in centralized LP-FT as feature shifts under out-of-domain fine-tuning-in FL’s unique setting of partially overlapping local and global distributions. Unlike centralized analyses (Kumar et al., 2022), which assume a single ground-truth function, FL involves multiple client-specific ground-truth functions, necessitating a new

View Original ArXiv

This content is AI-processed based on ArXiv data.

A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found