계층형 클러스터 연합 학습을 위한 지식 증류 기반 개인화 기법

February 23, 2026

Reading time: 5 minute

...

📝 Abstract

Clustered Federated Learning (CFL) has emerged as a powerful approach for addressing data heterogeneity and ensuring privacy in large distributed IoT environments. By clustering clients and training cluster-specific models, CFL enables personalized models tailored to groups of heterogeneous clients. However, conventional CFL approaches suffer from fragmented learning for training independent global models for each cluster and fail to take advantage of collective cluster insights. This paper advocates a shift to hierarchical CFL, allowing bi-level aggregation to train cluster-specific models at the edge and a unified global model at the cloud. This shift improves training efficiency yet might introduce communication challenges. To this end, we propose CFLHKD, a novel personalization scheme for integrating hierarchical cluster knowledge into CFL. Built upon multi-teacher knowledge distillation, CFLHKD enables inter-cluster knowledge sharing while preserving cluster-specific personalization. CFLHKD adopts a bi-level aggregation to bridge the gap between local and global learning. Extensive evaluations of standard benchmark datasets demonstrate that CFLHKD outperforms representative baselines in cluster-specific and global model accuracy and achieves a performance improvement of 3.32-7.57%. CCS Concepts • Computer systems organization → Distributed architectures; • Security and privacy → Privacy protections; • Computing methodologies → Learning paradigms.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Clustered Federated Learning (CFL) provides a powerful framework for collaborative learning in large-scale distributed Internet of Things (IoT) environments. CFL groups clients with similar data distributions together to train cluster-specific global models through device-cloud cooperation while keeping client local data private [14,15,21,25,34]. This approach is particularly wellsuited for IoT applications where geographically distributed clients, though heterogeneous, exhibit inherent clusterability [25,33]. For instance, vehicles operating in similar geographic regions generate spatiotemporally correlated data [35]. By exploiting such natural clusterability of heterogeneous clients, CFL trains personalized models tailored to specific clusters, making it effective for applications such as autonomous vehicles, smart cities, and healthcare systems [19,22,38].

Despite its relative effectiveness in handling data heterogeneity, CFL suffers from several critical limitations that hinder its scalability and adaptability in large-scale IoT environments. Conventional CFL approaches independently train global models for each cluster, with no mechanism for inter-cluster knowledge sharing. As illustrated in Figure 1, these cluster models learn from a subset of clients from a broad representative dataset. However, over time (right side, time = 10), due to concept drift and the absence of cross-cluster knowledge sharing, cluster models become increasingly specialized, leading to fragmented learning and limiting the generalizability of the overall system. Moreover, existing CFL methods assess similarity between clients using only model weights. While computationally inexpensive, this approach overlooks the dynamic nature of IoT systems, where client data distributions can shift over time. Consequently, this static similarity metric fails to address concept drift, further exacerbating the challenges of maintaining effective clusters. For example, as clients move or their data distributions evolve, fixed clusters lose relevance, resulting in degraded model performance and reduced adaptability to changing environments.

A promising solution to address these limitations is to adopt a hierarchical or bi-level aggregation framework, which has been used in FL for training a single global model [6,27,38]. In this framework, cluster-specific models are trained at the edge with device-edge coordination, while a unified global model is trained at the cloud with edge-cloud synergy. This bi-level aggregation ensures both local personalization and global generalization, addressing the lack of global perspective in traditional CFL. Although bi-level aggregation obviates the problem of fragmented learning, simple aggregation may severely impact the effectiveness of the global model. Traditional aggregation strategies such as FedAvg [31] struggle in the face of heterogeneity, where clients are geographically and contextually diverse, and may dilute cluster-specific adaptations by averaging model parameters across clusters. Furthermore, overwriting cluster models with the updated global model in every aggregation cycle risks erasing the unique contexts-specific characteristics, ultimately diminishing the benefits of localized training. More specifically, hierarchical CFL faces the following challenges;

(1) How to ensure effective CFL with bi-level aggregation? CFL traditionally manages client clustering and model aggregation at the cloud, converging effectively in static settings.

However, bi-level aggregation requires ensuring that the cluster-specific and global models align across different levels without compromising convergence or performance. (2) How to enable knowledge sharing between clusters while preserving cluster-specific characteristics? Inter-cluster knowledge sharing requires knowledge assimilation from heterogeneous clusters and ensuring that the knowledge distillation does not degrade the personalized performance of individual clusters.

In this paper, we present CFLHKD (Clustered Federated Learning with Hierarchical Knowledge Distillation), a novel methodology for bi-level aggregation in CFL. We introduce novel metrics based on information theory to quantify data heterogeneity across clients, enabling an informed and dynamic response to non-IID data. Based on these metrics, we propose a clustering algorithm that adapts to client mobility and concept drift by grouping clients with similar data distributions to train cluster-specific models and a unified global model while enabling inter-cluster knowledge sharing. We employ FedAvg exclusively for intra-cluster aggregation, as it is better suited for homogeneous settings. To address inter-cluster heterogeneity, we introduce a federated transfer learning (FTL) approach based on multi-teacher knowledge distillation (MTKD), facilitating inter-cluster knowledge sharing while preserving the uniqueness of cluster-specific models. Our contributions: Our major contributions are

View Original ArXiv

This content is AI-processed based on ArXiv data.

계층형 클러스터 연합 학습을 위한 지식 증류 기반 개인화 기법

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found