FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation
In federated healthcare systems, Federated Class-Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data privacy. However, in practical applications, dat…
Authors: Tiantian Wang, Xiang Xiang, Simon S. Du
F eDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation Tiantian W ang 1 Xiang Xiang 1 2 Simon S. Du 2 Abstract In federated healthcare systems, Federated Class- Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data priv acy . Howe v er , in practi- cal applications, data across agent nodes within the distributed framework often exhibits non- independent and identically distributed (non-IID) characteristics, rendering traditional continual learning methods inapplicable. T o address these challenges, this paper cov ers more comprehen- siv e incremental task scenarios and proposes a dynamic memory allocation strate gy for e xemplar storage based on the data replay mechanism. This strategy fully taps into the inherent potential of data heterogeneity , while taking into account the performance fairness of all participating clients, thereby establishing a balanced and adapti ve solu- tion to mitig ate catastrophic for getting. Unlike the fixed allocation of client ex emplar memory , the proposed scheme emphasizes the rational alloca- tion of limited storage resources among clients to improv e model performance. Furthermore, exten- siv e experiments are conducted on three medical image datasets, and the results demonstrate sig- nificant performance improv ements compared to existing baseline models. 1. Introduction W ith the gradual increase in the number of hematologic disease cases caused by living environments and lifestyle factors, accurate white blood cell (WBC) classification for blood disorder diagnosis has become a prominent research focus in recent years ( Zeng et al. , 2023 ; Bhatia et al. , 2023 ). As deep learning technologies adv ance, researchers ha ve 1 Huazhong Uni versity of Science and T echnology , W uhan, China. 2 Univ ersity of W ashington, Seattle, USA. Correspondence to: Xiang Xiang < xex@hust.edu.cn > . Pre-print proposed using computer vision techniques and data analy- sis to assist with WBC image classification tasks for hema- tological diagnosis ( Islam et al. , 2024 ; Pritee & Garg , 2025 ; Asghar et al. , 2024 ). Howe ver , significant challenges persist during rare and complex case sampling, including collec- tion difficulties, limited sample sizes, and noncentralized distribution patterns caused by objecti ve factors such as population demographics, disease varieties, and medical equipment disparities. T o address these issues, Federated Learning (FL) has been introduced as a distributed frame- work, allo wing the training of collaborati ve global models among participants using localized datasets while ensuring the preserv ation of data priv acy ( Chen et al. , 2020 ; W u et al. , 2023 ; Li et al. , 2020a ; Y ang et al. , 2019 ; Kaissis et al. , 2020 ; Guan et al. , 2024 ; Ng et al. , 2021 ). Con ventional federated learning approaches typically as- sume static data distributions. Howe ver , in real-world health- care scenarios, medical data from dif ferent clinical sites is usually generated in streaming or phased fashion, with both data distributions and categories ev olving over time ( W u et al. , 2024 ; W ang et al. , 2024a ; Feng et al. , 2022 ), making these con ventional approaches inadequate. T o address this challenge, incremental learning has emerged as an effecti ve solution. Nev ertheless, a critical and unavoidable challenge in incremental learning is that models typically exhibit rapid and se vere forgetting of previously acquired knowledge when learning new tasks or categories( Kirkpatrick et al. , 2017 ; Chen & Liu , 2022 ). Recent years hav e witnessed numerous proposed solutions to address catastrophic for getting in centralized learning paradigms ( Rebuffi et al. , 2017 ; Kirkpatrick et al. , 2016 ; W ang et al. , 2022a ; Li & Hoiem , 2017 ). Among these, mem- ory replay has been recognized as a straightforward and effecti v e approach that has been e xtended to federated set- tings ( Cha et al. , 2020 ; Pennisi et al. , 2024 ; Li et al. , 2024 ; Qi et al. , 2023 ).In centralized environments, the incremental learning configuration resembles a single-client continual learning task, where replay-based methods typically em- ploy predetermined and fixed e xemplar memory allocations. In contrast, the federated setting in volv es multiple clients exhibiting non-independent and identically distributed (non- IID) data characteristics, with each client contributing un- 1 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation F igur e 1. The legend reflects our moti vations and contrib utions. The coordinate system is used to characterize the sample distribution of the client. Compared to common replay-based methods that allocate fixed memory size m f ix per client and uniformly distribute it across all classes, FeDMRA le verages the heterogeneity of client data distributions and their performance contributions to dynamically allocate matched memory sizes m c and m y c , thereby optimizing the influence of stored ex emplars. equally to the global model due to varying en vironmental conditions and resource constraints ( Y e et al. , 2023a ; b ; Li et al. , 2020b ; Huang et al. , 2022 ). This fundamental discrep- ancy raises significant doubts about the direct applicabil- ity of con v entional replay-based approaches in distrib uted framew orks. Furthermore, most existing studies tend to de velop solu- tions for isolated incremental learning scenarios. For ex- ample, studies ( Mittal et al. , 2021 ; Masana et al. , 2022 ; Belouadah & Popescu , 2019 ; Pian et al. , 2023 ; Guo et al. , 2024 ; Dong et al. , 2022 ) address class-incremental learning, while ( Xie et al. , 2022 ; W ang et al. , 2022b ; 2024b ) focus on domain-incremental learning. Howe ver , real-world ap- plications rarely conform to such idealized conditions, as the emergence of no vel categories and distribution shifts typically occur concurrently in practice. T o address these challenges, we propose FeDMRA, fed- erated incr emental learning with dynamic memory r eplay allocation : a novel e xemplar memory allocation framew ork based on efficient data replay . Specifically , before the ar- riv al of ne w tasks, the server dynamically allocates memory portions to each client based on two ke y factors: (1) the client’ s pri v ate data distribution, and (2) its contrib utions in global fusion. This innov ativ e approach enables adaptive adjustment of the number of stored old samples per client during incremental training. A more intuitiv e explanation is giv en in Fig. 1 . Our principal contributions can be summarized as follo ws: • W e analyze challenges in practical applications and con- struct a more comprehensiv e framew ork for Federated Con- tinuous Learning (FCL), encompassing: Federated Class In- cremental Learning (FCIL), Federated Domain Incremental Learning (FDIL), and Federated Class-Domain Incremental Learning (FCDIL). • W e analyzed the potential issues in existing continual learning schemes based on fixed e xemplar set memory and designed a new federated continual learning approach. Fur- thermore, we focused on the memory allocation strategies for ex emplar sets on each client to address the unfairness caused by data heterogeneity . • W e optimized client training under traditional federated learning techniques by inte grating regularization and kno wl- edge distillation methods to alle viate the problem of catas- trophic forgetting. • W e demonstrated the effecti veness of our method on datasets that are clinically significant and challenging. 2. Related W ork Federated learning (FL), as a mainstream distributed learn- ing frame work, has emer ged as a research hots pot in both academia and industry due to its core adv antages of elimi- nating the need for raw data upload, while balancing priv ac y protection and efficient data utilization ( McMahan et al. , 2017 ; De Boer et al. , 2005 ; Ren et al. , 2024 ; Hsu et al. , 2020 ; Jiang et al. , 2023 ). Howe ver , the streaming dynamic growth of client-side samples over time places higher de- mands on the generalization ability and sustainable learning performance of the model, thus giving rise to Federated Continual Learning (FCL). Catastrophic forgetting ( Chen & Liu , 2022 ) constitutes the core challenge in FCL, and the academic community has conducted extensiv e research and proposed v arious solutions to address this issue, including: regularization-based approaches ( Chen et al. , 2022 ; Mittal et al. , 2021 ; Liu et al. , 2022 ), replay-based methods, knowl- edge distillation techniques( Kang et al. , 2022 ; Zhao et al. , 2023 ; Cheraghian et al. , 2021 ; Zhu et al. , 2021 ), and data generation strategies ( Babakniya et al. , 2023 ; Liang et al. , 2024 ). Howe v er , to the best of our kno wledge, most exist- 2 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation ing studies typically focus on isolated task scenarios, such as CIL or DIL. Research on practical and complex CDIL tasks remains limited. This gap is particularly salient in federated healthcare applications, where edge data typically exhibits cate gory expansion and distributional shifts across institutions. W e e xplicitly tackles this undere xplored yet critical setting, proposing a unified frame work to handle abov e challenges in FCL. FCL has been studied for cate gories, distribution domains, and tasks. Extending and deriving from centralized algo- rithms as a solution for FCL is straightforward and ef fecti ve, but it does not conform to the actual federated settings. As described in ( Kaissis et al. , 2020 ; Li et al. , 2020b ; Dong et al. , 2022 ; Zhao et al. , 2023 ), class imbalance between old and new classes is a key challenge faced by ex emplar replay methods. FCIL( Dong et al. , 2022 ) designs a proxy server to select the best old global model to balance the bias. ReFed ( Li et al. , 2024 ) proposes to train a priv ate information model to select more ef fectiv e old samples to resist forgetting. MFCL ( Babakniya et al. , 2023 ) and DDDR ( Liang et al. , 2024 ) generate samples that conform to the distribution of old data through generativ e models and add them to the training of ne w tasks. FedSpace ( Shenaj et al. , 2023 ) proposes a method of asynchronous class-incremental learning based on pre-training and prototype enhancement. Generally speaking, all of them are based on setting equal memory sizes for the ex emplar sets of clients and cate gories. Howe v er , this is not applicable to the current situation of unbalanced data and div erse resources at each edge side under the federated settings. Balancing the data distribution of clients and their performance contrib utions, and dynam- ically allocating the memory of the exemplar set for them are the issues that this paper focuses on. 3. PRELIMINARIES & PR OBLEM DEFINITION This section lays out the foundational frame work and core concepts that are essential for understanding the problem definition and proposed solutions that follo w . Our inv estiga- tion centers on the phenomenon of catastrophic for getting within FCL models. The primary objective is to preserve the model’ s classification accuracy in the face of challenges, such as distrib utional shifts, introduced by ne w sequential data. 3.1. Problem Definition - FL FL is a distributed machine learning paradigm that aims to collaborativ ely train a shared global model among multiple clients while preserving data pri vac y . Consider a federated learning system with C clients, indexed by c ∈ 1 , 2 , ..., C . Each client c possesses a local priv ate dataset, D c , which consists of n c = | D c | data samples. A key constraint is that the data from any client D c nev er leaves its local de- vice and is not shared with other clients or a central server . These datasets distrib uted across clients are typically hetero- geneous, i.e., Non-Independent and Identically Distrib uted (Non-IID). The local objecti ve for each client k is to find a set of model parameters, w , that minimizes the loss function F c ( w ) on its local dataset D c . This local loss function is typically defined as the empirical risk: F c ( w ) = 1 n c X i ∈D c ℓ ( x i , y i ; w ) (1) where ( x i , y i ) is a data sample from D c , and ℓ ( · ) is the loss function for a single sample (e.g., cross-entropy loss). The global objectiv e of FL is to minimize the weighted av erage of the local losses over all clients, denoted as the global loss function F ( w ) . The weight for each client is typically the fraction of its data size relati ve to the total data size, N = P C c n c . Therefore, the central problem in Federated Learning can be formally defined as the following optimization problem: min w F ( w ) ≜ C X c =1 n c N F c ( w ) (2) This optimization must be performed under sev eral key constraints, including data priv ac y , statistical heterogeneity (Non-IID data), and limited communication bandwidth. 3.2. Problem Definition - FCL As an extension of FL, FCL enables multiple clients to collaborativ ely train a global model under the condition of dynamically and sequentially arriving data. In FCL, clients progressively update their models following task sequence stream { D 1 , D 2 , ..., D T } , where each element D t represents a task-specific dataset. Formally , we de- fine D t = { x t i , y t i } N t i =1 as N t paired samples comprising input data x t i ∈ X t and corresponding label y t i ∈ Y t . X = ∪ T t =1 X t , Y = ∪ T t =1 Y t represent the sample domain space and the class space of all sequential tasks respecti vely . Therefore, the objecti ve of FCL is for all participating par- ties to continually learn a shared model across a sequence of T tasks, Eq. 2 can be reformulated as the optimization problem: w g ,t = arg min w C X c =1 1 ¯ N t X i ∈ ¯ D t c ℓ ( ¯ x t c,i , ¯ y t c,i ; w ) (3) where ¯ D t c = D t c ∪ D t c,cache , D t c represents the data sequence of client c during task t , D t c,cache is example set to cache old data until task t has a size of m c,t , and ¯ N t c = m c,t + N t c . In existing FCL research, the replay mechanism is consid- ered the simplest yet most effecti ve approach, which pro- vides equiv alent storage capacity for exemplar sets across all 3 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation F igur e 2. An ov erview of the proposed FeDMRA. Server : 1. Recei ve the model parameters w c and class distribution uploaded by the clients. 2. Calculate the storage size of example sets for each category on the client side through client-le vel and class-le vel calculations. 3. Aggregate the client models using FedA vg to update the global model. Client : 1. Filter samples to fill the allocated size m y c of the example set and replay . 2. Use the global model w g,t as a teacher model to perform knowledge transfer for the local model. 3. Local incremental training updates the model and uploads it. clients, i.e., m f ix = m y c,t . In contrast, we propose a global ex emplar memory pool that equals the union of all clients’ local ex emplar sets, with the expression: M = X c ∈ C X y ∈ Y m y c,t , and m y c,t ∈ [ m min , m max ] (4) where m max denotes the maximum storage capacity sup- ported by edge devices, and m min denotes the lower bound on the size of the ex emplar set, pre venting the failure to store old data due to extreme distrib utions. 4. F eDMRA FRAMEWORK This paper proposes a novel framework called FeDMRA, which effecti vely mitig ates catastrophic forgetting caused by data heterogeneity by dynamically allocating specific memory budgets to dif ferent clients’ sample sets. The dia- gram provides a clearer illustration. A clearer illustration is provided in Fig. 2 4.1. Server: Adaptive Example-set Memory Allocation T o mitigate the impact of data distribution on global perfor - mance, the server e valuates the contribution of client data to global aggregation updates through a dual-channel impor - tance measurement mechanism, and dynamically allocates local ex emplar memory for each client. This allocation pro- cess is divided into tw o phases: Client-level Allocation and Class-level Allocation . A detailed workflo w summarized in Algorithm. 1 . Client-level Dynamic Allocation. In the FCL framew ork, after each task is completed, the client sends its local up- dates (the model parameters w c,t ) to the central server . The server then performs an aggre gation update to generate the next iteration of the global model. Howe ver , relying solely on parameter updates offers a limited perspective: while they re veal the trajectory of local optimization, they fail to capture the underlying data characteristics that driv e such updates, potentially leading to a partial or incomplete under- standing. T o address this limitation, we propose that clients concur- rently transmit statistical metadata summarizing their local data distribution for the current task, such as the set of class labels and the corresponding number of samples per class. This approach provides the server with valuable data-centric insights without compromising the priv acy of any ra w local data. Subsequently , during the model aggregation phase, the server calculates a contribution index, denoted as b c,t , for each local model. This index is determined by quantify- ing the deviation of the local model’ s update relativ e to the previous global model w g ,t − 1 : b c,t = D if f ( w c,t , w g ,t − 1 ) P c ∈ C D if f ( w c,t , w g ,t − 1 ) (5) where D if f ( · ) denotes the deviation between local and global models can be quantified either through vector sub- traction ( w c,t − w g ,t − 1 ) . Furthermore, to compute the sam- 4 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation Algorithm 1 FeDMRA: Dynamic Allocation of Example Set Memory Input: Number of clients: C , communication rounds: R , global memory cache size: M , Initialized model pa- rameters: w g , 0 ; Output: the size of the example set for each class on the client side: { m y c,t , y ∈ Y t , c ∈ C } ; 1: for all i = 1 , 2 , · · · , T do 2: Receiv ed the local model parameters w c,t and class distributions uploaded from each client; 3: b c,t , d c,t ← The model space contribution inde x and the sample space contribution inde x of the client are obtained through Eq.(5) and Eq.(6); 4: { m t 1 , m t 2 , ..., m t c } ← Divide M according to Eq.(7); 5: for c = 1 , 2 , · · · , C do 6: m y c,t ← Calculate the memory of the example set for each class by Eq.(8); 7: end for 8: Perform global fusion and update to obtain w g ,t . 9: end for ple importance metric d c,t , we conduct in-depth analysis of local data distributions that simultaneously considers local- lev el sample influence, and global-lev el significance across clients. This framework acti v ely transforms heterogeneous distributions into dri ving forces for model enhancement: d c,t = P Y t y N t c,y /N t y P C c P Y t y N t c,y /N t y (6) where N t c,y /N t y represents the proportion of class y in client c relativ e to the global distribution of class y . In summary , the server performs weighted aggregation of dual-channel importance indices to determine final client significance, subsequently allocating global example mem- ory proportionally: m c,t = d c,t + b c,t P C c ( d c,t + b c,t ) × M (7) Class-level Dynamic Allocation. Given the inherent het- erogeneity of federated clients, after computing client-level memory allocations at the server , we further inv estigate the local and global distributions of each class to deter- mine category-specific memory assignments. The nai ve approach of allocating equal storage space to both classes (i.e., m c,A = m c,B ) is insufficient for adequately preserv- ing samples of Disease A, which ultimately degrades the model’ s identification performance for this critical category . Therefore, by incorporating both intra-client and inter-client class distributions, the class-specific replay storage space is: m y c,t = N or m (1 − a ) N t c,y N t c + a N t c,y N t y ! × m c,t (8) m y c,t denotes the allocated storage memory for class y at the client c . a is the h yperparameter that balances the proportion of the local and global distributions. N t c,y is the number of samples of class y for client c . N or m ( · ) represents the normalization operation to ensure that the sum of the number of examples across all classes is equal to m c,t . Global Aggregation. During the server aggregation phase, the global model parameters w g ,t are obtained through weighted aggregation following the classic FedA vg pro- tocol( McMahan et al. , 2017 ). 4.2. Client: Continual Learning W ith the arriv al of ne w tasks, each client trains and opti- mizes its model on the new data sequence. T o mitigate forgetting, we construct a sample set containing representa- ti ve old samples and integrate it into the training process for new tasks. Simultaneously , considering the heterogeneity of data across clients, we design an augmented objective function that minimizes loss while constraining the opti- mization direction, thereby reducing conflicting updates among clients. Example Set Selection. Following the e xemplar selection strategy proposed in ( Li et al. , 2024 ), this approach ad- dresses the shortcomings of conv entional methods that strug- gle to identify samples contributing beneficially to global updates. Inspired by momentum optimization techniques, the strategy maintains a client-specific personal information model on each client. This model integrates the historical global model w g ,t − 1 and undergoes iterati ve optimization using data from the first t − 1 tasks. The t -th new task arriv es, the personal information model updated by: v t c,e = v t − 1 c,e − 1 − η X i ∇L F v t − 1 c,e − 1 ¯ x i c,t − 1 , ¯ y i c,t − 1 ! + q ( λ ) v t − 1 c,e − 1 − w g ,t − 1 (9) where q ( λ ) = 1 − λ 2 λ , λ ∈ (0 , 1) , η is the step of update and e is the number of iterations. During the iterativ e optimiza- tion of the aforementioned objectiv e function, samples with larger gradient norms are stored in the e xemplar set. In contrast to ( Li et al. , 2024 ) where the size of D t c,cache remains constant for all clients under each task, our work dynamically determines the size m c,t of D t c,cache through server -side computation. When constructing the example set D t c,cache until task t via Eq. 9 , we select samples per class by sorting them in descending order of gradient norms. Based on Eq. 8 , the exemplar cache capacity for each category is computed, and the client selects m y c,t ex emplar samples for 5 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation T able 1: Report the classification accuracy A last and A avg under heterogeneity α = 1 . 0 (%). Method FCIL-M FCIL-A FCIL-H FDIL FCDIL A av g ↑ A last ↑ A av g ↑ A last ↑ A av g ↑ A last ↑ A av g ↑ A last ↑ A av g ↑ A last ↑ UP 86.61 85.5 88.62 88.78 95.06 95.87 84.10 89.18 87.30 88.45 iCaRL+FL 72.05 58.88 65.36 57.3 61.89 46.51 64.57 70.96 55.41 55.56 U A CL+FL 52.06 25.20 54.62 49.32 66.59 48.44 51.21 36.53 51.93 52.52 MFCL 69.75 20.87 46.00 31.91 79.27 51.09 74.02 64.82 51.12 29.16 DDDR 68.14 64.34 51.38 30.42 78.23 64.00 77.56 69.05 60.34 36.50 Re-Fed 82.75 70.53 74.18 70.84 76.98 73.44 81.66 87.48 80.74 67.72 FeDMRA 83.06 80.63 83.26 86.52 91.77 87.34 82.71 87.64 85.26 83.35 T able 2: Report the classification accuracy A last and A avg under heterogeneity α = 0 . 5 (%). Method FCIL-M FCIL-A FCIL-H FDIL FCDIL A av g ↑ A last ↑ A av g ↑ A last ↑ A av g ↑ A last ↑ A av g ↑ A last ↑ A av g ↑ A last ↑ UP 86.61 85.50 88.62 88.78 95.06 95.87 84.10 89.18 83.61 86.95 iCaRL+FL 72.05 58.88 65.36 57.30 61.89 46.51 64.57 70.96 80.53 77.49 U A CL+FL 52.06 25.20 54.62 49.32 66.59 48.44 51.21 46.05 51.60 53.63 MFCL 56.40 15.34 39.25 8.37 77.91 51.20 75.50 68.48 38.92 18.86 DDDR 40.75 17.80 48.74 28.15 73.21 59.02 76.01 72.30 41.17 14.72 Re-Fed 82.75 70.53 74.18 70.84 76.98 73.44 73.85 79.27 78.41 63.45 FeDMRA 77.29 70.25 80.00 74.66 88.48 82.95 75.83 79.02 83.57 85.79 each class: D t c, cache = where x i ∈ ¯ G x 1 , ¯ G x 2 , . . . , | {z } m y 1 c,t ¯ G x N t y 1 | {z } N t y 1 , . . . , ¯ G x 1 , ¯ G x 2 , . . . , | {z } m y 2 c,t ¯ G x N t y 2 | {z } N t y 2 , . . . , ¯ G x 1 , ¯ G x 2 , . . . , | {z } m y n c,t ¯ G x N t Y | {z } N t y n (10) where ¯ G x i = P G x i /e is the av erage gradi- ent norm of sample x i ov er e iterations, G x i = ∇L F t − 1 v c,e − 1 ¯ x i c,t − 1 , ¯ y i c,t − 1 2 is the norm of the sam- ple gradient. The algorithm flow is sho wn in Algorithm 2 in the appendix. Local Updating. For client-side training, we propose an optimization objectiv e function. Firstly , as with most im- age classification tasks, we employ cross-entropy loss L C E to minimize the di ver gence between the model’ s predicted distribution and the true distrib ution, thereby learning dis- criminativ e class-specific features. Furthermore, as analyzed in ( Puli et al. , 2024 )( De Boer et al. , 2005 ), the L C E loss prompts the model to quickly learn more easily classifiable features by imposing heavier penalties on misclassifications. Howe ver , this approach may lead to the neglect of stable feature learning and is also one of the primary causes of catastrophic forgetting in incremen- tal learning scenarios. Inspired by this, we introduce the L M G loss, which minimizes the L2 norm of the model’ s out- put space to constrain the magnitude of model outputs. This promotes more stable training and enhances generalization capability via: L M G = log(1 + ∥F w c,t ( ¯ x c,t ) ∥ 2 2 ) (11) T o pre vent local models from ov erfitting to local data during training and thereby causing global objecti ve drift, this pa- per introduces a re gularization loss term, denoted as L K L , which enforces consistency by measuring the Kullback- Leibler (KL) ( v an Erven & Harremos , 2014 ) di ver gence between the output distributions of the local and global models. As formulated in Eq. 12 , minimizing this term achiev es a dual objecti ve: (1) it constrains local updates to prev ent significant de viation from the global knowledge, thereby alleviating catastrophic forgetting; and (2) it pre- serves model plasticity , ensuring the model can ef fecti vely adapt to the nov el information presented by the new task: L K L = K L { S ( F ( ¯ x c,t ; w g ,t − 1 )) ∥ S ( F ( ¯ x c,t ; w c,t )) } (12) and S ( · ) is softmax function. In summary , the client’ s final objecti ve function is: L F I N AL = L C E + L K L + δ L M G (13) 6 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation T able 3: A verage accuracy with dif ferent configurations under various heterogeneities α (%). α 0 . 5 1 . 0 5 . 0 a 0 . 8 0 . 4 0 . 8 0 . 4 0 . 8 0 . 4 λ 0 . 8 0 . 4 0 . 8 0 . 4 0 . 8 0 . 4 0 . 8 0 . 4 0 . 8 0 . 4 0 . 8 0 . 4 FCIL-M 76.19 77.08 76.91 77.29 82.12 82.58 82.44 83.06 80.97 80.87 82.66 82.90 FCIL-A 78.45 79.36 80.00 78.36 82.75 82.27 82.91 83.26 84.23 83.80 83.77 83.89 FCIL-H 87.83 88.48 88.26 88.07 91.51 91.16 91.77 91.72 91.71 93.19 91.03 91.75 FDIL 72.44 73.13 75.83 72.41 80.70 80.72 82.71 81.55 80.75 81.05 81.97 81.36 FCDIL 82.64 80.26 83.57 82.18 83.95 84.42 84.49 85.26 84.37 83.65 85.02 83.46 where δ acts as a balancing factor to control the strength of the regular term. 5. EXPERIMENTS AND RESUL TS ANAL YSIS 5.1. Experiment Setup Datasets and FCL Settings. W e selected three white blood cell datasets in the medical field for the experiment, includ- ing Matek-19 ( Matek et al. , 2019 ), Acev edo-20 ( Ace vedo et al. , 2020 ), and ( Bodzas et al. , 2023 ). W e adopt a similar setup to the literature ( Sadafi et al. , 2023 ), partitioning the three datasets as follows: FCIL : The acronyms FCIL-M, FCIL-A and FCIL-H are used to denote CIL on Matek-19, Acev edo-20 and ( Bodzas et al. , 2023 ), respecti vely; FDIL : Each incremental stage consists of a dataset reflecting the shift in the distribution domain; FCDIL : In more comple x scenarios, the distribution of streaming data exhibits a com- bination of characteristics where both the label space and the domain space increase progressively . The dataset and task specifications are provided in Appendix B . Baselines. T o ensure fair comparison with other sig- nificant works, we follo w the same protocol proposed in ( Rebuf fi et al. , 2017 ) for constructing the FCL task. W e ev aluate all methods using the representati ve fed- erated learning model FedA vg: iCaRL( Rebuffi et al. , 2017 )+FL and U A CL( Sadafi et al. , 2023 )+FL , three mod- els specifically designed for federated incremental learning: MFCL ( Babakniya et al. , 2023 ), DDDR ( Liang et al. , 2024 ) and ReFed ( Li et al. , 2024 ), and the upper performance bound UP under ideal conditions. W e report the final accu- racy A last upon completion of the last streaming task and the av erage accuracy A av g across all tasks. Implementation Details. The implementation of this paper is all carried out on the PyT orch framew ork. All models are trained on NVIDIA R TX 4090 GPU with 24GB memory capacity and use ResNet18 as the backbone. FeDMRA is trained using the Adam optimizer and set the learning rate to 0.003. The input size is resized to 224 × 224 pixels, and the batch size is set to 64. In addition, the parameter settings for the global example set M and the maximum storable exam- ple set m max on the client side are different under different tasks. For FCIL on three datasets: M =1200, m max =400. For FDIL and FCDIL: M =3000, and m max =800. 5.2. Comparison Analysis T est Accuracy . W e ev aluate our approach under FCIL, FDIL, and FCDIL scenarios. Client data heterogeneity is simulated using a Dirichlet distribution Dir( α ). T ab . 1 com- pares the performance of v arious baseline algorithms when heterogeneity parameter α = 1 . 0 . The iCaRL algorithm, which reuses original samples, outperforms generativ e meth- ods like MFCL and DDDR that rely on pseudo-sample generation. This stems from generative models losing fi- delity and introducing synthetic noise during sample re- construction. An exception occurs in the high-resolution HLwbc dataset, where its richer feature space enables gen- erativ e methods to better preserve discriminative features. FeDMRA demonstrates significant advantages in FDIL and FCDIL tasks due to its dynamic memory allocation mecha- nism, whereas generativ e model-based approaches struggle to capture dynamic patterns in changing data distrib utions. Furthermore, the Re-Fed method performs poorly in most tasks, which we attribute to its combination of heteroge- neous perception paradigm selection and direct storage of representativ e samples. W e also conducted experiments with a more heterogeneous setting α = 0 . 5 in T ab . 2 . Al- though the performance trends align with T ab . 1 , all meth- ods exhibit significant de gradation. This indicates that the long-tail distribution induced by extreme data heterogeneity inevitably ne gati vely impacts model performance. Hyper -parameter . In addition, we conduct e xperiments under different de grees of heterogeneity and further explore the settings of the parameters a and λ . T ab . 3 presents the av- erage performance of FeDMRA with varying configurations of key parameters. a is the weight assigned to the global proportion when computing the data space importance for a giv en class. W e find that a = 0 . 4 is optimal in most settings, suggesting a relative robustness to data heterogeneity . λ , serves as the weight for the regularization term in the update of the local information model. The higher the v alue of 7 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation T able 4: Ablation study of FeDMRA (%). FCIL-H FDIL FCDIL dy ✓ ✓ × × ✓ ✓ × × ✓ ✓ × × L M G ✓ × ✓ × ✓ × ✓ × ✓ × ✓ × L K L ✓ × × ✓ ✓ × × ✓ ✓ × × ✓ A av g ↑ 91.77 91.06 90.98 90.51 82.71 81.60 78.61 81.82 85.26 82.74 78.98 80.98 A last ↑ 87.34 87.58 84.30 85.90 87.64 87.70 86.51 88.20 83.35 82.43 61.41 71.52 * dy - Dynamic Memory Replay strategy 80 82 84 86 88 90 92 94 Test Ac curacy( %) (a) δ 250 300 350 400 450 500 82 84 86 88 90 92 Test A ccurac y(%) (b) m max 600 700 800 900 10 00 1100 1 200 81 82 83 84 85 Test A ccurac y(%) (c) m max F igur e 3. Our method under different configuration (a) weight δ of L M G , (b) m max for FCIL, (c) m max for FDIL and FCDIL. λ , the heavier the penalty for iterati ve updates that de viate from the global model. The results in the table confirm that when heterogeneity is strong, placing greater emphasis on the global distribution is beneficial; when heterogeneity is moderate, prioritizing the local data distribution yields better performance; and as the data approaches IID settings, the optimal strategy balances attention between global and local distributions. Important parameter settings. W e sho w the impact of other important parameters on the method in Fig. 3 . Specif- ically , (a) shows the the A av g with different weights δ of the L M G loss. It can be seen that in most tasks, the optimal classification accuracy is achiev ed when δ = 0 . 1 . (b) and (c) illustrate the impact of the maximum memory for storing example sets in clients under different tasks on performance. The sizes of the training sets on clients vary significantly across different tasks. Therefore, W e configure varying global e xemplar memory b udgets and client-level maximum ex emplar storage limits. For FCIL, M =1200, and m max takes v alues in the range from 250 to 500 with an increment of 50. For FDIL and FCDIL, M =3000, and m max takes values in the range from 600 to 1200 with an increment of 100. It is observed that for FCIL, the method tends to be stable when m max = 400 . For FDIL and FCDIL, the algorithm achiev es an ideal result when m max = 800 . This experiment supports the results in T ab. 1 . Ablation Study . T o verify the ef fectiv eness of each compo- nent, we conducted ablation e xperiments on three datasets for different tasks in T ab. 4 . The results show that un- der most incremental tasks, the performance using only dy can reach suboptimal le vels. This conclusiv ely demon- strates that lev eraging client data distribution characteristics, ov ercoming fixed-memory constraints, and rationalizing ex- emplar storage allocation can ef fectiv ely enhance network performance. Meanwhile, for A av g , the L M G component performs better , and for A last , L K L performs better , which also reflects the advantages of combining the tw o in terms of improving a verage performance and combating for getting. 6. Conclusion This paper extends to broader continual learning scenar- ios and proposes a novel federated incremental learning framew ork called FeDMRA. The framework introduces a dual-space dynamic ev aluation mechanism for the central server , which combines the class distribution characteristics in the data sequences with the update shifts of model param- eter sequences to adaptiv ely allocate memory space for the example set, thus ef fecti vely ov ercoming the limitations of fixed memory b udgets for each client in heterogeneous sce- narios. At the same time, it designs optimization objectives to update the edge clients. This study is not constrained by the data storage format and provides a ne w perspecti ve for replay-based methods. Extensiv e experiments v alidate the effecti v eness of this approach across various scenarios. 8 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation Impact Statement This article introduces a research project aimed at promot- ing machine learning in federated healthcare systems. Our work may ha ve v arious social impacts, but none of them are considered significant enough to be specifically highlighted here. References Acev edo, A., Merino, A., Alf ´ erez, S., Molina, ´ A., Bold ´ u, L., and Rodellar , J. A dataset of microscopic peripheral blood cell images for dev elopment of automatic recognition systems. Data in brief , 30:105474, 2020. Asghar , R., K umar , S., Shaukat, A., and Hynds, P . Clas- sification of white blood cells (leucocytes) from blood smear imagery using machine and deep learning models: A global scoping revie w . PLoS ONE , 19(6), 2024. Babakniya, S., Fabian, Z., He, C., Soltanolkotabi, M., and A vestimehr , S. A data-free approach to mitigate catas- trophic forgetting in federated class incremental learning for vision tasks. Advances in Neural Information Pro- cessing Systems , 36:66408–66425, 2023. Belouadah, E. and Popescu, A. Il2m: Class incremen- tal learning with dual memory . In Pr oceedings of the IEEE/CVF international confer ence on computer vision , pp. 583–592, 2019. Bhatia, K., Dhalla, S., Mittal, A., Gupta, S., Gupta, A., and Jindal, A. Integrating explainability into deep learning- based models for white blood cells classification. Com- puters and Electrical Engineering , 110:108913, 2023. Bodzas, A., K odytek, P ., and Zidek, J. A high-resolution large-scale dataset of pathological and normal white blood cells. Scientific Data , 10(1):466, 2023. Cha, H., Park, J., Kim, H., Bennis, M., and Kim, S.-L. Proxy experience replay: Federated distillation for distributed reinforcement learning. IEEE Intelligent Systems , 35(4): 94–101, 2020. Chen, H., W ang, Y ., and Hu, Q. Multi-granularity regular - ized re-balancing for class incremental learning. IEEE T r ansactions on Knowledge and Data Engineering , 35 (7):7263–7277, 2022. Chen, Y ., Sun, X., and Jin, Y . Communication-efficient fed- erated deep learning with layerwise asynchronous model update and temporally weighted aggre gation. IEEE T rans Neural Netw Learn Syst , (10), 2020. Chen, Z. and Liu, B. Continual learning and catastrophic forgetting. In Lifelong Machine Learning , pp. 55–75. Springer , 2022. Cheraghian, A., Rahman, S., Fang, P ., Roy , S. K., Peters- son, L., and Harandi, M. Semantic-aware kno wledge distillation for few-shot class-incremental learning. In Pr oceedings of the IEEE/CVF conference on computer vision and pattern r ecognition , pp. 2534–2543, 2021. De Boer , P .-T ., Kroese, D. P ., Mannor , S., and Rubinstein, R. Y . A tutorial on the cross-entropy method. Annals of operations r esear c h , 134(1):19–67, 2005. Dong, J., W ang, L., Fang, Z., Sun, G., Xu, S., W ang, X., and Zhu, Q. Federated class-incremental learning. In Pr oceedings of the IEEE/CVF conference on computer vision and pattern r ecognition , pp. 10164–10173, 2022. Feng, J., Phillips, R. V ., Malenica, I., Bishara, A., Hubbard, A. E., Celi, L. A., and Pirracchio, R. Clinical artificial intelligence quality impro vement: towards continual mon- itoring and updating of ai algorithms in healthcare. NPJ digital medicine , 5(1):66, 2022. Guan, H., Y ap, P .-T ., Bozoki, A., and Liu, M. Federated learning for medical image analysis: A survey . P attern Recognition , pp. 110424, 2024. Guo, H., Zhu, F ., Liu, W ., Zhang, X.-Y ., and Liu, C.-L. Pilora: Prototype guided incremental lora for federated class-incremental learning. In Eur opean Confer ence on Computer V ision , 2024. Hsu, T .-M. H., Qi, H., and Bro wn, M. Federated visual clas- sification with real-world data distrib ution. In Eur opean Confer ence on Computer V ision , pp. 76–92. Springer , 2020. Huang, W ., Y e, M., and Du, B. Learn from others and be yourself in heterogeneous federated learning. In Pr oceed- ings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , pp. 10143–10153, 2022. Islam, O., Assaduzzaman, M., and Hasan, M. Z. An explain- able ai-based blood cell classification using optimized con volutional neural network. Journal of P athology In- formatics , 15:100389, 2024. Jiang, M., Roth, H. R., Li, W ., Y ang, D., Zhao, C., Nath, V ., Xu, D., Dou, Q., and Xu, Z. Fair federated medical image segmentation via client contrib ution estimation. In Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , pp. 16302–16311, 2023. Kaissis, G. A., Mako wski, M. R., R ¨ uckert, D., and Braren, R. F . Secure, priv acy-preserving and federated machine learning in medical imaging. Natur e Machine Intelli- gence , 2(6):305–311, 2020. 9 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation Kang, M., Park, J., and Han, B. Class-incremental learning by knowledge distillation with adaptive feature consol- idation. In Pr oceedings of the IEEE/CVF conference on computer vision and pattern r eco gnition , pp. 16071– 16080, 2022. Kirkpatrick, J., Pascanu, R., Rabino witz, N. C., V eness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ra- malho, T ., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., and Hadsell, R. Overcoming catas- trophic forgetting in neural networks. Pr oceedings of the National Academy of Sciences , 114:3521 – 3526, 2016. Kirkpatrick, J., Pascanu, R., Rabino witz, N., V eness, J., Des- jardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T ., Grabska-Barwinska, A., et al. Overcoming catastrophic forgetting in neural networks. Pr oceedings of the national academy of sciences , 114(13):3521–3526, 2017. Li, L., Fan, Y ., Tse, M., and Lin, K.-Y . A revie w of appli- cations in federated learning. Computer s & Industrial Engineering , 149:106854, 2020a. Li, T ., Sahu, A. K., T alwalkar , A., and Smith, V . Feder - ated learning: Challenges, methods, and future directions. IEEE signal pr ocessing magazine , 37(3):50–60, 2020b. Li, Y ., Li, Q., W ang, H., Li, R., Zhong, L. W ., and Zhang, G. T ow ards efficient replay in federated incremental learning. 2024 IEEE/CVF Confer ence on Computer V ision and P attern Recognition (CVPR) , pp. 12820–12829, 2024. Li, Z. and Hoiem, D. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence , 40(12):2935–2947, 2017. Liang, J., Zhong, J., Gu, H., Lu, Z., T ang, X., Dai, G., Huang, S., Fan, L., and Y ang, Q. Diffusion-dri ven data replay: A novel approach to combat for getting in feder- ated class continual learning. In Eur opean Conference on Computer V ision , pp. 303–319. Springer , 2024. Liu, H., Gu, L., Chi, Z., W ang, Y ., Y u, Y ., Chen, J., and T ang, J. Few-shot class-incremental learning via entropy- regularized data-free replay . In European Confer ence on Computer V ision , pp. 146–162. Springer , 2022. Masana, M., Liu, X., T wardo wski, B., Menta, M., Bagdanov , A. D., and V an De W eijer, J. Class-incremental learning: surve y and performance ev aluation on image classifica- tion. IEEE T ransactions on P attern Analysis and Machine Intelligence , 45(5):5513–5533, 2022. Matek, C., Schwarz, S., Marr , C., and Spiekermann, K. A single-cell morphological dataset of leukocytes from aml patients and non-malignant controls (aml- cytomorphology lmu). The Cancer Imaging Ar chive (TCIA)[Internet] , 2019. McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Artificial intelli- gence and statistics , pp. 1273–1282. PMLR, 2017. Mittal, S., Galesso, S., and Brox, T . Essentials for class incremental learning. In Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , pp. 3513–3522, 2021. Ng, D., Lan, X., Y ao, M. M.-S., Chan, W . P ., and Feng, M. Federated learning: a collaborati ve ef fort to achie ve better medical imaging models for individual sites that hav e small labelled datasets. Quantitative Imaging in Medicine and Sur gery , 11(2):852, 2021. Pennisi, M., Salanitri, F . P ., Bellitto, G., Casella, B., Ald- inucci, M., Palazzo, S., and Spampinato, C. Feder: Fed- erated learning through experience replay and priv acy- preserving data synthesis. Computer V ision and Image Understanding , 238:103882, 2024. Pian, W ., Mo, S., Guo, Y ., and Tian, Y . Audio-visual class- incremental learning. In Pr oceedings of the IEEE/CVF International Confer ence on Computer V ision , pp. 7799– 7811, 2023. Pritee, K. and Garg, R. D. Optimized con volutional neural network model for multilev el classification in leuk emia diagnosis using tversky loss. Artificial Intelligence in Health , 0(0):4710, 2025. ISSN 3041-0894. Puli, A., Zhang, L., W ald, Y ., and Ranganath, R. Don’ t blame dataset shift! shortcut learning due to gradients and cross entropy . In Pr oceedings of the 37th International Confer ence on Neural Information Pr ocessing Systems , NIPS ’23. Curran Associates Inc., 2024. Qi, D., Zhao, H., and Li, S. Better generati ve replay for con- tinual federated learning. In The Ele venth International Confer ence on Learning Repr esentations , 2023. Rebuf fi, S.-A., K olesnik ov , A., Sperl, G., and Lampert, C. H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer V ision and P attern Recognition , pp. 2001–2010, 2017. Ren, S., Hu, Y ., Chen, S., and W ang, G. Federated distillation for medical image classification: T owards trustworthy computer -aided diagnosis. arXiv pr eprint arXiv:2407.02261 , 2024. Sadafi, A., Salehi, R., Gruber , A., Boushehri, S. S., Giehr , P ., Nav ab, N., and Marr , C. A continual learning approach for cross-domain white blood cell classification. In MIC- CAI W orkshop on Domain Adaptation and Repr esentation T r ansfer , pp. 136–146. Springer , 2023. 10 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation Shenaj, D., T oldo, M., Rigon, A., and Zanuttigh, P . Asyn- chronous federated continual learning. In Pr oceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , pp. 5055–5063, 2023. van Erven, T . and Harremos, P . R ´ enyi div ergence and kullback-leibler div ergence. IEEE T r ansactions on Infor- mation Theory , 60(7):3797–3820, 2014. doi: 10.1109/ TIT .2014.2320500. W ang, F .-Y ., Zhou, D.-W ., Y e, H.-J., and Zhan, D.-C. Foster: Feature boosting and compression for class-incremental learning. In Eur opean confer ence on computer vision , pp. 398–414. Springer , 2022a. W ang, L., Zhang, X., Su, H., and Zhu, J. A comprehen- siv e survey of continual learning: Theory , method and application. IEEE transactions on pattern analysis and machine intelligence , 46(8):5362–5383, 2024a. W ang, Q., He, Y ., Dong, S., Gao, X., W ang, S., and Gong, Y . Non-exemplar domain incremental learning via cross- domain concept integration. In Eur opean Confer ence on Computer V ision , pp. 144–162. Springer , 2024b. W ang, Y ., Huang, Z., and Hong, X. S-prompts learning with pre-trained transformers: An occam’ s razor for domain incremental learning. Advances in Neural Information Pr ocessing Systems , 35:5682–5695, 2022b. W u, N., Y u, L., Jiang, X., Cheng, K.-T ., and Y an, Z. Fed- noro: towards noise-rob ust federated learning by address- ing class imbalance and label noise heterogeneity . IJCAI ’23, 2023. ISBN 978-1-956792-03-4. W u, X., Xu, Z., and yu T ong, R. K. Continual learning in medical image analysis: A survey . Computers in Biology and Medicine , 182:109206, 2024. ISSN 0010-4825. Xie, J., Y an, S., and He, X. General incremental learning with domain-aware categorical representations. In Pr o- ceedings of the IEEE/CVF confer ence on computer vision and pattern r ecognition , pp. 14351–14360, 2022. Y ang, Q., Liu, Y ., Cheng, Y ., Kang, Y ., Chen, T ., and Y u, H. F ederated Learning . Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2019. ISBN 978-3-031-00457-5. Y e, M., Fang, X., Du, B., Y uen, P ., and T ao, D. Heteroge- neous federated learning: State-of-the-art and research challenges. ACM Computing Surve ys , 56:1 – 44, 2023a. Y e, M., Fang, X., Du, B., Y uen, P . C., and T ao, D. Hetero- geneous federated learning: State-of-the-art and research challenges. ACM Computing Surve ys , 56(3):1–44, 2023b. Zeng, F ., Du, Z., Li, G., Li, C., Li, Y ., He, X., An, Y ., and W ang, H. Rapid detection of white blood cells us- ing hyperspectral microscopic imaging system combined with multi-data faster rcnn. Sensors and Actuator s B: Chemical , 389:133865, 2023. Zhao, L., Lu, J., Xu, Y ., Cheng, Z., Guo, D., Niu, Y ., and Fang, X. Few-shot class-incremental learning via class-aware bilateral distillation. In Pr oceedings of the IEEE/CVF confer ence on computer vision and pattern r ecognition , pp. 11838–11847, 2023. Zhu, Z., Hong, J., and Zhou, J. Data-free knowledge distil- lation for heterogeneous federated learning. In Interna- tional confer ence on machine learning , pp. 12878–12889. PMLR, 2021. 11 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation A. F eDMRA Algorithm. 2 presents the process in Section. 4.2 where the client side fills sample examples according to the allocated category share and updates the local model. Algorithm 2 FeDMRA: Sample Selection of Example Set Input: distributed datasets for t tasks: { D 1 , D 2 , ...D t } , number of clients: C , global model parameters: w g ,t − 1 , partitioned storage size: m y c,t − 1 , initialize the information model parameters: v c , the number of iterations of the client information model: e ; Output: local model w c,t and class distribution; 1: for all t = 1 , 2 , · · · , T do 2: Receive the global model parameters w g ,t − 1 and the allocated storage size m y c,t − 1 ; 3: for c = 1 , 2 , · · · , C do 4: v t c,e ← T ake ¯ D t − 1 c as input of Eq.(9); 5: ¯ G x i ← The av erage gradient norm of the sample ov er e iterations; 6: D t c,cache ← Sample old samples by sorting them in descending order of the gradient by Eq.(10); 7: w c,t ← Update the local model with ¯ D t c by Eq.(13). 8: end for 9: end for B. Datasets and Settings Datasets. W e used three white blood cell datasets. Among them, Matek-19 contains single white blood cell samples from 100 patients with acute myeloid leukemia and 100 non-leukemia patients. Acevedo-20 w as provided by the core laboratory of a hospital in Barcelona and has more than 14,000 images. ( Bodzas et al. , 2023 ) is a high-resolution data set that contains 16,027 annotated normal and pathological white blood cell samples from 78 patients, and we use the alternative name HLwbc hereinafter . Settings. W e constructed multiple continual learning scenarios based on three datasets, including FCIL, FDIL, and FCDIL. The T ab . 5 sho ws the class settings for each task in each scenario. T able 5: T ask settings of FCIL, FDIL, and FCDIL T ask Class Number per T ask FCIL-M 4, 3, 3, 3 FCIL-A 4, 3, 3 FCIL-H 4, 2, 2 FDIL 13, 10, 8 FCDIL 4, 3, 3, 3, 4, 3, 3, 4, 4 Baselines. W e compared with the follo wing methods. Among them, 1) FL+iCaRL( Rebuffi et al. , 2017 ): An adaptation of iCaRL to federated learning frame work, where it computes class prototypes to select and replay ex emplars for kno wledge preservation, while utilizing the standard FedA vg strategy for model aggregation. 2) FL+U A CL( Sadafi et al. , 2023 ): A continual learning algorithm for medical image analysis that addresses catastrophic for getting through a no vel e xemplar selection strategy for replay . It is also adapted to a federated learning setting by employing the standard FedA vg aggre gation method. 3) MFCL( Babakniya et al. , 2023 ): A generativ e model is trained on the server and then shared with the clients. This model is used to sample synthetic examples of past data, which are incorporated into the training process for new tasks. 4) DDDR( Liang et al. , 2024 ): A Latent Diffusion Model is utilized to acquire embeddings from old sample classes. These embeddings are then used to generate pseudo-samples for replay-augmented training. 5) ReF ed( Li et al. , 2024 ): An efficient ex emplar selection method designed to address catastrophic forgetting under heterogeneous data distributions. 6) UP: The optimal performance upper bound, a scenario where all historical data from e very task is fully accessible. Howe ver , this setting serves only as a theoretical reference, as it is both difficult to achie ve and impractical in real-world applications. 12 FeDMRA: F ederated Incr emental Learning with Dynamic Memory Replay Allocation C. Example Memory . T ab . 6 compares the memory allocation strategies for data replay across dif ferent approaches. Specifically , U A CL, iCaRL, and Re-Fed directly store ra w data at the client lev el, allo wing each client to replay 240 historical samples per task, with each class containing 240/ | Y t c | samples. In contrast, MFCL and DDDR maintain their original replay mechanisms: MFCL incorporates pseudo-samples in each training batch, while DDDR generates pseudo-samples per class. Our proposed method dynamically adjusts the memory allocation as tasks progress. T o ensure fair comparison under equi v alent global memory constraints, U A CL, iCaRL, and Re-Fed strictly adhere to the fixed memory budget M , whereas MFCL and DDDR demonstrate significantly higher total replay quantities that substantially exceed M . T able 6: Report the cache size of example-set for each method. T aking FCIL-M as an e xample. Method Memory Size of Each T ask iCaRL+FL (Per-Client) 240 × 5 240 × 6 240 × 6 240 × 7 U A CL+FL (Per-Client) 240 × 5 240 × 6 240 × 6 240 × 7 MFCL (Per-Batch) 128 × BS 128 × BS 128 × BS 128 × BS DDDR (Per-Class) 240 × Y t 240 × Y t 240 × Y t 240 × Y t ReFed (Per-Client) 240 × 5 240 × 6 240 × 6 240 × 7 FeDMRA (Per-Client) 240 × 5 322,322,190,209,152,240 311,400,47,309,132,190 346,61,367,122,370,134,240 * BS - BatchSize D. Computational and Communication Complexity W e analyze the method complexity and communication complexity from the dimensions of time and space. Assume that the number of model parameters updated by the client is P and the computational complexity per sample is d . Communication complexity . The model parameters and category distrib ution information uploaded by each client in each round: O ( R · ( | C | · P + 2( | Y | ))) (14) Computational complexity . When a new task sequence arriv es, the client updates the local information model based on the data from the pre vious task. Subsequently , it trains the model with the ne w data. Therefore, the computational complexity of the client is as follows: O R · e · X c ¯ D t c · d + X c ¯ D t − 1 c · d !! ≈ O (2 R · e X c D c · d ) (15) The computational complexity when the serv er dynamically allocates example set space and fuses model parameters is: O ( R · ( | C | + 1) · P ) + O ( | Y | ) (16) Merge Eq. (15) and Eq. (16), the total computational complexity of FeDMRA is: O R (2 | C | + 1) · P + 2 | Y | + 2 e X c ( D c · d ) ! + | Y | ! (17) E. Discussion W e address the issue of exacerbated for getting in non-IID data under complex tasks. While con v entional data replay schemes raise priv acy concerns, we would like to clarify that FeDMRA is not limited to the form of data replay but focuses on allocating storage space for data. In addition, our experiment focuses on the dataset of blood cell disease classification and constructs a rich task scenario. Howe ver , relying on a single data set may limit the generality of model for data from other sources. In fact, it is dif ficult to collect medical datasets from multiple domains. 13
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment