FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios

Federated Learning (FL) enables distributed optimization without compromising data sovereignty. Yet, where local label distributions are mutually exclusive, standard weight aggregation fails due to conflicting optimization trajectories. Often, FL met…

Authors: Andrea Moleri, Christian Internò, Ali Raza

FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios
F ederatedF actory: Generativ e One-Shot Learning for Extremely Non-I ID Distributed Scenarios Andrea Moleri 1 , 2 * , Christian In ternò 3 * , Ali Raza 1 , Markus Olhofer 1 , Da vid Klindt 4 , F abio Stella 2 † , and Barbara Hammer 3 † 1 Honda Re search Institute Europ e, German y 2 Univ ersity of Milan-Bico cca, Italy 3 Bielefeld Universit y , Germany 4 Cold Spring Harb or Lab oratory , U.S. * Equal c ontribution. † Co-advise d. Abstract. F ederated Learning ( FL ) enables distributed optimization without compromis- ing data so vereign ty . Y et, where lo cal lab el distributions are mutually exclusive, standard w eight aggregation fails due to conflicting optimization tra jectories. Often, FL metho ds rely on pretrained foundation models, introducing unrealistic assumptions. W e introduce Fed- era tedF a ctor y , a zero-dep endency framew ork that inv erts the unit of federation from discriminativ e parameters to generativ e priors. By exchanging generativ e mo dules in a sin- gle communication round, our architecture supp orts ex nihilo synthesis of univ ersally class- balanced datasets, eliminating gradient conflict and external prior bias entirely . Ev aluations across diverse medical imagery b enchmarks, including MedMNIST and ISIC2019 , demon- strate that our approach reco vers centralized upp er-bound p erformance. Under pathological heterogeneit y , it lifts baseline accuracy from a collapsed 11 . 36% to 90 . 57% on CIF AR-10 and restores ISIC2019 A UR OC to 90 . 57% . Additionally , this framework facilitates exact mo dular unlearning through the deterministic deletion of sp ecific generativ e mo dules. Keyw ords: F ederated Learning · Generative Synthesis · Non-I ID Data 1 In tro duction FL pro vides a decen tralized framework to optimize statistical mo dels across K distinct clients while strictly preserving lo cal data so v ereignt y [36]. How ev er, the theoretical con v ergence guaran tees of traditional FL rely on indep enden tly and iden tically distributed (I ID) data, an assumption violated b y extreme statistical heterogeneit y in real-w orld applications lik e m ulti-institutional medical imaging [40, 41]. The structural fragility of standard parameter aggregation is exposed most aggressively under pathological lab el skew [21, 25], where eac h client holds samples from only a few, or even just one, class of data. Sp ecifically , w e define and analyze the extreme single-class silo regime, where each client k ∈ { 1 , . . . , K } p ossesses a lo cal dataset D k that contains exclusively one class, yielding mutually disjoint lab el supports Y k . Optimizing shared discriminativ e parameters w fails unconditionally in this setting. Lac king the counterfactual data necessary to form an in ter- class decision b oundary , the lo cal empirical risks L k ( w ) produces optimization tra jectories that in terfere with one another. T o a void communication b ottlenec ks and conflicts in gradient tra jectories of iterative FL , One- Shot F ederated Learning ( OSFL ) attempts to aggregate knowledge in a single round of communi- cation [14, 32]. Recent generativ e OSFL metho ds synthesize datasets using pretrained F oundation Mo dels ( FM s) as universal priors [2, 49]. While effective for general domains, this dep endency is unreliable for sp ecialized applications suc h as medical diagnosis [23, 35, 50]. By pro jecting a rare target distribution onto an external representation space, these metho ds effectiv ely discard the off-manifold feature comp onen ts x ⊥ that constitute the true diagnostic signal [16]. W e introduce Federa tedF a ctor y , a framework that aims to reco ver the centralized upper- b ound performance under the extreme single-class silo assumption via a no vel, strictly zero- dep endency Generativ e OSFL architecture. W e inv ert the unit of federation from discriminative Correspondence: a.moleri@campus.unimib.it; christian.interno@uni-bielefeld.de Code : https://gith ub.com/andreamoleri/F ederatedF actory F ederatedF actory 2 parameter matrices W to lo calized generativ e prior parameters θ k . Eac h client indep endently trains and transmits a generativ e F actory G θ k exactly once, supp orting b oth a cen tralized ar- c hitecture (in consortiums where a cen tral aggregator can b e trusted) and a fully decentralized P eer-to-Peer (P2P) net work mesh (in consortiums in which a central aggregator cannot b e trusted). Sp ecifically to our work, we use the computationally efficien t EDM2 diffusion mo del [28, 29]. The serv er concatenates these decoupled generative structures into a universal prior, synthe- sizing class-balanced datasets ex nihilo from a standard latent space Z . By relying exclusively on mo dels trained directly on the true lo calized data distributions, Federa tedF actor y explicitly a voids pro jection errors and eliminates the external prior bias inheren t to FM -relian t metho ds. [ Hyp othesis : W e hypothesize that shifting the unit of federation from discriminativ e parameters to generative priors enables OSFL to achiev e p erformance comparable to cen- tralized baselines in pathologically non-I ID scenarios, strictly without raw data exchange or reliance on external FM s. W e summarize our con tributions as follows: – Robustness to Extreme Heterogeneit y: Federa tedF a ctor y recov ers centralized per- formance under pathological single-clas s silos where standard methods collapse (e.g, CI- F AR10 [31] Accuracy 11 . 36% → 90 . 57% , and ISIC2019 [43] A UROC 47 . 31% → 90 . 57% ). – Zero-Dep endency F ederation: By decoupling data synthesis from external pre-trained FM s, our proto col relies exclusiv ely on lo calized priors. As formalized in Theorem 1, this b ounds the global risk strictly by the lo cal generative error ( ¯ ϵ ). – One-Shot Communication Efficiency: Federa tedF a ctor y relies on a single communi- cation round ( C rounds = 1 ), a voiding exp ensive iterations. – Mo dular Machine Unlearning: The framework guarantees exact mo dular unlearning [10]. Remo ving a clien t requires only the structural deletion of their corresponding parameter co- ordinates ( Γ : ,k ← ∅ ). 2 Related W ork Optimization Under Distribution Shift. F ederated metho ds (e.g., FedDyn [1], FedPro x [33], SCAFF OLD [27]) handle statistical heterogeneit y by b ounding lo cal updates. Ho wev er, they fail when lo cal lab el sets are en tirely disjoint ( Y i ∩ Y j = ∅ ). Despite strong collaboration incen tives in this extreme regime (e.g., isolated hospitals needing generalized models), disjoint lab els cause activ ely divergen t gradients [51]. Without o verlapping classes to anc hor a shared feature space, pro ximal constraints cannot align these conflicting tra jectories to form coherent inter-class decision b oundaries [33]. One-Shot FL and External Priors. OSFL b ypasses iterativ e gradien t conflicts via a single upstream transmission. Recen t diffusion frameworks (e.g., FedLMG [46], FedSDE [38], Fed- DEO [47]) synthesize datasets cen trally but rely heavily on pretrained FM s (e.g., CLIP [39], Stable Diffusion [4]). Relying on an external manifold M FM inheren tly pro jects aw ay rare features x ⊥ ; in medical imaging, this risks texture bias and semantic hallucination [11]. W e eliminate this FM dep endency by transmitting lo cally trained generative priors θ k , ac hieving ex nihilo one-shot syn thesis exclusiv ely from true local distributions. Other approaches collaboratively train genera- tiv e models to construct syn thetic datasets. Notably , the Diffusion F ederated Dataset [15] models generation as co operative sampling from diffusion energy-based models. While analytically robust, its reliance on iterativ e communication rounds increases communication o verhead. 3 Bac kground and Mathematical Preliminaries F e der ate d L e arning under Single-Class Silo R e gime. In standard FL model aggregation, a decen tralized netw ork of K clien ts aims to aggregate lo cal empirical risks o ver shared discrim- inativ e parameters w ∈ R m in order to effectively approximate the global optim um. The de F ederatedF actory 3 1 2 3 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Density (a) IID / Uniform ( α → ∞ ) 1 2 3 0 0 . 2 0 . 4 0 . 6 0 . 8 1 (b) Dirichlet Sk ew ( α = 0 . 5 ) 1 2 3 0 0 . 2 0 . 4 0 . 6 0 . 8 1 (c) Single-Class Silo ( α → 0 ) Fig. 1: The Spectrum of Heterogeneit y . (a) Ideal I ID data (uniform o verlap). (b) Dirichlet- distributed sk ew (imbalanced ov erlap). (c) Single-Class Silo (pathologically disjoint supp orts), represen t- ing the extreme theoretical limit.  T ak eaw a y I: Under the single-class silo r egime , standard FL collapses. This extreme label sk ew induces gradient conflict across clients, rendering standard parameter aggregation incapable of forming a coherent global decision b oundary . facto standard FL approac h [36] formally attempts to minimize the global ob jective function min w L ( w ) := 1 K P K k =1 L k ( w ) , where L k ( w ) = E ( x,y ) ∼ p k [ ℓ ( w ; x, y )] is the lo cal ob jective repre- sen ting the exp ected loss ov er the true data distribution p k at client k , and ℓ denotes the p er-sample loss function. While aggregators suc h as FedA v g [36] succeed when the local datasets D k are I ID, this paradigm collapses under severe non-IID settings. W e parameterize lab el skewness via a Dirichlet distribution Dir C ( α ) . As α → 0 in a cross-silo setting with K clients and C global classes, we reac h the pathological Single-Class Silo regime (Figure 1c). Here, each client dataset D k con tains exactly one unique class, yielding strictly disjoint lab el sets ( Y i ∩ Y j = ∅ for i  = j ). Consequen tly , lo cal optimization tra jectories diverge, causing sev ere gradient conflict [48, 51]. F e der ate dF actory F ormal Pr oblem Statement. Our ob jectives challenge the standard set- ting defined in Sec. 3. W e op erate strictly within the cr oss-silo regime, which stands in contrast to the more common cr oss-devic e setting. While cr oss-devic e learning inv olv es massiv e p opulations of transient, resource-constrained mobile or IoT units, the cr oss-silo regime is characterized by a small, fixed num ber of stakeholders (such as hospitals or financial institutions) p ossessing lo calized, high-capacit y computational resources and p ersisten t connectivity [26]. W e aim to design a FL optimization proto col A that recov ers the optimal cen tralized decision b oundary while op erating under severe decentralized restrictions. Let D union = S K k =1 D k denote the inaccessible theoretical cen tralized dataset, and let w ∗ = arg min w E ( x,y ) ∼D union [ ℓ ( w , x, y )] represen t the optimal param- eters obtained via centralized training. W e mo del the federated proto col A as a set of lo cal client functions f k generating upstream messages M k = f k ( D k ) , and a serv er aggregation function g suc h that the final synthesized model has parameters w A = g ( M 1 , . . . , M K ) . W e formalize this ob jective as a constrained optimization problem seeking to minimize the excess global risk: min f 1 ,...,f K ,g   E ( x,y ) ∼D union [ ℓ ( w A , x, y )] − E ( x,y ) ∼D union [ ℓ ( w ∗ , x, y )]   s.t. C1 (P athological Skew): supp ( p i ( y )) ∩ supp ( p j ( y )) = ∅ , ∀ i  = j C2 (Zero-Dep endency): w A without external prior θ FM C3 (Strict So vereign ty): x / ∈ M k , ∀ x ∈ D k C4 (One-Shot Comm.): C rounds = 1 (1) Here, ℓ is the p er-sample task-sp ecific loss function. C1 mandates conv ergence in the extreme limit of the Diric hlet distribution ( α → 0 ), where p i ( y ) denotes the marginal lab el distribution at clien t i . In this pathological regime, discriminative gradient tra jectories are actively opp osed due to completely disjoint lab el supp orts. C2 prohibits reliance on external foundation mo dels ( M FM ) or public proxy datasets to syn thesize missing coun terfactuals, isolating the framew ork from external prior bias. C3 guarantees no ra w samples are transmitted in the uplink communication messages M k . Finally , C4 restricts the system to exactly one async hronous upstream comm unication p er F ederatedF actory 4 clien t (where C rounds is the total n umber of comm unications), permanently eliminating iterative comm unication ov erhead and bidirectional synchronization requirements. 3.1 Theoretical Guaran tee of Zero-Dep endency Conv ergence Recen t generative OSFL framew orks ensure global con vergence using server-side F oundation Mo d- els ( FM s) [38, 46, 47], assuming the FM ’s distribution p FM ( x ) sufficien tly ov erlaps with lo cal client data p k ( x ) . This o v erlap is b ounded b y λ , whic h quan tifies the maxim um OOD penalty b et ween the pre-trained manifold and priv ate data. While effective for natural images, this assumption fails in sp ecialized mo dalities (e.g, clinical) where severe domain shifts cause λ → ∞ . T o motiv ate Federa tedF actor y , we establish a conv ergence guarantee that completely by- passes this FM o verlap assumption. Let p union ( x , y ) b e the inaccessible true global join t distribu- tion. Under the Single-Class Silo constraint ( C1 ) with disjoint lab el supp orts, p union reduces to a strict mixture of lo cal marginals p k ( x ) w eighted by empirical proportions π k = |D k | / |D union | : p union ( x , y ) = K X k =1 π k p k ( x ) I ( y = y k ) (2) where I ( · ) is the indicator function isolating the lo calized class. By transmitting only the lo calized generative prior θ k ( C3 , C4 ), the server constructs a fully syn thetic global distribution ˆ p syn ( x , y ) ex nihilo , complying with the zero-dependency constrain t ( C2 ): ˆ p syn ( x , y ) = K X k =1 π k p θ k ( x ) I ( y = y k ) (3) T o pro ve conv ergence without external priors, we establish tw o standard assumptions: Assumption 1 (Lo cal Generative Conv ergence). The lo c al diffusion tr aining obje ctive (ELBO) dir e ctly minimizes the Kul lb ack-L eibler (KL) diver genc e. W e assume this lo c al optimiza- tion err or is b ounde d by ϵ k for al l clients k ∈ { 1 , . . . , K } : KL( p k ( x ) ∥ p θ k ( x )) ≤ ϵ k (4) Assumption 2 (Bounded Risk F unction). The task-sp e cific p er-sample loss function ℓ ( w , x , y ) is b ounde d by a c onstant M > 0 , such that sup w , x ,y ℓ ( w , x , y ) ≤ M . Lemma 1 (Global Manifold Recov ery). Under c onstr aint C1 and Assumption 1, the KL diver genc e b etwe en the true glob al distribution and the zer o-dep endency synthetic distribution is strictly b ounde d by the weighte d sum of the lo c al diffusion err ors: KL( p union ∥ ˆ p syn ) ≤ P K k =1 π k ϵ k . Pr o of. Based on the fundamental definition of the joint KL divergence, w e expand the integral: KL( p union ∥ ˆ p syn ) = X y Z p union ( x , y ) log p union ( x , y ) ˆ p syn ( x , y ) d x (5) Because the lab el supp orts are completely disjoint ( C1 ), for any giv en class y k , the cross-terms of the mixture ev aluate exactly to zero. The summation o ver y thus collapses perfectly to the individual clien t indices k . Substituting the marginal definitions: = K X k =1 Z  π k p k ( x )  log π k p k ( x ) π k p θ k ( x ) d x = K X k =1 π k KL( p k ( x ) ∥ p θ k ( x )) (6) Substituting the lo cal conv ergence b ound from Assumption 1 yields the final ine qualit y . ■ W e define L true ( w ) = E ( x ,y ) ∼ p union [ ℓ ( w , x , y )] as the true cen tralized risk, and L syn ( w ) = E ( x ,y ) ∼ ˆ p syn [ ℓ ( w , x , y )] as the surrogate risk ev aluated on the synthesized dataset. Let ¯ ϵ = q 1 2 P K k =1 π k ϵ k denote the aggregate generativ e error mapp ed to the T otal V ariation (TV) space. F ederatedF actory 5 Theorem 1 (Zero-Dep endency Aggregation). Under Assumptions 1 and 2, the exc ess glob al risk of the classifier w A = arg min w L syn ( w ) tr aine d exclusively on the gener ate d synthetic distribution, c omp ar e d to the optimal c entr alize d classifier w ∗ = arg min w L true ( w ) , is strictly b ounde d by: L true ( w A ) | {z } F eder ated Classifier − L true ( w ∗ ) | {z } Centr alize d Classifier ≤ 2 M ¯ ϵ | {z } Max Penalty (7) w Risk L L true L syn w ∗ w A ≤ 2 M ¯ ϵ Pr o of. By the in tegral definition of the T otal V ariation (TV) distance and Pinsk er’s inequality , the exp ected risk deviation for any arbitrary classifier w is strictly b ounded by: |L true ( w ) − L syn ( w ) | ≤ M · TV ( p union ∥ ˆ p syn ) ≤ M r 1 2 KL( p union ∥ ˆ p syn ) ≤ M ¯ ϵ (8) Because the federated classifier w A is optimized to minimize the syn thetic risk, it holds that L syn ( w A ) ≤ L syn ( w ∗ ) . Applying the bound M ¯ ϵ symmetrically transitions betw een true and syn- thetic risks: L true ( w A ) ≤ L syn ( w A ) + M ¯ ϵ ≤ L syn ( w ∗ ) + M ¯ ϵ ≤ L true ( w ∗ ) + 2 M ¯ ϵ (9) whic h concludes the pro of. ■ Remark. Theorem 1 formally demonstrates that global conv ergence is achiev able under con- strain ts C1 – C4 . By exchanging generativ e priors, the global distribution appro ximation error is en tirely b ounded by ϵ k . Unlike FM -dep endent metho ds where the b ound is dominated by a rigid, often infinite pro jection error ( λ ) caused b y out-of-distribution lo cal data, Federa tedF actor y ’s error approac hes zero simply by training the lo cal mo dels to standard conv ergence. 4 Metho dology T o satisfy the optimization ob jective outlined in Sec. 3, Federa tedF a ctor y abandons discrim- inativ e parameter aggregation entirely . Under the pathological single-class silo regime ( C1 ), gra- dien t tra jectories are in conflict [48]. W e resolv e this by replacing the unit of communication from discriminativ e gradien ts to localized generativ e prior parameters, establishing a zero-dep endency ( C2 ), no-data sharing ( C3 ), one-shot framew ork ( C4 ) for distributed synthesis. 4.1 F ederatedF actory Instead of transmitting parameter up dates, which actively conflict across disjoin t lab el spaces [48], Federa tedF actor y relies on the transmission of generative mo del parameters. W e define the F actory as a lo calized, self-con tained generative mo dule. Each client k ∈ { 1 , . . . , K } indep enden tly optimizes a F actory on its priv ate dataset D k . While our framew ork is arc hitecture-agnostic, w e sp ecifically instantiate the F actory using the Score-based Diffusion mo del EDM2 [29]. In this con text, the “generativ e blueprint” consists of the denoising function G θ k . Unlik e tradi- tional Auto encoders [8], diffusion models do not utilize a deterministic enco der to compress data in to a latent b ottlenec k. Instead, the reverse denoising pro cess acts as the fundamen tal mapping from a standard normal laten t space Z ∼ N ( 0 , I ) to the learned lo cal manifold ˆ M k [20]. While we empirically focus on Diffusion for its high-fidelit y clinical synthesis, Federa ted- F actor y natively supp orts architectures with explicit encoder-deco der splits (e.g., V AEs [30]) or adv ersarial mappings (GANs [13]), where the shared parameters θ k w ould corresp ond to the ph ys- ical deco der or generator. Reducing the communication pa yload to indep enden t F actories enables flexible global synthesis. Sp ecifically , w e instan tiate this framew ork through tw o distinct op era- tional mo des: (A) a centralized arc hitecture designed for consortiums equipp ed with a trusted aggregator (Sec. 4.1), and (B) a fully decentralized peer-to-p eer ( P2P ) mesh optimized for envi- ronmen ts in which a centralized en tity cannot b e trusted (Sec. 4.1). F ederatedF actory 6  T ak ea wa y II: By transmitting generativ e F actories, Federa tedF actor y enables a blueprin t-based ex nihilo generation of class-balanced datasets without relying on ov erlapping data supp orts or external FM s. Pr oto c ol A: Centr alize d Synthesis. Under this configuration (Figure 2), the framew ork des- ignates the cen tral aggregator as the p oin t of data generation, restricting netw ork ov erhead to exactly one upstream transmission p er client ( C rounds = 1 ). The server aggregates the generative mappings into a unified library Θ = { G θ 1 , . . . , G θ K } . Exploiting the universalit y of the standard normal laten t space Z , the serv er samples noise vectors z ∼ N ( 0 , I ) and pro jects them through the resp ectiv e F actory . This materializes a fully syn thetic, class-balanced global dataset ˆ D syn ex nihilo , with an arbitrarily large n umber of samples whose div ersity is b ounded by the entrop y of the lo calized generative priors: ˆ D syn = K [ k =1 { ( ˆ x , y k ) | ˆ x = G θ k ( z ) , z ∼ N ( 0 , I ) } (10) W e then optimize a global classifier w (ResNet-50 [17]) exclusiv ely on ˆ D syn . W e adopt this mo del to isolate the impro vemen ts of our generativ e framew ork without introducing architectural b ottle- nec ks. W e reform ulate the standard FL parameter aggregation in to an optimization ov er a global surrogate ob jective: min w 1 K K X k =1 E z ∼N ( 0 , I ) [ ℓ ( w ; G θ k ( z ) , y k )] (11) Because this syn thesis relies exclusively on lo calized generative priors optimized directly on the true data manifolds, Federa tedF a ctor y ensures that the generated distribution includes the rare diagnostic supp ort (as required in Sec. 3.1). Client A (Class 1) Client B (Class 2) Client C (Class 3) Server (Aggregator) Real D 1 F actory A x ∼ p 1 ( x ) Real D 2 F actory B x ∼ p 2 ( x ) Real D 3 F actory C x ∼ p 3 ( x ) (i) Lo cal T raining (e.g. EDM2 [29]) G θ 1 G θ 2 G θ 3 z ∼ N z ∼ N z ∼ N Syn ˆ D 1 Syn ˆ D 2 Syn ˆ D 3 ˆ x = G θ 1 ( z ) ˆ x = G θ 2 ( z ) ˆ x = G θ 3 ( z ) (ii) Upload (iii) Generative Sampling: z → ˆ x Params θ 1 Params θ 2 Params θ 3 Global Classifier Inference (iv) Global T raining (e.g. ResNet-50 [17]) Fig. 2: Cen tralized F ederatedF actory Protocol. Aggregated F actories pro duce a fully syn thetic dataset ˆ D , enabling a global classifier training without raw data access. Pr oto c ol B: De c entr alize d Synthesis. Under this configuration (Figure 3), each client k broad- casts its generative prior G θ k to all participating peers. Up on receiving the complement F actories F ederatedF actory 7 for all disjoint classes j  = k , every client k lo cally synthesizes the missing distributions, construct- ing a h ybrid dataset D mix k in whic h lo cal real data D k is augmen ted with synthetic samples ˆ x : D mix k = D k ∪ [ j  = k  ( ˆ x , y j ) | ˆ x = G θ j ( z ) , z ∼ N ( 0 , I )  (12) Clien t k then optimizes a lo cal discriminative exp ert classifier f w k exclusiv ely on D mix k . A t inference time, the distributed me sh aggregates global decisions using a Pro duct of Exp erts ( PoE ) formu- lation [19]. Unlike standard mixture-based ensembling, which acts as a logical disjunction and dilutes predictive certain ty , PoE functions as a strict intersection of constraints. F or a target sam- ple x , the global inference probabilit y is the renormalized pro duct: p PoE ( y | x ) = 1 Z Q K k =1 p k ( y | x ) where Z is the partition function. This enforces a strict consensus veto. T o preven t a single ov er- confiden t but incorrect exp ert from indiscriminately zeroing out the consensus, the aggregation op erates in log-space with a strict minimum probabilit y flo or. Th us, if a lo cal expert assigns a near-zero probabilit y to a spurious feature, the aggregate probabilit y strictly collapses, preserving high-confidence decision b oundaries across the distributed netw ork. Clien t A (Class 1) Clien t B (Class 2) Clien t C (Class 3) Real D 1 F actory A x ∼ p 1 ( x ) G θ 2 G θ 3 Union ˆ D A Real ∪ S yn Model w A Self D 1 Real D 2 F actory B x ∼ p 2 ( x ) G θ 1 G θ 3 Union ˆ D B Real ∪ S yn Model w B Self D 2 Real D 3 F actory C x ∼ p 3 ( x ) G θ 1 G θ 2 Union ˆ D C Real ∪ S yn Model w C Self D 3 Broadcast θ 1 Broadcast θ 2 Broadcast θ 3 Product of Experts Inference (i) Lo cal T raining (e.g. EDM2 [29]) (ii) Broadcast (iii) Classifier T raining (e.g. ResNet-50 [17]) (iv) Inference: ˆ P ( y | x ) ∝ Q P k ( y | x ) Fig. 3: Decen tralized F ederatedF actory Protocol. This architecture depicts the P2P top ology where lo cal data flo ws ( x ∼ p k ( x ) ) are augmente d with synthetic samples from broadcasted F actories to train lo cal exp erts, aggregated via P oE. 4.2 Multi-Class Generalization Federa tedF actor y natively extends to multi-class configurations. When clients possess data spanning multiple classes, each client k ∈ { 1 , . . . , K } indep enden tly trains a class-specific F actory G θ k ( z , c ) for every class c ∈ Y k presen t in its lo cal dataset. T o preserv e the true global prior probabilit y of the classes during cen tralized aggregation, the serv er requires the lo cal sample coun ts n c,k = |D c,k | . During global data synthesis, the serv er dra ws m c,k laten t vectors for each sp ecific F actory , strictly ensuring that the syn thesized contribution is proportional to the local empirical density ( m c,k ∝ n c,k ). T o bypass the destructive non-con vexit y of parametric av eraging, F ederatedF actory 8 (1) V ertical k 1 k 2 k 3 c 1 c 2 c 3 G θ c,k G θ c,k G θ c,k ∅ ∅ ∅ G θ c,k G θ c,k G θ c,k (2) Horizontal k 1 k 2 k 3 c 1 c 2 c 3 G θ c,k ∅ G θ c,k G θ c,k ∅ G θ c,k G θ c,k ∅ G θ c,k (3) T argeted k 1 k 2 k 3 c 1 c 2 c 3 G θ c,k G θ c,k G θ c,k G θ c,k ∅ G θ c,k G θ c,k G θ c,k G θ c,k Client Remov al Γ : ,k ← ∅ Concept Erasure Γ c, : ← ∅ Specific Intersection Γ c,k ← ∅ Fig. 4: Mo dular Unlearning Mo des in the Generative Matrix Γ . By structuring the global mo del as a disjoin t union of class-client generators G θ c,k , Federa tedF actor y enables exact erasure without retraining the entire ensem ble. whic h frequently shifts mature w eights into high-loss regions, the serv er instead p erforms data- sp ac e weighte d aggr e gation . F or a target sample size N target of class c , and clien ts S c ⊆ { 1 , . . . , K } p ossessing this class, the serv er assigns a generation quota Q c,k to eac h lo cal F actory G θ c,k : Q c,k = N target n c,k P j ∈S c n c,j . This allocation ensures the syn thetic global dataset ˆ D syn mirrors its empirical sources. Large institutions establish the manifold’s backbone, while smaller clinics inject sto c hastic div ersity . This preserv es rare morphological subtypes and scales the global distribution seamlessly , bypassing the in terference of weigh t aggregation across imbalanced mo dels. The multi-class expansion organizes the aggregated mo dels into a “Visual Memory” [10], defining a Gener ative Matrix Γ ∈ F C × K , with C global classes, K clients, and mapping space F . Each entry Γ c,k holds the lo calized F actory G θ c,k (or ∅ if lo cally absent). Under a strict single-class silo regime ( α → 0 ), Γ collapses into a sev erely sparse diagonal, with each clien t contributing exactly one v alid prior. Under general m ulti-class heterogeneity , it p opulates organically based on lo cal lab el supp orts. 4.3 Modular Mac hine Unlearning and Exact Erasure Decoupling the generative pro cess in to Γ structurally allows for mo dular unlearning [6]. Standard FL densely entangles represen tations across global weigh ts, making localized data deletion in- tractable and typically requiring complete retraining to satisfy right-to-be-forgotten mandates [12]. In Federa tedF a ctor y , the global represen tation is a discrete union of parameter-indep enden t mo dules. In the single-class silo regime, unlearning trivially requires excising the target client’s generativ e prior: Θ new = Θ \ { G θ k } . The server then discards the asso ciated synthetic samples and retrains the cen tralized classifier. In decentralized P2P setups, this is mirrored lo cally: p eers delete the rev oked prior G θ k , flush related syn thetic data, and retrain exp erts. Extending this logic through Γ (Figure 4) enables three gran ular mo des of exact data erasure. V ertical Unlearning (Clien t Remo v al): The serv er nullifies a column ( Γ : ,k ← ∅ ) to compre- hensiv ely forget clien t k . Horizontal Unlearning (Concept Erasure): The serv er executes a ro w-wise deletion ( Γ c, : ← ∅ ) to remov e an obsolete or restricted class c consortium-wide. T ar- geted Unlearning (Sp ecific Intersection): The server zeroes an exact co ordinate ( Γ c,k ← ∅ ) to remo ve a sp ecific class subset from a sp ecific client. F ollo wing an y de letion, the server clears the inv alidated synthetic buffer ˆ D syn and retrains the cen tralized classifier. Because the target manifold’s underlying generative prior is eradicated, we guaran tee exact data remo v al ( ˆ D syn ∩ ˆ M c,k = ∅ ). This ac hieves true exact unlearning without appro ximations [6, 12, 44]. F ederatedF actory 9 T able 1: Robustness to Extreme Statistical Heterogeneit y . Mean ± SD of Accuracy and AUR OC (%) under mo derate ( α = 0 . 1 ) and pathological ( α → 0 ) lab el sk ew. Best ov erall p erformance ( Red ). Bold indicates the b est FL metho d. Method CIF AR Bloo dMNIST PathMNIST Dirichlet Silos Dirichlet Silos Dirichlet Silos Acc AUC Acc AUC Acc AUC Acc AUC Acc AUC Acc AUC F edA vg [36] 89.76 ± 1.12 99.22 ± 0.19 11.36 ± 1.28 50.91 ± 1.79 83.46 ± 5.18 97.57 ± 1.11 21.88 ± 5.39 55.23 ± 7.17 73.79 ± 16.86 90.11 ± 16.05 18.05 ± 0.80 50.12 ± 1.76 F edDyn [1] 61.82 ± 11.47 94.87 ± 2.65 10.12 ± 0.24 51.48 ± 0.83 64.93 ± 8.63 92.02 ± 2.21 19.47 ± 0.00 51.44 ± 2.87 60.99 ± 8.66 92.71 ± 4.49 17.76 ± 0.80 47.57 ± 3.22 F edProx [33] 89.67 ± 1.15 99.21 ± 0.19 18.08 ± 1.18 67.06 ± 1.02 84.11 ± 5.38 97.61 ± 1.00 20.18 ± 1.57 53.69 ± 3.32 77.09 ± 11.89 96.01 ± 4.44 18.93 ± 0.57 52.97 ± 2.67 Scaffold [27] 79.26 ± 5.10 97.73 ± 0.86 12.99 ± 0.75 54.83 ± 1.08 82.31 ± 2.97 97.51 ± 0.75 22.60 ± 1.96 67.84 ± 2.98 71.88 ± 9.70 95.61 ± 1.88 31.15 ± 2.10 64.87 ± 2.79 FedF act (Cent.) – – 84.30 ± 0.66 98.24 ± 0.13 – – 91.17 ± 0.26 99.04 ± 0.06 – – 67.94 ± 3.84 93.60 ± 1.07 FedF act (P2P) – – 90.57 ± 0.09 99.14 ± 0.02 – – 86.38 ± 0.31 98.36 ± 0.07 – – 67.03 ± 2.28 91.48 ± 0.64 Centralized Upper Bound 94.69 ± 0.33 99.75 ± 0.03 94.69 ± 0.33 99.75 ± 0.03 91.23 ± 0.87 99.18 ± 0.13 91.23 ± 0.87 99.18 ± 0.13 84.82 ± 0.81 96.96 ± 0.40 84.82 ± 0.81 96.96 ± 0.40 Method RetinaMNIST ISIC2019 Dirichlet Silos Dirichlet Silos Acc AUC Acc AUC Acc AUC Acc AUC F edA vg [36] 45.20 ± 3.80 55.02 ± 7.96 43.50 ± 0.00 48.70 ± 1.90 60.35 ± 6.94 76.73 ± 12.33 48.22 ± 0.00 47.31 ± 1.78 F edDyn [1] 45.25 ± 3.91 54.02 ± 8.87 43.50 ± 0.00 48.70 ± 1.92 53.63 ± 5.48 68.48 ± 7.81 48.22 ± 0.00 43.28 ± 0.99 F edProx [33] 45.45 ± 4.36 54.52 ± 8.70 43.50 ± 0.00 48.74 ± 1.91 62.83 ± 2.41 83.21 ± 1.60 48.22 ± 0.00 45.49 ± 0.78 Scaffold [27] 43.50 ± 0.00 52.52 ± 6.15 43.50 ± 0.00 48.70 ± 1.90 50.03 ± 1.75 64.37 ± 6.19 48.22 ± 0.00 44.33 ± 1.02 FedF act (Cent.) – – 46.75 ± 1.29 70.29 ± 1.06 – – 62.08 ± 0.59 84.98 ± 0.57 FedF act (P2P) – – 49.30 ± 1.25 71.79 ± 0.64 – – 69.94 ± 0.46 90.57 ± 0.08 Centralized Upp er Bound 47.20 ± 0.78 69.69 ± 1.00 47.20 ± 0.78 69.69 ± 1.00 70.38 ± 0.69 90.37 ± 0.74 70.38 ± 0.69 90.37 ± 0.74 Dirichlet Silos 0 20 40 60 80 100 Accuracy (%) CIF AR Dirichlet Silos 0 20 40 60 80 100 BloodMNIST Dirichlet Silos 35 40 45 50 55 60 RetinaMNIST Dirichlet Silos 0 20 40 60 80 100 P athMNIST Dirichlet Silos 40 50 60 70 80 90 ISIC2019 F edA vg F edDyn F edP r o x Scaffold F edF act (Centralized) F edF act (Decentralized) Centralized Upper Bound Fig. 5: Results in P athological Heterogeneity . While standard baselines ( F edA vg ◦ , F edDyn □ , F edPro x △ , Scaffold ⋄ ) collapse as we mov e from mo derate sk ew ( α = 0 . 1 ) to extreme silos ( α → 0 ), Federa tedF a ctor y matc hes the Upper Bound (—) in b oth Centralized ( + ) , Decen tralized ( ⋆ ) configurations. 5 Exp erimen ts and Results Exp erimental Setup. T o v alidate Federa tedF a ctor y , we ev aluate across three dataset regimes: (1) CIF AR-10 [31] as a standard high-v ariance baseline for foundational stabilit y; (2) three MedMNIST [45] subsets ( BloodMNIST , RetinaMNIST , P a thMNIST ) to test div erse mor- phological heterogeneity; and (3) ISIC2019 [43] as a high-resolution stress test for rare dermato- scopic classes. Across all configurations, we emplo y an adaptiv e ResNet-50 [17] bac kb one trained via an identical SGD optimizer (batch size 128, w eight decay 1 × 10 − 4 , initial learning rate 0.1 with cosine annealing [34]) and domain-sp ecific augmentations ov er equiv alent computational budgets (300 centralized/syn thetic ep o c hs vs. 200 rounds of 5 local ep ochs for federated baselines). F ollow- ing preliminary grid-search tuning, we rep ort the mean and standard deviation of T est Accuracy and macro-a v eraged One-vs-Rest (OvR) AUR OC across five indep enden t random seeds for all configurations. The cen tralized upp er b ound is explicitly trained on the theoretically inaccessible global dataset D union . R esults. T able 1 and Figure 5 quantify the transition from mo derate heterogeneit y (Dirichlet α = 0 . 1 ) to the Single-Class Silo extreme. Under mo derate sk ew, FedPro x [33] and SCAFFOLD [27] main tain con vergence. How ev er, in strictly disjoin t silos, parameter aggregation degenerates in to a random walk, causing unconditional baseline failure. FedA v g on CIF AR plunges from 89.76% → 11.36% accuracy , while on imbalanced sets like ISIC2019 , global mo dels collapse to ma jority- class prediction (e.g., FedPro x drops from 62.83% → 48.22%). Conv ersely , Federa tedF actor y a voids this failure, as demonstrated on RetinaMNIST (49.30% vs. 47.20% Centralized A ccuracy) and ISIC2019 (90.57% vs. 90.37% Cen tralized AUR OC). F ederatedF actory 10 T able 2: Resource T rade-off Analysis. T otal computational ov erhead (FLOPs) vs. communication v olume (MBs) across the five benchmarks. Method CIF AR Bloo dMNIST P athMNIST RetinaMNIST ISIC2019 FLOPs MBs FLOPs MBs FLOPs MBs FLOPs MBs FLOPs MBs F edA vg [36] 1 . 95 × 10 17 358,899.6 3 . 57 × 10 16 287,069.6 2 . 69 × 10 17 322,981.5 3 . 22 × 10 15 179,371.6 3 . 11 × 10 17 287,163.4 F edDyn [1] 1 . 95 × 10 17 358,899.6 3 . 57 × 10 16 287,069.6 2 . 69 × 10 17 322,981.5 3 . 22 × 10 15 179,371.6 3 . 11 × 10 17 287,163.4 F edProx [33] 1 . 95 × 10 17 358,899.6 3 . 57 × 10 16 287,069.6 2 . 69 × 10 17 322,981.5 3 . 22 × 10 15 179,371.6 3 . 11 × 10 17 287,163.4 Scaffold [27] 1 . 95 × 10 17 717,799.1 3 . 57 × 10 16 574,139.3 2 . 69 × 10 17 645,962.9 3 . 22 × 10 15 358,743.2 3 . 11 × 10 17 574,326.8 FedF act (Server) 7 . 30 × 10 18 1,934.1 4 . 47 × 10 18 1,547.3 5 . 03 × 10 18 1,740.7 2 . 79 × 10 18 967.0 2 . 52 × 10 20 1,435.3 FedF act (Local) 8 . 35 × 10 18 19,340.6 4 . 97 × 10 18 12,378.0 5 . 67 × 10 18 15,665.9 2 . 97 × 10 18 4,835.2 2 . 54 × 10 20 11,482.0 D I N O V 2 ( B l o c k 8 ) t - S N E C I F A R 1 0 B l o o d M N I S T R e t i n a M N I S T P a t h M N I S T I S I C 2 0 1 9 Real Sample Synthetic Sample (FederatedFactory) S y n t h e t i c R e a l Fig. 6: Real vs . Synthetic Data in Representation Space. T op: DINOv2 (Blo ck 8) feature space [37]. F ederatedF actory ( ▲ ) represen ts the true target manifolds data ( • ), preserving go o d per- formance. Bottom: Visual comparisons across datasets. Columns pair real and generated samples of the same class, demonstrating accurate morphological preserv ation without memorization.  T ak ea wa y I II: Federa tedF actor y fixes the Single-Class Silo collapse, and it also manages to matc h the centralized data-p ooled upp er-b ound baseline, under more difficult constrain ts. Communic ation-Computation A naysis. T able 2 highlights Federa tedF a ctor y ’s shift from a bandwidth-bound to a compute-bound regime. T raditional FL baselines minimize edge compute but incur communication p enalties from iterativ e conv ergence. Bidirectional parameter transmis- sions across 200 rounds create severe b ottlenecks, demanding hundreds of gigabytes (e.g., 358,899.6 MB on CIF AR-10 ), a vulnerability SCAFFOLD strictly doubles for control v ariates. Conv ersely , Federa tedF actor y employs a OSFL protocol, collapsing net work exposure and ac hieving a 99.4% communication reduction (1,934.1 MB on CIF AR-10 ). This zero-dependency design incurs a “generativ e tax” in lo cal FLOPs, scaling computational load from ∼ 10 17 in baselines to ∼ 10 18 (up to ∼ 10 20 for ISIC2019 ). Ultimately , trading computational o verhead for reduced communi- cation costs is justified in cross-silo clinical settings, where mo del trustw orthiness tak es precedence o ver computational exp ense. Qualitative A nalysis and Manifold A lignment. T o v erify that F actories correctly appro x- imate the lo calized manifolds without memorization [5], w e pro ject the generated distribution ˆ D syn and the true target distribution D union in to a joint feature space, presen ted alongside qual- itativ e samples in Figure 6. Visual comparisons (Fig. 6, T op) demonstrate Federa tedF actor y successfully syn thesizes complex morphologies (e.g., ISIC2019 textures, BloodMNIST cells) ex nihilo . T o v alidate this structural fidelity , we pro ject intermediate DINOv2 represen tations (Blo c k 8) [37] via t-SNE (Fig. 6, Bottom), a feature space prov en effective for isolating generativ e ar- tifacts [24]. The synthetic priors nativ ely span the true empirical distribution without discrete F ederatedF actory 11 co ordinate o verlaps. While not a formal priv acy guarantee, this empirically confirms con tinuous div ersity and functional manifold mapping ov er trivial memorization. 6 Conclusion As federated learning (FL) scales [26], resolving extreme statistical heterogeneity is critical [21, 51]. T o b ypass the collapse of standard parameter aggregation under single-class silos [36], Federa t- edF actor y transfers lo calized generativ e priors rather than discriminative gradients. By using indep enden t generativ e modules to synthesize class-balanced global datasets ex nihilo [20, 28], our zero-dep endency framework eliminates gradient conflict and external prior bias [16, 23]. This consisten tly recov ers cen tralized upp er-bound p erformance where standard metho ds fail. F or in- stance, under strictly disjoint silos, it lifts CIF AR-10 accuracy from a collapsed 11 . 36% (F edA vg) to 90 . 57% (matching the cen tralized bound), and impro v es ISIC2019 AUR OC from 47 . 31% to 90 . 57% [43]. Remark ably , it achiev es this while slashing communication ov erhead by 99 . 4% (from 358 , 899 . 6 MB to just 1 , 934 . 1 MB on CIF AR-10) [36]. While traditional FL a verages lo cal tra jectories, our results prov e that transferring the un- derlying data manifold appro ximation is fundamentally more robust for disjoin t label supp orts. This paradigm shift encourages transmitting localized generative mo dels o ver fragile discrimina- tiv e b oundaries. F urthermore, structuring the global mo del as a discrete union of these generativ e mo dules inherently facilitates exact mo dular unlearning [6, 44]. Excising a client’s sp ecific gener- ativ e parameters guarantees exact data erasure, effortlessly complying with right-to-be-forgotten mandates without retraining the global ensem ble [12]. Limitations. T rading communication for compute b ottlenecks in cross-silo settings in tro duces a substantial “generative tax.” Hardware profiling sho ws lo cal computational loads scale by roughly an order of magnitude (e.g., 1 . 95 × 10 17 to 8 . 35 × 10 18 FLOPs on CIF AR-10). F urthermore, theoretical con vergence assumes local diffusion models conv erge without mere memorization [5]. While feature space pro jections empirically confirm contin uous div ersit y , the framew ork lac ks formal priv acy guaran tees (e.g., Differential Priv acy) [9] against adv anced data extraction [7] or mem b ership inference attac ks on transmitted priors [22]. Broader Impacts. Standard FL fragilit y endangers multi-institutional collab orations, espe- cially in clinical imaging where data so vereign t y is fundamen tal [40, 41]. Federa tedF a ctor y pro vides a robust, zero-dep endency alternativ e. Moreov er, organizing the global mo del as a dis- crete Generativ e Matrix inherently supp orts exact mo dular unlearning [6, 44], ensuring compliance with stringent data priv acy regulations like the Righ t to b e F orgotten and strictly protecting lo- calized data righ ts [12]. References 1. A car, D.A.E., Zhao, Y., Matas, R., Mattina, M., Whatmough, P ., Saligrama, V.: F ederated learning based on dynamic regularization. In: International Conference on Learning Representations (ICLR) (2021), https://openreview.net/forum?id=B7v4QMR6Z9w 2. Beitollahi, M., Bie, A., Hemati, S., Brunswic, L.M., Li, X., Chen, X., Zhang, G.: Parametric feature transfer: One-shot federated learning with foundation mo dels. arXiv preprint arXiv:2402.01862 (2024) 3. Bińk owski, M., Sutherland, D.J., Arb el, M., Gretton, A.: Demystifying MMD GANs. In: International Conference on Learning Representations (ICLR) (2018) 4. Blattmann, A., Rom bach, R., Oktay , K., Ommer, B.: Semi-parametric neural image synthesis (2022). https://doi.org/10.48550/arXiv.2204.11824 , 5. Bonnaire, T., Urfin, R., Biroli, G., Mézard, M.: Wh y diffusion models don’t memorize: The role of implicit dynamical regularization in training (2025), 6. Bourtoule, L., Chandrasek aran, V., Choquette-Cho o, C.A., Jia, H., T ra vers, A., Zhang, B., Lie, D., P ap ernot, N.: Machine unlearning. In: IEEE Symp osium on Securit y and Priv acy (SP). pp. 141–159 (2021) 7. Carlini, N., Ha yes, J., Nasr, M., Jagielski, M., Sehw ag, V., T ramèr, F., Balle, B., Ipp olito, D., W allace, E.: Extracting training data from diffusion mo dels. In: Pro ceedings of the 32nd USENIX Conference on Security Symp osium. SEC ’23, USENIX Asso ciation, USA (2023) 8. Da vidson, T.R., F alorsi, L., Cao, N.D., Kipf, T., T omczak, J.M.: Hyp erspherical v ariational auto- enco ders. In: Uncertaint y in Artificial Intelligence (UAI) (2018), F ederatedF actory 12 9. Dw ork, C., Roth, A.: The algorithmic foundations of differential priv acy . F ound. T rends Theor. Com- put. Sci. 9 (3–4), 211–407 (Aug 2014). https: / /doi .org /10 .1561 / 0400000042 , https: // doi .org / 10.1561/0400000042 10. Geirhos, R., Jaini, P ., Stone, A., Medapati, S., Yi, X., T oderici, G., Ogale, A., Shlens, J.: T ow ards flex- ible perception with visual memory . In: F orty-second International Conference on Machine Learning (2025), https://openreview.net/forum?id=dMYL47aQwb 11. Geirhos, R., Rubisch, P ., Mic haelis, C., Bethge, M., Wic hmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased to wards texture; increasing shap e bias improv es accuracy and robustness. In: In- ternational Conference on Learning Represen tations (2019), https: / / openreview . net /forum ? id = Bygh9j09KX 12. Golatk ar, A., Ac hille, A., Soatto, S.: Eternal sunshine of the spotless net: Selective forgetting in deep net works. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and P attern Recognition (CVPR). pp. 9304–9312 (2020) 13. Go odfellow, I., Po uget-Abadie, J., Mirza, M., Xu, B., W arde-F arley , D., Ozair, S., Courville, A., Bengio, Y.: Generative adv ersarial nets. In: A dv ances in Neural Information Processing Systems (NeurIPS). vol. 27 (2014) 14. Guha, N., T alw alk ar, A., Smith, V.: One-shot federated learning. In: arXiv preprin t (2019), presented at NeurIPS 2019 W orkshop on F ederated Learning 15. Hahn, S.J., Lee, J.: Diffusion federated dataset. In: OpenReview (2024), https ://openreview .net/ forum?id=1GCWcrZTX8 16. He, H., Xiang, S., Zhang, Y., Zhu, Y., Zhang, J., Lu, Y., Deng, H., Alsentzer, E., Chen, Q., Y u, K.H., et al.: Ai-generated data contamination ero des pathological v ariability and diagnostic reliability . medRxiv (2026) 17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016) 18. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Ho chreiter, S.: Gans trained b y a tw o time- scale up date rule conv erge to a lo cal nash equilibrium. In: Adv ances in Neural Information Pro cessing Systems (NeurIPS). vol. 30 (2017) 19. Hin ton, G.E.: T raining pro ducts of exp erts by minimizing contrastiv e divergence. Neural Computation 14 (8), 1771–1800 (2002), https : // www .cs . toronto. edu /~hinton / absps/ training - products- of - experts- by- minimizing- contrastive- divergence.pdf 20. Ho, J., Jain, A., Abbeel, P .: Denoising diffusion probabilistic mo dels. In: A dv ances in Neural Infor- mation Pro cessing Systems (NeurIPS). vol. 33, pp. 6840–6851 (2020) 21. Hsu, T.M.H., Qi, H., Brown, M.: Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335 (2019), 22. Hu, H., P ang, J.: Membership inference of diffusion models (2023), https :// arxiv.org /abs/ 2301. 09956 23. Huix, J.P ., Ganeshan, A.R., Haslum, J.F., Söderb erg, M., Matsouk as, C., Smith, K.: Are natural domain foundation models useful for medical image classification? In: Proceedings of the IEEE/CVF Win ter Conference on Applications of Computer Vision. pp. 7977–7987 (2024) 24. In ternò, C., Geirhos, R., Olhofer, M., Liu, S., Hammer, B., Klindt, D.: AI-generated video detection via p erceptual straigh tening. In: The Thirty-nin th Annual Conference on Neural Information Processing Systems (2025), https://openreview.net/forum?id=LsmUgStXby 25. Jamali-Rad, H., Ab dizadeh, M., Singh, A.: F ederated learning with taskonom y for non-iid data. arXiv (2021). https://doi.org/10.48550/arxiv.2103.15947 26. Kairouz, P ., McMahan, H.B.: A dv ances and op en problems in federated learning. F oundations and T rends in Machine Learning 14 (1-2), 1–210 (06 2021). https : / / doi . org / 10 . 1561 / 2200000083 , https://doi.org/10.1561/2200000083 27. Karimireddy , S.P ., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: Stochastic con- trolled av eraging for federated learning. In: In ternational Conference on Machine Learning (ICML). pp. 5132–5143. PMLR (2020), https://proceedings.mlr.press/v119/karimireddy20a.html 28. Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative mo dels. In: Adv ances in Neural Information Pro cessing Systems (NeurIPS). vol. 35, pp. 26565–26577 (2022), 29. Karras, T., Aittala, M., Leh tinen, J., Hellsten, J., Aila, T., Laine, S.: Analyzing and impro ving the training dynamics of diffusion mo dels. In: Pro ceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 24168–24178 (2024), https : / / openaccess . thecvf.com/content/ CVPR2024 /html/Karras_Analyzing_and_Improving _the_Training_Dynamics_ of_Diffusion_Models_CVPR_2024_paper.html 30. Kingma, D.P ., W elling, M.: Auto-enco ding v ariational ba yes. In: International Conference on Learning Represen tations (ICLR) (2013), arXiv preprint F ederatedF actory 13 31. Krizhevsky , A.: Learning multiple lay ers of features from tiny images. T ec h. rep., Universit y of T oronto (2009), https://www.cs.toronto.edu/~kriz/learning- features- 2009- TR.pdf 32. Li, Q., He, B., Song, D.: Practical one-shot federated learning for cross-silo setting. In: Pro ceedings of the Thirtieth International Join t Conference on Artificial Intelligence (IJCAI). pp. 1484–1490 (2021) 33. Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., T alwalk ar, A., Smith, V.: F ederated optimization in het- erogeneous netw orks. In: Pro ceedings of Mac hine Learning and Systems (MLSys). vol. 2, pp. 429–450 (2020), https: //proceedings. mlsys.org/ paper/2020/ hash/1f5fe83998a09396ebe6477d9475ba0c- Abstract.html 34. Loshc hilov, I., Hutter, F.: Sgdr: Sto c hastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) (2016), 35. Ma, J., He, Y., Li, F., Han, L., Y ou, C., W ang, B.: Segmen t anything in medical images. Nature Comm unications 15 (1), 654 (2024) 36. McMahan, B., Mo ore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficien t learning of deep netw orks from decentralized data. In: Proceedings of the 20th In ternational Conference on Artificial Intelligence and Statistics (AIST A TS). PMLR (2017) 37. Oquab, M., Darcet, T., Moutak anni, T., V o, H., Szafraniec, M., Khalidov, V., F ernandez, P ., Haziza, D., Massa, F., El-Noub y , A., et al.: Dinov2: Learning robust visual features without sup ervision. arXiv preprin t arXiv:2304.07193 (2023) 38. Qiu, L., Ann unziata, D., Giampaolo, F., Piccialli, F.: F edSDE: Self-distillation with diffu- sion enhanced for one-shot federated learning. In: 2025 IEEE International Conference on Big Data (BigData) (2025), https : / / github . com / Lynn0925 / FedSDE , iEEE Presentation Video: h ttps://www.computer.org/csdl/video-library/video/2dOxHtF y8UM 39. Radford, A., Kim, J.W., Hallacy , C., Ramesh, A., Goh, G., Agarwal, S., Sastry , G., Askell, A., Mishkin, P ., Clark, J., Krueger, G., Sutskev er, I.: Learning transferable visual mo dels from natural language su- p ervision. In: International Conference on Machine Learning (2021), https://api.semanticscholar. org/CorpusID:231591445 40. Riek e, N., Hanco x, J., Li, W., Milletari, F., Roth, H., Albarqouni, S., Bak as, S., Galtier, M.N., Land- man, B., Maier-Hein, K., Ourselin, S., Sheller, M., Summers, R.M., T rask, A., Xu, D., Baust, M., Cardoso, M.J.: The future of digital health with federated learning. np j Digital Medicine volume 3, Article num b er: 119 (2020) (2020). https://doi.org/10.1038/s41746- 020- 00323- 1 41. Sheller, M.J., Edwards, B., Reina, G.A., Martin, J., P ati, S., Kotrotsou, A., Milc henko, M., Xu, W., Marcus, D., Colen, R.R., Bak as, S.: F ederated learning in medicine: facilitating multi-institutional collab orations without sharing patien t data. Scien tific Reports 10 (2020). https : / / doi . org / 10 . 1038/s41598- 020- 69250- 1 42. Szegedy , C., V anhouck e, V., Ioffe, S., Shlens, J., W o jna, Z.: Rethinking the inception architecture for computer vision. In: Pro ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2818–2826 (2016) 43. du T errail, J.O., A y ed, S.S., Cyffers, E., Grimberg, F., He, C., Loeb, R., Mangold, P ., Marc hand, T., Marfoq, O., Mushtaq, E., Muzellec, B., Philippenko, C., Silv a, S., T eleńczuk, M., Alatur, S., Berry , A., Dieudonné, A., Michele, M., Gouin, A., Y u, D., Bellet, A., Bach, F., Quellec, G., Lorenzi, M., Dieulev eut, A., Jaggi, M., Karimireddy , S.P ., Hartley , M.A., Andreux, M.: Flam by: Datasets and b enchmarks for cross-silo federated learning in realistic healthcare settings. In: Pro ceedings of the 36th Conference on Neural Information Pro cessing Systems (NeurIPS 2022) Datasets and Benchmarks T rack (2022), 44. Y an, H., Li, X., Guo, Z., Li, H., Li, F., Lin, X.: ARCANE: An efficient architecture for exact machine unlearning. In: Pro ceedings of the Thirt y-First International Joint Conference on Artificial In telligence (IJCAI). pp. 4006–4013 (2022) 45. Y ang, J., Shi, R., W ei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medmnist v2-a large-scale ligh tw eight b enc hmark for 2d and 3d biomedical image classification. Scientific Data 10 (1), 41 (2023) 46. Y ang, M., Su, S., Li, B., Xue, X.: One-shot heterogeneous federated learning with lo cal mo del-guided diffusion models. In: International Conference on Mac hine Learning (ICML) (2023), https://arxiv. org/abs/2311.08870 47. Y ang, M., Su, S., Li, B., Xue, X.: F eddeo: Description-enhanced one-shot federated learning with diffusion models. In: Pro ceedings of the 32nd ACM International Conference on Multimedia. pp. 6666–6675. MM ’24, Association for Computing Mac hinery , New Y ork, NY, USA (2024). https : //doi.org/10.1145/3664647.3681490 , https://doi.org/10.1145/3664647.3681490 48. Y u, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for m ulti-task learning. In: Adv ances in Neural Information Pro cessing Systems (NeurIPS). v ol. 33 (2020) 49. Zaland, O., Jin, S., P okorn y , F.T., Bhuy an, M.: One-shot federated learning with classifier-free dif- fusion models. In: 2025 IEEE In ternational Conference on Multimedia and Exp o (ICME). pp. 1–6. IEEE (2025) F ederatedF actory 14 50. Zhang, S., Metaxas, D.N.: On the c hallenges and p erspectives of foundation mo dels for medical image analysis. Medical Image Analysis 91 , 102996 (2024) 51. Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: F ederated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018), F ederatedF actory 15 App endix Con ten ts A F ederatedF actory Pseudo code 15 B Qualitativ e Examples of Diffusion Generated Images 15 C Empirical Cum ulative Distribution F unction and t-SNE Analysis 15 D A dditional F ederatedF actory Performance Analysis 17 E A dditional F ederatedF actory Cost Analysis 18 F Image Generation Metrics (FID, KID) 19 G Computational En vironments 20 H Datasets and Sources 20 A F ederatedF actory Pseudoco de F or repro ducibilit y , w e formalize the complete Federa tedF actor y op erational protocol in Al- gorithm 1. The form ulation unites both the Cen tralized and Decentralized architectures in to a single execution graph, con trolled by the structural indicator T . B Qualitativ e Examples of Diffusion Generated Images T o visually v alidate the fidelity and div ersity of the Federa tedF a ctor y syn thesis, w e perform a nearest-neigh b or analysis across all ev aluated domains. F or each class, w e randomly sample laten t vectors z ∼ N ( 0 , I ) to generate synthetic images via the localized EDM2 [29] F actory . W e then compute the pixel-wise Euclidean (L2) distance across the lo cal training manifold to find each generated sample’s closest real counterpart. Figures 12, 13, 14, 15, and 16 presen t these comparisons. The top rows display the synthetic samples, while the b ottom ro ws show their nearest real neigh b ors with the embedded L2 distances ( d ). A cross b oth lo w-resolution standard b enc hmarks ( CIF AR-10 ) and high-resolution medical mo dalities ( ISIC2019 ), the generativ e priors successfully capture the underlying seman tic mor- phologies and textures. Imp ortan tly , structural differences in p oses, backgrounds, and b oundaries b et w een the synthetic images and their real matches confirm that the mo del syn thesizes div erse data rather than memorizing the training set. C Empirical Cum ulativ e Distribution F unction and t-SNE Analysis W e ev aluate Federa tedF a ctor y ’s manifold mapping by pro jecting generated ( ˆ D syn ) and lo cal training samples ( D union ) into 2D via t-SNE. As Fig. 7 shows, synthetic data consisten tly p opulates the same macroscopic regions as real data across all datasets. Crucially , despite this global semantic alignmen t, there is no p oin t-to-p oin t o verlap. W e further formalize the ev aluation of inter-class fidelit y and in tra-class div ersity through Empirical Cum ulativ e Distribution F unction (ECDF) curv es and nearest-neighbor distance histograms (Fig. 8). – Fidelit y (Distance to Real): Quan tified b y the minim um Euclidean (L2) distance in the feature space from eac h synthetic sample to its absolute nearest real neighbor ( ˆ x → x ). – Div ersity (Distance to Generated): Measured by the L2 distance strictly among the generated samples themselv es ( ˆ x i → ˆ x j ), useful to iden tify center collapse. F ederatedF actory 16 Algorithm 1 Federa tedF actor y : Unified Global Optimization Proto col Require: Netw ork of K clients, priv ate lo cal datasets D k , corresp onding local label supp orts Y k . Require: Architecture T ∈ { Centralized , Decentralized } . Require: Generative ep o c hs E gen , Discriminative ep o c hs E disc , Generation quotas Q k . Require: Untrained generative arc hitectures { G θ k } K k =1 . Phase I: Lo cal Generative Prior Optimization (Async hronous) 1: for each client k ∈ { 1 , . . . , K } in parallel do 2: for e = 1 to E gen do 3: Sample local empirical batch x ∼ D k 4: Compute diffusion ob jective: L ELBO ( θ k ; x ) ▷ Minimizes KL( p k ∥ p θ k ) b ounded b y ϵ k 5: Up date lo cal generativ e prior: θ k ← θ k − η ∇ θ k L ELBO 6: end for 7: end for Phase I I: Conditional Architectural Execution & Discriminativ e T raining 8: if T == Centralized then ▷ T rusted Aggregator A v ailable 9: Clien ts transmit optimized parameters θ k to the Server ▷ Satisfies C rounds = 1 10: Serv er initializes empt y global syn thetic dataset: ˆ D syn ← ∅ 11: for each received mo del G θ k ∈ Θ do 12: for i = 1 to Q k do 13: Sample latent noise v ector z ∼ N ( 0 , I ) 14: Synthesize counterfactual mapping ˆ x ← G θ k ( z ) 15: ˆ D syn ← ˆ D syn ∪ { ( ˆ x , y k ) } ▷ Ex nihilo syn thesis without external FM 16: end for 17: end for 18: Serv er initializes cen tralized classifier w 19: Optimize w on ˆ D syn for E disc ep ochs using standard empirical risk minimization 20: else if T == Decentralized then ▷ T rustless Decentralized Mesh 21: for each client k ∈ { 1 , . . . , K } in parallel do 22: Broadcast θ k to all v alid p eers j ∈ { 1 , . . . , K } \ { k } 23: Receive complemen t priors Θ \ k = { θ j } j  = k 24: Initialize local h ybrid dataset: D mix k ← D k 25: for eac h received complement mo del G θ j ∈ Θ \ k do 26: Generate Q j samples from G θ j ( z ) and app end mappings to D mix k 27: end for 28: Initialize local expert classifier w k 29: Optimize w k exclusiv ely on D mix k for E disc ep ochs 30: end for 31: end if Phase I I I: Distributed Global Inference 32: pro cedure Inference (T arget sample x target ) 33: if T == Centralized then 34: return p ( y | x target ; w ) 35: else if T == Decentralized then 36: Gather local unnormalized probabilities: p k = f w k ( x target ) , ∀ k ∈ { 1 , . . . , K } 37: Compute join t consensus: p joint ( y ) = Q K k =1 p k ( y | x target ) 38: Compute partition function: Z = P y ′ p joint ( y ′ ) 39: return Renormalized Pro duct of Exp erts: p PoE ( y | x target ) = 1 Z p joint ( y ) 40: end if 41: end pro cedure F ederatedF actory 17 2D t-SNE CIF AR BloodMNIST RetinaMNIST PathMNIST ISIC2019 Class 0 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Real Sample Synthetic Sample Fig. 7: Global Manifold Alignment in 2D t-SNE Subspace. F eature space pro jections comparing true lo calized training ( D union , ◦ ) and synthesized global ( ˆ D syn , △ ) distributions across all five benchmarks. 7.5 10.0 12.5 15.0 17.5 Dist to Real 0.0 0.2 0.4 0.6 0.8 1.0 Fidelity (ECDF) CIFAR 6 8 10 12 14 16 Dist to Real 0.0 0.2 0.4 0.6 0.8 1.0 BloodMNIST 6 8 10 12 Dist to Real 0.0 0.2 0.4 0.6 0.8 1.0 RetinaMNIST 5 10 15 Dist to Real 0.0 0.2 0.4 0.6 0.8 1.0 PathMNIST 5 10 15 20 Dist to Real 0.0 0.2 0.4 0.6 0.8 1.0 ISIC2019 7.5 10.0 12.5 15.0 17.5 Dist to Real Fidelity (Dist) 6 8 10 12 14 16 Dist to Real 6 8 10 12 Dist to Real 5 10 15 Dist to Real 5 10 15 20 Dist to Real 7.5 10.0 12.5 15.0 17.5 20.0 Dist to Gen 0.0 0.2 0.4 0.6 0.8 1.0 Diversity (ECDF) 5 10 15 Dist to Gen 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Dist to Gen 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Dist to Gen 0.0 0.2 0.4 0.6 0.8 1.0 5.0 7.5 10.0 12.5 15.0 Dist to Gen 0.0 0.2 0.4 0.6 0.8 1.0 7.5 10.0 12.5 15.0 17.5 20.0 Dist to Gen Diversity (Dist) 5 10 15 Dist to Gen 0 2 4 6 8 10 Dist to Gen 0 5 10 15 Dist to Gen 5.0 7.5 10.0 12.5 15.0 Dist to Gen Class 0 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Fig. 8: Quantitativ e Manifold Alignmen t via ECDF and Densit y Histograms. Ev aluation of in ter-class Fidelity (top rows: L2 distance from synthetic samples to their nearest real neigh b ors) and in tra-class Diversity (b ottom rows: L2 distance strictly among synthetic samples) across five b enc hmarks. Steep Fidelit y ECDF curves confirm that syn thetic distributions tightly bind to the real data manifold with minimal Out-of-Distribution (OOD) deviation. Simultaneously , diversit y ECDF s and density histograms sho ws how the generativ e factories main tain broad contin uous supp ort without mode collapse. These curves also reflect in trinsic domain v ariances: tightly group ed dis- tributions in RetinaMNIST [45] mirror the low structural v ariance of retinal crops, while broader ECDF s in ISIC2019 [43] and CIF AR [31] capture their high visual heterogeneity . Ultimately , these metrics mathematically pro ve Federa tedF actor y captures the true underlying data supp ort de- spite extreme single-class lab el skew. D A dditional F ederatedF actory Performance Analysis T o illustrate the catastrophic failure of standard parameter aggregation [36] and our framew ork’s subsequen t reco v ery , Fig. 9 compares p erformance magnitudes across all fiv e benchmarks. Under the pathological Single-Class Silo regime, severe gradien t interference causes baselines (F edA vg F ederatedF actory 18 0 20 40 60 80 100 Accuracy (%) CIF AR +72% BloodMNIST +69% RetinaMNIST +6% P athMNIST +37% ISIC2019 +22% 0 20 40 60 80 100 A UROC (%) +32% +31% +23% +29% +43% F edA vg F edDyn F edP r o x Scaffold F ederatedF actory (Centralized) F ederatedF actory (Decentralized) Data-P ooled Upper Bound Fig. 9: Absolute Performance Recov ery under Single-Class Silos . Grouped bar chart detailing the accuracy and AUR OC across the ev aluated datasets. The gra y shaded region represents the col- lapsed p erformance ceiling of standard iterativ e federated learning metho ds. Federa tedF a ctor y (in b oth Cen tralized and Decentralized mo des) successfully escap es this collapsed regime, yielding im- pro vemen ts (e.g., matching the Cen tralized Upper Bound ). [36], F edDyn [1], F edProx [33], SCAFFOLD [27]) to collapse in to near-random or ma jorit y-class predictions [48]. The barplot visualizes the ∆ impro vemen t (arro ws) from the b est iterative baseline to Federa tedF actor y . Ultimately , our zero-dep endency approac h completely bridges this gap, restoring predictiv e metrics to the theoretical data-p o oled upp er b ound. E A dditional F ederatedF actory Cost Analysis Federa tedF actor y transitions from a bandwidth-b ound optimization regime to a compute- b ound one in order to achiev e higher mo del trustw orthiness. W e visualize this “Compute-for- Bandwidth Sw ap” via log-log scatter plot in Fig. 10. Standard FL [1, 27, 33, 36] min imizes lo cal compute ( ∼ 10 16 – 10 17 FLOPs) but requires massive iterativ e comm unication ( > 10 5 MB). Conv ersely , Federa tedF a ctor y employs a One-Shot FL (OSFL) proto col, trading a “generative tax” in local compute ( ∼ 10 18 – 10 20 FLOPs) for a > 99 . 4% reduction in netw ork pa yload ( ∼ 10 3 MB). This compute-for-bandwidth trade-off is ideal for data-silo ed clinical consortiums p ossessing abundant lo cal compute. 1 0 3 1 0 4 1 0 5 1 0 6 Communication (MB) 1 0 1 5 1 0 1 6 1 0 1 7 1 0 1 8 1 0 1 9 1 0 2 0 1 0 2 1 T otal Computations (FLOPs) CIF AR 1 0 3 1 0 4 1 0 5 1 0 6 Communication (MB) BloodMNIST 1 0 3 1 0 4 1 0 5 1 0 6 Communication (MB) RetinaMNIST 1 0 3 1 0 4 1 0 5 1 0 6 Communication (MB) PathMNIST 1 0 3 1 0 4 1 0 5 1 0 6 Communication (MB) ISIC2019 FedA vg FedDyn FedProx Scaf fold FederatedFactory (Centralized) FederatedFactory (Decentralized) Fig. 10: The Compute-for-Bandwidth Sw ap. Log-log scatter comparing computational o verhead v ersus netw ork exp osure. Iterative baselines ( F edA vg ◦ , F edDyn □ , F edProx △ , Scaffold ⋄ ) cluster b ottom-righ t (lo w compute, massive bandwidth). Con versely , Federa tedF actor y ( Centralized + , Decen tralized ⋆ ) shifts to the top-left, accepting higher local FLOPs to achiev e higher trustw orthiness. F ederatedF actory 19 F Image Generation Metrics T o quan tify synthesized data quality , fidelity , and diversit y , w e ev aluate generated ( ˆ D syn ) against target distributions ( D union ) using F réc het Inception Distance (FID) [18] and Kernel Inception Distance (KID) [3]. These pro ject images into the Inception-V3 activ ation space [42] to measure statistical div ergence betw een real and syn thetic manifolds, impro ving up on naive pixel-space distances. As T ab. 3 and Fig. 11 illustrates, p er-class FID and KID distributions confirm Fed- era tedF actor y captures coheren t semantic structures. On natural domains ( CIF AR-10 [31]), p er-class FID a verages ∼ 28.8, with KID reliably < 0 . 02 ( × 100 ). Similarly , high-resolution der- matoscopic lesions ( ISIC2019 [43]) main tain strong manifold alignmen t with median FIDs ∼ 45 and tightly group ed KIDs. Benchmarks with complex, sparse cellular morphologies ( P a thMNIST , BloodMNIST [45]) exhibit sligh tly higher v ariances. T able 3: P er-Class Generative Alignmen t Metrics. Quan titative ev aluation of the synthesized images via FID and KID (scaled by 10 2 ). Lo wer v alues indicate b etter manifold alignment with the real target distribution. V alues are rounded to t wo decimal places for readabilit y . F or CIF AR, classes 0–9 corresp ond to airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck, respectively . Dataset Metric Class 0 1 2 3 4 5 6 7 8 9 CIF AR FID ( ↓ ) 29.07 22.50 27.28 46.67 22.10 35.40 41.02 15.01 19.32 25.75 KID ( ↓ ) 1.71 1.38 1.59 3.48 1.50 2.16 3.09 0.58 1.00 1.85 Blo odMNIST FID ( ↓ ) 84.01 61.54 45.08 50.37 60.58 66.42 50.78 27.19 – – KID ( ↓ ) 10.76 8.83 4.63 6.33 7.54 7.69 6.76 2.97 – – RetinaMNIST FID ( ↓ ) 38.49 58.40 38.77 44.09 58.88 – – – – – KID ( ↓ ) 3.52 3.03 2.85 2.90 3.91 – – – – – P athMNIST FID ( ↓ ) 38.20 33.85 77.25 103.85 57.24 64.21 64.68 79.69 70.86 – KID ( ↓ ) 3.83 2.92 8.95 14.33 6.75 7.01 7.63 10.43 8.94 – ISIC2019 FID ( ↓ ) 38.80 41.18 30.01 40.91 45.54 76.67 82.39 55.55 – – KID ( ↓ ) 1.59 2.18 1.46 1.23 1.90 1.85 1.76 1.92 – – CIF AR BloodMNIST RetinaMNIST PathMNIST ISIC2019 20 40 60 80 100 FID Score Per-Class Distribution of Image Fidelity CIF AR BloodMNIST RetinaMNIST PathMNIST ISIC2019 0 2 4 6 8 10 12 14 KID (×100) Per-Class Distribution of Kernel Inception Distance Density (V iolin) Individual Class (Strip) Median (Box) Fig. 11: Quantitativ e Manifold Alignment via FID and KID. Per-class distributions of FID (left) and KID (right). The raincloud plots visualize the density , ra w data p oints, and median scores. F ederatedF actory 20 These quantitativ e metrics represen t a strict lo wer b ound on the framework’s synthesis capa- bilit y . T o maintain computational tractability on distributed no des, we deploy ed a heavily down- scaled EDM2 [29] arc hitecture (128 embedding dimension, 32 sampling steps) and faced severe long-tail class imbalances (e.g., ISIC2019 ). Despite these sub-optimal conditions, Federa ted- F actor y syn thesizes counterfactuals matc hing cen tralized upp er-bound classification accuracy (Sec. 5). Because global discriminative performance is mathematically bounded by local gener- ativ e error (Theorem 1), ac hieving robust do wnstream classification with non-ideal generators guaran tees significan t upside. As lo cal compute scales, enabling larger foundation diffusion mo dels and extended training, ex nihilo syn thesis fidelity will prop ortionally improv e, yielding an even stronger global decision b oundary . G Computational En vironmen ts The core system is equipp ed with 2 × NVIDIA H100 and 4 × NVIDIA L40 GPUs. – NVIDIA H100 (80GB VRAM): The tw o H100 GPUs were exclusively dedicated to train- ing the lo calized generative priors, sp ecifically the EDM2 diffusion mo dels [28, 29]. – NVIDIA L40 (48GB VRAM): The four L40 GPUs were parallelized to handle server-side image synthesis and downstream ev aluations. Specifically , they w ere utilized to syn thesize the globally class-balanced datasets ex nihilo , and for training the global discriminative classifiers, i.e. the adaptiv e ResNet-50 [17] backbone. H Datasets and Sources W e gratefully ac knowledge the creators, c urators, and institutions behind the public datasets used to ev aluate Federa tedF actor y . The empirical v alidation of our metho d relies entirely on the public a v ailability of these b enc hmarks (CIF AR-10 [31], MedMNIST [45], ISIC2019 [43]) F ederatedF actory 21 Class 0 Synthetic Real d = 8 . 7 d = 6 . 8 d = 7 . 4 d = 7 . 9 d = 1 3 . 1 d = 5 . 2 d = 6 . 7 d = 7 . 8 d = 8 . 3 d = 1 0 . 8 Class 1 Synthetic Real d = 1 5 . 7 d = 1 0 . 1 d = 1 0 . 3 d = 1 3 . 2 d = 1 1 . 5 d = 1 2 . 9 d = 1 3 . 3 d = 1 0 . 9 d = 1 0 . 3 d = 1 0 . 6 Class 2 Synthetic Real d = 5 . 3 d = 7 . 5 d = 1 8 . 0 d = 1 2 . 1 d = 7 . 4 d = 8 . 7 d = 8 . 9 d = 5 . 5 d = 8 . 4 d = 6 . 0 Class 3 Synthetic Real d = 7 . 0 d = 8 . 2 d = 1 2 . 4 d = 1 0 . 3 d = 1 1 . 4 d = 6 . 1 d = 7 . 9 d = 1 0 . 6 d = 1 4 . 0 d = 7 . 0 Class 4 Synthetic Real d = 6 . 7 d = 5 . 7 d = 7 . 0 d = 7 . 7 d = 8 . 0 d = 7 . 6 d = 7 . 5 d = 6 . 3 d = 1 1 . 9 d = 7 . 2 Class 5 Synthetic Real d = 6 . 5 d = 8 . 5 d = 7 . 4 d = 7 . 0 d = 8 . 3 d = 7 . 8 d = 6 . 5 d = 7 . 1 d = 6 . 9 d = 8 . 0 Class 6 Synthetic Real d = 7 . 4 d = 6 . 9 d = 9 . 0 d = 1 2 . 2 d = 8 . 6 d = 1 2 . 6 d = 9 . 8 d = 4 . 6 d = 8 . 3 d = 7 . 9 Class 7 Synthetic Real d = 1 0 . 6 d = 9 . 4 d = 7 . 5 d = 1 2 . 2 d = 7 . 6 d = 9 . 3 d = 9 . 3 d = 8 . 5 d = 1 0 . 3 d = 9 . 1 Class 8 Synthetic Real d = 7 . 4 d = 6 . 5 d = 5 . 6 d = 9 . 5 d = 6 . 1 d = 9 . 5 d = 4 . 2 d = 9 . 6 d = 8 . 7 d = 1 0 . 4 Class 9 Synthetic Real d = 1 0 . 8 d = 8 . 9 d = 1 0 . 6 d = 9 . 0 d = 1 0 . 7 d = 9 . 4 d = 9 . 5 d = 1 4 . 8 d = 9 . 9 d = 1 2 . 3 Fig. 12: Nearest Neigh b or Analysis on CIF AR-10. Syn thetic samples (top rows) paired with their closest real counterparts (bottom ro ws). Class 9 has b een omitted to accommo date lay out constraints. F ederatedF actory 22 Class 0 Synthetic Real d = 6 . 8 d = 5 . 8 d = 6 . 8 d = 7 . 1 d = 6 . 8 d = 6 . 2 d = 6 . 6 d = 6 . 3 d = 5 . 4 d = 5 . 5 Class 1 Synthetic Real d = 6 . 2 d = 6 . 8 d = 4 . 9 d = 6 . 1 d = 8 . 6 d = 5 . 6 d = 5 . 1 d = 8 . 4 d = 7 . 1 d = 4 . 8 Class 2 Synthetic Real d = 5 . 6 d = 5 . 6 d = 5 . 0 d = 8 . 7 d = 5 . 1 d = 7 . 4 d = 5 . 2 d = 5 . 4 d = 7 . 4 d = 4 . 9 Class 3 Synthetic Real d = 5 . 9 d = 5 . 1 d = 6 . 1 d = 5 . 6 d = 7 . 9 d = 5 . 9 d = 6 . 9 d = 6 . 1 d = 7 . 7 d = 6 . 4 Class 4 Synthetic Real d = 5 . 6 d = 5 . 7 d = 6 . 0 d = 5 . 5 d = 6 . 7 d = 5 . 2 d = 7 . 0 d = 7 . 5 d = 5 . 4 d = 5 . 9 Class 5 Synthetic Real d = 7 . 3 d = 7 . 8 d = 6 . 5 d = 6 . 1 d = 5 . 8 d = 6 . 1 d = 5 . 2 d = 7 . 1 d = 6 . 1 d = 6 . 7 Class 6 Synthetic Real d = 6 . 1 d = 4 . 5 d = 6 . 4 d = 5 . 2 d = 6 . 5 d = 5 . 8 d = 6 . 2 d = 6 . 4 d = 6 . 6 d = 5 . 5 Class 7 Synthetic Real d = 7 . 0 d = 4 . 9 d = 5 . 6 d = 3 . 6 d = 3 . 6 d = 4 . 9 d = 4 . 1 d = 5 . 1 d = 5 . 7 d = 3 . 9 Fig. 13: Nearest Neighbor Analys is on Blo o dMNIST. The lo calized F actories successfully capture the distinct morphological features, shapes, and textures of different blo od cell t yp es. F ederatedF actory 23 Class 0 Synthetic Real d = 2 . 4 d = 6 . 3 d = 0 . 9 d = 2 . 9 d = 0 . 3 d = 3 . 9 d = 3 . 4 d = 3 . 9 d = 4 . 8 d = 4 . 4 Class 1 Synthetic Real d = 0 . 8 d = 3 . 0 d = 0 . 8 d = 4 . 4 d = 0 . 8 d = 0 . 7 d = 0 . 6 d = 1 . 2 d = 0 . 8 d = 2 . 1 Class 2 Synthetic Real d = 1 . 2 d = 4 . 6 d = 3 . 1 d = 3 . 8 d = 4 . 1 d = 6 . 3 d = 7 . 8 d = 5 . 8 d = 3 . 4 d = 6 . 1 Class 3 Synthetic Real d = 5 . 2 d = 4 . 5 d = 3 . 2 d = 3 . 9 d = 3 . 7 d = 5 . 3 d = 4 . 7 d = 6 . 3 d = 5 . 2 d = 4 . 1 Class 4 Synthetic Real d = 5 . 5 d = 7 . 1 d = 6 . 4 d = 4 . 8 d = 5 . 2 d = 4 . 1 d = 5 . 3 d = 5 . 8 d = 6 . 1 d = 5 . 5 Class 5 Synthetic Real d = 6 . 1 d = 4 . 6 d = 4 . 3 d = 5 . 3 d = 4 . 2 d = 3 . 4 d = 3 . 2 d = 3 . 4 d = 5 . 0 d = 3 . 5 Class 6 Synthetic Real d = 5 . 5 d = 6 . 8 d = 6 . 3 d = 5 . 9 d = 7 . 6 d = 6 . 7 d = 7 . 3 d = 7 . 2 d = 5 . 5 d = 7 . 0 Class 7 Synthetic Real d = 3 . 5 d = 3 . 9 d = 3 . 6 d = 3 . 0 d = 3 . 0 d = 3 . 6 d = 3 . 1 d = 3 . 2 d = 2 . 3 d = 4 . 9 Class 8 Synthetic Real d = 7 . 4 d = 8 . 4 d = 4 . 2 d = 6 . 1 d = 6 . 3 d = 4 . 3 d = 8 . 1 d = 4 . 2 d = 6 . 5 d = 4 . 1 Fig. 14: Nearest Neighbor Analysis on PathMNIST. Synthetic histological patches alongside their closest real training samples. F ederatedF actory 24 Class 0 Synthetic Real d = 7 . 3 d = 4 . 4 d = 6 . 7 d = 6 . 4 d = 7 . 6 d = 4 . 6 d = 6 . 2 d = 4 . 6 d = 9 . 0 d = 5 . 9 Class 1 Synthetic Real d = 4 . 8 d = 6 . 3 d = 5 . 4 d = 3 . 6 d = 1 0 . 9 d = 3 . 4 d = 4 . 3 d = 1 0 . 9 d = 5 . 3 d = 4 . 0 Class 2 Synthetic Real d = 3 . 7 d = 5 . 2 d = 6 . 5 d = 4 . 7 d = 3 . 5 d = 5 . 5 d = 7 . 7 d = 3 . 7 d = 6 . 8 d = 5 . 5 Class 3 Synthetic Real d = 6 . 0 d = 5 . 6 d = 3 . 0 d = 4 . 9 d = 3 . 9 d = 4 . 5 d = 8 . 5 d = 4 . 0 d = 6 . 4 d = 5 . 0 Class 4 Synthetic Real d = 6 . 7 d = 5 . 4 d = 6 . 4 d = 4 . 3 d = 7 . 9 d = 5 . 5 d = 6 . 0 d = 5 . 6 d = 3 . 1 d = 3 . 5 Fig. 15: Nearest Neighbor Analysis on RetinaMNIST. Despite the in trinsically lo w structural v ariance of retinal fundus crops, the generativ e mo dels maintain sufficient div ersity and do not trivially memorize the limited training support. F ederatedF actory 25 Class 0 Synthetic Real d = 2 8 . 8 d = 5 4 . 3 d = 1 0 . 1 d = 2 3 . 0 d = 5 8 . 2 d = 3 1 . 8 d = 2 7 . 9 d = 4 0 . 4 d = 1 5 . 0 d = 2 0 . 4 Class 1 Synthetic Real d = 1 0 . 8 d = 1 3 . 7 d = 1 8 . 6 d = 1 4 . 9 d = 2 0 . 3 d = 2 3 . 5 d = 1 0 . 9 d = 2 0 . 2 d = 5 6 . 8 d = 1 3 . 5 Class 2 Synthetic Real d = 1 2 . 4 d = 2 5 . 3 d = 1 7 . 3 d = 1 2 . 8 d = 1 2 . 7 d = 1 4 . 5 d = 1 2 . 1 d = 1 3 . 3 d = 1 4 . 6 d = 1 4 . 3 Class 3 Synthetic Real d = 2 8 . 5 d = 2 0 . 4 d = 1 4 . 0 d = 7 . 4 d = 1 9 . 5 d = 2 8 . 3 d = 1 6 . 0 d = 1 6 . 1 d = 1 3 . 4 d = 1 6 . 3 Class 4 Synthetic Real d = 1 1 . 4 d = 1 7 . 2 d = 1 0 . 8 d = 1 3 . 9 d = 1 5 . 6 d = 2 2 . 3 d = 5 0 . 2 d = 1 6 . 0 d = 1 8 . 4 d = 1 3 . 9 Class 5 Synthetic Real d = 2 8 . 0 d = 1 8 . 2 d = 1 0 . 3 d = 1 7 . 2 d = 6 3 . 4 d = 2 8 . 9 d = 1 8 . 5 d = 1 8 . 3 d = 1 6 . 4 d = 1 9 . 0 Class 6 Synthetic Real d = 5 7 . 0 d = 2 3 . 9 d = 1 9 . 9 d = 1 9 . 5 d = 3 2 . 7 d = 2 8 . 1 d = 1 7 . 5 d = 6 5 . 8 d = 3 3 . 2 d = 2 1 . 9 Class 7 Synthetic Real d = 2 0 . 4 d = 1 3 . 9 d = 4 7 . 6 d = 3 4 . 7 d = 1 5 . 1 d = 2 6 . 0 d = 2 2 . 3 d = 2 5 . 8 d = 2 7 . 5 d = 6 5 . 9 Fig. 16: Nearest Neigh b or Analysis on ISIC2019. High-resolution dermatoscopic lesions are syn- thesized with high fidelity .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment