Federated Learning with Multi-Partner OneFlorida+ Consortium Data for Predicting Major Postoperative Complications
Background: This study aims to develop and validate federated learning models for predicting major postoperative complications and mortality using a large multicenter dataset from the OneFlorida Data Trust. We hypothesize that federated learning mode…
Authors: Yuanfang Ren, Varun Sai Vemuri, Zhenhong Hu
Federated Learning with Mu lti-Partner OneFlorida+ Consortium Data for Predicting Major Postoperative Co mplications Yuanfang Ren, PhD a,b , Varun Sai Vemuri, BS a , Zhenhong Hu, PhD a,b , Benjamin Shi ckel, PhD a,b , Ziyuan Guan, MS a,b , Tyler J. Loftus, MD , PhD a,c , Parisa Rashidi , PhD a,d , Tezcan Ozrazga t- Baslanti, PhD a,b* , Azra Bihorac, MD , MS a,b* *These senior auth ors have contribu ted equally a Intelligent Clini cal Care Center, University o f Florida, Gainesville, FL. b Department of Medi cine, Division of Nephrol ogy, Hypertension, and Renal Transplanta tion, University of Florida , Gainesvill e, FL. c Department of Su rgery, University of Florida, Gaine sville, FL. d Department of Biomedi cal Engineering, Uni versity of Florida , Gainesville, FL. Corresponding to : Azra Biho rac, MD , MS, Intelli gent Clinical Care Cen ter , Division of Nephrology, Hype rtension, and Renal Transplantation, Depa rtment of Me dicine, University of Florida , PO Box 100224, Gainesvil le, FL 32610-0224. Telepho ne: (352) 294-8580; Fax: (352) 392 -5465; Email: abi horac@ufl.edu Reprints will not be available fro m the authors. Conflicts of Inte rest and Source of Funding : AB, PR, TOB , BS, and YR were supported R01 GM110240 fro m the National Insti tute of General Medi cal Sciences (NIH/N IGMS). This work was also suppo rted in part by the Na tional Center for Adva ncing Translational Sciences of the National Institutes of Health under the University of Florid a Clinical and Tran slational Science Awards UL1 TR000064 and UL1 TR0014 27. The content is solely the res ponsibility of the authors and does no t necessarily rep resent the official vi e ws of the Nation al Institutes of Hea lth. The funders had no role in design and conduct of the s tudy; collection, management, analysis , and interpretation of the data; prepa ration, review, or app roval of the manu script; and decisi on to submit the manu script for publicatio n. The authors declar e that they hav e no conflict of interests. AB and T OB had full acces s to all the d ata in the study and take responsibi lity for the integrity of the data and the accura cy of the data analysis. Keywords: federated learning, ma jor surgery, po stoperative complicatio ns, electronic heal th records, data pri vacy Abstract Background : Th is study aims to develop and vali date federated learning models for predicting major postope rative complications and mortality using a large multicenter dataset from the OneFlorida Data Tru st. We hypothesize that federated learni ng models wi ll offer robust generalizabil ity while preserving data p rivacy and security. Methods: This re trospective, longitudinal, multicenter cohort study included 358,644 adult patients admitted to five healthcare institutions , who underwent 494 ,163 inpatient ma jor surgical procedures fro m 2012-2023. We developed and internall y and externally v alidated federated learning models to p redict the postope rative risk of intensive care unit (ICU ) admission, mechanical ven tilation (MV) therapy , acute kidne y injury (AKI), and in-hospital mortality. These models were co mpared with local models trained on data fro m a single center and cen tral models trained on a pooled dataset fro m all centers. Performance wa s primarily evalua ted using area under the recei ver operating cha racteristics curve (AUROC) and the area under the precision-recall cu rve (AUPR C) values. Results: Our fede rated learning models demonstrated strong p redictive p erformance, with AUROC scores consis tently comparabl e or superior perfor mance in terms of AUROC and AUPRC across all outcomes and sites . Our federated lea rning models also demons trated strong generalizabil ity , with comparabl e or superior perfo rmance in terms o f both AUR OC and AUPRC compared to the bes t local learni ng model at each site . Conclusions: By leve raging multicen ter data, we developed robust, general izable, and privacy- preserving predictive models for majo r postoperative compl ications and m ortality. These findings support the feasibili ty of federated learning in cli nical decision sup port systems. Introduction In the United States, a stagge ring 40 to 50 mil lion major surgeries a re performed annually. 1 It is estimated tha t postoperative comp lications occur in up to 2 0% of cases on average 2 , through ra tes can reach 75% for high risk proced ures 3 , contributing to inc reased morbidity , p rolonged hospital len gth of stay, elevated healthca re costs, and even mortality. 4-6 While the 30-day postoperative mortali ty rate typically ranges from 1% to 4% for major surgeries, the 1 -year mortality rate for high-risk populations can ex ceed 13%. 1,7 Accurate prediction of pos toperative co mplication risk duri ng the preoperative peri od is crucial fo r improving patien t outcomes and opti mizing resource allocation . Assessing surgical risk requires the ti mely and precise aggrega tion of ext ensive clinical data. The integration of artificial intelligence with e lectronic health records (EHR) has advan ced data digitization, thereby enhancing the use of machi ne learning tools for r isk monitoring and diagnosis. Despite the promising develop ments in predicting these risks 8- 10 , current models are often limited to single ho spitals, restricting their general izability and reduci ng their effectiveness in diverse clinical settings. Multicente r data offers a solution to this limi tation by enabling the development of mo re robust and gene ralizable predictive model s. The rich diversity of patient populations and clini cal practices captu red in multicen ter datasets can significantly enhance the performance and a pplicabili ty of predictive model s. However, the use o f such data raises significant privac y concerns, as patient information must be secu rely managed and p rotected. Federated learning addresses these concerns by allowi ng multiple institutions to collaboratively t rain machine lea rning models wit hout sharing raw patien t data. Instead, model updates are aggregated centrally, ensurin g data privacy w hile leveraging the rich and diverse data available acro ss institutions. Although federated learning ha s been successful ly applied to predict outcomes such as acute kidney in jury (AKI) stage , adverse drug re actions, hospitalizations, and COVID-19 mortality, its application for predicting surgical postopera tive outcomes has not been thoroughly explored . 11 - 15 For example, Ren at al . 16 developed federa ted learning models for eight major pos toperative complicatio ns and in-hospital mortality u sing data from two Universit y of Florida Heal th (UFH) hospitals . Howeve r, the limite d number of hospi tals from the same system restricts the generalizabil ity of these finding s. To address this gap, our study aims to develop and vali date federated learni ng models for predicting ma jor postoperative complications and mortality using a large multicen ter dataset from the OneFlorid a Data Trust — a clinical data research ne twork compris ing 14 health systems . Our hypothesi s was that our federated learning model trained on distributed data would have better generalizability while simul taneously preservi ng data privacy and secu rity. Methods Study Design and Participants We obtained longi tudinal EHR data fo r 1,455,294 hospi talized patients ad mitted to healthcare institutions within the OneFl orida+ network be tween January 1, 2012 and April 30, 2023, comprising 81 ,421,419 admissi ons (including hi storical admissions) . We excluded outpatient admi ssions, patients under 18 years of age at the time of admission, patien ts who did not undergo majo r surgery, and those with end-stage renal disease (Figure 1) . Our final cohort included 494,163 encou nters from 358,644 pa tients from 5 partne rs (Partners 1, 2, 3 , 4, and 6) . The dataset includes demographics , vital signs, laborato ry values, medic ations, diagnosis, and procedure codes for all admissions. T he study was approved by the University of Florid a Institutional Revi ew Board and Privacy Office (IRB# IRB20230064 1) as an exempt study with a waiver of informed consent. Figure 1. Coho rt selection and exc lusion criteria We developed predi ctive models using three distinct learning paradigm s: local learning , central learning and federated learni ng. Local learni ng develops models using data fro m a single center, which can lead to models wi th limited general izability. Centr al learning, which pools data from multiple centers to devel op models, all ows for more general ized models but raises potential privac y concerns. Federa ted learni ng addresses these challen ges by training the model locall y at each center and aggrega ting the computed models on a central se rver without exchanging actual data. Conside ring the diversity of pa tient populations and the volume of available data, we selected three sites (Partners 3 , 4, and 6) fo r model development and internal validation . The remaining two sites (Part ners 1 and 2) were reserved for external validation. Follow ing the Transparent Repo rting of a multivaria ble predictio n model for Individual Prognosis or Di agnosis (TRIPOD) guidel ines under the type 2b analysi s category, we chronologicall y split each dataset at the th ree develop ment sites by admis sion dates into three cohorts: training (60% of observations ), validation (10% of observations) an d t he test (30% of observations) coho rts (Figure 1). This approa ch mitigates potential adverse effects of da taset drift due to changes in clinical practice or patient popul ations over time. We ensured that encounters fro m the same patient s tayed within the same cohort and the same site. We developed the model using the training cohor t, tuned hyper-paramete rs and selected models using validation cohort, and internall y validated the model u sing the test cohort. Identification of M ajor Surgery and Outcomes We identified ma jor surgery using Current Proced ural Terminology (CPT) cod es and associated relative val ue units (RVUs) following the methodolog y in study 17 . A CPT code represents ma jor surgery if the linke d RVU includ es an intraoperative po rtion and has an indicator of majo r surgery (Ren at al. 17 for more detai ls) . When a patient underwent multipl e surgeries during one admission, only the surgery wi th the highest intraope rative working uni ts was included in the analysis. Due to the unava ilability of detail ed surgical information, we used the dates associated with the CP T codes to mark the start and end da tes of the surgery, as precise surgery s tart and end times were not acce ssible. The primary outco mes were three postope rative complications: postoperative intensive care unit (ICU) admission, postopera tive mechanical ventil ation ( MV ), AKI and in-hospi tal mortality. ICU ad missions and MV we re identified usi ng diagnosis and pro cedure codes. We used our previously developed and vali dated computable phen otype algorithms 18 to determine the presence of AKI. In -hospi tal mortality was det ermined using the date of death. These four outcomes were tracked from the beginn ing of surgery (incl uding the surgery da te) until discharge. For mo re details, we re fer readers to Ren at al. 17 Predictors and Da ta Preprocessing Followi ng our previous study 17 , we used 99 ro utinely collected p reoperative features , along with histo rical admission informa tion. These features incl ud ed demographics and socioeconomic facto rs (e.g., insurance type), admission characteristics (e.g., admission source), operative infor mation (e.g., procedure code, provider specifici ty), comorbidities, histo rical medications and preope rative labora tory measurements. For the complete li st of features, see Ren et al. 17 We preprocessed the feature variabl es by removi ng outliers, imputing mis sing values, and standardizing the data. 9,16,19 We removed unreasonab le values as determined b y medical experts, as well as values in the top and bottom 1% o f the distribution. W e addressed missing values by creating a ‘missing’ category fo r missing catego rical variables and imput ing the missing continu ous variables with the median value f rom the training coho rt. We standardized the continuous va riables using min-max n ormalization techniques . For federated learning, all partners employed the same standardi zation scaler, but other preprocessing steps wer e independently car ried out at each site . Model Architectu re Followi ng the model architecture descri bed in previous studies 9,16 , we develop ed a deep learning model to p redict the risk of postope rative complications . Features in this model were divided into continuous , binary, and high-ca rdinality types, and p rocessed through neural networks suited to each type. Con tinuous and binary featu res were proces sed using fully connected layers, while high-cardinal ity features were handl ed with embeddi ng layers followed by fully connected layers. The latent representations f rom the three neural networks were merged through a fully connected laye r. This combi ned output was then directed into four separate branche s, each correspondi ng to one of the four outcomes, to e stimate the risk probabilities. We re fer readers to Shickel et al. 9 for architecture details. We trained the model using local lea rning, central l earning and federated le arning respectively. Fo r federated learning , w e employed three differen t algorithms: FedAvg 20 , FedProx 21 and SCAFFOLD 22 . FedAvg averages model weights from multiple clients, while FedProx extends FedAvg by penali zing a client’s local weight s based on t he deviation from the global model. SCAFF OLD specifically addresse s the problems of non-Independently and Identically Distribu ted (Non-IID) data in fede rated learning environments. I n addition to these federated learning algorithms tailored to neural network models, we also d eveloped an XGBoost (eXtreme Gradi ent Boosting) model, imple mented using the federa ted learning algori thm Federated XGBoos t 23 . Statistical Analysis We conducted a sensi tivity analysis to explore the combina tion of federated learni ng and local learning by fi ne-tuning the federated learning model a t each site using a personalized feature: the surgeon 's identity . We froze the wei ghts of the layers be fore the last outpu t layer, added an additional e mbedding laye r for the surgeon's identi ty feature, an d concatenated the feature represen tation from the federated learning model with the person alized features to generate new predic tions. We evaluated model performance primarily using the area unde r the receiver ope rating characteristic curve (AUROC) and the area under the precision-recall curve (AUPR C). For the sensitivity analysis , we also assessed performance using sensi tivity, specifi city, positive predictive value (PPV), and negative predictive val ue (NPV) . We conducte d 1000 bootstrap samplings to obtain 95% confidence intervals (CIs) for all performance me trics. For compari son of clinical charac teristics and outcomes , we used the χ2 test for categorical variabl es and the Mann-Whitney U test for continuous va riables. The threshol d for statistical significance was less than 0.05 for 2-sided tests, and P values for the famil y-wise error rate resul ting from multipl e comparisons were adjusted using the Bon ferroni correc tion. Analysis was conducted using Python version 3 .10, NVF lare version 2.7 and R version 4.5.2 . Results Patient Clinical Characteristics and Outcomes Across all training cohorts for model devel opment, significant dif ferences were obse rved in both demographic characteristics a nd outcome di stributions (Table 1) . In the training cohor t of 73,458 adult patien ts who underwent 94 , 908 maj or surgeries from Par tner 3, the mean age (standard devia tion, SD ) was 56 (20) year; 37,455 (51%) were fe male; 53,158 (72%) were White and 15,330 (21%) African A merican ; and 3,540 (5 %) were Hispanic. The p revalence of outcomes was 15% for postoperative ICU admission, 6% for postope rative MV, 10% for AKI , and 2% for in-hospital mortality. The Partner 4 training cohort incl uded 85,623 patients who underwent 141,290 major surgical procedures. C om pared to Par tner 3, this cohort had a higher mean (SD) age at 61 (19) years, a highe r percentage of female pa tients (5 3 %) , a higher percentage of white patients ( 77% ), a significan tly higher percentage o f Hispanic patients (44%) , and significantly low er prevalence of outc omes, including 2% fo r ICU admission, 1% for postoperative MV, 1% for AKI and 0 .1% for in-hospital mortali ty. The Partne r 6 training cohort included 48,542 pa tients who underwent 60,434 major surgical procedures. Compared to the other two partners , this cohort had an inter mediate mean (SD) age a t 57 (18) years, a highe r percentage of Af rican-American patients (24% vs 21% in Par tner 3 and 14% in Pa rtner 4) , a significantly lower pe rcentage of Hispa nic patients (1 %), and relatively lower prevalen ce of most outcomes, includ ing 6% for ICU admission , 2% for MV , 15% for AKI , and 1% for in-hospital mortality. The external validation cohort included 13,148 patients who unde rwent 13,73 8 major surgical procedures . Compared to the model devel opment cohorts, this cohort had young er patients (mean [SD] age, 50 [20] years) , a higher percentage of female pat ients (67%), similar percentage of Whi te and African-Ameri can patients to Partne r 3 and Partner 6 , and significantly lower prevalence of outcomes, including 0.3 % for ICU admissi on, 0.3% for MV , 2% for AKI, and 0.1% for in-hospi tal mortality. For each partne r, its internal vali dation cohort also exhibited significant differences in both demographic characteristics and outcome distributions co mpared to its training cohor t for model developmen t. Table 1 . Pa tient baseline characte ristics Variable Partner 3 Partner 4 Partner 6 Extern al P b Train Test Train Test Train Test Test Number of patients, n 73 , 458 36 , 442 85 , 623 42 , 818 48 , 542 24 , 001 13 , 148 Number of surgeries, n 94 , 908 41 , 364 141 , 290 66 , 499 60 , 434 26 , 772 13 , 738 Age in years, mean (SD) c 56 (20) 57 (18) a 61 (19) 59 (17) a 57 (18) 57 (17) a 50 (20) <0. 001 Sex, n (%) c Male 36,003 (49) 18,953 (52) a 40,373 (47) 20,802 (49) a 21,994 (45) 11,709 (49) a 4,296 (33) <0. 001 Female 37,455 (51) 17,489 (48) a 45,250 (53) 22,016 (51) a 26,548 (55) 12,292 (51) a 8,852 (67) <0. 001 Race, n (%) c,d,e White 53,158 (72) 26,512 (73) 66,015 (77) 34,531 (81) a 34,604 (71) 16,704 (70) a 9,471 (72) <0. 001 African American 15,330 (21) 7,166 (20) a 11,876 (14) 5,531 (13) a 11,637 (24) 5,722 (24) 2,944 (22) <0. 001 Other 4,183 (6) 2,369 (7) a 1,536 (2) 843 (2) a 2,099 (4) 1,465 (6) a 254 (2) <0. 001 Missing 787 (1) 395 (1) 6,196 (7) 1,913 (4) a 202 (0.4) 110 (0.5) 479 (4) <0. 001 Ethnicity, n (%) c ,d Non- Hispanic 69,113 (94) 33,568 (92) a 45,386 (53) 20,949 (49) a 45,045 (93) 21,919 (91) a 11,171 (85) <0. 001 Hispanic 3,540 (5) 2,159 (6) a 37,843 (44) 19,779 (46) a 684 (1) 485 (2) a 948 (7) <0. 001 Missing 805 (1) 715 (2) a 2,394 (3) 2,090 (5) a 2,813 (6) 1,597 (7) a 1,029 (8) <0. 001 Insurance c Medicare 28,822 (39) 15,677 (43) a 23,967 (28) 8,707 (20) a 15,178 (31) 5,378 (22) a 1,534 (12) <0. 001 Private 21,994 (30) 11,191 (31) a 49,953 (58) 27,780 (65) a 18,797 (39) 9,074 (38) a 10,167 (77) <0. 001 Medicaid 15,231 (21) 6,520 (18) a 1,724 (2) 552 (1) a 1,761 (4) 2,147 (9) a 897 (7) <0. 001 Uninsured 7,411 (10) 3,054 (8) a 9,979 (12) 5,779 (14) a 12,806 (26) 7,402 (31) a 550 (4) <0. 001 Complication s, n (%) f ICU admission 13991 (15) 10469 (25) a 2325 (2) 2608 (4) a 3849 (6) 1975 (7) a 41 (0.3) <0. 001 MV 6104 (6) 4809 (12) a 1577 (1) 1584 (2) a 1124 (2) 1212 (5) a 40 (0.3) <0. 001 AKI 9618 (10) 5802 (14) a 1364 (1) 1207 (2) a 8959 (15) 4810 (18) a 287 (2) <0. 001 In -hospital mortality 1976 (2) 866 (2) 130 (0.1) 62 (0.1) 706 (1) 390 (1) a 11 (0.1) <0. 001 a Within each partner, t he P values r epresent a differe nce <0.05 for the com parison between its training and test cohort. b T he P values represen t a difference <0. 05 f or the compariso n among thre e training cohorts from Partners 3, 4 and 6. Th e P values were a djusted using the Bon ferroni correction . c Data was reported based on values calculat ed at the latest hospital admission. d Race and ethnicity w ere self-rep orted. e Other races include American India n or Alaska Nat ive, Asian, Native Hawaiian or Pacific Islander, an d multiracial. f Data were reported bas ed on postoper ative comp lication status for each surgical proc edure. When a patient had multiple surger ies during o ne admission, o nly the surgery with the highest intraoperat ive working units was us ed in the analysis. Comparison of Cen tral Learning and Fede rated Learning M odels We evaluated the pe rformance o f central learnin g and federated learning models both internally and ex ternally using AUR OC and AUPRC met rics (Table 2 and Ta ble 3). Among the three federated learning algorithms tailored to neural network models — FedAvg, FedPro x, and SCAFFOLD — S CAFFOLD consi stently demonstr ated comparable or superior performance in terms of AUR OC and AUPRC across all outcome s and sites. When compared to the tree-based model, Federated XGBoost , SCAFFOLD achieved compa rable or significa ntly better AUROC scores across almo st all outcomes and sites, except for the AK I outcome in the exte rnal dataset (0.73 [95% CI, 0.71- 0.77 ] vs 0.82 [95% CI, 0 . 79 -0.84] ). Simil arly, SCAFFOLD achieved comparable or significantly higher AU PRC scores across most outcomes a nd sites, except fo r the AKI outcome for Partner 6 and the ex ternal dataset (Pa rtner 6: 0.51 [95% CI , 0.49-0.52] vs 0.55 [95% CI, 0 .54-0.57], External: 0.09 [95% CI, 0.06-0.12] vs 0.15 [95% CI, 0.12-0.19 ] ). Thus, among all federa ted learning models, SC AFFOLD achieved the overall best perfo rmance. For simplicity, we only reported the SCAFF OLD performance i n the following sections. When compared with the central lea rning model, SCA FFOLD consistently achieve d comparable or superior performance in terms of AU ROC and AUPRC acro ss all outcomes and sites , demonstrating its capability to provide stron g predictive perfor mance on non-independent and non-IID data distributions whil e preserving patien t privacy . The performance o f the SCAFFOLD model varied across dif ferent sites. Fo r Partner 3, AUROC s cores ranged from 0.82 (95% CI, 0.81-0.82 ) for AKI to 0.90 (95% CI, 0.90-0.91) for MV; AUPRC sc ores ranged from 0.18 (95% CI, 0 .16-0.21) for in-hospital mo rtality to 0.69 (95 % CI, 0.68-0.70) for ICU ad mission . For Partner 4, AU ROC scores range d from 0.94 (95% CI, 0 .94-0.95) for ICU ad mission to 0.96 (95% CI, 0.96-0.96 ) for MV; AUPRC scores ranged from 0.03 (95 % CI, 0.02-0.04) for in-hospital mortality to 0.43 (95% CI, 0.41-0.45) fo r ICU admission. Fo r Partner 6, AUROC scores ranged from 0.79 (95% CI, 0.77-0.80) for ICU admi ssion to 0.88 (95 % CI, 0.86-0.89) for in-hospital mortality; AUPRC scores ranged from 0.12 (95% CI, 0.10-0.14) for hospital mortality to 0.51 (95% CI, 0.49-0.52 ) for AKI. For the external data set, AUROC scores ranged from 0.73 (95% CI, 0.71-0.77) for AKI to 0.97 (95% C I, 0.96-0.99) for in-hospital mo rtality; AUPRC scores ranged from 0.01 (95% CI, 0.01-0.02) for ICU admission to 0.16 (95% CI, 0.08-0.29) for MV. Across all outco mes and sites, SCAFFOLD achieved generally high AUR OC scores ranging from 0.73 to 0 .97, with the highest score observed in MV and in-hospital mortali ty. However, AUPRC scores, which account for class imbalan ce, varied more wide ly, highlighting differences in performance ac ross sites and outcomes . This variabili ty was particularly noticeabl e for extremely imbalanced outcomes, such a s the in -hospital mortality outco me for Partner 4 and the ICU admission o utcome for the exte rnal dataset. Table 2. Compa rison of AUROC with 95% Confidence Interval for Cent ral Learning and Federated Learning Models Outcome CL FedAvg FedProx SCAFFOLD Federated XGBoost Performance on Partner 3 ICU admission 0.87 (0.87-0.88) 0.83 (0.82-0.83) 0.81 (0.81-0.81) 0.85 (0.85-0.86) 0.78 (0.78-0.78) MV 0.89 (0.88-0.89) 0.87 (0.87-0.88) 0.87 (0.86-0.87) 0.90 (0. 90 -0.91) 0.83 (0.83-0.84) AKI 0.78 (0.77-0.78) 0.78 (0.77-0.79) 0.77 (0.77-0.78) 0.82 (0.81-0.82) 0.79 (0.79- 0.80 ) In -hospital mortality 0.88 (0.87-0.89) 0.88 (0.87-0.89) 0.87 (0.86-0.88) 0.89 (0.88-0.90) 0.87 (0.85-0.88) Performance on Partner 4 ICU admission 0.94 (0.94-0.95) 0.94 (0.94-0.94) 0.94 (0.94-0.94) 0.94 (0.94-0.95) 0.92 (0.91-0.92) MV 0.96 (0.96-0.96) 0.96 (0.95-0.96) 0.96 (0.95-0.96) 0.96 (0.96-0.96) 0.94 (0.93-0.94) AKI 0.89 (0.88-0.89) 0.93 (0.93-0.94) 0.94 (0.93-0.94) 0.95 (0.94-0.95) 0.90 (0. 89 -0.91) In -hospital mortality 0.89 (0.86-0.92) 0.96 (0.95-0.97) 0.96 (0.95-0.97) 0.96 (0.94-0.97) 0.92 (0. 88 -0.95) Performance on Partner 6 ICU admission 0.76 (0.75-0.77) 0.77 (0.76-0.78) 0.75 (0.74-0.76) 0.79 (0.77- 0.80 ) 0.75 (0.73-0.76) MV 0.78 (0.76-0.79) 0.78 (0.77-0.79) 0.76 (0.75-0.78) 0.81 (0. 80 -0.82) 0.75 (0.74-0.77) AKI 0.76 (0.75-0.77) 0.78 (0.77-0.79) 0.77 (0.76-0.78) 0.80 (0.79-0.81) 0.80 (0.79- 0. 81) In -hospital mortality 0.83 (0.81-0.85) 0.87 (0.86-0.89) 0.86 (0.85-0.88) 0.88 (0.86-0.89) 0.88 (0.86-0.89) Performance on external validation dataset ICU admission 0.75 (0.65-0.83) 0.77 (0.69-0.84) 0.75 (0.66-0.84) 0.77 (0.69-0.84) 0.55 (0.45- 0.64 ) MV 0.97 (0.95-0.98) 0.94 (0.92-0.96) 0.83 (0.77-0.88) 0.96 (0.95-0.98) 0.86 (0. 78 - 0.93 ) AKI 0.76 (0.73-0.79) 0.76 (0.73-0.79) 0.76 (0.73-0.78) 0.73 (0.71-0.77) 0.82 (0.79-0.84) In -hospital mortality 0.98 (0.95-0.99) 0.96 (0.91-0.99) 0.91 (0.85-0.97) 0.97 (0.96-0.99) 0.65 (0.39-0.88) Abbreviations. CL; cen tral learning, ICU; intensive car e unit, MV; mechan ical ventilation, AKI; acute kidney injury. Highest performance for each ro w is bold. Table 3. Compa rison of AUPRC with 95% Conf idence Interval for Cen tral Learning and Federated Learning Models Outcome CL FedAvg FedProx SCAFFOLD Federated XGBoost Performance on Partner 3 ICU admission 0.71 (0.70-0.72) 0.64 (0.63-0.65) 0.62 (0.62-0.63) 0.69 (0.68- 0.70 ) 0.59 (0.58- 0.60 ) MV 0.53 (0.52-0.54) 0.50 (0.48-0.51) 0.49 (0.47- 0.50 ) 0.56 (0.55-0.58) 0.47 (0.45-0.48) AKI 0.39 (0.37- 0.40 ) 0.39 (0.38-0.40) 0.38 (0.37- 0.40 ) 0.46 (0.45-0.48) 0.47 (0.46-0.49) In -hospital mortality 0.18 (0.16-0.21) 0.15 (0.13-0.17) 0.14 (0.13-0.16) 0.18 (0.16-0.21) 0.20 (0. 17 - 0. 2 2) Performance on Partner 4 ICU admission 0.43 (0.41-0.45) 0.40 (0.38-0.42) 0.42 (0. 40 -0.44) 0.43 (0.41-0.45) 0.39 (0.37- 0.41 ) MV 0.38 (0.35-0.40) 0.35 (0.33-0.38) 0.36 (0.34-0.38) 0.38 (0.36-0.40) 0.36 (0.33-0.39) AKI 0.15 (0.13-0.17) 0.28 (0.25-0.31) 0.29 (0.26-0.32) 0.35 (0.32-0.38) 0.29 (0.26- 0.32 ) In -hospital mortality 0.01 (0.01-0.03) 0.02 (0.01-0.04) 0.03 (0.02-0.07) 0.03 (0.02-0.04) 0.02 (0.01-0.05) Performance on Partner 6 ICU admission 0.23 (0.22-0.25) 0.23 (0.21-0.24) 0.17 (0.16-0.19) 0.25 (0.23-0.27) 0.25 (0.23-0.27) MV 0.15 (0.13-0.17) 0.15 (0.14-0.17) 0.12 (0.11-0.13) 0.19 (0.17-0.21) 0.15 (0.13-0.16) AKI 0.41 (0. 40 -0.43) 0.47 (0.46-0.48) 0.46 (0.44-0.47) 0.51 (0.49-0.52) 0.55 (0.54- 0.57 ) In -hospital mortality 0.08 (0.07-0.11) 0.11 (0.09-0.13) 0.09 (0.08-0.12) 0.12 (0. 10 -0.14) 0. 14 (0. 11 - 0.17 ) Performance on external validation dataset ICU admission 0.01 (0.01-0.02) 0.01 (0.01-0.02) 0.01 (0.01-0.02) 0.01 (0.01-0.02) 0.00 (0.00-0.01) MV 0.13 (0.06-0.23) 0.09 (0.04-0.18) 0.08 (0.02-0.18) 0.16 (0.08-0.29) 0.08 (0.04-0.18) AKI 0.08 (0.07-0.11) 0.06 (0.05-0.08) 0.06 (0.05-0.07) 0.09 (0.06-0.12) 0.15 (0.12-0.19) In -hospital mortality 0.14 (0.02-0.45) 0.04 (0.01-0.15) 0.01 (0.00-0.02) 0.12 (0.01-0.34) 0.22 (0.00- 0.49 ) Abbreviations. CL; cen tral learning, ICU; intensive car e unit, MV; mechan ical ventilation, AKI; acute kidney injury. Highest performance for each ro w is bold. Comparison of Lo cal Learning and Fede rated Learning Models We internally and ex ternally evaluated the pe rformance of three local learning models (Partner 3, Partne r 4 and Partner 6 models, developed using only their respective local da ta) and the federated lea rning model (SCAFF OLD) using AUR OC and AUPRC metrics (Table 4 and Table 5). Local learning models tend to exhibit stronger pe rformance on their respective local data but pe rform ed poorly at other sites . This is likely due to the test data at each site having a distribution more similar to the training data of the locally devel oped model than to the training data fro m other sites . Consequen tly, mod els developed using sing le-center data often have limited gene ralizability. However, compa red to the best local learning model at each site, SCAFFOLD model de monstrated s trong generalizab ility, with comparabl e or superior performance in terms of both AUROC and AUPRC. For the external e valuation, the Partne r 3 model, developed using relatively outcome balanced data, achieved the best performan ce among all local learni ng models, and the SCAFF OLD model achieved comparable or superio r performance. Table 4. Compa rison of AUROC with 95% Confidence Interval for Loc al Learning and Federated Learning Models Outcome Model Partner 3 Test Dataset Partner 4 Test Dataset Partner 6 Test Dataset External Validation Dataset ICU admission Partner 3 model 0.87 (0.86-0.87) 0.86 (0.85-0.86) 0.74 (0.73-0.75) 0.79 (0.71-0.85) Partner 4 model 0.64 (0.63-0.64) 0.95 (0.94-0.95) 0.62 (0.61-0.64) 0.71 (0.61-0.79) Partner 6 model 0.77 (0.76-0.77) 0.66 (0.65-0.67) 0.80 (0.78-0.81) 0.74 (0.66-0.81) SCAFFOLD 0.85 (0.85-0.86) 0.94 (0.94-0.95) 0.79 (0.77- 0.80 ) 0.77 (0.69-0.84) MV Partner 3 model 0.89 (0.89- 0.90 ) 0.89 (0.88- 0.90 ) 0.76 (0.74-0.77) 0.95 (0.93-0.96) Partner 4 model 0.68 (0.67-0.69) 0.96 (0.96-0.96) 0.65 (0.64-0.67) 0.75 (0.65-0.85) Outcome Model Partner 3 Test Dataset Partner 4 Test Dataset Partner 6 Test Dataset External Validation Dataset Partner 6 model 0.79 (0.79- 0.80 ) 0.69 (0.68-0.70) 0.80 (0.78-0.81) 0.90 (0.87-0.93) SCAFFOLD 0.90 (0. 90 -0.91) 0.96 (0.96-0.96) 0.81 (0. 80 -0.82) 0.96 (0.95-0.98) AKI Partner 3 model 0.81 (0.80-0.81) 0.75 (0.73-0.76) 0.73 (0.73-0.74) 0.78 (0.76-0.81) Partner 4 model 0.62 (0.61-0.63) 0.95 (0.95-0.96) 0.62 (0.61-0.63) 0.66 (0.62-0.69) Partner 6 model 0.74 (0.74-0.75) 0.69 (0.68-0.71) 0.79 (0.79- 0.80 ) 0.73 (0. 70 -0.77) SCAFFOLD 0.82 (0.81-0.82) 0.95 (0.94-0.95) 0.80 (0.79-0.81) 0.73 (0.71-0.77) In -hospital mortality Partner 3 model 0.91 (0. 90 -0.92) 0.64 (0.56-0.72) 0.80 (0.78-0.82) 0.97 (0.95-0.99) Partner 4 model 0.71 (0.69-0.73) 0.95 (0.94-0.97) 0.66 (0.63-0.68) 0.68 (0.48-0.85) Partner 6 model 0.81 (0.80-0.83) 0.73 (0.66- 0.80 ) 0.87 (0.86-0.89) 0.91 (0.80-0.97) SCAFFOLD 0.89 (0.88-0.90) 0.96 (0.94-0.97) 0.88 (0.86-0.89) 0.97 (0.96-0.99) Abbreviations. CL; cen tral learning, ICU; intensive car e unit, MV; mechan ical ventilation, AKI; acute kidney injury. Hi ghest performance f or each outcome a nd each column is b old. Table 5. Compa rison of AUPRC with 95% Conf idence Interval for Loc al Learning an d Federated Learning Models Outcome Model Partner 3 Test Dataset Partner 4 Test Dataset Partner 6 Test Dataset External Validation Dataset ICU admission Partner 3 model 0.70 (0.69-0.71) 0.26 (0.24-0.28) 0.22 (0.20-0.23) 0.01 (0.01-0.03) Partner 4 model 0.42 (0.41-0.43) 0.46 (0.44-0.48) 0.10 (0.09-0.10) 0.01 (0.00-0.01) Partner 6 model 0.55 (0.54-0.56) 0.10 (0.09-0.11) 0.28 (0.26- 0.30 ) 0.01 (0.00-0.01) SCAFFOLD 0.69 (0.68- 0.70 ) 0.43 (0.41-0.45) 0.25 (0.23-0.27) 0.01 (0.01-0.02) MV Partner 3 model 0.55 (0.53-0.56) 0.20 (0.18-0.22) 0.14 (0.13-0.16) 0.07 (0.03-0.12) Partner 4 model 0.25 (0.24-0.26) 0.39 (0.37-0.42) 0.07 (0.06-0.08) 0.10 (0.03-0.23) Partner 6 model 0.36 (0.35-0.38) 0.07 (0.06-0.08) 0.17 (0.15-0.19) 0.02 (0.01-0.03) Outcome Model Partner 3 Test Dataset Partner 4 Test Dataset Partner 6 Test Dataset External Validation Dataset SCAFFOLD 0.56 (0.55-0.58) 0.38 (0.36-0.40) 0.19 (0.17-0.21) 0.16 (0.08-0.29) AKI Partner 3 model 0.45 (0.43-0.46) 0.10 (0.09-0.12) 0.38 (0.36-0.39) 0.07 (0.06-0.09) Partner 4 model 0.22 (0.22-0.23) 0.32 (0.29-0.35) 0.28 (0.27-0.29) 0.05 (0.04-0.06) Partner 6 model 0.33 (0.32-0.35) 0.09 (0.08-0.11) 0.48 (0.47- 0.50 ) 0.06 (0.05-0.08) SCAFFOLD 0.46 (0.45-0.48) 0.35 (0.32-0.38) 0.51 (0.49-0.52) 0.09 (0.06-0.12) In -hospital mortality Partner 3 model 0.22 (0. 20 -0.25) 0.01 (0.00-0.03) 0.08 (0.06-0.10) 0.15 (0.01- 0.38 ) Partner 4 model 0.05 (0.05-0.06) 0.03 (0.02-0.09) 0.03 (0.02-0.03) 0.00 (0.00-0.01) Partner 6 model 0.09 (0.08-0.10) 0.01 (0.00-0.01) 0.10 (0.09-0.12) 0.01 (0.00-0.02) SCAFFOLD 0.18 (0.16-0.21) 0.03 (0.02-0.04) 0.12 (0. 10 -0.14) 0.12 (0.01-0.34) Abbreviations. CL; cen tral learning, ICU; intensive car e unit, MV; mechan ical ventilation, AKI; acute kidney injury. Hi ghest performance f or each outcome a nd each column is b old. Sensitivity Analysi s At each partner , we fine-tuned the federated lea rning model using i ts local feature, the surgeon’s identity , and evaluated its per formance (Table 6). Across almost all sites and all outcomes, the fine- tuned federated learning model demonstrated consisten t comparable and slightly better AUR OC and AUP RC scores. Specificall y, for Partner 3, the fine-tuned model showed improvemen ts over the federa ted learning model in outcomes such as ICU admission (AUROC 0.87 [95% CI, 0.86-0.87] vs . 0.85 [95% CI, 0.85-0.86 ]; AUPRC 0.71 [95% CI, 0.70- 0.72] vs. 0.69 [95% CI, 0.68-0.70]), AKI (AUPRC 0.47 [95% CI, 0.46-0.49] vs. 0 . 46 [95% CI, 0.45 - 0. 48 ]), and in-hospit al mortality (AUROC 0 .90 [95% CI, 0.89 -0.90] vs. 0.89 [95% C I, 0.88- 0.90]; AUPRC 0. 20 [95% CI, 0. 18 - 0. 23 ] vs. 0. 18 [95% CI, 0. 16 - 0. 21 ]). The improvemen ts were also reflected in sl ightly higher specificity and PPV values. For Pa rtner 4, the f inetuned model showed an increa se in AUPRC scores for ICU admission (0. 46 [95% CI, 0.44-0.48] vs . 0. 43 [95% CI, 0.41-0 .45]) and MV (0. 41 [95% C I, 0.39-0.43] vs. 0. 38 [95% CI, 0.36-0.40]), whi le other metrics remain ed si milar. For Partner 6, the fine-tuned model provi ded improvemen ts for ICU admission (AUROC 0.80 [95% CI, 0. 79 -0.81 ] vs . 0.79 [95% CI, 0 . 77 -0.80]; AUPRC 0. 28 [95% CI, 0. 26 - 0. 30 ] vs. 0. 25 [95% CI, 0. 23 - 0. 27 ]), MV (AUP RC 0. 20 [95% CI, 0. 18 - 0.22 ] vs. 0. 19 [95% CI, 0. 17 - 0. 21 ]), AKI (AUROC 0.81 [95% CI, 0. 80 -0 .81] vs. 0. 80 [95% CI, 0.79-0 .81]; AUPRC 0. 53 [ 95% CI, 0. 51 - 0.54 ] vs . 0. 51 [95% CI, 0.49- 0.52 ]) , and in-hospital mortali ty (AUPRC 0. 13 [95% CI, 0.11- 0. 16 ] vs. 0.12 [95% CI, 0.10- 0. 14 ]). The model also showed sli ght improvements in speci ficity, sensitivity and PP V metrics for most outcomes. Table 6. Sens itivity Analysis Fine-tun ing the Federated Lea rning SCAFFOLD Model by Adding Persona lized Feature (the Surgeon ’s Identity): Model Perfor mance Measurements with 95% Con fidence Interval Outcome Model AUROC AUPRC Sensitivity Specificity PPV NPV Performance on Partner 3 ICU admission FL 0.85 (0.85- 0.86) 0.69 (0.68- 0.70 ) 0.78 (0.75-0.80) 0.77 (0.74-0.79) 0.53 (0.51- 0.55) 0.91 (0.90- 0.92) FT 0.87 (0.86- 0.87) 0.71 (0.70- 0.72) 0.78 (0.76-0.81) 0.81 (0.77-0.82) 0.57 (0.54- 0.59) 0.91 (0.91- 0.92) MV FL 0.90 (0.90- 0.91) 0.56 (0.55- 0.58) 0.86 (0.83-0.87) 0.81 (0.79-0.84) 0.37 (0.35- 0.40) 0.98 (0.97- 0.98) FT 0.90 (0.90- 0.91) 0.56 (0.55- 0.58) 0.86 (0.84-0.88) 0.81 (0.79-0.82) 0.37 (0.35- 0.39) 0.98 (0.98- 0.98) AKI FL 0.82 (0.81- 0.82) 0.46 (0.45- 0.48) 0.78 (0. 70 -0.81) 0.70 (0.68-0.78) 0.30 (0.29- 0.34) 0.95 (0.94- 0.96) FT 0.82 (0.81- 0.82) 0.47 (0.46- 0.49) 0.74 (0.72-0.77) 0.74 (0.71-0.76) 0.32 (0.30- 0.33) 0.95 (0.94- 0.95) In -hospital mortality FL 0.89 (0.88- 0.90) 0.18 (0.16- 0.21) 0.84 (0.78-0.88) 0.79 (0.74-0.84) 0.08 (0.07- 0.10 ) 1.00 (0.99- 1.00) FT 0.90 (0.89- 0.90) 0.20 (0.18- 0.23) 0.81 (0.79-0.88) 0.82 (0.75-0.83) 0.09 (0.07- 0.10 ) 1.00 (0.99- 1.00) Performance on Partner 4 Outcome Model AUROC AUPRC Sensitivity Specificity PPV NPV ICU admission FL 0.94 (0.94- 0.95) 0.43 (0.41- 0.45) 0.89 (0.84-0.91) 0.86 (0.84-0.90) 0.20 (0.18- 0.26) 0.99 (0.99- 1.00) FT 0.95 (0.94- 0.95) 0.46 (0.44- 0.48) 0.89 (0.86-0.93) 0.86 (0.82-0.88) 0.20 (0.18- 0.23) 0.99 (0.99- 1.00) MV FL 0.96 (0.96- 0.96) 0.38 (0.36- 0.40) 0.91 (0.89-0.92) 0.92 (0.92-0.94) 0.22 (0.21- 0.26) 1.00 (1.00- 1.00) FT 0.96 (0.96- 0.97) 0.41 (0.39- 0.43) 0.90 (0.89-0.93) 0.93 (0.91-0.93) 0.23 (0.19- 0.25) 1.00 (1.00- 1.00) AKI FL 0.95 (0.94- 0.95) 0.35 (0.32- 0.38) 0.91 (0.87-0.94) 0.85 (0.83-0.89) 0.10 (0.09- 0.13) 1.00 (1.00- 1.00) FT 0.95 (0.94- 0.95) 0.33 (0.30- 0.36) 0.90 (0.87-0.94) 0.86 (0.82-0.88) 0.11 (0.09- 0.13) 1.00 (1.00- 1.00) In -hospital mortality FL 0.96 (0.94- 0.97) 0.03 (0.02- 0.04) 0.94 (0.85-1.00) 0.85 (0.76-0.95) 0.01 (0.00- 0.02) 1.00 (1.00- 1.00) FT 0.96 (0.95- 0.97) 0.03 (0.02- 0.05) 0.97 (0.89-1.00) 0.84 (0.80-0.94) 0.01 (0.00- 0.01) 1.00 (1.00- 1.00) Performance on Partner 6 ICU admission FL 0.79 (0.77- 0. 8 0) 0.25 (0.23- 0.27) 0.69 (0.65-0.81) 0.74 (0.62-0.77) 0.17 (0.14- 0.19) 0.97 (0.96- 0.98) FT 0.80 (0.79- 0.81) 0.28 (0.26- 0.30) 0.70 (0.65-0.77) 0.75 (0.68-0.79) 0.18 (0.16- 0.20) 0.97 (0.97- 0.97) MV FL 0.81 (0. 80 - 0.82) 0.19 (0.17- 0.21) 0.75 (0. 70 -0.81) 0.73 (0.68-0.79) 0.12 (0.10- 0.13) 0.98 (0.98- 0.99) FT 0.81 (0.80- 0.82) 0.20 (0.18- 0.22) 0.72 (0. 70 -0.82) 0.77 (0.67-0.79) 0.13 (0.10- 0.14) 0.98 (0.98- 0.99) AKI FL 0.80 (0.79- 0.81) 0.51 (0.49- 0.52) 0.73 (0.69-0.75) 0.71 (0. 70 -0.75) 0.36 (0.35- 0.38) 0.92 (0.92- 0.93) FT 0.81 (0. 80 - 0.81) 0.53 (0.51- 0.54) 0.73 (0.67-0.77) 0.73 (0.69-0.78) 0.37 (0.35- 0.41) 0.92 (0.92- 0.93) In -hospital mortality FL 0.88 0.12 0.82 (0.77-0.92) 0.78 (0.69-0.84) 0.05 1.00 Outcome Model AUROC AUPRC Sensitivity Specificity PPV NPV (0.86- 0.89) (0. 10 - 0.14) (0.04- 0.07) (1.00- 1.00) FT 0.88 (0.87- 0.90) 0.13 (0.11- 0.16) 0.84 (0.76-0.92) 0.76 (0.68-0.83) 0.05 (0.04- 0.07) 1.00 (1.00- 1.00) Abbreviations. FL ; federated learning, FT ; fine- tun ing, ICU; i ntensive care u nit, MV; mecha nical ventilation, AKI; acute ki dney injury. Highest performa nce for each outc ome is bol d. Discussion In this study, we devel oped and internal ly and externall y validated federat ed learning models to predi ct the risk of major postoperative compl ications and mortali ty using a large multicenter OneFl orida Data Trust dataset. The fede rated learning SCAFF OLD mo del demonstrated the bes t performance fo r our non-IID data and prove d to be more robus t and generalizable than local models. The federa ted learning model also showed compa rable and slightly superior performance to the central learning model while maintaining data privacy . Our study highli ghts several key advantages of federated learning over tradi tional central learning and local l earning approaches . First, the development of federated learning models allows multipl e centers to partici pate, leveraging large and diverse patient populations . This mitigates the poten tial biases introduced by smaller, sing le-center datasets and ensures that the model can generali ze well across diffe rent populati ons and healthcare set tings, which is cruci al for real-world cli nical applications. This capabil ity is demonstrated in our study and sup ported by many other studi es 11,16,24- 26 . For example, Dayan e t al. 24 proposed a fede rated learning model to predict the future oxygen requirement s for COVID- 19 patients using EHR data fro m 20 institutions. The model provided a 38% increase i n generalizabil ity, as me asured by AUROC scores, compared to local model s. Similarly, Vaid et al . 11 developed feder ated learning models to predict mortali ty in COVID-19 patients within 7 days using data from 5 hospitals, which outperform ed all local models. Second , the federated learning app roach al igns with current ethical and legal s tandards for data pri vacy, addressi ng a major barrier i n the sharing and utilization of medical data. Furthermo re, the use of the federated learning SCAFFOLD model specifically addresses challenge s associated with non-IID data distributions, whi ch are common in multicenter studi es. Our findings highl ight the feasibili ty of implementi ng federated learning in a practical healthcare setting . The model’s robus t predictive per formance, while pres erving data privacy, demonstrates its potential for integra tion into clini cal decision support systems. By uti lizing only preoperative data , our model enabl es early assessment of surgical risk, he lping clinici ans identify high-risk pa tients and tailor periop erative care to reduce the likeli hood of complications. For example, patien ts at high risk of pos toperative AKI or ca rdiovascular ev ents could benefit from preopera tive interventions such as optimizin g fluid status, cont rolling blood pressure, and managing ane mia. 27 Additionally, for patien ts with diabetes, con trolling weight and adjusting lifestyle can sign ificantly enhance their overall surgical outcomes. 28 Early identification o f high- risk patients also facilitates improved pe rioperative planni ng and resource al location. It allows for the optimiza tion of operating room and ICU bed schedules, priori tization of resou rces for the most vulnerable pa tients, and the p reparation of nece ssary equipment and medications. By implementing these measures, healthca re systems can enhance pa tient outcome s and reduce the overall burden on healthcare infras tructure. Sensitivity analysi s highlights the potential of combini ng federated learning and local learning approaches . While using routinel y collected features for model develop ment can increase the gene ralizability of models , adding additional featu res may provide further valua ble insights. For exa mple, features such as surgeon s’ previous per formances in relation to his case- mix, and patients’ social determinants of health (e.g., poverty and inequality) , have been increasingly recogni zed for their imp act on health. 29,30 Adding these features to model development all ows the model to provide mo re personalized predic tions and account for loca l variations that migh t not be captured by the federated learning model alone. Additional ly, if the performance of the federated model drops due to da ta drift in some cente rs, fine-tuning the federated learning model in those cen ters with updated data allow s the quick adaptation . The study has seve ral limitations. Fir st, the dataset used lacks certain g ranular details o f surgeries (such as the exact sta rt and end times) and the types of anesth esia administered, which could introduce potential bias into our results. Additi onally, the absence of data on factors such as patient loca tions (stations ) and respiratory device usage limits the scope of our outcomes. Second, while our study simul ated the federa ted learning enviro nment, it did not full y replicate the p ractical challenges of re al -world im plementation. For exa mple, variation s in EHR data cleaning, lab eling, and standardi zation practices ac ross different centers were not addressed. Addition ally, technical chall enges such as network latency, data synch ronization, and differences in computational pow er among partici pating nodes were n ot evaluated. These factors could signi ficantly influence the feasibility an d performance of fede rated learning in an actual deployment scenario. Future work should emphasi ze the creation of a multicente r surgical dataset with clear provenance , employing com mon data models and including comprehensive elemen ts specific to surgeries from diver se, multi-institutional sources . Conclusions We develop ed robus t, generalizable, and p rivacy-preserving predictive models for majo r postoperative co mplications and mor tality using a large multicenter datase t . Further implementation s tudies are needed to validate the federated learning plat form across differen t healthcare systems and to assess the clinical imp act of these models on patient care. Reference 1. Dobson GP. T rauma of major su rgery: A global p roblem that is not going away. Int J Surg . Sep 2020;81 :47-54. doi:10.1016/ j.ijsu.2020.07.017 2. Nye HE, Shen EP, Baig F. Postopera tive Complications. Medical Clinics . 2024;108(6):1201 -1214. 3. Gratama DN, Weinberg L, Rayka teeraroj N, et al. Reduced Long-Term Survival Af ter Postoperative Compli cations in Majo r Gastrointesti nal Surgery. Ther Clin Risk Manag . 2025;21:1459-1472. doi:10.2147/TCRM .S543913 4. Weinberg L, Ra tnasekara V, Tran AT , et al. The A ssociation of Postopera tive Complications and Hospi tal Costs Foll owing Distal Pancreatec tomy. Front Surg . 2022;9 :890518. doi:10.3389/fsu rg.2022.890518 5. Healy MA, Mullard AJ, Campbell DA, Jr., Dimick JB. Hospital and Paye r Costs Associated With Su rgical Complications. JAMA Surg . Sep 1 2016;151(9):8 23-30. doi:10.1001/ja masurg.2016.0773 6. Tevis SE, Kennedy GD. Postoperative co mplications and i mplications on patient- centered outcomes . J Surg Res . May 1 2013;181(1):106-13. doi:10.1016/j.jss.2013.01 .032 7. Gill TM, Vander Wyk B, Leo-Summers L, Murphy TE, Bech er RD. Populati on-Based Estimates of 1-Yea r Mortality After Major Surgery A mong Community-Living Older US Adul ts. JAMA Surg . Dec 1 2022;157(12):e22515 5. doi:10.1001/ jamasurg.2022.5155 8. Bihorac A, Ozrazga t-Baslanti T, Ebadi A , et al. MySurgeryRi sk: Developme nt and Validation of a Machine-learning Risk Algo rithm for Major Co mplications and Death After Surgery. Ann Su rg . Apr 2019;269(4 ):652-662. doi:10.1097/SLA .0000000000 002706 9. Shickel B, Loftus TJ, Ruppert M , et al. Dynamic p redictions of postope rative complications fro m explainable, uncer tainty-aware, and multi -task deep neural networks. Sci Rep . Jan 21 2023 ;13(1):1224. doi:10.1038 /s41598- 023 - 27418 -5 10. Datta S, Loftus TJ , Ruppert MM, et al. Added Val ue of Intraoperative Data for Predicti ng Postoperative Compli cations: The M ySurgeryRisk PostOp Ex tension. J Surg Res . Oct 2020;254:350-363. doi:10.1016/j.jss .2020.05.007 11. Vaid A, Jaladanki SK , Xu J, et al. Federated Lear ning of Electronic Health Records to Improve Mortali ty Prediction in Ho spitalized Patients With COVID-19: Machine Learni ng Approach. JMIR Med Inform . Jan 27 20 21;9(1):e 24207. doi:10.2196/2420 7 12. Choudhury O, Park Y, Saloni dis T, Gkoulalas-Divanis A, Syll a I, Das AK. P redicting Adverse Drug Reac tions on Distributed Health Data using Fede rated Lear ning. AMIA Annu Symp Proc . 2019;2019 :313-322. 13. Brisimi TS, Chen R, Mela T, Olshevs ky A, Paschal idis IC, Shi W. Federa ted learning of predictive models from federated Electronic Health Reco rds. Int J Med Inform . Apr 2018 ;112:59- 67. doi:10.1016/j .ijmedinf.2018.01 .007 14. Ng D, Lan X, Yao MM-S, Chan WP, Feng M. Federated learning: a collaborative e ffort to achieve better medical imaging model s for individu al sites that have smal l labelled datasets . Quantitative Ima ging in Medi cine and Surgery . 2021;11(2):852 . 15. Huang CT, Wang TJ , Kuo LK, et al. Fede rated machi ne learning for predi cting acute kidney injury in c ritically ill patients: a multicenter study in Taiwan. Health Inf Sci Sys t . Dec 2023;11(1):48. doi:10 .1007/s13755- 023 - 00248 -5 16. Ren Y, Park Y , Shickel B, et al. Fede rated Learni ng for Predicting Major Postoperative Complications. Ann Surg Open . Jun 2025 ;6(2):e573. doi:10.1097 /AS9.00000 00000000573 17. Ren Y, Adiyeke E, Guan Z, et al. Validation of the MySurgeryRisk Algori thm for Predicting Compli cations and Death a fter Major Surgery : A Retrospective Multicenter S tudy Using OneFlorida Da ta Trust. arXiv prep rint arXiv:25062181 4 . 2025; 18. Ozrazgat-Baslanti T, Ren Y, Adiyeke E, et al. Develop ment and validation of a race- agnostic computable phenotype for kidney health i n adult hospitalized patie nts. PLoS One . 2024;19(4):e0299332 . doi:10.1371/journal .pone.0299332 19. Ren Y, Loftus TJ , Datta S, et al . Performance of a Machine Learning Algorithm Usi ng Electronic Health Record Data to P redict Postoperative Co mplications and Report on a Mobil e Platform. JAMA N etw Open . May 2 2022 ;5(5):e2211973. doi:10.1001/ja manetworkopen.2022.11973 20. McMahan B, Moore E, Ramage D, Ha mpson S, y Arcas BA. Communicati on- efficient learning of deep ne tworks from decen tralized data. P MLR; 2017:1273-1282. 21. Li T, Sahu AK, Zahe er M, Sanjabi M, Talwalkar A, Smi th V. Federated op timization in heterogeneous netwo rks. Proceedings o f Machine learni ng and systems . 2020;2:429 -450. 22. Karimireddy SP , Kale S, Mohri M, Reddi S, Stich S , Suresh AT. Scaffold: Stochastic controlled averagi ng for federated learning. PMLR; 2020 :5132-5143. 23. Xu Z, Hsieh Y-T, Zhang Z, et al. Secu re Federate d XGBoost with CU DA-accelerated Homomorphic Encr yption via NVIDIA FLARE. arXiv preprint arXiv:2504039 09 . 2025; 24. Dayan I, Roth HR , Zhong A, et al. Federated lear ning for predicting cli nical outcomes in patients with COV ID-19. Nat Med . Oct 2021;27(10):1735-1743. doi:10.1038/s41591- 021 - 01506 - 3 25. Feki I, Amma r S, Kessentini Y, Muha mmad K. Fe derated learning for COVID- 19 screening from Ches t X-ray images. Appl So ft Comput . Jul 2021;106:107330 . doi:10.1016/j.asoc.2021 .107330 26. Feng B, Shi J, H uang L, et al. Robustly federated l earning model for identif ying high-risk patients with pos toperative gastric can cer recurrence. Nat Commun . Jan 25 2024;15(1) :742. doi:10.1038/s41467 - 024 - 44946 -4 27. John A, Norbert L, Peter A, et al. Ki dney disease: Imp roving global outcomes (KD IGO) acute kidney inju ry work group. KDIG O clinical practice guideline for acu te kidney injury . Kidney Int Suppl . 2012;2:1 -138. 28. Polderman JAW, Hermanide s J, Hulst AH. Update on the perioperative m anagement of diabetes melli tus. BJA Educ . Aug 2024 ;24(8):261-269. doi :10.1016/j.bjae.2024 .04.007 29. Gabert R, Tho mson B, Gakidou E, Ro th G. Identifyi ng High-Risk Neighborhoods U sing Electronic Medical Records: A Popul ation-Based Approach for Targe ting Diab etes Prevention and Treatment Inte rventions. PLoS One . 2016 ;11(7):e0159227. doi:10.1371/journ al.pone.0159227 30. Goodwin AJ, Nadi g NR, McElligott JT, Si mpson KN, Ford DW. Where You Live Matters: The Impact of Pl ace of Residence on Seve re Sepsis Incidence and Mortali ty. Chest . Oct 2016;150(4):829 -836. doi:10.1016/j.che st.2016.07.004
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment