What Drives Length of Stay After Elective Spine Surgery? Insights from a Decade of Predictive Modeling
Objective: Predicting length of stay after elective spine surgery is essential for optimizing patient outcomes and hospital resource use. This systematic review synthesizes computational methods used to predict length of stay in this patient population, highlighting model performance and key predictors. Methods: Following PRISMA guidelines, we systematically searched PubMed, Google Scholar, and ACM Digital Library for studies published between December 1st, 2015, and December 1st, 2024. Eligible studies applied statistical or machine learning models to predict length of stay for elective spine surgery patients. Three reviewers independently screened studies and extracted data. Results: Out of 1,263 screened studies, 29 studies met inclusion criteria. Length of stay was predicted as a continuous, binary, or percentile-based outcome. Models included logistic regression, random forest, boosting algorithms, and neural networks. Machine learning models consistently outperformed traditional statistical models, with AUCs ranging from 0.94 to 0.99. K-Nearest Neighbors and Naive Bayes achieved top performance in some studies. Common predictors included age, comorbidities (notably hypertension and diabetes), BMI, type and duration of surgery, and number of spinal levels. However, external validation and reporting practices varied widely across studies. Discussion: There is growing interest in artificial intelligence and machine learning in length of stay prediction, but lack of standardization and external validation limits clinical utility. Future studies should prioritize standardized outcome definitions and transparent reporting needed to advance real-world deployment. Conclusion: Machine learning models offer strong potential for length of stay prediction after elective spine surgery, highlighting their potential for improving discharge planning and hospital resource management.
💡 Research Summary
This systematic review examined the literature from December 1 2015 to December 1 2024 to identify and evaluate predictive models of length of stay (LOS) after elective spine surgery. Following PRISMA guidelines, the authors searched PubMed, Google Scholar, and the ACM Digital Library, initially retrieving 1,263 records. After duplicate removal, title/abstract screening, and full‑text eligibility assessment, 29 studies met the inclusion criteria (original empirical work predicting LOS in elective spine surgery using data‑driven techniques, English language, peer‑reviewed).
The included studies were predominantly retrospective (≈90 %) with a minority of prospective designs. Sample sizes varied dramatically—from fewer than 100 patients to over 450,000—yielding an average cohort of about 29,700 patients. Data sources were heterogeneous: electronic health records (≈45 %), national registries such as ACS‑NSQIP, the Nationwide Readmissions Database, and CMS claims (≈20 %), as well as state‑level databases (e.g., SPARCS) and single‑center registries. The patient population was older (mean age ≈60 years) and the surgical procedures were mainly lumbar and cervical fusions, with multi‑level and minimally invasive techniques also represented.
Modeling approaches fell into two broad categories: traditional statistical methods (logistic regression, linear regression, Cox models) and machine learning (ML) techniques (random forest, gradient boosting variants such as XGBoost and LightGBM, support vector machines, multilayer perceptrons, K‑Nearest Neighbors, Naïve Bayes, and, in a few cases, deep neural networks). Several studies compared both types directly. Performance was most commonly reported using the area under the receiver‑operating characteristic curve (AUC), but accuracy, precision, recall, and F1‑score were also provided. Across the board, ML models outperformed statistical models, achieving AUCs ranging from 0.94 to 0.99, whereas statistical models typically reported AUCs between 0.70 and 0.85. Notably, K‑Nearest Neighbors and Naïve Bayes achieved the highest AUCs in a few individual datasets, while ensemble methods (random forest, XGBoost) consistently ranked among the top performers.
Key predictors that repeatedly emerged as important across studies included patient age, comorbidities (especially hypertension and diabetes), body mass index (BMI), type of surgery (fusion vs. decompression), operative time, and the number of spinal levels fused. Some investigations incorporated laboratory values (e.g., serum albumin, CRP) and found modest performance gains (≈1–2 % absolute AUC increase). A subset of papers employed natural language processing (NLP) to extract features from operative notes or discharge summaries; these hybrid models demonstrated 2–3 % point improvements over models using only structured data.
Risk‑of‑bias assessment using the Cochrane tool revealed concerns primarily around selection bias and reporting bias. Methodological heterogeneity was evident in how studies handled missing data, class imbalance (e.g., oversampling, SMOTE), and hyper‑parameter tuning, limiting reproducibility. External validation was scarce—only about 10 % of the studies validated their models on an independent cohort, with the majority relying on internal cross‑validation.
The discussion highlighted the promise of ML for LOS prediction but stressed several barriers to clinical translation: lack of standardized LOS definitions, limited external validation, insufficient model interpretability (e.g., SHAP or LIME analyses were rarely reported), and variable reporting of calibration. The authors argue that for real‑world deployment, models must be integrated into electronic health record workflows, provide actionable risk scores to clinicians, and be accompanied by cost‑effectiveness analyses.
In conclusion, the review confirms that machine‑learning models can predict postoperative LOS after elective spine surgery with high discrimination, outperforming traditional statistical approaches. However, to move from research to bedside, future work should prioritize multi‑institutional external validation, transparent reporting of model development (including feature importance and calibration), and the creation of user‑friendly decision support tools that align with hospital operational needs.
Comments & Academic Discussion
Loading comments...
Leave a Comment