Predicting first-episode homelessness among US Veterans using longitudinal EHR data: time-varying models and social risk factors

Predicting first-episode homelessness among US Veterans using longitudinal EHR data: time-varying models and social risk factors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Homelessness among US veterans remains a critical public health challenge, yet risk prediction offers a pathway for proactive intervention. In this retrospective prognostic study, we analyzed electronic health record (EHR) data from 4,276,403 Veterans Affairs patients during a 2016 observation period to predict first-episode homelessness occurring 3-12 months later in 2017 (prevalence: 0.32-1.19%). We constructed static and time-varying EHR representations, utilizing clinician-informed logic to model the persistence of clinical conditions and social risks over time. We then compared the performance of classical machine learning, transformer-based masked language models, and fine-tuned large language models (LLMs). We demonstrate that incorporating social and behavioral factors into longitudinal models improved precision-recall area under the curve (PR-AUC) by 15-30%. In the top 1% risk tier, models yielded positive predictive values ranging from 3.93-4.72% at 3 months, 7.39-8.30% at 6 months, 9.84-11.41% at 9 months, and 11.65-13.80% at 12 months across model architectures. Large language models underperformed encoder-based models on discrimination but showed smaller performance disparities across racial groups. These results demonstrate that longitudinal, socially informed EHR modeling concentrates homelessness risk into actionable strata, enabling targeted and data-informed prevention strategies for at-risk veterans.


💡 Research Summary

This study tackles the pressing public‑health problem of homelessness among United States veterans by developing and evaluating predictive models that forecast a veteran’s first episode of homelessness within 3, 6, 9, or 12 months after a one‑year observation period. Using the Veterans Affairs (VA) electronic health record (EHR) system, the authors assembled a cohort of 4,276,403 patients who had any VA encounter in 2016. During the subsequent 12‑month outcome window, 13,728 (0.32%) experienced homelessness within three months, rising to 51,002 (1.19%) within twelve months.

A central methodological contribution is the construction of time‑varying patient representations. Raw visit‑level data were aggregated into half‑year intervals, and a clinician‑informed “condition‑persistence” framework was applied to each clinical, social, and behavioral variable. This framework distinguishes chronic, ever‑history, recurrent, and episodic patterns, allowing a diagnosis or risk factor to remain “active” for a clinically plausible duration after its last recorded occurrence. The resulting longitudinal feature matrix preserves temporal dynamics that static snapshots discard. The authors also built comparable static representations for baseline comparison.

Three families of predictive algorithms were trained on both static and time‑varying inputs: (1) classical machine‑learning models (Elastic Net logistic regression, Random Forest, XGBoost); (2) masked language models (ModernBERT‑T and BioClinical‑ModernBERT‑T) fine‑tuned on the structured prompts derived from the longitudinal data; and (3) large language models (LLM) – Llama‑3.1‑8B and OpenBioLLM‑8B – also fine‑tuned using natural‑language prompts that encode the same information. Model performance was assessed primarily with precision‑recall area under the curve (PR‑AUC), given the extreme class imbalance, and secondarily with ROC‑AUC. Additional metrics included sensitivity, specificity, positive predictive value (PPV), and observed‑to‑expected (O/E) ratios within the top 1 % and 5 % risk tiers.

Across all horizons, time‑varying representations consistently outperformed static ones, improving PR‑AUC by roughly 15–30 %. The best‑performing models were ModernBERT‑T (time‑varying) for the 3‑month horizon (PR‑AUC = 2.39 %, 95 % CI 1.80‑3.34) and the 9‑month horizon (PR‑AUC = 5.27 %, CI 4.68‑6.01), and XGBoost (time‑varying) for the 6‑month (PR‑AUC = 4.13 %) and 12‑month horizons (PR‑AUC = 6.72 %). In contrast, LLMs achieved ROC‑AUC comparable to the encoder‑based models but lagged in PR‑AUC, reflecting the difficulty of capturing rare events with generative architectures. Notably, LLMs exhibited the smallest performance gaps across racial sub‑groups, suggesting a potential fairness advantage.

Risk concentration analysis revealed that screening only the top 1 % of predicted risk captured 9.8‑14.7 % of all future homelessness cases, with specificity remaining above 99 %. Expanding to the top 5 % increased capture to 26‑33 % while maintaining specificity around 95 %. Positive predictive values rose with longer horizons and narrower tiers: PPV in the top 1 % tier ranged from 3.93 % at three months to 13.80 % at twelve months; the highest PPV observed (18.71 %) occurred when selecting the top 0.5 % at twelve months, corresponding to an O/E ratio of 14.72. These figures indicate that, within the VA data, roughly one in five veterans flagged in the highest‑risk stratum actually became homeless within a year, providing a highly actionable target for preventive outreach.

The authors acknowledge several limitations. First, homelessness identification relies on VA administrative codes, which may miss veterans who become homeless outside the VA system, limiting external validity. Second, social‑behavioral risk factors are often under‑recorded, potentially attenuating model performance. Third, the observational design precludes causal inference; prospective trials would be needed to confirm that risk‑based interventions reduce homelessness incidence. Finally, model interpretability, especially for LLMs, remains a challenge for clinical deployment.

In conclusion, this work demonstrates that integrating clinically informed, time‑varying EHR features with social and behavioral risk factors substantially improves the prediction of first‑episode homelessness among veterans. The longitudinal, socially enriched models concentrate risk into a small, high‑yield segment of the population, enabling health systems to allocate limited preventive resources efficiently. Moreover, the comparative analysis of traditional machine‑learning, transformer‑based masked language models, and large language models offers valuable insights into the trade‑offs between discrimination, fairness, and scalability for real‑world health‑policy applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment