Population physiology: leveraging population scale (EHR) data to understand human endocrine dynamics

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Studying physiology over a broad population for long periods of time is difficult primarily because collecting human physiologic data is intrusive, dangerous, and expensive. Electronic health record (EHR) data promise to support the development and testing of mechanistic physiologic models on diverse population, but limitations in the data have thus far thwarted such use. For instance, using uncontrolled population-scale EHR data to verify the outcome of time dependent behavior of mechanistic, constructive models can be difficult because: (i) aggregation of the population can obscure or generate a signal, (ii) there is often no control population, and (iii) diversity in how the population is measured can make the data difficult to fit into conventional analysis techniques. This paper shows that it is possible to use EHR data to test a physiological model for a population and over long time scales. Specifically, a methodology is developed and demonstrated for testing a mechanistic, time-dependent, physiological model of serum glucose dynamics with uncontrolled, population-scale, physiological patient data extracted from an EHR repository. It is shown that there is no observable daily variation the normalized mean glucose for any EHR subpopulations. In contrast, a derived value, daily variation in nonlinear correlation quantified by the time-delayed mutual information (TDMI), did reveal the intuitively expected diurnal variation in glucose levels amongst a wild population of humans. Moreover, in a population of intravenously fed patients, there was no observable TDMI-based diurnal signal. These TDMI-based signals, via a glucose insulin model, were then connected with human feeding patterns. In particular, a constructive physiological model was shown to correctly predict the difference between the general uncontrolled population and a subpopulation whose feeding was controlled.

💡 Research Summary

This study addresses the challenge of investigating human physiological dynamics at the population level over long time horizons, a task traditionally hampered by the invasiveness, cost, and logistical difficulty of collecting continuous physiological data from large cohorts. The authors propose that electronic health records (EHRs), which contain decades‑long, heterogeneous clinical measurements for hundreds of thousands of patients, can serve as a “natural experiment” for testing mechanistic physiological models, provided that appropriate analytical tools are employed.

The paper focuses on glucose‑insulin dynamics as a test case because the underlying physiology is well understood, and simple mechanistic models exist. The authors first assemble two large datasets from Columbia University Medical Center: (1) a general cohort comprising all in‑patients and out‑patients over a 20‑year period (≈800,000 individuals) with sporadic glucose measurements, and (2) a sub‑cohort of intensive‑care patients receiving continuous intravenous nutrition (NICU), representing a population with tightly controlled glucose input.

A standard six‑equation glucose‑insulin model (based on Sturis et al.) is adopted. The model includes plasma insulin (Ip), remote insulin (Ii), glucose (G), and three sequential delay filters (h1‑h3) that capture the lag between insulin secretion and glucose utilization. All parameters are fixed to values reported in the literature; the only variable component is the exogenous glucose delivery rate (IG), which the authors encode using four distinct feeding regimens: (a) continuous constant infusion (simulating NICU patients), (b) regular meals at 8 h, 12 h, and 18 h, (c) meals with random ±1 h jitter each day, and (d) completely random meal times each day.

Traditional statistical summaries (mean glucose, variance) fail to reveal any diurnal (24‑hour) pattern in either cohort. To overcome this limitation, the authors employ time‑delayed mutual information (TDMI), an information‑theoretic metric that quantifies nonlinear dependence between pairs of measurements separated by a specified lag Δt. TDMI is robust to irregular sampling and captures memory effects that linear autocorrelation cannot. For each patient’s glucose time series, TDMI is computed across a range of lags, and the ensemble average is examined for peaks at 24 h.

Results show that the general population exhibits a clear TDMI peak at a 24‑hour lag, indicating a persistent diurnal structure in the nonlinear correlation of glucose values, despite the absence of a corresponding signal in the mean. In contrast, the NICU cohort’s TDMI curve is essentially flat, reflecting the lack of a regular feeding schedule. Simulations of the mechanistic model with the four feeding inputs reproduce these empirical findings: regular meals generate a 24‑hour TDMI peak, while continuous infusion eliminates it; jittered or random meals attenuate the peak proportionally to the degree of timing variability. This demonstrates that the observed TDMI diurnal signal is driven primarily by the regularity of external glucose inputs (i.e., meals) rather than intrinsic variability in insulin sensitivity or other physiological parameters.

The study highlights several important implications. First, EHR data, when combined with nonlinear information metrics, can uncover population‑level physiological rhythms that are invisible to conventional analyses. Second, a relatively simple mechanistic model suffices to explain the observed TDMI patterns, suggesting that the dominant driver of daily glucose dynamics is the timing of nutrient intake. Third, the methodology offers a way to “circumvent” inter‑patient variability by aggregating large numbers of irregularly sampled records, provided that the analysis respects the underlying stochastic structure.

Limitations include the opportunistic nature of glucose measurements (taken for clinical reasons, not systematic sampling), the fixed‑parameter approach that ignores individual differences in insulin clearance, medication effects, or comorbidities, and the fact that TDMI, while sensitive to dependence, does not establish causality. Future work could integrate additional EHR variables (e.g., medications, physical activity, body mass index) and employ Bayesian or machine‑learning techniques to infer patient‑specific model parameters, thereby enabling personalized predictions of metabolic trajectories. Extending the approach to other endocrine axes (cortisol, thyroid hormones) could provide a broader view of circadian regulation in real‑world clinical populations.

In summary, the paper demonstrates that population‑scale EHR data can be harnessed to test and validate mechanistic physiological models. By leveraging TDMI to capture hidden diurnal correlations in glucose measurements, the authors show that feeding patterns dominate the observed daily rhythm, and that a simple glucose‑insulin model can accurately predict differences between uncontrolled and controlled feeding groups. This work opens a pathway toward large‑scale, data‑driven physiology that can inform both precision medicine and public‑health strategies.

Population physiology: leveraging population scale (EHR) data to understand human endocrine dynamics

💡 Research Summary

Comments & Academic Discussion

Leave a Comment