Behavioral-clinical phenotyping with type 2 diabetes self-monitoring data
Objective: To evaluate unsupervised clustering methods for identifying individual-level behavioral-clinical phenotypes that relate personal biomarkers and behavioral traits in type 2 diabetes (T2DM) self-monitoring data. Materials and Methods: We used hierarchical clustering (HC) to identify groups of meals with similar nutrition and glycemic impact for 6 individuals with T2DM who collected self-monitoring data. We evaluated clusters on: 1) correspondence to gold standards generated by certified diabetes educators (CDEs) for 3 participants; 2) face validity, rated by CDEs, and 3) impact on CDEs’ ability to identify patterns for another 3 participants. Results: Gold standard (GS) included 9 patterns across 3 participants. Of these, all 9 were re-discovered using HC: 4 GS patterns were consistent with patterns identified by HC (over 50% of meals in a cluster followed the pattern); another 5 were included as sub-groups in broader clusers. 50% (9/18) of clusters were rated over 3 on 5-point Likert scale for validity, significance, and being actionable. After reviewing clusters, CDEs identified patterns that were more consistent with data (70% reduction in contradictions between patterns and participants’ records). Discussion: Hierarchical clustering of blood glucose and macronutrient consumption appears suitable for discovering behavioral-clinical phenotypes in T2DM. Most clusters corresponded to gold standard and were rated positively by CDEs for face validity. Cluster visualizations helped CDEs identify more robust patterns in nutrition and glycemic impact, creating new possibilities for visual analytic solutions. Conclusion: Machine learning methods can use diabetes self-monitoring data to create personalized behavioral-clinical phenotypes, which may prove useful for delivering personalized medicine.
💡 Research Summary
This paper investigates whether unsupervised machine‑learning techniques can discover meaningful, individual‑level behavioral‑clinical phenotypes from type‑2 diabetes (T2DM) self‑monitoring data. Six adult participants with T2DM recorded every meal using a smartphone app that captured a photo, a textual description, and blood‑glucose (BG) measurements taken before and 1–3 hours after eating. After data collection, registered dietitians estimated the macronutrient composition (carbohydrate, protein, fat, fiber, and total calories) of each meal using USDA nutrient databases.
Two certified diabetes educators (CDEs) collaborated with the research team to define four candidate feature sets for phenotype construction: (1) percent calories from each macronutrient (no BG), (2) grams of each macronutrient (no BG), (3) BG change combined with percent‑calorie macronutrient ratios, and (4) BG change combined with macronutrient grams. Hierarchical clustering (HC) was applied separately to each participant’s meals using min‑max scaling, Euclidean distance, and mean linkage. The optimal number of clusters was chosen by maximizing the Calinski‑Harabasz (CH) index; clusters containing fewer than five meals were discarded.
To evaluate the automatically generated phenotypes, the authors created “gold‑standard” patterns based on expert visual inspection. For the first three participants (P1‑P3), CDEs independently identified patterns from raw data, then reached consensus, producing nine qualitative patterns. These patterns were later refined after the CDEs examined the HC results and parallel‑coordinate visualizations, yielding a second set of gold standards used to assess the impact of phenotyping on expert understanding. For participants P4‑P6, a single round of gold‑standard creation was performed with access to both raw data and visualizations. Each qualitative pattern was translated into a compound inequality (e.g., “Lunch, >45 % calories from fat, BG rise >50 mg/dL”) to enable quantitative comparison with the data.
Gold‑standard quality was assessed by three metrics: (a) true‑positive rate (fraction of meals satisfying the inequality), (b) false‑positive rate (meals meeting the nutritional part but violating the BG part), and (c) over‑fitting (inequalities that applied to only one or two meals). Feature‑set selection was performed by comparing the partition of meals induced by each HC run to the partition implied by the gold‑standard inequalities, using the Adjusted Rand Index (ARI). The feature set that combined BG change with percent‑calorie macronutrient ratios consistently yielded the highest ARI and was therefore used for downstream analysis.
When comparing HC‑derived clusters to the gold standards for P1‑P3, four of the nine expert patterns were fully rediscovered: more than 50 % of the meals belonging to a given pattern fell within a single HC cluster. The remaining five patterns appeared as sub‑clusters within broader HC groups, indicating partial rediscovery. Overall, 18 clusters were generated across all participants; nine of them (50 %) received a rating of 3 or higher on a 5‑point Likert scale for validity, significance, and actionability as judged by the CDEs.
Beyond quantitative matching, the visualizations of HC results (parallel‑coordinate plots of cluster means) were shown to the CDEs. After reviewing these visualizations, the educators identified patterns that were more consistent with the underlying data, reducing contradictions between expert‑derived patterns and actual participant records by 70 %. This demonstrates that unsupervised phenotyping can not only reproduce expert knowledge but also refine it, providing clearer insight into the relationship between meal composition and post‑prandial glycemic response.
The study highlights several important implications. First, integrating behavioral data (nutrient intake) with clinical biomarkers (BG change) yields hybrid phenotypes that capture the complex, individualized metabolic response to food—a key requirement for precision diabetes management. Second, translating qualitative expert knowledge into formal logical expressions creates a bridge between human expertise and algorithmic output, allowing systematic validation even when gold standards are inherently subjective. Third, hierarchical clustering’s flexibility—no need to pre‑specify cluster numbers, ability to detect non‑linear and aspherical structures—makes it well‑suited for the heterogeneous, high‑dimensional nature of self‑monitoring data. Fourth, the study underscores the value of visual analytics: concise visual summaries of clusters can dramatically improve clinicians’ ability to interpret large, noisy self‑monitoring datasets.
Limitations include the small sample size (six participants), reliance on self‑reported meal data (potential mis‑estimation of portion sizes), and a limited set of candidate features that omitted potentially informative variables such as meal timing, glycemic index of foods, or contextual factors (stress, physical activity). Future work should expand the cohort, incorporate richer contextual data, explore alternative clustering or representation learning methods, and ultimately test whether phenotype‑guided interventions improve glycemic outcomes in prospective clinical trials.
In conclusion, the authors demonstrate that hierarchical clustering of combined nutritional and glycemic data can automatically generate meaningful behavioral‑clinical phenotypes for individuals with T2DM. These phenotypes align well with expert‑derived gold standards, receive favorable face‑validity ratings, and, when visualized, help clinicians refine their understanding of patient‑specific diet‑glucose relationships. The methodology offers a promising pathway toward data‑driven, personalized diabetes self‑management and broader precision‑medicine applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment