From Wearables to Warnings: Predicting Pain Spikes in Patients with Opioid Use Disorder

Chronic pain (CP) and opioid use disorder (OUD) are common and interrelated chronic medical conditions. Currently, there is a paucity of evidence-based integrated treatments for CP and OUD among individuals receiving medication for opioid use disorder (MOUD). Wearable devices have the potential to monitor complex patient information and inform treatment development for persons with OUD and CP, including pain variability (e.g., exacerbations of pain or pain spikes) and clinical correlates (e.g., perceived stress). However, the application of large language models (LLMs) with wearable data for understanding pain spikes, remains unexplored. Consequently, the aim of this pilot study was to examine the clinical correlates of pain spikes using a range of AI approaches. We found that machine learning models achieved relatively high accuracy (>0.7) in predicting pain spikes, while LLMs were limited in providing insights on pain spikes. Real-time monitoring through wearable devices, combined with advanced AI models, could facilitate early detection of pain spikes and support personalized interventions that may help mitigate the risk of opioid relapse, improve adherence to MOUD, and enhance the integration of CP and OUD care. Given overall limited LLM performance, these findings highlight the need to develop LLMs which can provide actionable insights in the OUD/CP context.

💡 Research Summary

This pilot study investigates how wearable sensor data and artificial‑intelligence (AI) techniques can be combined to predict “pain spikes” – abrupt exacerbations of chronic pain – in patients who are simultaneously receiving medication‑assisted treatment (MOUD) for opioid use disorder (OUD). The authors note that chronic pain (CP) and OUD frequently co‑occur, create a vicious cycle, and lack integrated, evidence‑based treatment strategies. Real‑time monitoring of physiological signals (heart rate, skin conductance, accelerometry, temperature, sleep stages) and self‑reported measures (pain intensity, perceived stress, opioid use) offers a potential pathway to detect impending pain spikes and intervene before relapse or non‑adherence occurs.

Study design and data collection
Forty‑five adult participants (21‑58 years, balanced gender) were enrolled for a three‑month observation period. Each wore a multi‑modal sensor that recorded data at five‑minute intervals. Participants completed daily electronic diaries rating pain on a 0‑10 scale, stress level, and opioid consumption. A “pain spike” was defined as a self‑reported pain rating of ≥ 7 that increased by at least two points compared with the previous day. The 30‑minute window preceding each spike was labeled as a positive instance; all other windows were labeled negative. Missing sensor values were linearly interpolated, and all variables were z‑score normalized.

Model development
The authors trained several predictive models: logistic regression, random forest, gradient‑boosting machine (GBM), and two deep‑learning time‑series architectures (LSTM and Temporal Convolutional Network). Data were split 70 % training, 15 % validation, 15 % test, with hyper‑parameter tuning performed via cross‑validation. Performance metrics included accuracy, ROC‑AUC, F1‑score, sensitivity, and specificity.

Results – traditional machine learning
Tree‑based models outperformed the others. Random forest and GBM achieved ROC‑AUC values of 0.82 and test‑set accuracies around 0.74, while the LSTM reached an AUC of 0.78 and accuracy of 0.71. SHAP (Shapley Additive Explanations) analysis identified the most influential features as sudden increases in heart‑rate variability, accelerometer‑derived activity bursts, elevated stress scores, and spikes in skin conductance. These variables align with known sympathetic nervous system activation that precedes pain exacerbation.

Results – large language model (LLM)
For comparison, the authors converted the same sensor streams into textual summaries (e.g., “HR rose from 78 to 95 bpm; steps increased from 200 to 500”) and fed them to a GPT‑4‑style LLM via carefully crafted prompts. The LLM’s predictive accuracy hovered around 0.58, and its explanations were generic, referencing standard pain‑management guidelines rather than patient‑specific physiological patterns. The authors attribute this poor performance to (1) sub‑optimal prompt engineering, (2) lack of domain‑specific fine‑tuning, and (3) the LLM’s limited ability to ingest raw time‑series data without a multimodal architecture.

Clinical implications
The study demonstrates that wearable‑derived physiological markers, when processed with interpretable tree‑based algorithms, can flag impending pain spikes with sufficient reliability to support real‑time clinical decision‑making. Early alerts could trigger non‑pharmacologic interventions (mindfulness, breathing exercises), prompt medication adjustments, or schedule a brief tele‑visit, thereby reducing the likelihood of opioid craving, relapse, and MOUD non‑adherence.

Limitations and future directions
Key limitations include the modest sample size, short observation window, and reliance on self‑reported pain spikes, which may introduce subjective bias. The authors also acknowledge that the LLM’s underperformance highlights a broader gap: current large language models are not yet equipped to directly process multimodal clinical time‑series data. Future work should (a) expand to multi‑site, longer‑duration cohorts, (b) incorporate objective relapse markers (e.g., urine toxicology), (c) develop multimodal transformer architectures that fuse sensor streams with textual notes, and (d) fine‑tune LLMs on domain‑specific corpora to improve interpretability and actionable insight generation.

Conclusion
Wearable sensors combined with conventional machine‑learning models provide a viable, interpretable approach for predicting pain spikes in patients battling both chronic pain and opioid use disorder. While large language models hold promise for generating narrative clinical insights, they currently fall short in this specific multimodal prediction task. Advancing LLM capabilities through domain‑specific fine‑tuning and multimodal integration will be essential to realize fully automated, personalized warning systems that can mitigate relapse risk, improve MOUD adherence, and ultimately foster integrated care for this high‑risk population.

💡 Research Summary

📜 Original Paper Content