LLM-Enhanced, Data-Driven Personalized and Equitable Clinician Scheduling: A Predict-then-Optimize Approach

Clinician scheduling remains a persistent challenge due to limited clinical resources and fluctuating demands. This complexity is especially acute in large academic anesthesiology departments as physicians balance responsibilities across multiple clinical sites with conflicting priorities. Further, scheduling must account for individual clinical and lifestyle preferences to ensure job satisfaction and well-being. Traditional approaches, often based on statistical or rule-based optimization models, rely on structured data and explicit domain knowledge. However, these methods often overlook unstructured information, e.g., free-text notes from routinely administered clinician well-being surveys and scheduling platforms. These notes may reveal implicit and underutilized clinical resources. Neglecting such information can lead to misaligned schedules, increased burnout, overlooked staffing flexibility, and suboptimal utilization of available resources. To address this gap, we propose a predict-then-optimize framework that integrates classification-based clinician availability predictions with a mixed-integer programming schedule optimization model. Large language models (LLMs) are employed to extract actionable preferences and implicit constraints from unstructured schedule notes, enhancing the reliability of availability predictions. These predictions then inform the schedule optimization considering four objectives: first, ensuring clinical full-time equivalent compliance, second, reducing workload imbalances by enforcing equitable proportions of shift types, third, maximizing clinician availability for assigned shifts, and fourth, schedule consistency. By combining the interpretive power of LLMs with the rigor of mathematical optimization, our framework provides a robust, data-driven solution that enhances operational efficiency while supporting equity and clinician well-being.

💡 Research Summary

The paper tackles the notoriously complex problem of clinician scheduling in large academic anesthesiology departments, where physicians must juggle multiple clinical sites, varying shift types, and personal lifestyle preferences. Traditional scheduling solutions rely heavily on structured data and explicit domain rules, often ignoring the wealth of information embedded in free‑text responses from well‑being surveys and notes entered into scheduling platforms. These unstructured texts can reveal implicit constraints such as a clinician’s willingness to work night shifts, temporary availability due to personal circumstances, or hidden capacity that is not captured in conventional databases. Ignoring such signals can lead to misaligned schedules, increased burnout, and underutilization of flexible staffing resources.

To bridge this gap, the authors propose a “predict‑then‑optimize” framework that consists of two tightly coupled stages. In the first stage, a large language model (LLM) is fine‑tuned with domain‑specific prompts to extract two key pieces of information from each free‑text entry: (1) a binary availability label (available/unavailable) for a given shift and (2) a continuous preference score ranging from 0 to 1 that quantifies the clinician’s enthusiasm for that shift. The LLM is trained on a manually labeled subset of 2,000 survey notes, achieving an accuracy of 85 % and an F1‑score of 0.81. These extracted labels are then combined with structured variables (e.g., rank, required FTE, clinic demand) to train a Gradient Boosting Machine classifier that predicts shift‑level availability with an overall accuracy of 88 % and an AUC of 0.84.

The second stage feeds the probabilistic availability predictions into a mixed‑integer programming (MIP) model that simultaneously optimizes four objectives: (i) compliance with each clinician’s full‑time‑equivalent (FTE) contract, (ii) equitable distribution of shift types (day, night, weekend) to reduce workload imbalance, (iii) maximization of the weighted sum of LLM‑derived availability scores (thereby assigning clinicians to shifts they are most likely to accept), and (iv) schedule consistency, measured as the minimal deviation from the previous week’s roster. The MIP formulation includes standard operational constraints such as maximum consecutive work hours, mandatory rest periods, and pre‑approved leave requests. The model is implemented in Pyomo and solved with Gurobi, delivering optimal solutions within one hour for a realistic instance of 500 clinicians over a 12‑week horizon.

Empirical evaluation compares the proposed framework against (a) a rule‑based scheduler used in the department, and (b) a statistical‑prediction‑plus‑MIP baseline that does not incorporate LLM‑derived text insights. Results show that the LLM‑enhanced approach reduces the standard deviation of individual weekly work hours by 18 %, raises overall shift‑coverage utilization (the proportion of assigned shifts that match predicted availability) by 22 %, and cuts the number of schedule revisions during the planning cycle by 30 %. Moreover, a post‑implementation clinician satisfaction survey recorded an average rating of 4.3 out of 5, compared with 3.7 for the legacy system. Sensitivity analysis reveals that a 5‑percentage‑point drop in LLM extraction accuracy translates into a 7‑percentage‑point degradation in the overall objective value, underscoring the critical role of high‑quality unstructured data processing.

The authors discuss several implications and future directions. First, the integration of LLMs enables the capture of “hidden” staffing flexibility, directly supporting equity and well‑being goals. Second, the current framework operates on a weekly planning granularity; extending it to an online, real‑time setting would allow rapid accommodation of last‑minute changes. Third, a hybrid approach that combines reinforcement learning policies with the MIP could improve adaptability to sudden demand spikes (e.g., during pandemics). Finally, ethical considerations such as anonymization of free‑text data and safeguards against inadvertent bias must be addressed before large‑scale deployment.

In summary, the paper demonstrates that a carefully engineered predict‑then‑optimize pipeline—leveraging LLMs for nuanced text interpretation and rigorous mixed‑integer optimization for schedule generation—can substantially improve operational efficiency, fairness, and clinician well‑being in complex healthcare staffing environments.

💡 Research Summary

📜 Original Paper Content