Modern clinical decision support systems can concurrently serve multiple, independent medical imaging institutions, but their predictive performance may degrade across sites due to variations in patient populations, imaging hardware, and acquisition protocols. Continuous surveillance of predictive model outputs offers a safe and reliable approach for identifying such distributional shifts without ground truth labels. However, most existing methods rely on centralized monitoring of aggregated predictions, overlooking site-specific drift dynamics. We propose an agent-based framework for detecting drift and assessing its severity in multisite clinical AI systems. To evaluate its effectiveness, we simulate a multi-center environment for output-based drift detection, assigning each site a drift monitoring agent that performs batch-wise comparisons of model outputs against a reference distribution. We analyse several multi-center monitoring schemes, that differ in how the reference is obtained (site-specific, global, production-only and adaptive), alongside a centralized baseline. Results on real-world breast cancer imaging data using a pathological complete response prediction model shows that all multi-center schemes outperform centralized monitoring, with F1-score improvements up to 10.3% in drift detection. In the absence of site-specific references, the adaptive scheme performs best, with F1-scores of 74.3% for drift detection and 83.7% for drift severity classification. These findings suggest that adaptive, site-aware agent-based drift monitoring can enhance reliability of multisite clinical decision support systems.
Translating machine learning predictive models into fully operational clinical decision support systems (CDSS) remains a technically demanding, and resource-intensive endeavor [1]. Even after rigorous validation on independent and external cohorts, these models may still experience a subtle yet impactful drop in performance-referred to as model drift-when deployed in real-world hospitals and medical imaging centers.
A major contributor to this phenomenon is the inherently dynamic and heterogeneous nature of clinical environments, where frequent and multifactorial changes in imaging hardware, acquisition protocols and patient demographics are commonly observed. These changes may impact on the performance of the predictive models, particularly when such variations were not well represented during the models’ development [2].
A common strategy to monitor model performance degradation is to track output drifts, i.e., changes in the distribution of model outputs over time. Such changes may indicate an underlying dataset shift [3]-a difference in the joint distribution P (X, Y) between training and deployment-but they can also result from other factors, such as model updates, data preprocessing changes, or system errors.
Recent advances in data de-identification, secure communication protocols, and privacy-preserving cloud infrastructures have enabled multisite CDSS-platforms to serve decision support to multiple and independent healthcare institutions [4]. However, drift detection systems typically rely on centralized monitoring schemes that aggregate data globally [5]. This centralization may obscure localized drifts that compromise the model’s reliability in specific institutions. Software agents have recently sparked renewed attention in AI [6], leveraging distributed decision-making and modular task execution. Only a few agent-based systems have been applied to drift monitoring. One example is [7], which employs an ensemble of agents to detect drift in healthcare signals. Nevertheless, continuous, unsupervised multi-site agent-based drift monitoring in medical imaging remains largely unexplored.
In this study, we propose an agent-based drift monitoring framework suitable for multisite CDSS with dedicated drift monitoring agents to each clinical site. This approach aims to enable adaptable and localized detection of drift while also supporting global aggregation of drift signals across all agents. We validated this framework through an in-silico simulation using real-world breast cancer imaging data [13], for output drift detection in predicting pathological complete response (pCR) to neoadjuvant chemotherapy (NAC). Our hypothesis is that the proposed agent-based monitoring framework for multisite CDSS enables accurate, fine-grained, site-specific drift detection and facilitates a unified, time-aware assessment of drift severity across institutions.
This section presents the agent-based drift monitoring framework for multisite CDSS, outlining the conceptual design of the monitoring agents, and describing an in-silico simulation for output drift detection.
A drift monitoring (DRM) agent is a dedicated monitoring process responsible for overseeing the integrity of a predictive model being used by an individual clinical center. Conceptually, it is a lightweight, continuously operating software component, that ingests model input and/or output data (in real time or batch regime) and applies a drift detection method to assess the presence of change over time. A modular design is proposed for the DRM agents, in which the drift detection logic is decoupled from the core monitoring process. This separation of concerns enables flexibility in configuring the monitoring system. Inspired by the MAPE-K (Monitor, Analyze, Plan, and Execute) control loop of self-managing systems [8], we propose four distinct architectural steps (i.e., initialization, perception, reason and act) for a DRM agent (Figure 1). Initialization. The initialization (or registration) of a DRM agent involves specifying the predictive model, the clinical center, and the selected drift detection method with its configuration parameters. Additionally, each agent is assigned with a monitoring regime (i.e., on-line, batch) together with a window size and an evaluation frequency, enabling personalization to the data throughput characteristics of the clinical site.
Perception. Depending on the monitoring regime, the agent accumulates incoming observations until the predefined monitoring window is reached (batch mode) or processes each observation sequentially upon arrival (online mode).
Reasoning. This function defines the core action of the agent, that is the detecting drift within the current monitoring window. This task is determined by the drift method with which the agent was initialized. The expected output from this step encompasses a drift score (usually a p-value), together with a binary drift value (obtained via thresholding) that signals w
This content is AI-processed based on open access ArXiv data.