Title: UniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data
ArXiv ID: 2512.17100
Date: 2025-12-18
Authors: ** - Justin Li (보스턴 대학교) - Efe Sencan (보스턴 대학교) - Jasper Zheng Duan (Sandia National Laboratories) - Vitus J. Leung (Sandia National Laboratories) - Stephen Tsaur (보스턴 대학교·보스턴 메디컬 센터) - Ayse K. Coskun (보스턴 대학교) **
📝 Abstract
Machine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and adoption, especially in high-stakes domains such as healthcare. To address this challenge, we introduce UniCoMTE, a model-agnostic framework for generating counterfactual explanations for multivariate time series classifiers. The framework identifies temporal features that most heavily influence a model's prediction by modifying the input sample and assessing its impact on the model's prediction. UniCoMTE is compatible with a wide range of model architectures and operates directly on raw time series inputs. In this study, we evaluate UniCoMTE's explanations on a time series ECG classifier. We quantify explanation quality by comparing our explanations' comprehensibility to comprehensibility of established techniques (LIME and SHAP) and assessing their generalizability to similar samples. Furthermore, clinical utility is assessed through a questionnaire completed by medical experts who review counterfactual explanations presented alongside original ECG samples. Results show that our approach produces concise, stable, and human-aligned explanations that outperform existing methods in both clarity and applicability. By linking model predictions to meaningful signal patterns, the framework advances the interpretability of deep learning models for real-world time series applications.
💡 Deep Analysis
📄 Full Content
UniCoMTE: A Universal Counterfactual
Framework for Explaining Time-Series Classifiers
on ECG Data
Justin Li1, Efe Sencan1, Jasper Zheng Duan2, Vitus J. Leung2,
Stephen Tsaur1,3, Ayse K. Coskun1
1*Boston University, Boston,MA, USA.
2Sandia National Laboratories, Albuquerque, NM, USA.
3Boston Medical Center, Boston, MA, USA.
Contributing authors: justinli@bu.edu; esencan@bu.edu;
jzduan@sandia.gov; vjleung@sandia.gov; Stephen.Tsaur@bmc.org;
acoskun@bu.edu;
Abstract
Machine learning models, particularly deep neural networks, have demonstrated
strong performance in classifying complex time series data. However, their black-
box nature limits trust and adoption, especially in high-stakes domains such
as healthcare. To address this challenge, we introduce UniCoMTE, a model-
agnostic framework for generating counterfactual explanations for multivariate
time series classifiers. The framework identifies temporal features that most heav-
ily influence a model’s prediction by modifying the input sample and assessing its
impact on the model’s prediction. UniCoMTE is compatible with a wide range
of model architectures and operates directly on raw time series inputs. In this
study, we evaluate UniCoMTE’s explanations on a time series ECG classifier.
We quantify explanation quality by comparing our explanations’ comprehen-
sibility to comprehensibility of established techniques (LIME and SHAP) and
assessing their generalizability to similar samples. Furthermore, clinical utility is
assessed through a questionnaire completed by medical experts who review coun-
terfactual explanations presented alongside original ECG samples. Results show
that our approach produces concise, stable, and human-aligned explanations
that outperform existing methods in both clarity and applicability. By linking
model predictions to meaningful signal patterns, the framework advances the
interpretability of deep learning models for real-world time series applications.
1
arXiv:2512.17100v2 [cs.LG] 22 Dec 2025
Keywords: Explainable artificial intelligence (XAI), Counterfactual explanations,
ECG classification, Machine Learning
1 Introduction
Cardiovascular diseases (CVDs) remain the leading cause of death globally, accounting
for an estimated 17.9 million deaths each year [1]. Early detection and diagnosis are
critical for reducing morbidity and mortality, as timely interventions can significantly
improve outcomes [2]. Electrocardiograms (ECGs) serve as a primary non-invasive
diagnostic tool to assess cardiac function by recording the heart’s electrical activity
over time. Given the complexity and sheer volume of ECG recordings, researchers have
increasingly turned to deep learning methods as a means to automate ECG-based
diagnosis.
Recent studies have demonstrated that deep learning models in particular can
achieve high performance for ECG classification tasks and show potential for clinical
application in research settings. For example, a deep neural network trained on 12-lead
ECG samples can outperform cardiology residents in detecting multiple arrhythmias,
with F1-scores above 80% and specificity over 99%, across six ECG abnormalities [3].
Similarly, a Convolutional Neural Network [4] (CNN) model trained on 12-lead ECG
data can perform on par with cardiologists and exhibits greater accuracy than a
leading commercial ECG analysis system. Other models have achieved high perfor-
mances across a range of similar classification tasks including the classification of
myocardial infarction and atrial fibrillation [5–7]. Beyond performance comparisons
with clinical standards, several studies investigate the impact of architectural choices.
For instance, using one-dimensional time-series models appear more effective than
transforming ECG signals into image representations. One study finds that a gated
recurrent unit–based recurrent neural network [8] achieves around 80% sensitivity
and 81% specificity, outperforming both two-dimensional CNN approaches and multi-
modal fusion of one- and two-dimensional inputs. In terms of efficiency, a lightweight
11-layer hybrid convolutional neural network–long short-term memory (CNN–LSTM)
model achieves near-perfect arrhythmia classification (approximately 98% accuracy)
across eight rhythm classes [9], while remaining compact enough for deployment to
wearable monitors for continuous, real-time detection. Traditional feature-based ML
methods also show promise: one approach combines advanced ECG signal process-
ing—such as peak detection—with a ML classifier to achieve state-of-the-art heartbeat
classification performance on a large dataset of over 10,000 patients [10]. Notably, this
method maintains high accuracy across different patient cohorts, achieving around
80–90% accuracy even when evaluated on external hospital data, in contrast to sharp
performance drops observed in less generalizable models.
Although these models have achieved high performance across a range of disease
classification tasks in research setting