ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation
Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the “black-box” nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model’s actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward “white-box” cardiac diagnostics.
💡 Research Summary
The paper introduces ECG‑IMN, an Interpretable Mesomorphic Neural Network designed for high‑resolution 12‑lead electrocardiogram (ECG) classification. Unlike conventional deep‑learning ECG classifiers that rely on post‑hoc explanation tools such as Grad‑CAM or SHAP, ECG‑IMN is built as a hypernetwork: a convolutional backbone encodes the raw ECG signal into a latent representation, and a transition decoder upsamples this representation to generate a sample‑specific weight tensor W that matches the original time‑lead dimensions. Prediction is performed by a strictly linear operation y = W·X + b, where X is the input ECG. Consequently, the element‑wise product W⊙X serves as an exact, high‑resolution attribution map, eliminating the need for approximations. Two formulations are offered: a multi‑class version that produces a distinct W_k for each diagnostic class, and a binary version that yields a single evidence map. Training combines standard cross‑entropy (or binary cross‑entropy) with an L1 regularization term on W to promote sparsity, ensuring that only clinically meaningful waveforms (e.g., ST‑elevation, T‑wave inversion) receive high weights.
Experiments on the PTB‑XL dataset—using four binary tasks (Normal vs. Myocardial Infarction, ST/T changes, Conduction Disturbance, Hypertrophy) and both 500 Hz and 100 Hz sampling rates—show that ECG‑IMN attains AUROC scores within 0.02 of a strong black‑box CNN baseline, demonstrating competitive predictive performance. An ablation without the transition decoder (“IMN Direct”) collapses to near‑random accuracy, confirming the decoder’s essential role in mapping latent features to high‑resolution weight maps.
For interpretability, the authors propose two visualization strategies: (1) scalar impact maps for binary models, where positive values support the disease class and negative values support the normal class; (2) class‑specific impact maps for multi‑class models, allowing independent assessment of evidence for each diagnosis. A sliding‑window aggregation filter (parameterized by window length L_win and stride S) smooths the raw attribution maps, suppressing high‑frequency noise while highlighting sustained morphological patterns. An interactive web application hosted on HuggingFace Spaces lets clinicians explore instance‑specific attributions, adjust aggregation parameters, and perform counterfactual ablations by masking selected leads or time segments, observing the resulting change in predicted probability.
Overall, ECG‑IMN bridges the gap between deep‑learning accuracy and clinical trustworthiness by providing intrinsic, faithful explanations for every prediction, offering a practical pathway toward transparent AI‑assisted cardiac diagnostics.
Comments & Academic Discussion
Loading comments...
Leave a Comment