ECGTwin: Personalized ECG Generation Using Controllable Diffusion Model
Personalized electrocardiogram (ECG) generation is to simulate a patient’s ECG digital twins tailored to specific conditions. It has the potential to transform traditional healthcare into a more accurate individualized paradigm, while preserving the key benefits of conventional population-level ECG synthesis. However, this promising task presents two fundamental challenges: extracting individual features without ground truth and injecting various types of conditions without confusing generative model. In this paper, we present ECGTwin, a two-stage framework designed to address these challenges. In the first stage, an Individual Base Extractor trained via contrastive learning robustly captures personal features from a reference ECG. In the second stage, the extracted individual features, along with a target cardiac condition, are integrated into the diffusion-based generation process through our novel AdaX Condition Injector, which injects these signals via two dedicated and specialized pathways. Both qualitative and quantitative experiments have demonstrated that our model can not only generate ECG signals of high fidelity and diversity by offering a fine-grained generation controllability, but also preserving individual-specific features. Furthermore, ECGTwin shows the potential to enhance ECG auto-diagnosis in downstream application, confirming the possibility of precise personalized healthcare solutions.
💡 Research Summary
ECGTwin tackles the emerging problem of personalized electrocardiogram (ECG) generation, which aims to synthesize a patient’s “digital twin” ECG under arbitrary cardiac conditions. The authors propose a two‑stage framework. In the first stage, an Individual Base Extractor (IBE) learns a compact patient‑specific vector (the base vector) from a reference ECG and its associated cardiac condition using self‑supervised contrastive learning. Positive pairs are constructed from two ECG recordings of the same patient, while all other pairings in a batch serve as negatives. A CLIP‑style loss maximizes cosine similarity between base vectors of the same patient and minimizes it across different patients. The IBE is built on a Transformer encoder that receives the reference cardiac condition via cross‑attention; the ECG is first encoded into a latent space by a VAE, and the latent representation is fed to the Transformer, ensuring alignment with the diffusion model’s latent space.
The second stage employs a latent diffusion model (DDPM) to generate ECGs conditioned on the extracted base vector and a target cardiac condition (c_tar). Conditioning is handled by the novel AdaX Condition Injector, which splits the heterogeneous inputs into two dedicated pathways. The Cardiac Condition Pathway tokenizes each clinical report using a pretrained text encoder (Nomic embed‑text‑v1.5), concatenates normalized age, sex, and heart‑rate features to each token, and feeds the resulting sequence into a cross‑attention module after linear projection and positional encoding. This design lets the model attend selectively to the most relevant report(s) and incorporate demographic cues. The Base & Time Pathway combines the sinusoidal time embedding with a linearly projected base vector, passes their sum through an MLP to produce adaptive normalization parameters (α, β, γ), and applies them to the latent feature via adaptive LayerNorm and scaling. Consequently, global information (patient identity and diffusion timestep) is injected holistically, while detailed morphological cues are injected via cross‑attention.
AdaX also supports prompt‑to‑prompt editing, enabling post‑generation fine‑tuning of specific report tokens without retraining the diffusion model.
The authors train ECGTwin on a curated ECG pair dataset derived from MIMIC‑IV‑ECG, roughly twenty times larger than prior personalized ECG datasets. Quantitative metrics (FID, DTW, patient‑consistency scores) and qualitative visual inspection demonstrate that ECGTwin produces ECGs with higher fidelity, greater diversity, and stronger preservation of patient‑specific waveforms (P‑wave, QRS complex, T‑wave) compared to a VQ‑VAE baseline. Moreover, when the synthetic ECGs are used to augment a patient‑specific auto‑diagnosis model, diagnostic accuracy improves by an average of 7 percentage points, confirming the practical utility of the generated digital twins.
In summary, ECGTwin introduces (1) a contrastive‑learning‑based Individual Base Extractor that isolates patient‑specific traits without ground‑truth supervision, and (2) an AdaX Condition Injector that cleanly separates and adaptively integrates multiple conditioning modalities within a diffusion framework. This combination yields a controllable, high‑quality personalized ECG generator with clear benefits for downstream clinical tasks, data augmentation for rare cardiac conditions, and real‑time patient monitoring.
Comments & Academic Discussion
Loading comments...
Leave a Comment