CTTVAE: Latent Space Structuring for Conditional Tabular Data Generation on Imbalanced Datasets
Generating synthetic tabular data under severe class imbalance is essential for domains where rare but high-impact events drive decision-making. However, most generative models either overlook minority groups or fail to produce samples that are useful for downstream learning. We introduce CTTVAE, a Conditional Transformer-based Tabular Variational Autoencoder equipped with two complementary mechanisms: (i) a class-aware triplet margin loss that restructures the latent space for sharper intra-class compactness and inter-class separation, and (ii) a training-by-sampling strategy that adaptively increases exposure to underrepresented groups. Together, these components form CTTVAE+TBS, a framework that consistently yields more representative and utility-aligned samples without destabilizing training. Across six real-world benchmarks, CTTVAE+TBS achieves the strongest downstream utility on minority classes, often surpassing models trained on the original imbalanced data while maintaining competitive fidelity and bridging the gap for privacy for interpolation-based sampling methods and deep generative methods. Ablation studies further confirm that both latent structuring and targeted sampling contribute to these gains. By explicitly prioritizing downstream performance in rare categories, CTTVAE+TBS provides a robust and interpretable solution for conditional tabular data generation, with direct applicability to industries such as healthcare, fraud detection, and predictive maintenance where even small gains in minority cases can be critical.
💡 Research Summary
The paper introduces CTTVAE, a Conditional Transformer‑based Tabular Variational Autoencoder designed specifically for severely imbalanced tabular datasets. The authors identify two fundamental shortcomings of existing generative approaches: (i) they either ignore minority classes or rely on naive conditioning that does not enforce class‑specific structure in the latent space, and (ii) they provide no mechanism to ensure that under‑represented groups are sufficiently seen during training. To address these gaps, CTTVAE combines (a) a class‑aware triplet margin loss that explicitly pulls together latent embeddings of the same class while pushing apart those of different classes, and (b) a Training‑by‑Sampling (TBS) strategy that constructs each training batch by repeatedly selecting a specific value of a discrete column, guaranteeing regular exposure to every categorical value, including rare ones.
The base model, TTVAE, already uses a transformer encoder to capture heterogeneous feature interactions and employs an MMD regularizer to align the aggregated posterior with a standard normal prior. CTTVAE augments this with the triplet loss L_triplet = max(‖z_a‑z_p‖²‑‖z_a‑z_n‖²+m, 0), where semi‑hard negative mining selects negatives that satisfy ‖z_a‑z_p‖² < ‖z_a‑z_n‖² < ‖z_a‑z_p‖² + m. The overall objective becomes:
L = reconstruction − E
Comments & Academic Discussion
Loading comments...
Leave a Comment