Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusion or flow model on the canonical slice, and finally recover the invariant distribution by sampling a random symmetry transform at generation time. Building on a formal quotient-space perspective, our work provides a comprehensive theory of canonical diffusion by proving: (i) the correctness, universality and superior expressivity of canonical generative models over invariant targets; (ii) canonicalization accelerates training by removing diffusion score complexity induced by group mixtures and reducing conditional variance in flow matching. We then show that aligned priors and optimal transport act complementarily with canonicalization and further improves training efficiency. We instantiate the framework for molecular graph generation under $S_n \times SE(3)$ symmetries. By leveraging geometric spectra-based canonicalization and mild positional encodings, canonical diffusion significantly outperforms equivariant baselines in 3D molecule generation tasks, with similar or even less computation. Moreover, with a novel architecture Canon, CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
Generative modeling is fundamentally grounded in the geometric structure of data. In domains such as computer vision and natural language processing (NLP), data exhibits specific symmetries-for instance, objects in images possess translation invariance, while semantic meaning in text relies on sequential order. Recent diffusion (Song & Ermon, 2019;Ho et al., 2020;Song et al., 2021) and flow-based (Liu et al., 2022;Lipman et al., 2022;Albergo et al., 2023) generative models have achieved great success across images (Dhariwal & Nichol, 2021;Rombach et al., 2022), videos (Ho et al., 2022;Singer et al., 2022), text (Li et al., 2022;Gong et al., 2022), and biomolecules (Watson et al., 2023;Corso et al., 2023). Crucially, however, modalities like images are not fully invariant to all transformations: an upside-down landscape or a reversed sentence typically loses its semantic validity. Consequently, state-of-the-art diffusion models in these fields explicitly break symmetries: they utilize fixed grid topologies or inject positional encodings (PEs) to anchor the generation process to a canonical orientation.
In contrast, molecular generation operates on a fundamentally different geometric space defined by the direct product of permutation and Euclidean symmetries, S N × SE(3). Unlike natural images, a rotated molecule or a re-indexed molecular graph represents the exact same physical entity. Molecular generative models typically enforce the constraints by building equivariant architectures and/or invariant priors (Hoogeboom et al., 2022;Xu et al., 2022;Tian et al., 2024), so that the learned vector field (score/velocity) respects the group action and the generated distribution is symmetry-consistent. While principled, this approach often incurs substantial architectural and computational overhead (e.g., equivariant layers, tensor algebra), and it can obscure an additional challenge: symmetry creates latent “gauge” ambiguity, so intermediate noisy states may correspond to multiple equivalent group-transformed configurations. This complex mixture-like nature results in “trajectory crossing” and conflicting gradients, as the model struggles to determine which valid orientation to generate from a symmetric noise vector, making the learned dynamics less straight.
Motivated by this observation, we argue that the efficiency of image generation stems precisely from its lack of total invariance, and we propose to transfer this advantage to molecular modeling via canonicalization. In this work, we challenge the necessity and effectiveness of equivariant generative models, and introduce a novel canonical diffusion framework, with theoretical analysis of how this symmetry breaking procedure accelerates diffusion training while ensuring validity over invariant or equivariant targets. By mapping the quotient space of molecules to a unique canonical section-explicitly breaking symmetry during training-we align the noise and data distributions. This effectively resolves the trajectory mismatch problem, transforming the complex equivariant generation task into a simplified transport problem along a canonical manifold.
Our starting point is the flow-matching identity that the irreducible regression error equals an expected conditional variance of endpoint displacements given an intermediate state. In Section 3, under a group-aligned lifting construction, we show this conditional variance decomposes into two nonnegative components: a within-slice term that reflects genuine transport difficulty on a canonical slice, and a symmetry-ambiguity term that arises purely from marginalizing over the latent group element. Canonicalization eliminates the symmetry-ambiguity term by fixing a gauge (working on a canonical representative per orbit), whereas optimaltransport-like couplings target the within-slice term by making the coupling more Monge-like and the trajectory closer to straight-line transport.
Based on this analysis, in Section 4 we propose a practical symmetry-aware pipeline for molecular generation that combines a canonicalizer with flexible non-equivariant backbones. We canonicalize molecules by jointly fixing permutation and rotation gauges (e.g., a Fiedler-vector rank for S N and a rank-anchored frame for SO(3)), then train diffusion/flow models directly in canonical space with a gap-free prior and canonical conditions. This yields a simple yet powerful alternative to fully equivariant generative modeling: it reduces symmetry-induced ambiguity, improves few-step accuracy with the straighter learned transport, and unlocks the expressive and computational advantages of generic Transformers/GNNs for large-scale molecular generation. Furthermore, we develop a novel architecture termed Canon (Figure 6) which explicitly incorporates and refines canonical information in an additional atom hidden state, enabling canonicality-aware denoising.
Experimentally, canonicalization is compatible with any (equivariant or non-equivariant) b
This content is AI-processed based on open access ArXiv data.