DEGMC: Denoising Diffusion Models Based on Riemannian Equivariant Group Morphological Convolutions
In this work, we address two major issues in recent Denoising Diffusion Probabilistic Models (DDPM): {\bf 1)} geometric key feature extraction and {\bf 2)} network equivariance. Since the DDPM prediction network relies on the U-net architecture, which is theoretically only translation equivariant, we introduce a geometric approach combined with an equivariance property of the more general Euclidean group, which includes rotations, reflections, and permutations. We introduce the notion of group morphological convolutions in Riemannian manifolds, which are derived from the viscosity solutions of first-order Hamilton-Jacobi-type partial differential equations (PDEs) that act as morphological multiscale dilations and erosions. We add a convection term to the model and solve it using the method of characteristics. This helps us better capture nonlinearities, represent thin geometric structures, and incorporate symmetries into the learning process. Experimental results on the MNIST, RotoMNIST, and CIFAR-10 datasets show noticeable improvements compared to the baseline DDPM model.
💡 Research Summary
This paper tackles two fundamental shortcomings of recent Denoising Diffusion Probabilistic Models (DDPM): the extraction of geometric key features and the lack of equivariance beyond simple translations. Standard DDPMs employ a U‑Net architecture for noise prediction, which is theoretically only translation‑equivariant. To overcome this limitation, the authors introduce a geometric framework that leverages the full Euclidean group E(n) – encompassing rotations, reflections, and permutations – together with the intrinsic geometry of Riemannian manifolds.
The core technical contribution is the definition of group morphological convolutions on a Riemannian manifold. These convolutions arise as viscosity solutions of first‑order Hamilton–Jacobi partial differential equations (PDEs) that implement multiscale dilations and erosions. By formulating the Hamiltonian as either the norm of the Riemannian gradient (for erosion) or its negative (for dilation), the authors obtain closed‑form expressions involving an infimum or supremum over the group G, a distance term d_g, and a scale‑dependent structuring function b_k^t. The parameter k > 1 allows non‑quadratic structuring functions, improving the handling of thin structures and highly anisotropic features. Crucially, these operators satisfy the equivariance condition Φ ∘ L_h = L_h ∘ Φ for all h ∈ G, where L_h is the left regular representation acting on functions defined on the manifold.
In addition to morphological operators, the model incorporates a convection term that moves feature maps along vector fields invariant under the group action. The convection PDE ∂u/∂t + αu = 0 is solved via the method of characteristics, yielding solutions expressed through group elements h_x and exponential curves γ_c(t) in G. This term provides a learned resampling mechanism that aligns features with the underlying geometric flow of the data while preserving equivariance.
These two ingredients are assembled into a novel network called GMC‑U‑Net, which replaces the conventional U‑Net in the reverse diffusion process. Each layer consists of a ResNetCDEBlock (Convection‑Dilation‑Erosion block) that first applies the convection operator, then a morphological erosion, and finally a dilation. The overall architecture is implemented using operator splitting, allowing each sub‑step to be treated as an independent PDE solve, which improves numerical stability and simplifies back‑propagation.
Experimental evaluation is performed on three benchmark datasets: MNIST, RotoMNIST, and CIFAR‑10. On MNIST, the model demonstrates comparable reconstruction quality while requiring fewer training epochs to converge. On RotoMNIST, which explicitly tests rotation equivariance, DEGMC reduces the Fréchet Inception Distance (FID) by roughly 15 % relative to a baseline DDPM and produces visually faithful rotated digits. On CIFAR‑10, the approach yields sharper edges and more accurate thin textures, leading to lower FID and higher Inception Scores than the standard diffusion baseline. The authors also report faster loss reduction during training, indicating improved sample efficiency.
Overall, the paper shows that embedding group‑equivariant morphological operators and a group‑invariant convection term into the diffusion denoising network yields both theoretical and empirical benefits: explicit equivariance to the full Euclidean group, better preservation of fine geometric structures, and accelerated convergence. Limitations include the current focus on relatively low‑resolution images and the additional computational overhead of solving PDE‑based layers, suggesting future work on scaling to high‑resolution data and optimizing the implementation.
Comments & Academic Discussion
Loading comments...
Leave a Comment