DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of identical agents – cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance – invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes – from simple ellipsoids to complex morphologies – using only minimal spatial cues.
💡 Research Summary
DiffeoMorph presents a fully differentiable framework for learning distributed morphogenesis protocols that enable a population of identical agents to self‑organize into arbitrary three‑dimensional target shapes. The core of the system consists of two tightly coupled components: (1) an SE(3)‑equivariant graph neural network (GNN) that governs the agents’ dynamics, and (2) a novel shape‑matching loss based on 3D Zernike polynomial expansions.
In the morphogenesis stage each agent i carries a gene expression vector g_i, a polarity vector p_i, and a spatial position x_i. Pairwise interactions are encoded through edge features that include squared Euclidean distance d^2_{ij}, the angle between polarities θ_{ij}, and the gene vectors of the two agents. These features are processed by an attention mechanism to produce weighted messages, which are then transformed into forces that update positions, gene expressions, and polarities via a neural differential equation (NDE) solver. Because the message construction depends only on relative distances and angles, the whole dynamics are invariant under global SE(3) transformations, yet they retain sufficient geometric information to drive complex collective behavior.
The second component addresses the fundamental problem of comparing a simulated point cloud to a target shape without assuming a one‑to‑one correspondence, equal point counts, or pre‑aligned orientations. Both the generated and target point clouds are first normalized to lie inside the unit ball and then projected onto the orthonormal basis of 3D Zernike polynomials Z_{nℓm}. The resulting coefficients C = {c_{nℓm}} constitute a spectral signature of the shape. Rotation of a shape corresponds to applying Wigner‑D matrices D_ℓ(q) to the azimuthal index m, where q is a unit quaternion.
The loss is defined as a bilevel optimization problem. The inner loop searches for the quaternion q* that maximizes the spectral overlap M(C^{evol}, C^{target}; q) = ⟨C^{evol}, D(q)·C^{target}⟩. Because no closed‑form solution exists, q* is obtained by gradient descent on the unit‑quaternion manifold. Once the optimal alignment is found, the outer loss L_{sm} = ‖C^{target} – D(q*)·C^{evol}‖2^2 is computed. Crucially, gradients of L{sm} with respect to the GNN parameters w require ∂q*/∂w. Rather than unrolling the inner optimization, the authors employ implicit differentiation to solve the associated adjoint equation, yielding an efficient and memory‑light gradient computation.
This Zernike‑based loss simultaneously satisfies four desiderata that most existing metrics lack: (i) permutation invariance – the loss does not depend on point ordering, (ii) robustness to varying point counts – dense and sparse clouds are compared on equal footing, (iii) full SO(3) rotation invariance – the inner alignment removes any global orientation mismatch, and (iv) sensitivity to reflections – unlike power‑spectrum or bispectrum features that discard azimuthal sign information, the full Zernike spectrum preserves chirality, allowing the system to distinguish left‑right asymmetric structures.
Benchmark experiments compare the proposed loss against Chamfer distance, Earth Mover’s Distance, pairwise‑distance histograms, and Gromov‑Wasserstein metrics. Across a suite of shapes ranging from simple ellipsoids to intricate, chiral objects, the Zernike loss yields faster convergence, lower final error, and superior scalability because its computational cost depends only on the fixed spectral truncation order rather than the number of points.
The authors also isolate the loss from the dynamics by directly optimizing point clouds (and optionally per‑point weights) to match target shapes. The bilevel formulation successfully reorients and reshapes both dense (N≈3000) and sparse (N≈800) clouds without altering their initial global orientation, confirming true rotation invariance.
Integrating the loss with the SE(3)‑equivariant GNN, DiffeoMorph is trained end‑to‑end on morphogenesis tasks. Starting from an uninformative spherical distribution with only minimal spatial cues encoded in the gene vectors, the system learns to produce a wide variety of target morphologies. Notably, the framework can generate chiral structures (e.g., a crescent with two distinct weight regions) by preserving reflection‑sensitive information in the Zernike spectrum.
The paper further demonstrates that the shape‑matching loss can be plugged into reinforcement‑learning pipelines. When used as a reward signal, the same loss dramatically reduces sample complexity compared with traditional RL approaches that rely on sparse or handcrafted rewards.
In summary, DiffeoMorph introduces a powerful combination of (1) a rotation‑equivariant, attention‑based graph neural dynamics model, (2) a differentiable, rotation‑invariant, reflection‑sensitive Zernike spectral loss, and (3) an implicit‑differentiation‑based bilevel optimization scheme. This enables learning of distributed control policies that reliably drive large populations of agents to self‑assemble into complex 3D shapes using only minimal global information. The methodology bridges developmental biology, swarm robotics, programmable matter, and multi‑agent reinforcement learning, and opens avenues for applying differentiable shape metrics to other physics‑based simulation domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment