소프트 기하학적 편향을 활용한 객체 중심 세계 모델링
📝 Abstract
Equivariance is a powerful prior for learning physical dynamics, yet exact group equivariance can degrade performance if the symmetries are broken. We propose objectcentric world models built with geometric algebra neural networks, providing a soft geometric inductive bias. Our models are evaluated using simulated environments of 2d rigid body dynamics with static obstacles, where we train for next-step predictions autoregressively. For long-horizon rollouts we show that the soft inductive bias of our models results in better performance in terms of physical fidelity compared to non-equivariant baseline models. The approach complements recent soft-equivariance ideas and aligns with the view that simple, well-chosen priors can yield robust generalization. These results suggest that geometric algebra offers an effective middle ground between hand-crafted physics and unstructured deep nets, delivering sample-efficient dynamics models for multi-object scenes.
💡 Analysis
Equivariance is a powerful prior for learning physical dynamics, yet exact group equivariance can degrade performance if the symmetries are broken. We propose objectcentric world models built with geometric algebra neural networks, providing a soft geometric inductive bias. Our models are evaluated using simulated environments of 2d rigid body dynamics with static obstacles, where we train for next-step predictions autoregressively. For long-horizon rollouts we show that the soft inductive bias of our models results in better performance in terms of physical fidelity compared to non-equivariant baseline models. The approach complements recent soft-equivariance ideas and aligns with the view that simple, well-chosen priors can yield robust generalization. These results suggest that geometric algebra offers an effective middle ground between hand-crafted physics and unstructured deep nets, delivering sample-efficient dynamics models for multi-object scenes.
📄 Content
Learning world models that generalize far beyond their training distribution remains a central goal in machine learning, especially in applications for computer vision and robotics. Architectural priors-for example, weight sharing or equivariance to geometric transformations-can dramatically improve data efficiency and generalization [3]. At the same time, hard constraints can become liabilities when the underlying symmetries are only approximate, as is common in real environments with boundaries, contact events, anisotropic friction, damping, and actuation. This tension motivates soft inductive biases: parameterizations that nudge learning toward symmetry-respecting solutions while preserving headroom to fit structured violations of those symmetries [8,31]. . RMSE over 10 rollout frames against ground truth environment of 10 rigid body polygons colliding in a box with gravity, separated by free motion (top) and object-wall collisions (bottom). Our Clifford models with soft geometric inductive bias ({S, S-Ad}-CliffordTransformer) outperforms the equivariant models as well as baseline transformers for the sparse object-wall collisions, while staying on-par with the best equivariant model (E-CliffordTransformer) during free motion.
World models of objects moving in space are a natural substrate for such priors. By factoring scenes into entities and interactions, they promise compositional generalization and long-horizon stability [21,32]. Yet, learning dynamics (not just static properties) in multi-object scenes is brittle: contacts and constraints create sharp, state-dependent symmetry breaking, and naive, non-geometric architectures either overfit or require large data budgets. Conversely, strictly equivariant models can underfit when the data deviate from the assumed symmetries.
Pre-print. We argue that geometric algebra (Clifford algebra) [26] offers an effective middle ground for injecting geometric structure softly. By representing both states and operators as multivectors, projective geometric algebra (PGA) provides the model with a bias towards geometric transformations. The tensor nature of multivectors enables efficient implementation of standard modules such as linear maps, attention and nonlinearity [2,27]. Instead of enforcing exact E(2)-equivariance end-to-end, our models use this soft geometric inductive bias to enable effective learning in environments with broken equivariance [27,31]. The result is an object-centric dynamics model with soft geometric bias: a next-step predictor trained autoregressively on object states and capable of generating long-horizon, physically-plausible rollouts.
Practically, we embed each entity’s position, velocity, and orientation as multivectors in PGA(2) ∼ = Cl(2, 0, 1) and train Clifford neural networks with soft geometric bias to predict sequences of this embedded activity. We train our architectures as dynamics models (i.e., on next-step prediction) on procedurally generated 2D rigid-body scenarios built with the JAX-based JAX2D engine [22]. We show that the soft geometric bias improves both sample efficiency and long-horizon physical fidelity relative to (i) non-geometric MLP/Transformer baselines and (ii) a strictly equivariant variant of the Clifford transformer.
This perspective complements recent arguments that much of deep learning’s generalization can be understood through the lens of explicit priors and parameterization choices [31], and it operationalizes “soft equivariance” ideas developed in residual pathway priors and related regularization schemes [8,16]. It also connects to the emerging use of geometric algebra transformers across domains (e.g., E(3) and Lorentz symmetry) which show that Clifford representations can unify objects and transformations within scalable attention architectures [2,29].
Our main contributions are:
• Object-centric dynamics with soft geometric bias. We introduce a Clifford-based dynamics model that biases features and interactions toward E(2)-consistent behavior while allowing controlled symmetry violations, yielding robustness in scenes with contacts, boundaries, and heterogeneous materials. • Autoregressive world modeling.
We train a geometrically-informed transition model with a nexttoken prediction objective and evaluate long-horizon rollouts in multi-object scenes. The model is trained with a block-causal teacher forcing objective that has not been used in object-centric dynamics modeling before.
• Empirical gains on procedurally generated physics.
On JAX2D rigid-body environments, our models achieve lower rollout errors and higher sample efficiency than non-equivariant baselines, and outperform or match a strictly equivariant Clifford counterpart when symmetries are only approximate.
Soft inductive biases. Existing work has explored the tension between restricting the solution space of deep neural networks with inductive biases like equivariance, and ‘softly’ parameterizing these biases to accommodate c
This content is AI-processed based on ArXiv data.