On the Spatiotemporal Dynamics of Generalization in Neural Networks
Why do neural networks fail to generalize addition from 16-digit to 32-digit numbers, while a child who learns the rule can apply it to arbitrarily long sequences? We argue that this failure is not an engineering problem but a violation of physical postulates. Drawing inspiration from physics, we identify three constraints that any generalizing system must satisfy: (1) Locality – information propagates at finite speed; (2) Symmetry – the laws of computation are invariant across space and time; (3) Stability – the system converges to discrete attractors that resist noise accumulation. From these postulates, we derive – rather than design – the Spatiotemporal Evolution with Attractor Dynamics (SEAD) architecture: a neural cellular automaton where local convolutional rules are iterated until convergence. Experiments on three tasks validate our theory: (1) Parity – demonstrating perfect length generalization via light-cone propagation; (2) Addition – achieving scale-invariant inference from L=16 to L=1 million with 100% accuracy, exhibiting input-adaptive computation; (3) Rule 110 – learning a Turing-complete cellular automaton without trajectory divergence. Our results suggest that the gap between statistical learning and logical reasoning can be bridged – not by scaling parameters, but by respecting the physics of computation.
💡 Research Summary
The paper tackles the well‑known “length generalization failure” of modern neural networks, especially transformers, which can learn to add 16‑digit numbers but catastrophically collapse when asked to add longer numbers. The authors argue that this failure is not a matter of insufficient data or model capacity but a violation of fundamental physical constraints that any truly generalizing system must obey. Drawing inspiration from physics and neuroscience, they formulate three postulates that together define a viable computational substrate for causal, length‑independent reasoning.
- Locality (Relativistic Causality) – Information cannot propagate instantaneously across arbitrary distances; it must travel through adjacent points at a finite speed. In neural terms this forbids global attention mechanisms that effectively implement “action at a distance.”
- Spacetime Symmetry (Translation Invariance) – The computational law governing updates must be invariant under shifts in space and time. This mirrors the inductive principle that the same rule applies everywhere, ensuring that a model trained on finite‑length examples can extrapolate to unseen positions.
- Thermodynamic Dissipation and Stability – Over long sequences, microscopic noise inevitably accumulates unless the dynamics are dissipative. The system must possess low‑dimensional attractors so that any perturbed state is driven back to a discrete, stable manifold, analogous to how digital signals resist degradation.
From these postulates the authors derive a concrete architecture: Spatiotemporal Evolution with Attractor Dynamics (SEAD). SEAD is a neural cellular automaton (NCA) that uses a single, weight‑shared local update rule (Φ) applied repeatedly to a lattice representing the input sequence. The update rule is essentially a convolutional kernel (or recurrent rule) that respects locality and translation invariance. Crucially, the dynamics are designed to be contractive: saturating nonlinearities or explicit quantization push the system toward binary attractor states, thereby implementing the third postulate.
The computation proceeds as follows: the input digits are embedded as an initial lattice state; Φ is iterated until the lattice converges to a fixed point (or a small set of attractor states). The number of iterations required grows with the input length, forming a natural “light‑cone” of influence that mirrors causal propagation in physical systems. Because the rule is shared across all positions, the same computation works for any length, and because the dynamics are dissipative, the result is robust to numerical noise even for millions of steps.
The authors validate SEAD on three benchmark tasks that are canonical for algorithmic reasoning:
- Parity – A binary classification task where the output depends on the parity of the input bits. SEAD achieves perfect accuracy for arbitrary input lengths, demonstrating that the light‑cone propagation correctly aggregates information without loss.
- Addition – The model is trained on 16‑digit integer addition and then tested on sequences up to one million digits. SEAD attains 100 % accuracy across this massive extrapolation, whereas transformer‑based baselines typically fail beyond 2‑3× the training length. The number of iterations scales linearly with the number of digits, evidencing input‑adaptive computation.
- Rule 110 – A one‑dimensional cellular automaton known to be Turing‑complete. SEAD learns the rule from short trajectories and reproduces long‑range dynamics without trajectory divergence, confirming that the learned local rule respects the global dynamics of a complex system.
Across all experiments, SEAD’s performance is deterministic, noise‑resilient, and independent of the specific length of the test inputs. The authors argue that these results illustrate a shift from “statistical generalization” (minimizing expected risk under a fixed data distribution) to “causal generalization” (recovering the underlying structural mechanism that is invariant to environmental interventions such as length changes).
In the discussion, the paper situates its contribution within broader debates on AI reasoning. It critiques recent engineering‑heavy fixes—positional encodings, attention biases, chain‑of‑thought prompting—as partial patches that do not address the underlying violation of physical constraints. By grounding model design in locality, symmetry, and stability, the work offers a principled pathway toward neural systems that can reason algorithmically and extrapolate indefinitely, much like human cognition. The authors suggest that future work could explore richer attractor structures, hierarchical spacetime manifolds, and applications to symbolic mathematics, program synthesis, and scientific discovery, where exact logical reasoning over arbitrarily large structures is essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment