Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach
We study impartial games under fixed-latency, fixed-scale quantised inference (FSQI). In this fixed-scale, bounded-range regime, we prove that inference is simulable by constant-depth polynomial-size Boolean circuits (AC0). This yields a worst-case representational barrier: single-frame agents in the FSQI/AC0 regime cannot strongly master NIM, because optimal play depends on the global nim-sum (parity). Under our stylised deterministic rollout interface, a single rollout policy head from the structured family analysed here reveals only one fixed linear functional of the invariant, so increasing rollout budget alone does not recover the missing bits. We derive two structural bypasses: (1) a multi-policy-head rollout architecture that recovers the full invariant via distinct rollout channels, and (2) a multi-frame architecture that tracks local nimber differences and supports restoration. Experiments across multiple settings are consistent with these predictions: single-head baselines stay near chance, while two-frame models reach near-perfect restoration accuracy and multi-head FSM-controlled shootouts achieve perfect win/loss position classification. Overall, the empirical results support the view that explicit structural priors (history/differences or multiple rollout channels) are important in the FSQI/AC0 regime.
💡 Research Summary
The paper investigates the fundamental representational limits of neural networks operating under a Fixed‑Scale Quantised Inference (FSQI) regime, where weights and thresholds are confined to a bounded, input‑size‑independent grid and inference must be performed with constant depth (i.e., fixed latency). By showing that any such network—whether a feed‑forward NN, a finite‑window RNN, or a finite‑window transformer—can be simulated by a constant‑depth, polynomial‑size AC⁰ circuit (Theorem 2.4), the authors place FSQI models squarely within a well‑studied circuit class known to be incapable of computing global parity (XOR) functions.
Using this connection, they prove a strong impossibility result for the classic impartial game NIM. Optimal play in NIM depends on the nim‑sum, the bitwise XOR of all heap sizes. Since parity is not in AC⁰, a single‑frame, single‑head policy/value network cannot compute the nim‑sum in the worst case (Theorem 5.2). Consequently, no such network can achieve “strong mastery” of NIM or any impartial game whose solution requires aggregating local Grundy values via parity.
The paper then analyses the information that a weak evaluator can expose through deterministic rollouts. A single rollout head, regardless of budget, reveals at most one fixed linear functional of the nim‑sum (Propositions 5.10 and 5.14). Thus, increasing rollout depth alone cannot overcome the parity barrier.
Two structural bypasses are proposed:
-
Multi‑frame bypass – By feeding two consecutive game states (Pₜ₋₁, Pₜ) the network can compute the local difference Δ(Pₜ₋₁, Pₜ), which encodes only O(1) bits of information and is AC⁰‑computable (Lemma 5.4). Using Δ, a simple restoration rule can keep the nim‑sum at zero, yielding a universal verifier policy that achieves strong (restoration) mastery (Propositions 5.5 and 5.13).
-
Multi‑head rollout bypass – Introducing B independent rollout heads, each trained to recover a distinct bit of the nim‑sum, allows the agent to reconstruct the full invariant via depth amplification (Proposition 5.9). With all B bits recovered, the agent can classify win/loss positions perfectly.
Empirical validation is performed on a 20‑heap, 4‑bit NIM instance with one million supervised restoration examples. A one‑frame, single‑head model remains near chance, while a two‑frame model attains >99 % restoration accuracy. In a complementary experiment, a multi‑head FSM‑controlled shoot‑out achieves 100 % win/loss classification, whereas the single‑head baseline is limited by its one‑bit information channel.
The authors discuss why AC⁰ is a realistic abstraction for low‑bit deployment (e.g., INT8) where thresholds do not scale with fan‑in, and they note that normalization layers (LayerNorm, Softmax) are treated as calibration steps rather than sources of parity‑computing power. They also acknowledge that their results are worst‑case representational statements, not guarantees about learnability; the multi‑head construction proves existence but does not claim that self‑play will discover the required heads automatically.
In summary, the paper establishes that under fixed‑scale quantised inference, single‑frame neural agents cannot compute the global parity needed for optimal NIM play, but modest architectural augmentations—either a short history window or multiple rollout channels—transform the problem into an AC⁰‑tractable one. This work highlights the importance of explicit structural priors when designing neural planners for impartial games in hardware‑constrained settings.
Comments & Academic Discussion
Loading comments...
Leave a Comment