On the Complexity of Neural Computation in Superposition
Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing $m’$ features in superposition requires at least $Ω(\sqrt{m’ \log m’})$ neurons and $Ω(m’ \log m’)$ parameters. This implies an explicit limit on how much one can sparsify or distill a model while preserving its expressibility, and complements empirical scaling laws by implying the first subexponential bound on capacity: a network with $n$ neurons can compute at most $O(n^2 / \log n)$ features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m’} \log m’)$ neurons and $O(m’ \log^2 m’)$ parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as $O(\log m’)$ neurons based on the Johnson-Lindenstrauss Lemma. Our work analytically establishes that the number of parameters is a good estimator of the number of features a neural network computes.
💡 Research Summary
The paper tackles a fundamental question in modern deep learning: how many logical features can a neural network compute when both inputs and outputs are represented in superposition—that is, when the number of features exceeds the number of neurons. The authors focus on a clean, mathematically tractable setting: computing many Boolean functions in parallel, specifically the 2‑AND problem (a collection of pairwise ANDs over m input bits) and the Neural Permutation problem (computing all permutations of m bits). They assume a feature‑sparsity condition on the admissible inputs, typically v = 2 (each input vector contains at most two active bits), which mirrors empirical observations that only a few features are active at any time in large models.
Lower‑bound contributions
The authors develop an information‑theoretic framework that abstracts away architectural details such as activation functions or connectivity patterns. By counting the number of distinct Boolean function families that must be realizable, they show that any network that (even with a small error probability) can compute the target family must have a parameter description length of at least Ω(m′ log m′) bits. Since each parameter can be stored with O(1) bits in a realistic setting, this translates into a requirement of at least Ω(√(m′ log m′)) neurons. The proof uses Kolmogorov complexity to argue that a highly expressive network necessarily needs a large amount of information encoded in its weights. The bound holds for exact computation as well as for the more realistic case where a bounded fraction of inputs may be mis‑computed. An “unordered” variant (where only the multiset of outputs matters) yields a slightly weaker but still non‑trivial bound. Crucially, these lower bounds are assumption‑free with respect to activation functions or depth; they rely only on the sparsity of inputs and the combinatorial size of the function class.
From the lower bound the authors derive a new capacity limit: a network with n neurons can compute at most O(n² / log n) distinct features in superposition. This is a sub‑exponential bound that sharply contrasts with passive representation results (e.g., Johnson‑Lindenstrauss embeddings) that can store up to 2^{O(n)} features without any computation. The result therefore establishes an exponential gap between the ability to store many features and the ability to compute them when superposition is required.
Upper‑bound constructions
To complement the impossibility results, the authors present an explicit construction that nearly matches the lower bound. The key technical tool is the notion of feature influence: for each input variable they count how many output ANDs depend on it. Based on this influence they partition the set of ANDs into three regimes (low, medium, high) and handle each with a different wiring strategy:
-
Low‑influence regime (≤ m′/4) – the majority of ANDs in realistic settings. Inputs are routed into a collection of “computational channels” that are themselves superposed. Within each channel, a simple linear‑threshold unit computes the AND. Because each channel handles only a limited number of outputs, the total number of neurons grows as O(√m′ log m′).
-
Medium‑influence regime – handled by a hybrid of channel routing and direct weight sharing, keeping the neuron count low while ensuring each output receives the correct combination of inputs.
-
High‑influence regime – a small set of outputs that depend on many inputs; these are computed using a more dense sub‑network but the overall contribution to the neuron budget remains sub‑dominant.
The resulting network uses O(1) layers, O(√m′ log m′) neurons, and O(m′ log² m′) parameters, with each parameter requiring only O(1) bits on average. The construction is exact (no errors) and can be stacked to compute sequences of 2‑AND operations, making it suitable for deeper architectures. The authors also show how the design extends to higher‑arity ANDs (k‑way AND), to larger sparsity levels v (with an extra factor O(v² log m)), and to multi‑layer settings.
Implications and context
The work bridges a gap between empirical observations of superposition in large language models and rigorous theory. By proving that the number of computable features grows only quadratically (up to a log factor) with the number of neurons, the paper explains why massive models need to increase width dramatically to support richer logical reasoning. It also provides a concrete limitation for model compression: any technique that reduces parameters below Ω(m′ log m′) will inevitably lose the ability to compute the full set of superposed features.
The authors compare their results with prior work. Vaintrob et al. (2023) introduced the k‑AND problem but restricted superposition to neurons only, keeping inputs monosemantic; their algorithm achieves n = Θ(m′^{2/3}) which is far from optimal under the fully superposed setting. Subsequent work (Vaintrob et al., 2024) improves the bound but still falls short of the O(√m′ log m′) achieved here. Moreover, earlier lower‑bound techniques for neural memorization rely on VC‑dimension arguments that require specific activation functions; the present paper’s Kolmogorov‑complexity based bounds avoid such assumptions, making them broadly applicable.
Finally, the paper notes that the low‑influence channel routing mechanism appears spontaneously in small networks trained by gradient descent (Adler et al., 2024), suggesting that the theoretical construction may capture a real computational primitive used by trained models. This connection opens a promising avenue for mechanistic interpretability: identifying such “computational channels” in the activation patterns of large models could reveal how they manage massive superposition.
Conclusion
Overall, the paper delivers a near‑tight characterization of the resources needed to compute Boolean functions in superposition. The lower bounds establish a fundamental, sub‑exponential capacity limit, while the constructive upper bounds demonstrate that this limit is essentially attainable. These results have direct relevance for understanding the scaling laws of large neural networks, for guiding compression strategies, and for informing future work on interpreting the internal logic of superposed representations.
Comments & Academic Discussion
Loading comments...
Leave a Comment