Attention Is Not Retention: The Orthogonality Constraint in Infinite-Context Architectures

Attention Is Not Retention: The Orthogonality Constraint in Infinite-Context Architectures
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Biological memory solves a problem that eludes current AI: storing specific episodic facts without corrupting general semantic knowledge. Complementary Learning Systems theory explains this through two subsystems - a fast hippocampal system using sparse, pattern-separated representations for episodes, and a slow neocortical system using distributed representations for statistical regularities. Current AI systems lack this separation, attempting to serve both functions through neural weights alone. We identify the Orthogonality Constraint: reliable memory requires orthogonal keys, but semantic embeddings cannot be orthogonal because training clusters similar concepts together. The result is Semantic Interference (connecting to what cognitive psychologists have long observed in human memory), where neural systems writing facts into shared continuous parameters collapse to near-random accuracy within tens of semantically related facts. Through semantic density (rho), the mean pairwise cosine similarity, we show collapse occurs at N=5 facts (rho > 0.6) or N ~ 20-75 (moderate rho). We validate across modalities: 16,309 Wikipedia facts, scientific measurements (rho = 0.96, 0.02% accuracy at N=10,000), and image embeddings (rho = 0.82, 0.05% at N=2,000). This failure is geometric - no increase in model capacity can overcome interference when keys share semantic overlap. We propose Knowledge Objects (KOs): structured facts with hash-based identity, controlled vocabularies, and explicit version chains. On Wikipedia facts, KO retrieval achieves 45.7% where Modern Hopfield Networks collapse to near-zero; hash-based retrieval maintains 100%. Production systems (Claude Memory, ChatGPT Memory) store unstructured text, causing schema drift (40-70% consistency) and version ambiguity. Knowledge Objects provide the discrete hippocampal component that enables reliable bicameral memory.


💡 Research Summary

The paper “Attention Is Not Retention: The Orthogonality Constraint in Infinite‑Context Architectures” identifies a fundamental geometric limitation that prevents current large language models (LLMs) from reliably storing episodic facts during inference. Drawing on the Complementary Learning Systems (CLS) theory from neuroscience, the authors argue that biological memory separates fast, sparse, pattern‑separated hippocampal storage from slow, distributed cortical learning. Modern AI systems collapse these two functions into a single continuous parameter space, attempting to write new facts directly into model weights or fast‑weight associative memories.

The authors formalize the “Orthogonality Constraint”: for interference‑free retrieval, stored key vectors must be (approximately) orthogonal. However, embeddings learned from language data are deliberately clustered to capture semantic similarity, leading to a high “semantic density” (ρ), defined as the mean pairwise cosine similarity among keys. When ρ is large, inner‑product‑based retrieval suffers from semantic interference: the similarity between keys causes their contributions to overlap, rapidly degrading recall accuracy.

Empirical analysis across three modalities demonstrates the severity of this effect. In a set of 16,309 Wikipedia facts (average ρ≈0.45), accuracy drops to near‑random after storing only 30‑50 related facts. In a scientific measurement dataset with ρ=0.96, storing 10,000 facts yields a mere 0.02 % correct retrieval. In an image‑embedding benchmark (ρ=0.82), 2,000 facts lead to 0.05 % accuracy. The collapse occurs regardless of model size, attention window length, or training data volume, confirming that the problem is geometric rather than architectural.

To overcome the constraint, the paper proposes Knowledge Objects (KOs). A KO is a discrete, typed memory unit that (1) possesses a hash‑based unique identifier, guaranteeing orthogonal addressing; (2) follows a controlled vocabulary and schema, eliminating the “schema drift” observed when the same fact is stored under varied textual predicates; and (3) includes explicit version chains, allowing deterministic overwrites and clean retrieval of the current value. KOs are stored externally in a discrete key‑value store, while the neural network continues to serve as the slow cortical component for statistical reasoning.

Experimental results show that KO‑based retrieval achieves 45.7 % accuracy on the Wikipedia benchmark, while a pure modern Hopfield‑style associative memory (the mathematical foundation of transformer attention) collapses to near‑zero. Direct hash lookup maintains 100 % accuracy across all scales. The authors also audit commercial memory extensions such as Claude Memory and ChatGPT Memory, finding that their unstructured‑text storage leads to 40‑70 % predicate consistency and highly variable correction rates (0‑100 %), illustrating the practical impact of semantic interference and version ambiguity.

The paper concludes that any system that attempts online storage of discrete facts in shared continuous parameters will inevitably suffer from the Orthogonality Constraint. A bicameral architecture—discrete hippocampal‑like storage (KOs) paired with continuous cortical‑like learning—offers a principled solution. This design is positioned as essential for future episodic‑memory‑enabled agents, Retrieval‑Augmented Generation pipelines, and neurosymbolic AI, and it demonstrates that scaling attention or data alone cannot substitute for the need for discrete, orthogonal addressing.


Comments & Academic Discussion

Loading comments...

Leave a Comment