More than MACs: Exploring the Role of Neuromorphic Engineering in the Age of LLMs
The introduction of large language models has significantly expanded global demand for computing; addressing this growing demand requires novel approaches that introduce new capabilities while addressing extant needs. Although inspiration from biological systems served as the foundation on which modern artificial intelligence (AI) was developed, many modern advances have been made without clear parallels to biological computing. As a result, the ability of techniques inspired by ``natural intelligence’’ (NI) to inflect modern AI systems may be questioned. However, by analyzing remaining disparities between AI and NI, we argue that further biological inspiration can contribute towards expanding the capabilities of artificial systems, enabling them to succeed in real-world environments and adapt to niche applications. To elucidate which NI mechanisms can contribute toward this goal, we review and compare elements of biological and artificial computing systems, emphasizing areas of NI that have not yet been effectively captured by AI. We then suggest areas of opportunity for NI-inspired mechanisms that can inflect AI hardware and software.
💡 Research Summary
The paper “More than MACs: Exploring the Role of Neuromorphic Engineering in the Age of LLMs” argues that the unprecedented computational and energy demands of today’s large language models (LLMs) cannot be solved solely by continuing the industry‑wide push to optimize multiply‑accumulate (MAC) operations. Instead, the authors advocate a broader, biologically‑inspired redesign of AI hardware and software.
First, the authors compare human brain efficiency with that of modern AI accelerators on two quantitative axes: token‑level efficiency (energy per word or token) and synapse‑level efficiency (operations per watt). Using publicly available benchmarks (ML.Energy) they show that, when batch processing is employed, current GPUs (e.g., NVIDIA H100) actually consume less energy per token than the human brain, which uses roughly 6 J per word. However, this advantage is contingent on massive parallelism and aggressive batching; the brain achieves its efficiency without batching and with constant low latency.
On the synapse‑level, the brain is estimated to deliver 3.5–35 tera‑ops per watt (TOPS/W). A survey of 170 AI accelerators reveals that most fall below the lower bound of this range, although a subset using low‑precision (4‑8‑bit) arithmetic approaches or exceeds it. The authors note that these numbers hide critical architectural constraints, especially memory.
Memory is identified as the dominant bottleneck for LLM execution. Modern GPUs rely on a hierarchy of SRAM caches, DRAM, and high‑bandwidth memory (HBM). SRAM cells, while fast, occupy large silicon area and have not scaled down significantly with newer process nodes. Consequently, even systems with 80 GB of HBM must stream billions of parameters and intermediate activations across the memory hierarchy, incurring substantial energy and latency costs. The paper discusses alternative architectures such as systolic arrays, which keep parameters in local memory and reduce data movement, yet still depend on external DRAM for large models.
The authors then analyze the trade‑off between batch size and inter‑token latency. Larger batches amortize the cost of loading static program data, improving energy per token, but increase the time between successive outputs, harming user‑perceived responsiveness. Human cognition, by contrast, processes information continuously with low latency and without the need for batching. This observation motivates the search for non‑batch, in‑situ inference and learning mechanisms.
From these observations, three research directions for neuromorphic engineering are proposed:
-
High‑density, low‑power memory integration – employing emerging non‑volatile memories (RRAM, PCM), spin‑tronic devices, or 3‑D stacked memory‑compute fabrics to bring storage physically closer to compute units, thereby minimizing costly data movement.
-
Analog‑digital hybrid computation – implementing synaptic‑level multiplication as physical current flow rather than digital arithmetic, reducing conversion overhead and approaching the brain’s energy per synaptic operation.
-
In‑situ learning and continual adaptation – designing hardware that can update weights locally and on‑the‑fly, eliminating the need for large batch training cycles and enabling real‑time adaptation to changing environments.
The paper concludes that while the gap in raw energy‑per‑operation between AI and the brain is narrowing, fundamental differences in memory capacity, reliance on batching, and lack of continual learning remain. Neuromorphic approaches that address these dimensions could fundamentally reshape the trajectory of AI, moving beyond the “more MACs” mantra toward systems that are more brain‑like in efficiency, adaptability, and real‑world applicability.
Comments & Academic Discussion
Loading comments...
Leave a Comment