abstractPIM: A Technology Backward-Compatible Compilation Flow for Processing-In-Memory

abstractPIM: A Technology Backward-Compatible Compilation Flow for Processing-In-Memory
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The von Neumann architecture, in which the memory and the computation units are separated, demands massive data traffic between the memory and the CPU. To reduce data movement, new technologies and computer architectures have been explored. The use of memristors, which are devices with both memory and computation capabilities, has been considered for different processing-in-memory (PIM) solutions, including using memristive stateful logic for a programmable digital PIM system. Nevertheless, all previous work has focused on a specific stateful logic family, and on optimizing the execution for a certain target machine. These solutions require new compiler and compilation when changing the target machine, and provide no backward compatibility with other target machines. In this chapter, we present abstractPIM, a new compilation concept and flow which enables executing any function within the memory, using different stateful logic families and different instruction set architectures (ISAs). By separating the code generation into two independent components, intermediate representation of the code using target independent ISA and then microcode generation for a specific target machine, we provide a flexible flow with backward compatibility and lay foundations for a PIM compiler. Using abstractPIM, we explore various logic technologies and ISAs and how they impact each other, and discuss the challenges associated with it, such as the increase in execution time.


💡 Research Summary

The paper introduces abstractPIM, a novel compilation framework designed to enable general‑purpose processing‑in‑memory (PIM) across a variety of memristive stateful‑logic technologies while preserving backward compatibility with existing target machines. The authors observe that prior PIM solutions are tightly coupled to a single logic family (e.g., MAGIC, IMPLY, CRS) and require a complete recompilation whenever the underlying hardware changes. To break this dependency, abstractPIM separates code generation into two independent stages.

In the first stage, a target‑independent intermediate representation (IR) is produced using a defined instruction set architecture (ISA) that abstracts away any technology‑specific details. This stage can be performed by the programmer or a high‑level compiler and is completely agnostic to the memristor technology, the cross‑bar layout, or the physical implementation of logic gates.

The second stage translates each ISA instruction into a sequence of micro‑operations (micro‑code) that are directly supported by the chosen PIM device. This translation is performed once per instruction by the PIM vendor and embedded in the memory controller. Because the micro‑code maps abstract ISA commands onto the actual voltage‑driven operations of the target logic family, the same IR can be re‑targeted to any future memristive technology simply by swapping the micro‑code library.

A third component, runtime execution, streams the ISA instructions from the CPU to the controller, which expands them into the pre‑generated micro‑operations and drives the cross‑bar array. This mirrors the way modern CPUs decode x86 macro‑instructions into micro‑ops, but the “micro‑ops” here are physical voltage pulses that implement stateful logic.

The authors illustrate the flow with a half‑adder example, showing four configurations: (a) a NOR‑only ISA on a MAGIC‑NOR device, (b) a full 2‑input/1‑output ISA on the same device, (c) the same ISA on a MAGIC‑NAND device, and (d) the same ISA on a device that supports all 2‑input MAGIC functions. The examples demonstrate that the same intermediate code can be reused across different ISAs and technologies, achieving a 56 % reduction in control‑load (CPU‑to‑controller traffic) compared to state‑of‑the‑art mapping tools such as SIMPLER.

A key insight is the trade‑off between code size and execution latency. A minimal ISA (e.g., NOR‑only) yields the smallest number of clock cycles because each instruction maps directly to a single primitive operation, but it inflates the instruction count and thus the control traffic. Conversely, richer ISAs (including AND, XOR, etc.) compress the instruction stream, reducing communication overhead, yet each high‑level instruction must be decomposed into multiple primitive operations, increasing the total number of clock cycles. The paper quantifies this effect, reporting a 10‑30 % increase in execution cycles when using richer ISAs, while achieving up to a 56 % reduction in transferred instruction words.

The work also positions abstractPIM relative to existing mapping tools (SIMPLE, SAID, SIMPLER). Those tools tightly couple logic synthesis, placement, and scheduling to a specific technology, making them unsuitable for heterogeneous or future PIM devices. abstractPIM’s two‑layer abstraction decouples synthesis from technology, enabling rapid retargeting and fostering the development of a universal PIM ISA and associated compiler infrastructure.

In summary, abstractPIM provides a technology‑agnostic compilation flow that separates high‑level program representation from low‑level device specifics, supports multiple stateful‑logic families, reduces CPU‑controller communication, and offers a clear framework for exploring ISA‑technology co‑design. This approach lays the groundwork for scalable, backward‑compatible PIM systems and paves the way for future research on general‑purpose memristive computing architectures.


Comments & Academic Discussion

Loading comments...

Leave a Comment