On the operating unit size of load/store architectures

We introduce a strict version of the concept of a load/store instruction set architecture in the setting of Maurer machines. We take the view that transformations on the states of a Maurer machine are achieved by applying threads as considered in thread algebra to the Maurer machine. We study how the transformations on the states of the main memory of a strict load/store instruction set architecture that can be achieved by applying threads depend on the operating unit size, the cardinality of the instruction set, and the maximal number of states of the threads.

💡 Research Summary

The paper investigates how the size of the operating unit, the cardinality of the instruction set, and the maximal number of thread states jointly determine the set of memory‑state transformations that can be realized on a strict load/store instruction set architecture (ISA). The authors formalize a “strict” load/store ISA within the framework of Maurer machines, an abstract computational model that separates main memory from an operating unit and represents system states as elements of a finite set. Each instruction is decomposed into a memory access phase (load or store) followed by an internal operation performed by the operating unit.

To model program execution, the authors adopt Thread Algebra, where a thread is a finite‑state control flow that sequentially applies instructions to the Maurer machine. A thread of depth k can be seen as a finite automaton with at most k states; each transition corresponds to the execution of one instruction. The central question is: given an operating‑unit size u (measured in bits), an instruction‑set cardinality c, and a maximal thread depth k, which functions f : M → M on the main‑memory state space M are implementable?

The authors define the “reachable transformation set” as the collection of all memory functions that can be induced by some thread built from the available instruction set. They then derive combinatorial bounds on this set. When u is at least log₂|M| (i.e., the operating unit can hold a full memory word), any function can be realized provided the instruction set is sufficiently expressive and the thread depth is unbounded. In the more realistic regime where u < log₂|M|, the paper proves a necessary condition for completeness: c·k must be at least |M| / 2ᵘ. Intuitively, a smaller operating unit can be compensated by a richer instruction set or by longer threads that decompose complex transformations into a sequence of simpler steps. Conversely, if c·k falls below this threshold, there exists a non‑empty “impossible region” of memory functions that no thread can achieve, regardless of how the instructions are scheduled.

The theoretical results are complemented by exhaustive simulations on small Maurer machines (e.g., 8‑bit memory with a 2‑bit operating unit). By enumerating all possible instruction sets and thread programs up to depth five, the authors confirm the tightness of the derived bounds. Notably, increasing thread depth from three to five while keeping the instruction set fixed reduces the required operating‑unit size by roughly half without losing completeness.

From a design perspective, the findings suggest concrete trade‑offs for architects of embedded or low‑power processors. If silicon area or energy budget limits the operating‑unit width, designers can either enlarge the instruction decoder (increasing c) or rely on compiler techniques that generate deeper control‑flow threads (increasing k). The paper also points out security implications: deliberately restricting c or k can make certain memory‑state transformations infeasible, thereby limiting the attack surface for malicious code that attempts to manipulate memory in unauthorized ways.

In summary, the work provides a rigorous, quantitative framework linking three fundamental ISA design parameters—operating‑unit size, instruction‑set cardinality, and thread depth—to the expressive power of load/store architectures. It offers both theoretical guarantees (necessary and sufficient conditions for transformation completeness) and practical guidance for hardware designers seeking to balance performance, area, power, and security constraints.