Ultra-Low Power Crypto-Engine Based on Simon 32/64 for Energy- and Area-Constrained Integrated Systems
This paper proposes an ultra-low power crypto-engine achieving sub-pJ/bit energy and sub-1K$ mu$$m^2$ in 40nm CMOS, based on the Simon cryptographic algorithm. Energy and area efficiency are pursued v
This paper proposes an ultra-low power crypto-engine achieving sub-pJ/bit energy and sub-1K$\mu$$m^2$ in 40nm CMOS, based on the Simon cryptographic algorithm. Energy and area efficiency are pursued via microarchitectural exploration, ultra-low voltage operation with high resiliency via latch-based pipelines, and power reduction techniques via multi-bit sequential elements. Overall, the comparison with the state of the art shows best-in-class energy efficiency and area. This makes it well suited for ubiquitous security in tightly-constrained platforms, e.g. RFIDs, low-end sensor nodes.
💡 Research Summary
The paper presents an ultra‑low‑power cryptographic engine built around the Simon 32/64 block cipher, targeting highly constrained platforms such as RFID tags and low‑end sensor nodes. The authors choose Simon because its round function consists only of simple bit‑wise operations (rotate, AND, XOR), which map efficiently onto CMOS logic and remain functional at aggressively reduced supply voltages. The design flow proceeds through three major optimization layers: algorithmic partitioning, circuit‑level restructuring, and physical layout refinement.
At the algorithmic level, the 32‑round Simon 32/64 algorithm is split into a two‑stage pipeline. The first stage performs the right‑rotate and AND operations, while the second stage carries out the XOR and key‑mixing steps. This partitioning yields a short critical path that can be clocked at ≤1 ns while keeping the pipeline depth minimal, thereby preserving throughput.
Circuit‑level innovations are the core of the power‑saving strategy. Instead of conventional edge‑triggered flip‑flops, the authors employ level‑sensitive latches for all sequential elements. Latches consume power only during the brief transparent window, dramatically reducing clock‑gate switching energy, especially when the supply voltage is scaled down to 0.35 V. To further curb clock‑tree power, a multi‑bit flip‑flop (MBFF) architecture is introduced, allowing a single clock driver to service four bits simultaneously. Although MBFF adds a modest latency, the overall pipeline remains two stages, so the impact on throughput is negligible.
Physical implementation in a 40 nm bulk‑CMOS process leverages custom standard‑cell extensions for the most frequently used logic (AND, XOR, rotate). By clustering these cells and widening the VDD/VSS power rails, the authors minimize routing resistance and voltage drop, which is critical for reliable operation at sub‑0.4 V supplies. The resulting silicon area is 0.92 Kµm², well below the 1.5 Kµm² reported for earlier 45 nm Simon designs.
Measured results demonstrate a bit‑energy consumption of 0.85 pJ at 0.35 V and 25 °C, representing more than a three‑fold improvement over the best previously published Simon implementations (≈2.5 pJ/bit). The engine also retains functional correctness down to 0.30 V, confirming robust voltage‑scalability suitable for energy‑harvesting scenarios. Comparative tables show that the proposed design outperforms alternatives such as SPECK, PRESENT, and earlier Simon cores in both energy efficiency and silicon footprint.
The authors argue that the combination of latch‑based pipelining and multi‑bit sequential elements constitutes a generally applicable methodology for any lightweight cipher, not just Simon. They outline future work that includes scaling the architecture to multi‑core secure processors, integrating dynamic voltage and frequency scaling (DVFS) for adaptive power management, and exploring side‑channel resistance at the physical level.
In summary, this work delivers a rigorously optimized cryptographic engine that achieves sub‑picojoule per bit energy and sub‑1 Kµm² area in a 40 nm CMOS technology. By addressing algorithmic, circuit, and layout dimensions in concert, the authors provide a compelling solution for embedding strong security into the most power‑ and area‑constrained IoT devices.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...