Design and Implementation of a Secure RISC-V Microprocessor
Secret keys can be extracted from the power consumption or electromagnetic emanations of unprotected devices. Traditional countermeasures have a limited scope of protection and impose several restrictions on how sensitive data must be manipulated. We demonstrate a bit-serial RISC-V microprocessor implementation with no plain-text data. All values are protected using Boolean masking. Software can run with little to no countermeasures, reducing code size and performance overheads. Unlike previous literature, our methodology is fully automated and can be applied to designs of arbitrary size or complexity. We also provide details on other key components, such as clock randomizer, memory protection, and random number generator (RNG). The microprocessor was implemented in 65-nm CMOS technology. Its implementation was evaluated using NIST tests and side-channel attacks. Random numbers generated with our RNG pass on all NIST tests. The side-channel analysis on the baseline implementation extracted the advanced encryption system (AES) key using only 375 traces, while our secure microprocessor was able to withstand attacks using 20M traces.
💡 Research Summary
**
The paper presents a comprehensive methodology for building a side‑channel‑resistant RISC‑V microprocessor that requires virtually no changes to software or RTL code. The authors start from a publicly available bit‑serial RISC‑V core, chosen for its minimal area, and automatically transform it into three variants: (1) a baseline processor with no counter‑measures (NCM‑uP), (2) a processor protected with Boolean masking (BM‑uP), and (3) a processor that combines Boolean masking with differential domino logic (DDL) (BM‑DDL‑uP). The transformation is performed by scripts that replace selected static CMOS cells with masked or dynamic cells, preserving the original synthesis, placement, routing, and static timing flow.
Boolean Masking is applied at the logic level. Each secret value is split into two shares; the shares are processed independently using specially derived Boolean expressions for non‑linear operations (AND, OR, XOR) that keep the shares statistically independent of the original value. By operating on shares rather than the secret itself, an attacker must simultaneously capture and correctly combine all shares to recover the secret, dramatically raising the difficulty of power‑analysis attacks.
Differential Domino Logic (DDL) is used to implement the masked gates. DDL is a precharge/evaluation dynamic logic style where both outputs are forced to the same value during precharge, and only one toggles during evaluation. This reduces data‑dependent power variations and suppresses glitches that could otherwise leak information from the masked shares.
Clock Randomization is introduced to thwart trace alignment, a prerequisite for effective correlation power analysis (CPA). An 8‑bit linear‑feedback shift register (LFSR) generates a pseudo‑random pattern that decides whether to skip a clock edge. The skip ratio can be 25 %, 50 %, or 75 % (implemented with AND, XOR, or OR of the two LSBs of the LFSR). The LFSR is periodically perturbed by bits from the on‑chip RNG, breaking the deterministic 255‑cycle repetition and making the clock edge pattern effectively non‑repeating. This random delay insertion decorrelates the power traces, forcing an attacker to collect orders of magnitude more measurements.
Random Number Generator (RNG) supplies the high‑throughput randomness required by both masking (fresh masks each cycle) and clock randomization. The entropy source is a 22‑inverter ring oscillator that exploits thermal noise‑induced phase jitter. Raw jitter bits are sampled by a clock derived from the system clock, filtered through a metastability filter, and then post‑processed by a hybrid of a 43‑bit XNOR LFSR and a 37‑cell linear hybrid cellular automaton shift register (CASR). The post‑processor removes bias and expands entropy, achieving a throughput of over 1 Gbps of random bits. The RNG passes all NIST SP 800‑22 statistical tests (frequency, block, runs, entropy, etc.), confirming its suitability for cryptographic masking.
Implementation Results: All three variants were fabricated in a 65 nm CMOS process. Area overheads are modest: BM‑uP occupies ~1.8× the baseline area, while BM‑DDL‑uP occupies ~2.4×. Power consumption increases by roughly 12 % (BM‑uP) and 18 % (BM‑DDL‑uP). Performance remains practical; at a 65 MHz clock the processor encrypts a 128‑bit AES block in ~20 ms, despite the bit‑serial datapath and added security logic.
Security Evaluation: The authors collected more than 40 million power traces and applied dynamic time warping (DTW) for trace alignment before CPA. The baseline processor leaked enough information to recover the AES key with only 375 traces. The BM‑uP required about 3 million traces to reach a statistically significant correlation, while the BM‑DDL‑uP resisted attacks up to 20 million traces, showing no key recovery. The combination of Boolean masking, DDL, and clock randomization thus raises the data‑complexity of a successful side‑channel attack by four orders of magnitude.
Contributions and Impact:
- An automated, CAD‑integrated flow that injects Boolean masking and dynamic logic without manual RTL changes.
- Demonstration that a small, bit‑serial RISC‑V core can be hardened against power analysis while retaining acceptable area, power, and performance.
- A high‑throughput, NIST‑validated RNG that supplies fresh masks each cycle.
- A practical clock‑randomization scheme that can be tuned per software routine, providing a flexible trade‑off between security and timing predictability.
Overall, the work shows that comprehensive side‑channel protection can be achieved at the microarchitectural level, enabling secure embedded and IoT devices without burdening software developers with complex cryptographic libraries or extensive code modifications.
Comments & Academic Discussion
Loading comments...
Leave a Comment