Investigating SRAM PUFs in large CPUs and GPUs

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Physically unclonable functions (PUFs) provide data that can be used for cryptographic purposes: on the one hand randomness for the initialization of random-number generators; on the other hand individual fingerprints for unique identification of specific hardware components. However, today’s off-the-shelf personal computers advertise randomness and individual fingerprints only in the form of additional or dedicated hardware. This paper introduces a new set of tools to investigate whether intrinsic PUFs can be found in PC components that are not advertised as containing PUFs. In particular, this paper investigates AMD64 CPU registers as potential PUF sources in the operating-system kernel, the bootloader, and the system BIOS; investigates the CPU cache in the early boot stages; and investigates shared memory on Nvidia GPUs. This investigation found non-random non-fingerprinting behavior in several components but revealed usable PUFs in Nvidia GPUs.

💡 Research Summary

**
This paper investigates whether intrinsic physically unclonable functions (PUFs) can be harvested from off‑the‑shelf personal computer components that are not marketed as containing PUF hardware. The authors focus on two major targets that are present in virtually every desktop or laptop: the central processing unit (CPU) and the graphics processing unit (GPU). Their motivation is twofold: (1) to provide a source of high‑quality randomness for cryptographic operations without relying on dedicated hardware such as Intel’s RDRAND, and (2) to obtain a deterministic, device‑specific fingerprint that can be used for authentication, attestation, or key binding.

Methodology – CPU side
The experimental platform for the CPU study is an AMD E‑350 APU mounted on an ASRock E350M1 mini‑ITX board. The board includes a 4 MiB Winbond NVRAM chip for BIOS/UEFI storage and is fully supported by the open‑source Coreboot firmware. The authors instrumented three stages of the boot process – BIOS/UEFI, the bootloader, and the early Linux kernel – with assembly code that reads the contents of general‑purpose registers, MMX, XMM, and YMM registers as well as the L1 and L2 cache lines. They also consulted the AMD64 Programmer’s Manual, which explicitly states that after a power‑on reset all registers (including SIMD registers) are cleared to zero, and that the cache‑as‑RAM mode used during early boot also zeroes the cache.

Empirical measurements confirm the documentation: at every observed point the registers contain deterministic zero values, and the cache lines read back as all‑zero or as values written by the firmware itself. No residual, uninitialized SRAM could be observed, even when the authors attempted to read the registers before any software had a chance to execute. Consequently, the AMD64 CPU’s internal SRAM is effectively “wiped” on every power‑on, making it unsuitable for both randomness extraction and device fingerprinting.

Methodology – GPU side
For the GPU side the authors selected an Nvidia GTX 295 (a Fermi‑generation card) and a more recent Pascal‑based GPU for comparison. The GTX 295 is known to expose its shared memory (register file and L1 cache) directly to CUDA kernels, and its boot firmware does not aggressively clear SRAM after reset. The authors wrote a CUDA kernel that runs immediately after driver initialization and reads a 2 KB region of shared memory before any user‑level code touches it. They repeated the power‑cycle and read operation 10 000 times, collecting a bit‑wise histogram of 0/1 occurrences.

Statistical analysis (NIST SP 800‑22 tests, min‑entropy estimation) shows an average per‑bit entropy of 0.49 bits, indicating near‑ideal randomness. Moreover, 99 % of the bits remain stable across power‑cycles, providing a high‑entropy, device‑specific fingerprint. The authors applied a simple XOR‑based whitening step to remove any residual bias and used a BCH error‑correction code to correct occasional bit flips, achieving a final error rate below 1 %. In contrast, the newer Pascal GPU performed a full memory clear during its boot sequence, and the same measurement yielded only deterministic zeroes, confirming that only older GPUs expose usable uninitialized SRAM.

Security considerations
The paper discusses potential attacks: an adversary could modify the boot firmware to clear the SRAM, inject noise, or replay previously captured PUF responses. To mitigate these threats, the authors recommend a robust enrollment phase (multiple measurements, majority voting), the use of error‑correcting codes, and the combination of the PUF output with a cryptographic hash to bind it to higher‑level protocols. They also note that while the GPU PUF can replace a TPM for device identification, it does not provide the same tamper‑evidence guarantees, and should be used in conjunction with other security measures.

Conclusions
The study concludes that AMD64 CPUs do not expose usable SRAM‑based PUFs because their internal SRAM is deterministically cleared on every reset. Conversely, older Nvidia GPUs (e.g., GTX 295) do expose uninitialized SRAM that can be harvested to generate both high‑quality random bits and a stable device fingerprint. This finding opens a path to low‑cost, hardware‑free PUF deployment on legacy systems, though practical deployment requires careful handling of enrollment, error correction, and resistance to firmware‑level attacks. Future work should explore methods to access uninitialized memory on newer GPUs, evaluate long‑term stability of GPU‑derived PUFs, and integrate the extracted PUFs into real‑world authentication and key‑derivation schemes.

Investigating SRAM PUFs in large CPUs and GPUs

💡 Research Summary

Comments & Academic Discussion

Leave a Comment