Accelerating Post-Quantum Cryptography via LLM-Driven Hardware-Software Co-Design
Post-quantum cryptography (PQC) is crucial for securing data against emerging quantum threats. However, its algorithms are computationally complex and difficult to implement efficiently on hardware. In this paper, we explore the potential of Large Language Models (LLMs) to accelerate the hardware-software co-design process for PQC, with a focus on the FALCON digital signature scheme. We present a novel framework that leverages LLMs to analyze PQC algorithms, identify performance-critical components, and generate candidate hardware descriptions for FPGA implementation. We present the first quantitative comparison between LLM-driven synthesis and conventional HLS-based approaches for low-level compute-intensive kernels in FALCON, showing that human-in-the-loop LLM-generated accelerators can achieve up to 2.6x speedup in kernel execution time with shorter critical paths, while highlighting trade-offs in resource utilization and power consumption. Our results suggest that LLMs can minimize design effort and development time by automating FPGA accelerator design iterations for PQC algorithms, offering a promising new direction for rapid and adaptive PQC accelerator design on FPGAs.
💡 Research Summary
The paper addresses the pressing need to implement post‑quantum cryptography (PQC) efficiently on reconfigurable hardware, focusing on the FALCON digital signature scheme, which offers compact signatures and fast verification but suffers from heavy polynomial arithmetic, Number‑Theoretic Transforms (NTT), FFTs, and discrete Gaussian sampling. Traditional FPGA design routes—hand‑crafted RTL and high‑level synthesis (HLS)—each have drawbacks: RTL yields near‑optimal performance but demands extensive expertise and long development cycles, while HLS accelerates development but struggles with constant‑time execution, modular arithmetic mapping, memory bottlenecks, and side‑channel resistance required by cryptographic standards.
To bridge this gap, the authors propose a novel framework that integrates large language models (LLMs) into both hardware‑software partitioning and hardware generation. The workflow begins with a reference C/C++ implementation of a PQC algorithm. In the partitioning stage, either profiling data (e.g., GNU gprof) or the raw source code is fed to an LLM, which automatically identifies computational hotspots. The LLM recognizes O(N²) polynomial multiplication routines such as “zintaddscaledmul” and “zintaddmul” as prime candidates for acceleration, even without explicit profiling, by leveraging its pre‑trained knowledge of algorithmic patterns.
Once candidate kernels are selected, a second LLM‑driven step generates synthesizable HDL (Verilog or VHDL), associated testbenches, TCL constraint scripts, and even timing‑aware directives. These artifacts are then passed to a commercial FPGA toolchain (e.g., Xilinx Vivado) for synthesis, placement, and routing. The authors evaluate the approach on two FALCON parameter sets (512‑bit and 1024‑bit security). Compared with conventional HLS‑generated accelerators, the LLM‑produced designs achieve up to 2.6× reduction in kernel execution time and a ~20 % shorter critical path, enabling higher clock frequencies. Resource usage (LUTs/FFs) rises by 15‑25 % and power consumption by roughly 10 %, reflecting less‑optimal memory interfacing and pipeline granularity in the automatically generated code.
Crucially, the overall design cycle shrinks from weeks of manual RTL work to a matter of days, dramatically lowering the expertise barrier for PQC hardware development. The study also highlights practical considerations such as prompt engineering, token‑limit handling, and the trade‑off between abstract (high‑level) versus full‑code prompts for accurate hotspot detection.
In summary, the work demonstrates that LLMs can serve as effective assistants in the co‑design of PQC accelerators, automatically extracting performance‑critical kernels, generating hardware descriptions, and integrating them into a standard FPGA flow. While resource and power overheads remain, the speed‑up and reduction in human effort suggest a promising direction for rapid, adaptable, and scalable PQC hardware deployment. Future work is suggested to extend the methodology to other NIST‑selected schemes (e.g., Kyber, Dilithium) and to incorporate automated side‑channel resistance verification within the LLM‑driven pipeline.
Comments & Academic Discussion
Loading comments...
Leave a Comment