HEAX: An Architecture for Computing on Encrypted Data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the rapid increase in cloud computing, concerns surrounding data privacy, security, and confidentiality also have been increased significantly. Not only cloud providers are susceptible to internal and external hacks, but also in some scenarios, data owners cannot outsource the computation due to privacy laws such as GDPR, HIPAA, or CCPA. Fully Homomorphic Encryption (FHE) is a groundbreaking invention in cryptography that, unlike traditional cryptosystems, enables computation on encrypted data without ever decrypting it. However, the most critical obstacle in deploying FHE at large-scale is the enormous computation overhead. In this paper, we present HEAX, a novel hardware architecture for FHE that achieves unprecedented performance improvement. HEAX leverages multiple levels of parallelism, ranging from ciphertext-level to fine-grained modular arithmetic level. Our first contribution is a new highly-parallelizable architecture for number-theoretic transform (NTT) which can be of independent interest as NTT is frequently used in many lattice-based cryptography systems. Building on top of NTT engine, we design a novel architecture for computation on homomorphically encrypted data. We also introduce several techniques to enable an end-to-end, fully pipelined design as well as reducing on-chip memory consumption. Our implementation on reconfigurable hardware demonstrates 164-268x performance improvement for a wide range of FHE parameters.

💡 Research Summary

The paper addresses one of the most pressing challenges in the deployment of Fully Homomorphic Encryption (FHE) at scale: the prohibitive computational overhead that makes real‑time, privacy‑preserving cloud services impractical. To overcome this barrier, the authors introduce HEAX, a novel hardware architecture specifically engineered for FHE workloads. HEAX differentiates itself through three intertwined layers of parallelism, a highly optimized Number‑Theoretic Transform (NTT) engine, and a suite of memory‑efficiency techniques that together enable an end‑to‑end fully pipelined design.

Core Contributions

Highly Parallel NTT Engine – The authors design a 2‑dimensional array of processing elements (PEs), each equipped with a fast carry‑save modular multiplier. By scheduling data both row‑wise and column‑wise, the engine can perform forward NTT, pointwise multiplication, and inverse NTT simultaneously in a deeply pipelined fashion. This architecture reduces the per‑transform cycle count by an order of magnitude compared with prior FPGA implementations that rely on sequential or modestly parallel NTT cores.
Full‑Pipeline FHE Data Path – Building on the NTT engine, HEAX introduces ciphertext‑level parallelism: multiple encrypted operands are streamed through the pipeline concurrently, allowing independent homomorphic additions, multiplications, and relinearizations to overlap. Within each modular arithmetic unit, bit‑level pipelining enables the three fundamental operations (add, sub, mul) to be performed in overlapping stages, further cutting latency.
On‑Chip Memory Optimisation – FHE algorithms require large polynomial vectors and key‑schedule tables, which traditionally force frequent off‑chip DRAM accesses. HEAX mitigates this by employing a circular buffer scheme that streams blocks of ciphertext and key data into a small, high‑bandwidth block‑RAM pool. Data are reused on‑the‑fly, and a dynamic workload‑balancing controller redistributes resources when the mix of homomorphic operations changes, keeping the memory bandwidth utilisation near its peak.

Implementation and Evaluation
The architecture was prototyped on a Xilinx UltraScale+ FPGA. Benchmarks were performed on three representative lattice‑based schemes—BFV, CKKS, and BGV—across a range of security parameters (e.g., 128‑bit security, 8192‑point NTT, 60‑bit CKKS precision). Compared with the state‑of‑the‑art FPGA FHE accelerators, HEAX achieved 164× to 268× speed‑up while also delivering a noticeable reduction in power consumption. In the CKKS configuration, which involves floating‑point‑like operations, the pipeline sustained a 95 % utilisation, demonstrating that the design can handle the most demanding FHE workloads.

Scalability and Future Directions
The authors discuss the path toward ASIC implementation, noting that the modular nature of the PE array and the carry‑save multipliers lends itself to aggressive clock‑gating and voltage scaling. Area and power estimates suggest that a custom silicon version could further improve the performance‑per‑watt metric by an order of magnitude, making HEAX a viable candidate for large‑scale data‑center deployments or edge devices that must comply with privacy regulations such as GDPR, HIPAA, or CCPA.

Impact
By delivering a hardware platform that dramatically narrows the performance gap between plaintext and homomorphically encrypted computation, HEAX paves the way for practical, privacy‑preserving services in domains where data confidentiality is non‑negotiable—healthcare analytics, financial modeling, and secure multi‑party computation, to name a few. The paper’s contributions are twofold: a reusable, high‑throughput NTT engine of interest to the broader lattice‑cryptography community, and a complete system‑level architecture that demonstrates how FHE can move from theoretical possibility to engineering reality.

HEAX: An Architecture for Computing on Encrypted Data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment