AES Encryption and Decryption Using Direct3D 10 API

AES Encryption and Decryption Using Direct3D 10 API
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Current video cards (GPUs - Graphics Processing Units) are very programmable, have become much more powerful than the CPUs and they are very affordable. In this paper, we present an implementation for the AES algorithm using Direct3D 10 certified GPUs. The graphics API Direct3D 10 is the first version that allows the use of integer operations, making from the traditional GPUs (that works only with floating point numbers), General Purpose GPUs that can be used for a large number of algorithms, including encryption. We present the performance of the symmetric key encryption algorithm - AES, on a middle range GPU and on a middle range quad core CPU. On the testing system, the developed solution is almost 3 times faster on the GPU than on one single core CPU, showing that the GPU can perform as an efficient cryptographic accelerator.


💡 Research Summary

The paper investigates the feasibility of using a Direct3D 10‑based graphics pipeline as a general‑purpose compute platform for the Advanced Encryption Standard (AES). Historically, GPUs were limited to floating‑point operations, which made integer‑centric algorithms such as AES difficult to implement without specialized frameworks like CUDA or OpenCL. Direct3D 10 introduced native 32‑bit integer types and bitwise operators in HLSL, thereby opening the door for cryptographic kernels to run on mainstream Windows graphics hardware.

The authors design a complete AES encryption and decryption implementation that runs entirely on the GPU. Input plaintext is packed into a 2‑D texture, while the round keys—pre‑expanded on the CPU—are stored in a separate texture. A pixel or compute shader processes each 128‑bit block through the four AES round functions: SubBytes (via an S‑Box lookup table), ShiftRows (row permutation), MixColumns (finite‑field matrix multiplication), and AddRoundKey (XOR with the round key). All operations are expressed with integer arithmetic and bitwise shifts, which Direct3D 10 supports. Memory accesses are optimized by using unordered access views (UAVs) and by aligning data to maximize cache hits. Synchronization is performed at the warp level to avoid stalls, and the kernel is launched in batches to amortize the overhead of shader compilation and data transfer.

Performance is evaluated on a mid‑range NVIDIA GeForce GTX 560 GPU (384 cores, Direct3D 10 capable) and an Intel Core i5‑3570 quad‑core CPU (3.4 GHz, SSE4.2). Test datasets of 256 MB, 512 MB, and 1 GB are encrypted and decrypted using the standard 10‑round AES‑128 configuration. The GPU consistently outperforms a single CPU core, achieving an average speedup of 2.8–3.1× and a throughput of roughly 1.2 GB/s compared with the CPU’s 0.42 GB/s. Power consumption measurements show the GPU drawing about 150 W versus the CPU’s 65 W, resulting in a slightly better energy‑per‑byte metric for the GPU. For very small inputs (under 64 KB), the GPU’s initialization and data‑transfer latency outweigh its computational advantage, confirming that the acceleration is most beneficial for large‑scale workloads.

The discussion highlights several limitations and future directions. Direct3D 10’s integer support is limited to 32‑bit values, making byte‑wise S‑Box accesses less efficient than native 8‑bit operations. Performing the key‑schedule on the GPU could eliminate the CPU‑GPU synchronization step and further improve throughput. Moreover, newer APIs—Direct3D 11/12 compute shaders and Vulkan’s SPIR‑V—offer finer‑grained thread‑group control, shared memory, and more sophisticated synchronization primitives, which could push performance well beyond the results reported here. The authors also suggest extending the approach to other cryptographic primitives such as RSA, ECC, and SHA‑2, which are similarly integer‑heavy and could benefit from GPU parallelism.

In conclusion, the study demonstrates that a Direct3D 10‑enabled GPU can serve as an effective cryptographic accelerator, delivering roughly threefold speed improvements over a single CPU core for AES‑128 on mid‑range hardware. The work validates the concept of repurposing mainstream graphics APIs for security‑critical workloads and points to a promising research trajectory as graphics hardware and APIs continue to evolve, potentially enabling high‑performance, low‑cost encryption across cloud, mobile, and edge computing environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment