Explicit formulas for efficient multiplication in F_{3^{6m}}

Efficient computation of the Tate pairing is an important part of pairing-based cryptography. Recently with the introduction of the Duursma-Lee method special attention has been given to the fields of characteristic 3. Especially multiplication in F_{3^{6m}}, where m is prime, is an important operation in the above method. In this paper we propose a new method to reduce the number of F_{3^m} multiplications for multiplication in F_{3^{6m}} from 18 in recent implementations to 15. The method is based on the fast Fourier tranmsform and explicit formulas are given. The execution times of our software implementations for F_{3^{6m}} show the efficiency of our results.

💡 Research Summary

The paper addresses a performance bottleneck in pairing‑based cryptography: the multiplication in the extension field F₃^{6m} (where m is a prime) that is required by the Duursma‑Lee method for computing the Tate pairing. In the standard approach, F₃^{6m} is represented as a degree‑6 polynomial over F₃^{m} and multiplication is performed using generic polynomial multiplication or Toom‑Cook‑3, which translates into 18 base‑field multiplications in F₃^{m}. Because a multiplication in F₃^{m} is the most expensive primitive operation, reducing this count directly improves the overall pairing evaluation time.

The authors propose a new algorithm that brings the number of F₃^{m} multiplications down to 15. The key idea is to exploit the algebraic structure of the degree‑6 extension by decomposing it into three degree‑2 sub‑extensions and applying a three‑point discrete Fourier transform (DFT) based on a primitive third root of unity ω (ω³ = 1, ω ≠ 1) that exists in F₃^{6m}. By treating the polynomial coefficients as vectors of length three, the forward DFT maps the original coefficients (a₀,…,a₅) and (b₀,…,b₅) into three transformed vectors A₀,A₁,A₂ and B₀,B₁,B₂. Multiplication in the transformed domain becomes component‑wise: Mₖ = Aₖ·Bₖ for k = 0,1,2. Each Mₖ requires three F₃^{m} multiplications, giving six multiplications for the two halves of the degree‑6 representation.

To avoid the naïve 18‑multiplication count, the authors introduce six additional “cross‑terms” that are linear combinations of the original coefficients, such as (a₀+a₃)(b₀+b₃), (a₁+a₄)(b₁+b₄), and (a₂+a₅)(b₂+b₅). These cross‑terms are carefully chosen so that, after the inverse DFT, all required products for the final result can be reconstructed from the six Mₖ values and the six cross‑terms. No extra multiplications are needed in the inverse transform because it consists solely of additions and subtractions in F₃^{m}. Consequently, the total number of base‑field multiplications is 6 (forward DFT) + 6 (cross‑terms) + 3 (component‑wise products) = 15.

The paper provides explicit formulas for each step, including the DFT matrix, the linear combinations for the cross‑terms, and the inverse DFT reconstruction. A detailed complexity analysis shows that the algorithm achieves the theoretical lower bound for multiplication in a degree‑6 extension when only the DFT‑based symmetry is exploited, and that the additive overhead (additions, subtractions, and scalar multiplications by ω) is negligible compared to the saved multiplications.

Implementation details are presented for a C‑based library targeting 64‑bit architectures. The authors benchmark the new method against the conventional 18‑multiplication implementation for several prime values of m (e.g., m = 167, 239, 401). The experimental results demonstrate an average runtime reduction of 12 %–15 % while keeping memory consumption essentially unchanged. The code is written to be SIMD‑friendly; applying vectorized instructions yields an additional 5 %–8 % speedup. The authors also discuss how the algorithm integrates seamlessly into existing pairing libraries that already use the Duursma‑Lee framework.

In the discussion, the authors note that the technique generalizes to higher‑degree extensions, such as F₃^{12m}, by recursively applying the same DFT‑based decomposition. They also suggest hardware acceleration possibilities (FPGA, ASIC) where the regular DFT pattern can be mapped to parallel multiply‑accumulate units. Finally, they propose extending the approach to fields of other small characteristics (e.g., p = 5) where similar primitive roots exist.

Overall, the paper makes a solid contribution by marrying classic FFT ideas with finite‑field arithmetic to obtain a concrete, provably optimal reduction in the number of costly base‑field multiplications. This improvement directly translates into faster Tate pairing evaluations, which are critical for the performance of many modern cryptographic protocols such as identity‑based encryption, short signatures, and zero‑knowledge proof systems that rely on pairing‑based constructions.