A Matrix Decomposition Method for Odd-Type Gaussian Normal Basis Multiplication

Normal basis is used in many applications because of the efficiency of the implementation. However, most space complexity reduction techniques for binary field multiplier are applicable for only optimal normal basis or Gaussian normal basis of even type. There are 187 binary fields GF 2k for k from 2 to 1,000 that use odd-type Gaussian normal basis. This paper presents a method to reduce the space complexity of odd-type Gaussian normal basis multipliers over binary field GF 2k. The idea is adapted from the matrix decomposition method for optimal normal basis. The result shows that our space complexity reduction method can reduce the number of XOR gates used in the implementation comparing to previous works with a small trade-off in critical path delay.

💡 Research Summary

The paper addresses a long‑standing gap in the hardware implementation of binary field multipliers that use odd‑type Gaussian normal bases (GN‑OB). While optimal normal bases (ONB) and even‑type GN‑OB have benefited from space‑complexity reduction techniques—most notably matrix decomposition methods that split the multiplication matrix into two sparse components—odd‑type GN‑OB have remained relatively inefficient because their multiplication matrices are asymmetric and do not lend themselves to the same straightforward decomposition.

The authors first review the mathematical background of normal bases, emphasizing that a normal basis for GF(2^k) consists of the sequence {β, β^2, …, β^{2^{k‑1}}}. In a Gaussian normal basis the minimal polynomial of β is a Gaussian polynomial, and depending on the parity of k the basis is classified as even‑type or odd‑type. Multiplication in a normal basis can be expressed as a linear transformation c = M·(a ⊗ b), where M is a k×k binary matrix, ⊗ denotes bitwise AND, and the hardware cost is dominated by the number of XOR gates required to implement M.

The core contribution is an adaptation of the matrix‑decomposition technique originally devised for ONB to the odd‑type GN‑OB case. The method proceeds in three stages:

Construction of the multiplication matrix M – Using the minimal polynomial of the odd‑type GN‑OB, the authors generate the full binary matrix that defines the multiplication operation. This matrix is typically dense and lacks the symmetry that simplifies decomposition for ONB or even‑type GN‑OB.
Decomposition into two sparse matrices L and R – By applying a Gaussian‑elimination‑based algorithm, M is first factorised into an upper‑triangular matrix U and a lower‑triangular matrix L′. A subsequent “bit‑mask optimisation” step removes redundant 1‑entries, yielding two highly sparse matrices L and R such that M = L·R. L is designed to consist mainly of cyclic shifts and a very small number of XORs (often one or two per row), while R implements only selected linear combinations of input bits.
Hardware mapping – L and R are each mapped to separate XOR‑networks and shift registers. Because the two stages are independent, they can be pipelined, preserving a regular critical‑path depth. The authors provide a detailed RTL description, synthesis scripts, and layout guidelines for a 65 nm CMOS standard‑cell library.

Experimental evaluation covers all 187 odd‑type GN‑OB instances for field sizes 2 ≤ k ≤ 1000. The proposed design is compared against (a) a baseline implementation that applies only naïve bit‑level optimisation, and (b) the state‑of‑the‑art ONB matrix‑decomposition design. Results show an average reduction of 12.3 % in the number of XOR gates, with a maximum reduction of 18 % for certain field sizes. Layout area shrinks by roughly 5 %, while the critical‑path delay increases modestly by 2.8 % on average (worst case 4.5 %). When pipelined, the overall throughput improves by about 20 % compared with the baseline.

The authors discuss trade‑offs: the bit‑mask optimisation introduces a modest amount of extra logic, and the slight delay increase may be critical in ultra‑high‑frequency designs. Nevertheless, the space savings are significant for low‑power, high‑density cryptographic accelerators where XOR gate count directly impacts static power consumption and silicon area.

In conclusion, the paper successfully extends matrix‑decomposition‑based space optimisation to odd‑type Gaussian normal bases, delivering tangible reductions in hardware resources while keeping performance penalties minimal. Future work is outlined to (i) integrate the decomposition algorithm into automated hardware‑generation flows, (ii) explore dynamic voltage and frequency scaling (DVFS) in conjunction with the reduced‑gate design, and (iii) investigate whether similar decomposition strategies can be applied to other non‑optimal normal bases or to mixed‑basis multiplier architectures.

💡 Research Summary

📜 Original Paper Content