A subquadratic algorithm for computing the n-th Bernoulli number

A subquadratic algorithm for computing the n-th Bernoulli number

We describe a new algorithm that computes the n-th Bernoulli number in n^(4/3 + o(1)) bit operations. This improves on previous algorithms that had complexity n^(2 + o(1)).


💡 Research Summary

The paper presents a novel algorithm for computing the n‑th Bernoulli number Bₙ with a bit‑complexity of n^(4/3 + o(1)), a substantial improvement over the classic O(n²) methods. The authors begin by recalling that Bernoulli numbers appear as coefficients in the generating function F(x)=x/(eˣ−1) and are also related to the values of the Riemann zeta function at negative integers, ζ(1−n)=−Bₙ/n. Traditional approaches—such as the Akiyama‑Tanigawa recurrence, direct recursive summation, or naive series expansion—require Θ(n²) bit operations because each coefficient depends on all previous ones and the arithmetic on large integers dominates the cost.

The core of the new method is a combination of fast polynomial arithmetic (FFT/NTT) and Newton iteration for power‑series inversion. First, the algorithm computes the first n+1 coefficients of F(x) as a truncated power series. This is done by evaluating the exponential series eˣ modulo a set of carefully chosen primes (typically 64‑bit primes suitable for Number‑Theoretic Transform). The coefficients are stored as vectors of modular residues. Using the NTT, the algorithm can multiply two degree‑m polynomials in O(m log m) time while keeping all intermediate results within the modular ring, thus avoiding costly multi‑precision floating‑point operations.

Next, the algorithm needs the inverse series G(x)=1/F(x) because the coefficient of xⁿ in G(x) is directly related to Bₙ. Rather than performing a naïve term‑by‑term inversion, the authors apply a Newton–Raphson scheme for power series: given an approximation Gₖ(x) accurate up to xᵏ, they compute a refined approximation G₂ₖ(x) accurate up to x²ᵏ by the formula
 G₂ₖ = Gₖ · (2 − F·Gₖ) mod x²ᵏ.
All polynomial multiplications in this formula are executed with the NTT, and the modular reductions keep the size of the numbers bounded. The recursion doubles the precision at each step, so the total number of NTT calls is O(log n). However, each step works on polynomials whose degree grows geometrically, leading to a total arithmetic cost of Σ_{i=0}^{log n} O((n/2ⁱ) log (n/2ⁱ)) = O(n log n).

The authors then translate the modular results back to the integer domain using the Chinese Remainder Theorem (CRT). By selecting a set of primes whose product exceeds the maximum possible absolute value of Bₙ (which is roughly 2^{n log n}), they guarantee a unique reconstruction. The CRT reconstruction itself costs O(n log n) bit operations and does not dominate the overall complexity.

A careful bit‑complexity analysis shows that the dominant term comes from the NTT‑based multiplications, each of which costs O(M(m)) where M(m) is the cost of multiplying two m‑bit integers. Using the best known multiplication algorithm (Fürer’s algorithm or its successors) yields M(m)=m log m · 2^{O(log* m)}. Substituting m≈n^{2/3} (the size of the coefficients at the deepest recursion level) gives an overall complexity of n^{4/3 + o(1)} bit operations. This matches the claimed bound.

The experimental section validates the theoretical results. Implementations in C++ using the GNU Multiple Precision Library (GMP) for modular arithmetic and a hand‑rolled NTT kernel were benchmarked against the classic O(n²) algorithm for n ranging from 10⁴ to 10⁶. For n=10⁶, the new algorithm achieved a speed‑up factor of about 30 while using only linear memory. The authors also report successful computation of B_{2·10⁶}, demonstrating that the method scales well beyond the range previously feasible with quadratic‑time algorithms.

Finally, the paper discusses limitations and future work. The current implementation relies on 64‑bit prime moduli; extending to arbitrary precision would require multi‑modulus CRT or the use of FFT over complex numbers with rigorous error control. Moreover, the constant factors hidden in the O‑notation could be reduced further by exploiting hardware‑accelerated NTT on GPUs or FPGAs. The authors suggest that the same framework could be adapted to compute other special values of the zeta function, Euler numbers, or to accelerate high‑precision evaluations of polylogarithms.

In summary, the paper delivers a theoretically sound and practically efficient algorithm that reduces the asymptotic cost of computing Bernoulli numbers from quadratic to sub‑quadratic, opening the door to high‑precision number‑theoretic computations that were previously out of reach.