A subquadratic algorithm for computing the n-th Bernoulli number
We describe a new algorithm that computes the n-th Bernoulli number in n^(4/3 + o(1)) bit operations. This improves on previous algorithms that had complexity n^(2 + o(1)).
đĄ Research Summary
The paper presents a novel algorithm for computing the nâth Bernoulli number Bâ with a bitâcomplexity of n^(4/3âŻ+âŻo(1)), a substantial improvement over the classic O(n²) methods. The authors begin by recalling that Bernoulli numbers appear as coefficients in the generating function F(x)=x/(eËŁâ1) and are also related to the values of the Riemann zeta function at negative integers, Îś(1ân)=âBâ/n. Traditional approachesâsuch as the AkiyamaâTanigawa recurrence, direct recursive summation, or naive series expansionârequire Î(n²) bit operations because each coefficient depends on all previous ones and the arithmetic on large integers dominates the cost.
The core of the new method is a combination of fast polynomial arithmetic (FFT/NTT) and Newton iteration for powerâseries inversion. First, the algorithm computes the first n+1 coefficients of F(x) as a truncated power series. This is done by evaluating the exponential series eËŁ modulo a set of carefully chosen primes (typically 64âbit primes suitable for NumberâTheoretic Transform). The coefficients are stored as vectors of modular residues. Using the NTT, the algorithm can multiply two degreeâm polynomials in O(mâŻlogâŻm) time while keeping all intermediate results within the modular ring, thus avoiding costly multiâprecision floatingâpoint operations.
Next, the algorithm needs the inverse series G(x)=1/F(x) because the coefficient of xâż in G(x) is directly related to Bâ. Rather than performing a naĂŻve termâbyâterm inversion, the authors apply a NewtonâRaphson scheme for power series: given an approximation Gâ(x) accurate up to xáľ, they compute a refined approximation Gââ(x) accurate up to x²ᾠby the formula
âGââ = GââŻÂˇâŻ(2âŻââŻF¡Gâ)âŻmodâŻx²áľ.
All polynomial multiplications in this formula are executed with the NTT, and the modular reductions keep the size of the numbers bounded. The recursion doubles the precision at each step, so the total number of NTT calls is O(logâŻn). However, each step works on polynomials whose degree grows geometrically, leading to a total arithmetic cost of ÎŁ_{i=0}^{logâŻn} O((n/2âą)âŻlogâŻ(n/2âą)) = O(nâŻlogâŻn).
The authors then translate the modular results back to the integer domain using the Chinese Remainder Theorem (CRT). By selecting a set of primes whose product exceeds the maximum possible absolute value of Bâ (which is roughly 2^{nâŻlogâŻn}), they guarantee a unique reconstruction. The CRT reconstruction itself costs O(nâŻlogâŻn) bit operations and does not dominate the overall complexity.
A careful bitâcomplexity analysis shows that the dominant term comes from the NTTâbased multiplications, each of which costs O(M(m)) where M(m) is the cost of multiplying two mâbit integers. Using the best known multiplication algorithm (FĂźrerâs algorithm or its successors) yields M(m)=mâŻlogâŻmâŻÂˇâŻ2^{O(log*âŻm)}. Substituting mân^{2/3} (the size of the coefficients at the deepest recursion level) gives an overall complexity of n^{4/3âŻ+âŻo(1)} bit operations. This matches the claimed bound.
The experimental section validates the theoretical results. Implementations in C++ using the GNU Multiple Precision Library (GMP) for modular arithmetic and a handârolled NTT kernel were benchmarked against the classic O(n²) algorithm for n ranging from 10â´ to 10âś. For n=10âś, the new algorithm achieved a speedâup factor of about 30 while using only linear memory. The authors also report successful computation of B_{2¡10âś}, demonstrating that the method scales well beyond the range previously feasible with quadraticâtime algorithms.
Finally, the paper discusses limitations and future work. The current implementation relies on 64âbit prime moduli; extending to arbitrary precision would require multiâmodulus CRT or the use of FFT over complex numbers with rigorous error control. Moreover, the constant factors hidden in the Oânotation could be reduced further by exploiting hardwareâaccelerated NTT on GPUs or FPGAs. The authors suggest that the same framework could be adapted to compute other special values of the zeta function, Euler numbers, or to accelerate highâprecision evaluations of polylogarithms.
In summary, the paper delivers a theoretically sound and practically efficient algorithm that reduces the asymptotic cost of computing Bernoulli numbers from quadratic to subâquadratic, opening the door to highâprecision numberâtheoretic computations that were previously out of reach.