Efficient Boys function evaluation using minimax approximation
We present an algorithm for efficient evaluation of Boys functions $F_0,\dots,F_{k_\mathrm{max}}$ tailored to modern computing architectures, in particular graphical processing units (GPUs), where maximum throughput is high and data movement is costly. The method combines rational minimax approximations with upward and downward recurrence relations. The non-negative real axis is partitioned into three regions, $[0,\infty\rangle = A\cup B\cup C$, where regions $A$ and $B$ are treated using rational minimax approximations and region $C$ by an asymptotic approximation. This formulation avoids lookup tables and irregular memory access, making it well suited hardware with high maximum throughput and low latency. The rational minimax coefficients are generated using the rational Remez algorithm. For a target maximum absolute error of $\varepsilon_\mathrm{tol} = 5\cdot10^{-14}$, the corresponding approximation regions and coefficients for Boys functions $F_0,\dots,F_{32}$ are provided in the appendix.
💡 Research Summary
The paper addresses the computational bottleneck posed by the evaluation of Boys functions Fₖ(x)=∫₀¹ t^{2k} e^{-x t²} dt, which appear ubiquitously in Gaussian‑type orbital (GTO) based quantum‑chemical two‑electron integral calculations. Traditional strategies rely on pre‑tabulated values with interpolation or low‑degree polynomial/Pade approximations. While these reduce arithmetic, they incur irregular memory accesses that severely limit performance on modern high‑throughput architectures such as GPUs, where data movement is far more expensive than floating‑point operations.
To overcome this limitation, the authors propose a fully arithmetic‑driven algorithm based on rational minimax approximations combined with upward and downward recurrence relations. A rational minimax approximation is the best rational function of prescribed numerator and denominator degrees that minimizes the maximum weighted error over a given interval. The authors implement a robust version of the Remez algorithm in quadruple precision to compute these approximations, avoiding the convergence and precision problems encountered with existing libraries (Boost, AlgRemez, Baryrat, Chebfun). Their implementation (open‑source at gitlab.com/rasmusvi/minimax) uses divided differences, orthonormal polynomial bases, and Sturm’s theorem to guarantee pole‑free solutions.
The non‑negative real axis is partitioned into three regions:
- Region C (
Comments & Academic Discussion
Loading comments...
Leave a Comment