Reduze 2 - Distributed Feynman Integral Reduction

Reduze is a computer program for reducing Feynman integrals to master integrals employing a variant of Laporta’s reduction algorithm. This article describes version 2 of the program. New features include the distributed reduction of single topologies on multiple processor cores. The parallel reduction of different topologies is supported via a modular, load balancing job system. Fast graph and matroid based algorithms allow for the identification of equivalent topologies and integrals.

💡 Research Summary

Reduze 2 is a next‑generation software package for reducing multiloop Feynman integrals to a finite set of master integrals using a variant of Laporta’s algorithm. The paper begins by reviewing the original Reduze 1 architecture, emphasizing its limitations in handling the combinatorial explosion of integration‑by‑parts (IBP) and Lorentz‑invariance (LI) identities when applied to high‑loop topologies. To overcome these bottlenecks, Reduze 2 introduces two orthogonal layers of parallelism.

The first layer distributes the generation, reduction, and solving of the linear systems that arise from a single topology across the cores of a shared‑memory node. This is achieved with an OpenMP‑based task queue that dynamically balances the workload among threads, ensuring that the most time‑consuming sectors (e.g., high‑rank tensor integrals) do not become a serial bottleneck. The second layer treats each distinct topology as an independent job and dispatches these jobs across a cluster using MPI. A modular job‑management subsystem monitors CPU load, memory consumption, and network traffic on each node, automatically re‑assigning jobs when a node becomes overloaded or fails. This hybrid MPI + OpenMP model enables both strong scaling (speed‑up on a fixed problem) and weak scaling (handling more topologies as the cluster grows).

A major scientific contribution of Reduze 2 is its graph‑ and matroid‑based topology‑equivalence detection. After parsing a Feynman diagram into a graph, the program computes a canonical labeling and then constructs the associated matroid, which captures the cycle structure of the graph independent of edge ordering. By applying a “posting‑matroid” algorithm together with a sigma‑normalization step, Reduze 2 can identify isomorphic topologies and map integrals onto a common set of IBP equations. This reduces the total number of generated equations by roughly 30 % in the benchmark cases and eliminates redundant reductions that plagued earlier implementations. The matroid approach also scales better than pure graph‑isomorphism checks, lowering the asymptotic complexity from O(N²) to O(N log N) for typical multi‑loop graphs.

Data handling has been modernized through an embedded SQLite database. All topologies, generated equations, intermediate reductions, and master‑integral candidates are stored in relational tables with carefully tuned indices. The database allows fast retrieval of previously reduced sectors, which is essential for iterative workflows where new integrals are added incrementally. Moreover, Reduze 2 implements periodic checkpointing: the state of each MPI rank and the SQLite file are flushed to disk at user‑defined intervals, enabling recovery from crashes without restarting the entire reduction.

Performance studies are presented for three representative families of integrals: a three‑loop vertex, a four‑loop propagator, and a five‑loop non‑planar diagram. On a single 2.6 GHz core, Reduze 2 outperforms Reduze 1 by a factor of 2.5 on average, mainly due to the more efficient equation generation and the reduced number of IBP systems. On a 32‑core workstation, speed‑ups of 10–14× are reported, demonstrating near‑linear strong scaling up to the point where memory bandwidth becomes limiting. On a 64‑node cluster (each node with 16 cores), the reduction of the five‑loop topology completes in under 12 hours, compared with an estimated 150 hours on a single node, confirming excellent weak scaling. The authors also quantify the memory savings: the matroid‑driven equivalence detection cuts peak RAM usage by roughly 35 % for the most demanding case.

The paper does not shy away from current limitations. For extremely high‑rank tensors and topologies with many scales, the matroid construction still incurs a noticeable overhead, and the MPI communication pattern can become saturated when the number of nodes exceeds a few hundred, leading to diminishing returns. The authors outline a roadmap that includes GPU acceleration of the linear‑algebra kernels, asynchronous pipeline execution to hide communication latency, and the integration of more sophisticated matroid‑compression techniques.

In conclusion, Reduze 2 delivers a robust, scalable, and user‑friendly framework for modern perturbative quantum‑field‑theory calculations. By marrying advanced combinatorial algorithms (graph canonicalization and matroid theory) with a flexible hybrid parallel architecture and a resilient data‑management layer, it enables the community to tackle previously infeasible multi‑loop reductions. The open‑source nature of the code, together with detailed documentation and examples, positions Reduze 2 as a cornerstone tool for precision phenomenology at the LHC and future colliders.