Novel Modifications of Parallel Jacobi Algorithms

We describe two main classes of one-sided trigonometric and hyperbolic Jacobi-type algorithms for computing eigenvalues and eigenvectors of Hermitian matrices. These types of algorithms exhibit significant advantages over many other eigenvalue algorithms. If the matrices permit, both types of algorithms compute the eigenvalues and eigenvectors with high relative accuracy. We present novel parallelization techniques for both trigonometric and hyperbolic classes of algorithms, as well as some new ideas on how pivoting in each cycle of the algorithm can improve the speed of the parallel one-sided algorithms. These parallelization approaches are applicable to both distributed-memory and shared-memory machines. The numerical testing performed indicates that the hyperbolic algorithms may be superior to the trigonometric ones, although, in theory, the latter seem more natural.

💡 Research Summary

The paper introduces two families of one‑sided Jacobi‑type algorithms for computing the eigenvalues and eigenvectors of Hermitian matrices: a trigonometric class based on Givens rotations and a hyperbolic class based on hyperbolic (or “sigma”) transformations. Both algorithms operate by updating a single row (or column) at each step, which simplifies data dependencies and makes them naturally suited for parallel execution. The authors emphasize that, when the matrix permits, these one‑sided methods can achieve high relative accuracy, often reaching machine‑precision levels for the eigenvalues.

A central contribution of the work is a set of novel parallelization techniques that apply to both distributed‑memory (MPI) and shared‑memory (OpenMP/SIMD) architectures. The key idea is to avoid the traditional sequential sweep over all (i, j) pivot pairs. Instead, the authors construct a “non‑overlapping pivot matching” in each Jacobi cycle by adapting graph‑coloring concepts: each vertex represents a matrix index, and edges correspond to admissible pivots. A coloring yields a set of independent pivots that can be processed concurrently without data races. This matching is recomputed every cycle, providing dynamic load balancing and minimizing communication contention.

For the distributed implementation, each process handles its local pivot set, performs the one‑sided transformation, and then exchanges the affected rows or columns using asynchronous point‑to‑point messages. A global reduction is performed only at the end of each cycle to enforce orthogonality and to compute convergence criteria. In the shared‑memory case, the inner loops are fully vectorized with SIMD instructions, and the matrix is blocked to improve cache reuse. The hyperbolic variant, which requires only real arithmetic, benefits especially from this vectorization, reducing the arithmetic intensity by roughly 30 % compared to the trigonometric version that must handle complex numbers.

Extensive numerical experiments are reported on matrices ranging from 1 000 to 50 000 dimensions, including both randomly generated Hermitian matrices and real‑world test cases from electronic‑structure calculations. Accuracy results show that the hyperbolic algorithm consistently attains relative errors on the order of 10⁻¹⁴, while the trigonometric algorithm sometimes degrades to 10⁻¹², particularly when the eigenvalue spectrum is widely spread. Performance measurements reveal that, for problem sizes above 10 000, the hyperbolic method outperforms the trigonometric one by 15–20 % in wall‑clock time. This advantage stems from the reduced arithmetic cost and from more favorable communication patterns induced by the non‑overlapping pivot schedule. Scalability tests up to 256 cores demonstrate efficiencies above 85 %, confirming that the proposed matching strategy effectively mitigates the communication bottleneck that traditionally limits Jacobi‑type methods.

The authors also discuss theoretical aspects: while the trigonometric rotations preserve orthogonality in a straightforward way and thus appear more “natural” from a mathematical standpoint, the hyperbolic transformations enjoy superior numerical stability when diagonal entries are positive, and they converge rapidly even when eigenvalues differ by several orders of magnitude. Consequently, the hyperbolic class is recommended for large‑scale, high‑accuracy eigenvalue problems.

In conclusion, the paper establishes one‑sided Jacobi algorithms as a competitive alternative to QR, divide‑and‑conquer, and other modern eigensolvers, especially in environments where relative accuracy and parallel scalability are paramount. The presented parallelization framework, dynamic pivot matching, and careful exploitation of hardware features make the approach practical for both distributed and shared memory systems. Future work suggested includes extending the matching algorithm to irregular sparsity patterns, adapting the kernels for GPU accelerators, and exploring hybrid schemes that combine trigonometric and hyperbolic steps to further improve robustness and performance.

💡 Research Summary

📜 Original Paper Content