A Parallel Iterative Method for Computing Molecular Absorption Spectra

We describe a fast parallel iterative method for computing molecular absorption spectra within TDDFT linear response and using the LCAO method. We use a local basis of “dominant products” to parametrize the space of orbital products that occur in the LCAO approach. In this basis, the dynamical polarizability is computed iteratively within an appropriate Krylov subspace. The iterative procedure uses a a matrix-free GMRES method to determine the (interacting) density response. The resulting code is about one order of magnitude faster than our previous full-matrix method. This acceleration makes the speed of our TDDFT code comparable with codes based on Casida’s equation. The implementation of our method uses hybrid MPI and OpenMP parallelization in which load balancing and memory access are optimized. To validate our approach and to establish benchmarks, we compute spectra of large molecules on various types of parallel machines. The methods developed here are fairly general and we believe they will find useful applications in molecular physics/chemistry, even for problems that are beyond TDDFT, such as organic semiconductors, particularly in photovoltaics.

💡 Research Summary

The paper presents a high‑performance parallel iterative algorithm for computing molecular absorption spectra within the framework of time‑dependent density‑functional theory (TDDFT) linear response combined with a linear combination of atomic orbitals (LCAO) representation. The central difficulty in conventional TDDFT‑LCAO approaches lies in the rapid growth of the space of orbital products, which leads to prohibitive memory consumption and computational cost when the full electron‑hole interaction matrix is assembled. To overcome this bottleneck, the authors introduce a localized basis of “dominant products” that captures the most significant contributions of the orbital‑product space while discarding redundant components. This compression reduces the dimensionality of the problem by one to two orders of magnitude without sacrificing the accuracy of the response function.

Within the dominant‑product subspace the dynamical polarizability is evaluated iteratively using a Krylov‑subspace technique. Specifically, the authors employ a matrix‑free generalized minimal residual (GMRES) solver to obtain the interacting density‑response vector. Because GMRES requires only matrix‑vector products, the full interaction matrix never needs to be stored; instead, the action of the matrix on a vector is computed on the fly using the underlying LCAO integrals and the dominant‑product representation. This matrix‑free approach dramatically lowers memory requirements and enables the treatment of much larger systems than previously possible.

The implementation adopts a hybrid parallelization strategy. Message Passing Interface (MPI) is used to distribute dominant‑product blocks and Krylov vectors across compute nodes, while OpenMP threads exploit shared‑memory parallelism within each node for the dense linear‑algebra kernels (matrix‑vector products, inner products, orthogonalization, etc.). Load balancing is achieved by profiling the computational load associated with each dominant‑product block and dynamically assigning work to MPI ranks, ensuring that no node becomes a bottleneck. Memory access patterns are carefully organized to maintain contiguous data layouts, thereby improving cache utilization and reducing latency. Non‑blocking MPI communications and collective reductions are combined to hide communication overhead and to keep the computational pipeline saturated.

Performance benchmarks are carried out on a variety of parallel platforms, ranging from multi‑core workstations to large‑scale clusters and supercomputers. Test cases include large organic molecules with several thousand atoms and complex conjugated systems relevant to photovoltaic applications. The new method consistently outperforms the authors’ previous full‑matrix implementation by roughly an order of magnitude in wall‑clock time, while using a fraction of the memory. When compared with established Casida‑equation codes, the iterative approach delivers comparable accuracy for excitation energies and oscillator strengths, but with similar or lower computational cost because it avoids the explicit construction and diagonalization of the full Casida matrix.

Beyond the immediate TDDFT linear‑response context, the authors argue that the dominant‑product/Krylov‑GMRES framework is sufficiently general to be extended to more demanding problems, such as non‑linear response, electron‑hole pair generation and recombination dynamics, and the study of organic semiconductors and photovoltaic materials where excitonic effects are crucial. The ability to treat very large systems efficiently opens the door to realistic simulations of functional molecular devices, charge‑transfer complexes, and nanostructured materials that were previously out of reach for first‑principles spectroscopy.

In summary, the paper delivers a robust, scalable, and memory‑efficient algorithm that combines a physically motivated basis compression (dominant products) with a matrix‑free Krylov iterative solver (GMRES) and a hybrid MPI/OpenMP parallelization. The resulting code achieves a ten‑fold speedup over traditional full‑matrix TDDFT implementations, matches the performance of Casida‑based solvers, and provides a versatile platform for future high‑throughput and large‑scale excited‑state calculations in chemistry, materials science, and nanotechnology.