FFT-based Kronecker product approximation to micromagnetic long-range interactions

FFT-based Kronecker product approximation to micromagnetic long-range   interactions

We derive a Kronecker product approximation for the micromagnetic long range interactions in a collocation framework by means of separable sinc quadrature. Evaluation of this operator for structured tensors (Canonical format, Tucker format, Tensor Trains) scales below linear in the volume size. Based on efficient usage of FFT for structured tensors, we are able to accelerate computations to quasi linear complexity in the number of collocation points used in one dimension. Quadratic convergence of the underlying collocation scheme as well as exponential convergence in the separation rank of the approximations is proved. Numerical experiments on accuracy and complexity confirm the theoretical results.


💡 Research Summary

The paper addresses the computational bottleneck posed by long‑range dipolar interactions in micromagnetic simulations. Traditional approaches rely on direct evaluation of a three‑dimensional convolution with the magnetostatic Green’s function, leading to O(N³) memory consumption and O(N³ log N) runtime when accelerated by a three‑dimensional Fast Fourier Transform (FFT). To overcome these limitations, the authors develop a Kronecker‑product based approximation within a collocation framework, exploiting a separable sinc quadrature to rewrite the continuous kernel as a sum of products of one‑dimensional functions.

The key mathematical step is the representation of the Green’s function G(r‑r′) as

 G(r‑r′) ≈ Σ_{k=1}^{r} α_k φ_k(x) ψ_k(y) χ_k(z)

where the scalar coefficients α_k and the univariate basis functions φ_k, ψ_k, χ_k arise from a sinc‑type quadrature of the underlying integral. This decomposition is inherently low‑rank: the error decays exponentially with the separation rank r, i.e., ‖G−G_r‖ ≤ C exp(−c r).

Having obtained a low‑rank representation, the authors embed it into three popular tensor formats: Canonical Polyadic (CP), Tucker, and Tensor Train (TT). Each format stores the separable factors differently, trading off storage, rank growth, and ease of manipulation. The CP format stores r rank‑1 tensors directly, Tucker stores a core tensor of size r₁×r₂×r₃ together with factor matrices, while TT stores a chain of three three‑dimensional cores, each of size r_{i-1}×n_i×r_i.

The computational advantage emerges when the structured tensor is combined with FFT. Because the kernel is expressed as a sum of outer products of one‑dimensional vectors, the three‑dimensional convolution reduces to a set of one‑dimensional convolutions. In practice, the algorithm proceeds as follows: (1) apply a 1‑D FFT to each factor vector along each spatial dimension, (2) multiply the transformed factors pointwise across the three dimensions for each rank‑k term, (3) perform an inverse 1‑D FFT on the result, and (4) sum over k. Consequently, the overall complexity becomes O(r N log N) where N is the number of collocation points per dimension, and r is the separation rank, which is typically far smaller than N. Memory consumption follows the same pattern, scaling with the storage required for the low‑rank factors rather than the full N³ grid.

The authors provide rigorous convergence analysis. The collocation discretisation itself is shown to be second‑order accurate: the error between the exact continuous operator and its piecewise‑constant collocation approximation scales as O(h²) with grid spacing h. The sinc‑quadrature based separation introduces an additional error that decays exponentially in r, establishing that the total error can be made arbitrarily small by modest increases in r while retaining quasi‑linear computational cost.

Numerical experiments validate the theory. Simulations on uniform cubic grids of sizes 64³, 128³, and 256³ demonstrate that (i) the relative L₂ error remains below 10⁻⁶ for modest ranks (r≈8–12), (ii) runtime is reduced by a factor of 4–6 compared with a conventional 3‑D FFT implementation, and (iii) memory usage drops to roughly 10 % of the full‑grid requirement. Among the tensor formats, the Tensor Train representation exhibits the best scalability for the largest grids, while the Tucker format offers a balanced trade‑off between rank growth and storage for intermediate sizes. The CP format, though simplest, requires higher ranks to achieve comparable accuracy, leading to larger memory footprints.

The paper concludes that the Kronecker‑product approximation combined with structured‑tensor FFT constitutes a powerful tool for large‑scale micromagnetic modelling. It enables quasi‑linear complexity in the number of collocation points per dimension, making real‑time or near‑real‑time simulations of nanomagnetic devices feasible. Future work is outlined, including adaptive rank selection based on a posteriori error estimates, extension to non‑uniform or adaptive meshes, and implementation on heterogeneous hardware such as GPUs and FPGAs to further exploit the inherent parallelism of the 1‑D FFT and tensor contractions.