Machine-Learned Hamiltonians for Quantum Transport Simulation of Valence Change Memories
The construction of the Hamiltonian matrix \textbf{H} is an essential, yet computationally expensive step in \textit{ab-initio} device simulations based on density-functional theory (DFT). In homogeneous structures, the fact that a unit cell repeats itself along at least one direction can be leveraged to minimize the number of atoms considered and the calculation time. However, such an approach does not lend itself to amorphous or defective materials for which no periodicity exists. In these cases, (much) larger domains containing thousands of atoms might be needed to accurately describe the physics at play, pushing DFT tools to their limit. Here we address this issue by learning and directly predicting the Hamiltonian matrix of large structures through equivariant graph neural networks and so-called augmented partitioning training. We demonstrate the strength of our approach by modeling valence change memory (VCM) cells, achieving a Mean Absolute Error (MAE) of 3.39 to 3.58 meV, as compared to DFT, when predicting the Hamiltonian matrix entries of systems made of $\sim$5,000 atoms. We then replace the DFT-computed Hamiltonian of these VCMs with the predicted one to compute their energy-resolved transmission function with a quantum transport tool. A qualitatively good agreement between both sets of curves is obtained. Our work provides a path forward to overcome the memory and computational limits of DFT, thus enabling the study of large-scale devices beyond current \textit{ab-initio} capabilities
💡 Research Summary
The paper tackles a fundamental bottleneck in first‑principles device simulation: the construction of the Hamiltonian matrix H, which in density‑functional theory (DFT) scales as O(N³) and becomes prohibitive for amorphous or defect‑rich structures containing thousands of atoms. The authors propose a machine‑learning (ML) pipeline that directly predicts H for large‑scale valence‑change memory (VCM) cells, thereby bypassing the expensive DFT step.
The core of the approach is an equivariant graph neural network (EGNN) that respects the rotational covariance of H when expressed in a localized spherical‑harmonic basis. The network consists of a single message‑passing layer with multi‑head attention, node embeddings representing on‑site Hamiltonian blocks, and edge embeddings for off‑site interactions. By embedding the SO(3) symmetry directly into the convolutional kernels, the model learns the correct transformation behavior with far fewer training examples than a conventional neural network.
Training data are generated from two VCM structures (TiN‑HfO₂‑Ti/TiN stacks) in which oxygen vacancies are inserted randomly, yielding 20 thin (1 Å) slices that together contain roughly 40 k atoms. Each slice is processed with CP2K DFT to obtain the exact Hamiltonian, which serves as the ground‑truth label. Crucially, the test set consists of four unseen configurations produced by kinetic Monte Carlo (KMC) simulations: two with fully formed conductive filaments and two with broken filaments. This deliberate distribution gap forces the model to generalize beyond the specific vacancy patterns seen during training.
To handle the memory constraints of a single GPU, the authors employ “augmented partitioning”: the full 5,268‑atom device is divided longitudinally into slices while preserving inter‑slice atomic connectivity. The EGNN is trained on these partitions and later applied to the whole device in a single forward pass.
Performance metrics show that the predicted on‑site (node) and off‑site (edge) Hamiltonian entries have mean absolute errors (MAE) of 1.5–1.8 meV and 0.12 meV respectively, leading to an overall MAE of 3.39–3.58 meV for the full matrix. These errors are comparable to the state‑of‑the‑art DeepH‑2 model (2.2 meV) but achieved on structures more than thirty times larger (≈5,000 atoms versus ≤150 atoms).
The ML‑generated Hamiltonians are fed into the Omen quantum‑transport simulator, which uses the non‑equilibrium Green’s function (NEGF) formalism to compute the energy‑resolved transmission function T(E) and the resulting current I under a 1 V bias. While the detailed shape of T(E) (peak positions and amplitudes) deviates from the DFT reference, the integrated current‑voltage characteristics match closely, especially for the fully formed filament cases. For broken‑filament configurations, where the transmission is very low, the current is more sensitive to Hamiltonian errors, yet the ML model still correctly identifies the low‑conductance state.
Inference time is dramatically reduced: a full Hamiltonian prediction takes ~2 seconds on a GPU, compared with ~3.94 node‑hours for the corresponding DFT calculation. Consequently, the approach enables rapid generation of hundreds of intermediate Hamiltonians along a switching trajectory, amortizing the modest training cost (< 40 node‑hours).
The authors acknowledge limitations: the transmission spectra lack some fine features, and the model’s accuracy declines for low‑conductance states. Future work will explore larger and more diverse training datasets, more expressive equivariant architectures (e.g., Equiformer‑v2), and loss functions that directly target transport observables. They also envision extending the framework to other emerging memory technologies such as phase‑change memories, where amorphous‑to‑crystalline transitions could be modeled without resorting to costly DFT calculations.
In summary, this study demonstrates that an EGNN combined with augmented partitioning can learn to predict DFT‑level Hamiltonians for thousands‑atom, non‑periodic devices with sub‑meV accuracy, and that the resulting ML‑Hamiltonians are sufficiently accurate to reproduce key transport characteristics. This opens a pathway toward atomistic, first‑principles simulation of realistic device geometries that were previously out of reach for conventional electronic‑structure methods.
Comments & Academic Discussion
Loading comments...
Leave a Comment