MACE-POLAR-1: A Polarisable Electrostatic Foundation Model for Molecular Chemistry

Accurate modelling of electrostatic interactions and charge transfer is fundamental to computational chemistry, yet most machine learning interatomic potentials (MLIPs) rely on local atomic descriptors that cannot capture long-range electrostatic eff…

Authors: Ilyes Batatia, William J. Baldwin, Domantas Kuryla

MACE-POLAR-1: A Polarisable Electrostatic Foundation Model for Molecular Chemistry
MA CE-POLAR-1: A P olarisable Electrostatic F oundation Mo del for Molecular Chemistry Ily es Batatia, 1 William J. Baldwin, 1 Doman tas Kuryla, 2, 1 Joseph Hart, 1, 3 Elliott Kasoar, 4, 1 Alin M. Elena, 4 Harry Moore, 1, 5 Mik oła j J. Ga wk o wski, 6 Benjamin X. Shi, 7 V enk at Kapil, 6 P anagiotis K ourtis, 8 Ioan-Bogdan Magdău, 8 and Gab or Csan yi 1, 9 1 Engine ering L ab or atory, University of Cambridge, T rumpington St, Cambridge, UK 2 Y usuf Hamie d Dep artment of Chemistry, University of Cambridge, L ensfield R o ad, Cambridge, UK 3 Cavendish L ab or atory, University of Cambridge, J. J. Thomson A venue, Cambridge, UK 4 Scientific Computing Dep artment, Scienc e and T e chnolo gy F acilities Council, Dar esbury L ab or atory, Ke ckwick L ane, Dar esbury W A4 4AD, UK 5 Ångstr öm AI, San F r ancisc o, California, USA 6 Dep artment of Physics and Astr onomy, University Col le ge L ondon, 7-19 Gor don St, L ondon WC1H 0AH, Unite d Kingdom 7 Initiative for Computational Catalysis, Flatir on Institute, 160 5th Avenue, New Y ork, NY 10010 8 Scho ol of Natur al and Envir onmental Scienc es, Newc astle University, Newc astle up on T yne, UK 9 Max Planck Institute for Polymer R ese ar ch, A ckermannwe g 10, Mainz, Germany (Dated: F ebruary 24, 2026) A ccurate mo delling of electrostatic interactions and c harge transfer is fundamental to compu- tational c hemistry , yet most mac hine learning interatomic p otentials (MLIPs) rely on lo cal atomic descriptors that cannot capture long-range electrostatic effects. W e present a new electrostatic foun- dation mo del for molecular chemistry that extends the MA CE architecture with explicit treatment of long-range in teractions and electrostatic induction. Our approac h combines lo cal many-bo dy geometric features with a non-self-consisten t field formalism that up dates learnable charge and spin densities through polarisable iterations to model induction, follow ed b y global c harge equilibration via learnable F ukui functions to con trol total charge and total spin. This design enables an ac- curate and physical description of systems with v arying c harge and spin states while maintaining computational efficiency and ease of training. T rained on the OMol25 dataset of 100 million h ybrid DFT calculations, our models ac hieve chemical accuracy across diverse b enc hmarks, with accuracy comp etitiv e with hybrid DFT on thermo chemistry , reaction barriers, conformational energies, and transition metal complexes. Notably , w e demonstrate that the inclusion of long-range electrostatics leads to a large impro v ement in the description of non-co v alen t in teractions and supramolecular complexes ov er non-electrostatic mo dels, including sub-kcal/mol prediction of molecular crystal for- mation energy in the X23-DMC dataset and a fourfold improv ement ov er short-ranged mo dels on protein-ligand interactions. Our mo del also demonstrates an impro v ed description of ions and redox reactions of transition metals in solution. The mo del’s ability to handle v ariable charge and spin states, respond to external fields, provide interpretable spin-resolved charge densities, and main- tain accuracy from small molecules to protein-ligand complexes p ositions it as a v ersatile to ol for computational molecular chemistry and drug discov ery . I. In tro duction Electrostatic in teractions go v ern the structure, dynam- ics, and function of molecular systems throughout chem- istry and biology . F rom the intricate folding of proteins stabilised by salt bridges, to the sp ecific recognition of substrates by enzymes through complementary c harge distributions, electrostatics dictate molecular behaviour at man y scales. In drug discov ery , the electrostatic com- plemen tarity b et w een a ligand and its protein target is a primary determinant of binding affinity and sp eci- ficit y . In m aterials science, charge transfer and p olari- sation phenomena at interfaces control the p erformance of catalysts, batteries, and electronic devices. Despite this fundamen tal imp ortance, the accurate and efficient mo delling of electrostatic interactions remains one of the cen tral c hallenges in computational chemistry . Mac hine learning in teratomic p oten tials (MLIPs) hav e emerged as a transformative to ol for computational chemistry , reac hing the accuracy of quan tum mechanics (QM) at a fraction of the computational cost. The leading MLIP arc hitectures [1–5], most notably message-passing neu- ral netw orks, hav e achiev ed success by learning complex and accurate relationships b etw een the lo cal atomic en- vironmen t of an atom and its contribution to total en- ergy . Although these lo cal descriptors excel at captur- ing short-range quantum effects suc h as cov alent b ond- ing, Pauli repulsion, and short-range electrostatics, they are, b y construction, unable to model the long-range na- ture of electrostatics or disp ersion. In man y systems, long-range electrostatic interactions are screened, and lo- cal mo dels can accurately repro duce many observ ables of complex chemical systems. Ho wev er, the absence of long-range in teractions b ecomes a critical failure p oin t for c harged molecules, ionic materials, and large biomolecu- lar complexes, where long-range electrostatics is a domi- nan t ph ysical in teraction. Moreo v er, this limitation pre- v ents these models from resp onding to external electric 2 fields, correctly mo delling charge transfer b et ween dis- tan t fragments, or describing p olarisation in extended systems. T o address this, several strategies hav e been developed to incorp orate electrostatics into MLIPs; for comprehen- siv e reviews see Refs. [6–9]. The simplest approaches pre- dict partial charges or atomic m ultipoles directly from lo- cal geometry . Early examples include the 3 rd Generation Neural Netw ork (3GNN) [10, 11], the p olarisable m ulti- p olar electrostatic p otential [12] and PhysNet [13]. The LODE [14] computes long-range features using equiv- arian t descriptors as sources and differen t algebraic de- ca ys to mo del electrostatics and dispersion effects. Al- though these mo dels include long-range electrostatic in- teractions through a Coulomb term, they cann ot handle systems with v arying total charge. Subsequent architec- tures in tro duced mechanisms to enforce a specific total c harge, either through global charge em b eddings, as in Deep P oten tial Long Range (DPLR) [15, 16], the La- ten t Ew ald Summation (LES) metho d [17], or the SO3LR mo del [18]; through multi-h ypothesis total-charge equi- libration in FENNIX [19, 20]; or through learned F ukui functions as in AIMNet-NSE [21]. How ev er, predicting c harges purely from lo cal geometry has fundamental limi- tations: such mo dels cannot capture induced p olarisation b et w een w ell-separated subsystems or long-range charge transfer [7, 22]. More sophisticated approac hes employ charge equili- bration (QEq) schemes [23–25], as first demonstrated in the CENT architecture [26–28] and s ubsequen tly in 4G- HDNN [29], kQEq [30], and BAMBOO [31]. These mo d- els define an energy functional quadratic in c harges, with electronegativities and hardnesses predicted from local geometry , and obtain charges by minimising this func- tional, similar to the self-consistency lo op of densit y func- tional theory (DFT). Although these mo dels can capture induced p olarisation through their self-consistency lo op, classical QEq suffers from fundamental deficiencies, in- cluding incorrect fractional charge separation up on dis- so ciation [32–34] and unphysical cubic scaling of po- larisabilit y with system size [35, 36], leading to metal- lik e ov er-screening in large insulating systems [37] that mak es them inadequate for biomolecules. Alternativ e self-consisten t mac hine learning approaches based on electronic structure theory [38], suc h as SCFNN [39] using iteratively up dated W annier function cen tres, eMLP [40] treating W annier centres as pseudo-atoms, and BpopNN [41] inspired b y orbital-free DFT, can cap- ture phenomena lik e induced p olarisation and long-range c harge transfer, but are significantly more cumbersome to train and deploy b ecause of their self-consistent lo op and often require constrained DFT data for training. A central goal of atomistic mo delling is to obtain in- teratomic p otentials that can b e deploy ed out of the b o x across broad c hemical space, while retaining near– ab ini- tio accuracy for energies and forces. F oundation force fields [42–44] pursue this b y pretraining on large, chemi- cally div erse datasets and transferring across phases and c hemistries. The release of large-scale datasets and foun- dation mo dels in b oth materials and molecular c hemistry has transformed computational chemistry by demo cratis- ing the use of MLIPs for a wide range of c hemists. In molecular chemistry , the dev elopment of transfer- able p otentials has follow ed a distinct tra jectory shap ed b y the challenges of v ariable charge, spin, and long- range interactions. Moreo ver, the accurate descrip- tion of molecular c hemistry often requires high lev- els of electronic structure theory (hybrid DFT or even w av efunction metho ds), whic h significan tly increases the cost of generating large and diverse datasets. Early transferable MLIPs such as ANI-1 and ANI-2x estab- lished broad co v erage across neutral closed-shell small organic molecules.[45–47] MACE-OFF [48], trained on the SPICE dataset [49], demonstrated that pre-trained short-range MLIPs can reach ab initio accuracy across a wide range of c hemical systems, from molecular liq- uids and crystals to drug-like molecules and biop oly- mers. AIMNet [50] and AIMNet2 [51, 52] introduced neural spin equilibration to handle c harged and op en- shell species, enabling accurate treatmen t of radicals and small-molecule reactivity . SO3LR [53] combined learned short-range in teractions with explicit long-range electro- statics and disp ersion computed from lo cal c harges, im- pro ving generalisation to condensed-phase environmen ts. Domain-fo cused foundation mo dels targeting biomolec- ular simulations, such as F eNNix-Bio [19, 20], hav e ac hieved accuracies comp etitiv e with classical force fields for protein and nucleic acid dynamics. The release of the OMol25 dataset [54] has represen ted a breakthrough for large-scale pretraining in molecular chemistry , ow- ing to its unprecedented size and chemical cov erage, computed throughout at the ω B97M-V range-separated h ybrid lev el of theory . Short-range MLIPs trained on OMol25—including UMA [44], MACE-OMol [54], Orb- Mol [4], and MA CE-MH-1 [55]—hav e demonstrated un- preceden ted accuracies on molecular b enchmarks, es- tablishing ML IPs as a credible replacement for semi- empirical methods and, increasingly , for DFT itself in molecular simulations. In this w ork, we in troduce a new family of electro- static foundation models, MA CE-POLAR-1 , that pro- vide a physics-based treatment of long-range electro- static in teractions through induction effects while retain- ing computational efficiency . Our mo del builds on the pro ven accuracy of the MACE architecture for short- range interactions and incorporates long-range electro- statics through a no v el non-local field update. Sequen- tial p olarisable up dates to a non-lo cal c harge density re- fine atomic m ultip oles in resp onse to the electrostatic p oten tial; after eac h update, the total c harge and spin of the system are equilibrated using learnable, en vironmen t- dep enden t F ukui functions, a mechanism inspired by con- ceptual density functional theory and the AIMNet-NSE mo del [21]. This design captures the essen tial physics of p olarisation and charge transfer while av oiding the cost and p otential instability of self-consistent field iter- 3 ations. W e demonstrate the capabilities of this approach b y training MACE-POLAR-1 mo dels on 100 million di- v erse molecular structures from the OMol25 dataset and conducting an extensive b enc hmark campaign spanning thermo c hemistry , reaction barriers, conformational ener- gies, transition metal complexes, protein–ligand interac- tions, supramolecular complexes, molecular crystals, re- do x chemistry in solution, and molecular liquids. Our results show that the explicit inclusion of physics-based electrostatics dramatically improv es the description of c harged systems and non-co v alen t complexes, achieving state-of-the-art accuracy on challenging b enc hmarks suc h as molecular crystal lattice energies and protein–ligand binding. The mo del’s abilit y to handle v ariable charge and spin states, respond to external fields, provide in ter- pretable spin-resolved charge densities, and maintain ac- curacy from small molecules to protein–ligand complexes p ositions it as a versatile to ol for computational molecu- lar chemistry , drug disco very and bio-sim ulations. I I. Theory and Metho ds A. Theoretical F oundation The accurate description of molecular systems requires capturing b oth short-range and long-range in teractions. While message-passing neural net w orks excel at learn- ing lo cal quantum mechanical in teractions, they cannot, b y construction, capture in teractions b eyond their cutoff radius. Our approach addresses this fundamen tal limita- tion through explicit treatmen t of long-range interactions while maintaining the flexibility and generalisability of neural netw ork p oten tials for short-range interactions. The total energy of an atomistic system is decomp osed in to short-range and long-range contributions: E total = E local + E non-local + E electrostatic (1) where E local captures most of the contributions to the energy , including cov alent b onding, Pauli repulsion, and short-range electrostatic effects, and is predicted b y a lo cal MLIP mo del; E electrostatic describ es smeared long- range Coulom bic in teractions; and E non-local accoun ts for a learned correction that captures residual non-lo cal ef- fects beyond pure electrostatics, including dispersion. In the next section, w e explain ho w we restrict the flexibility of E non-local b y parametrising it as a function of local ge- ometry and a non-lo cal c harge density . The lo cality of the E local term dep ends on the c hoice of hyper-parameters for the lo cal MLIP , and it typically describ es interactions b e- t ween 10 and 20 Å. F or most MLIP mo dels, this depends on the num ber of la yers used for message passing and the receptiv e field at each lay er. See T able I for the receptive field of the models tested in this pap er. B. Mo del Architecture Our mo del extends the MACE [1] architecture with ex- plicit long-range interactions through a field-dep endent induction mec hanism. The k ey addition is a non-self- consisten t field formalism that up dates atomic multipoles based on the lo cal electric field while main taining com- putational efficiency . F or a detailed description of the MA CE arc hitecture, see [1, 56]. F or a full exp osition of the design space of electrostatics extensions to the MA CE architecture, see our companion pap er on the de- sign space of self-consistent electrostatics extensions to MLIPs [8], which also presen ts a self-consisten t v ersion of the field c harge equilibration used here. W e first predict lo cal features using ( T ) MA CE la y- ers, forming no de features at eac h lay er h ( t ) i,klm , where i is the atom index, k is the channel index of the MACE features, and l m are the usual spherical indices. These no de features encode rich man y-b ody information ab out the geometry and chemistry of the lo cal environmen t of atom i . W e also read out a lo cal contribution to the energy E local . In the rest of the pap er, we denote fea- ture vectors in b old, with the ( k lm ) indices implicit, for example ( h ( t ) i ) klm := h ( t ) i,klm . T o mo del b oth the electro- static and non-lo cal contributions to the energy , w e use a physics-based model of long-range interactions. Belo w, w e outline ho w we introduce an explicit spin-charge den- sit y in the mo del as a proxy feature for long-range in ter- actions, and use it to compute electrostatic and non-lo cal con tributions. The main motiv ation to learn a ph ysics- based mo del of long-range interactions is to ensure that it can b e fitted to small and medium-sized systems acces- sible from ab initio data and extrap olated appropriately to larger systems. 1. Co arse-gr aining the ele ctr on density In our long-range extension, w e in tro duce a learnable spin-resolv ed charge density as the central non-lo cal fea- ture of the mo del that will contribute to the total en- ergy . W e refer to this quan tit y as the spin-c harge den- sit y . Before defining it formally , w e explain its physical con tent. Throughout this pap er, we use atomic units ( e = ℏ = m e = 4 π ϵ 0 = 1 ) unless stated otherwise. In electronic structure, the electron density of a molec- ular system, n ( r ) , is a p ositiv e function o v er r ∈ R 3 that in tegrates to the total n umber of electrons N el and giv es the exp ected num ber of electrons p er unit v olume at each p oin t in space. The charge density is the difference b e- t ween the p ositiv e nuclear c harges and the electron den- sit y: ρ ( r ) = X i Z i δ ( r − R i ) − n ( r ) , (2) where Z i is the nuclear charge of atom i at p osition R i . The charge density integrates to P i Z i − N el = Q , the 4 FIG. 1. Overview of the MA CE-POLAR-1 architecture and b enc hmarked applications. (A) Mo del architecture. A tomic p ositions and sp ecies are passed to a MACE mo del, which predicts lo cal no de features, a lo cal energy contribution, and an initial set of spin-charge multipoles. The lo cal features ric hly encode semi-lo cal geometry and chemistry . Then, the spin-c harge multipoles are iteratively refined through a long-range op eration. First, a physically inspired global conv olution maps the spin-charge multipoles into atom-centred electrostatic features v i,nlm . Then, a lo cal op eration follo wed by a global normalisation predicts a new set of spin-c harge multipoles. The pro cess is iterated twice, b efore the final set of multipoles is used to compute a Coulomb energy and an additional learned non-lo cal energy contribution. (B) Construction of long-range electrostatic features. Information is propagated betw een distan t atoms b y constructing a smooth charge density ρ ( r ) from the spin-c harge m ultip oles and conv olving with the Coulomb kernel to give an electric p otential v ( r ) . Then, one can pro ject the p oten tial on to atom-cen tred functions to giv e equiv arian t electrostatic features. (C) Ph ysical extrapolation capabilities: c harge lo calisation when fragmenting a cluster, resp onse of the model c harge densit y to an electric field, and correct prediction of the o xidation state of transition metal ions in water. (D) Application domains b enc hmarked: protein-ligand binding, molecular crystals, transition-metal redox p oten tials, and supramolecular complexes. total c harge. Because the n uclear c harge Z i around eac h atom is almost en tirely screened b y the approximately Z i electrons that surround it, the charge density is close to zero near each n ucleus in a neutral molecule: it repre- sen ts only the residual net charge arising from b onding, c harge transfer, and p olarisation. In this sense, ρ ( r ) is analogous to a deformation density in crystallograph y . In our model, rather than learning the electron densit y n ( r ) directly , w e learn a smo oth coarse-graining of the c harge densit y ρ ( r ) . This has a fundamen tal consequence for the energy decomp osition. The dominan t ele ctrostatic con tributions to the total energy—n uclear–n uclear repul- sion and n uclear–electron attraction—are short-ranged b ecause they are lo cally screened, and are absorb ed en- tirely into E local , whic h the MACE mo del learns from data. The explicit E electrostatic term accounts only for the muc h smaller long-range Coulom b in teraction of this smo oth residual density . The E non-local captures resid- ual long-range interactions b ey ond pure electrostatics, including dispersion, as a function of this smo oth c harge densit y and lo cal geometry . 2. Multip oles of the Char ge and Spin Density W e no w define the spin-charge density formally . F or a system of N atoms at positions ( R 1 , . . . , R N ) with chem- ical sp ecies ( Z 1 , . . . , Z N ) , total charge Q , and total spin S , the spin-charge densit y ρ ↑↓ ( r ) := { ρ ↑ ( r ) , ρ ↓ ( r ) } ∈ R 2 (3) giv es the spin-up ( ↑ ) and spin-down ( ↓ ) comp onen ts of the residual density at eac h point r ∈ R 3 as defined in Equation 2. Each spin channel of the c harge density is 5 therefore defined as, ρ ↑ ( r ) := X i Z i 2 δ ( r − R i ) − n ↑ ( r ) (4) ρ ↓ ( r ) := X i Z i 2 δ ( r − R i ) − n ↓ ( r ) (5) with n ↑ ( r ) the electron density of spin-up electrons, whic h integrates to the num ber of spin-up electrons N ↑ el , and n ↓ ( r ) the electron density of spin-do wn electrons, whic h integrates to the num b er of spin-down electrons N ↓ el . On eac h atom, half the n uclear c harge, Z i / 2 , is used to screen each spin c hannel of the electron density . Throughout the pap er, we leav e the dep endence on the atomic configuration, total charge, and total spin implicit and we write ρ ↑↓ ( r , R 1 , Z 1 , . . . , R N , Z N , Q, S ) := ρ ↑↓ ( r ) . This density ma y be non-local as its v alue at a p oin t r can depend on the p ositions and sp ecies of atoms far from r . Because the nuclear p oint c harges and core elec- tron density hav e b een coarse-grained aw a y through the screening described ab ov e, the residual c harge density ρ ( r ) is a smo oth, slowly v arying function that lacks the rapid oscillations and cusps of the full electronic den- sit y . Only this low-frequency component of the charge densit y gives rise to long-range electrostatic interactions; the high-frequency structure asso ciated with core elec- trons and nuclear cusps produces interactions that deca y rapidly with distance and is therefore captured by the lo cal MA CE mo del within E local . A low-order m ultip ole expansion with broad Gaussian smearing is consequently sufficien t to represen t ρ ( r ) for the purp ose of computing long-range electrostatics. W e expand the spin-charge density using atomic m ul- tip oles and Gaussian type orbitals (GTOs): ρ ↑ ( r ) = X i,lm p ↑ i,lm ϕ n =1 ,lm ( r − r i ) (6) ρ ↓ ( r ) = X i,lm p ↓ i,lm ϕ n =1 ,lm ( r − r i ) (7) ϕ nlm ( r − r i ) = C nl | r − r i | l exp  − | r − r i | 2 2 σ 2 n  Y lm  \ r − r i  (8) where p ↑ i,lm and p ↓ i,lm are the atomic multipole co efficien ts of spin-up and spin-down channels, resp ectiv ely , C nl is a normalisation constant, σ n is the Gaussian smearing width and Y lm are spherical harmonics. As mentioned ab o v e, we made implicit the dep endency of the multipole co efficien ts on the atomic p ositions, atomic species , and total charge and spin. The c harge density ρ and the spin densit y s can b e computed from the spin-charge density as shown in Eq. 9, by taking the sum or the difference of the tw o c hannels of the spin-charge density , ρ ( r ) = ρ ↑ ( r ) + ρ ↓ ( r ) , s ( r ) = ρ ↑ ( r ) − ρ ↓ ( r ) . (9) Throughout, we use the superscript ↑↓ to represent the spin c hannels on eac h quantit y . F or example, the t wo channels of spin-c harge densit y will b e written as ρ ↑↓ = { ρ ↑ , ρ ↓ } to simplify notation. The p ↑↓ i, 00 fea- tures correspond to up and down atomic charges, the { p ↑↓ i, 1 m } m ∈ [ − 1 , 0 , 1] to (up and down) atomic dip oles, and for l > 1 one obtains the higher-order multipoles, resolv- ing the spin-charge density at higher and higher resolu- tions. The spin and charge densities need to ob ey total normalisation constraints, Z s ( r ) d r = S, Z ρ ( r ) d r = Q, (10) where S corresp onds to the total n um b er of unpaired electrons and Q is the total c harge. W e adopt the conv en- tion S = N ↓ el − N ↑ el ≥ 0 , which in the absence of spin-orbit coupling is equiv alent to the opp osite conv en tion b y the spin-rev ersal symmetry of the non-relativistic Hamilto- nian and has no effect on any observ able. Restricting the non-lo cal contribution to the total energy to be a func- tion of a non-lo cal spin-charge density and lo cal geometry descriptors preven ts the mo del from learning ov erly flex- ible non-lo cal energy terms that w ould not extrapolate w ell. The use of tw o scalar fields for the spin-resolved densit y corresp onds to a coarse-graining of collinear un- restricted spin-DFT (spin-p olarised K ohn–Sham DFT). In particular, the mo del can correctly predict a non-zero spin density even for closed-shell systems, which mak es the mo del smo other and enables a b etter description of reactivit y . The inclusion of spin-orbit coupling w ould re- quire introducing a magnetisation vector field, whic h we lea ve for future work. 3. L o c al guess to the spin-char ge density Using the MACE no de features h ( t ) i,klm in eac h lay er t , w e first predict multipoles on each atom: ˜ p (0) , ↑↓ i,lm = T X t X k W ( t ) lk h ( t ) i,klm (11) where T is the num ber of lay ers of the lo cal MA CE mo del, and W is a learnable w eight matrix. These m ultip oles corresp ond to a lo cal baseline, capturing the geometrical and chemical dep endence of the spin-charge density that can b e well describ ed within the receptive field of the lo cal mo del. Then, w e equilibrate the monopoles ˜ p (0) , ↑↓ i, 00 to integrate to the correct total c harge and total spin using the F ukui 6 mec hanism, f (0) , ↑↓ i = MLP ( h ( T ) i,k 00 ) (12) p (0) , ↑ i, 00 = ˜ p (0) , ↑ i, 00 + f (0) , ↑ i P j f (0) , ↑ j   Q + S 2 − X j ˜ p (0) , ↑ j, 00   (13) p (0) , ↓ i, 00 = ˜ p (0) , ↓ i, 00 + f (0) , ↓ i P j f (0) , ↓ j   Q − S 2 − X j ˜ p (0) , ↓ j, 00   (14) where f (0) i are lo cally predicted F ukui features, Q is the target total charge and S the total spin. The v alues of Q + S 2 and Q − S 2 are the target spin-resolved normalisa- tion constraints for the ↑ and ↓ residual-c harge channels, resp ectiv ely . This equilibration is equiv alent to AIM- Net’s “neural charge equilibration” [21]. W e use the term "F ukui" here to emphasise the connection b et w een the f i features and the usual F ukui functions in conceptual DFT. In the supplemen tary information section V G, we deriv e an explicit connection of the F ukui features to con- ceptual DFT and show that the F ukui features are equal to the partial deriv ativ e of the monopole coefficients with resp ect to the c hemical p otential f ↑↓ i := ∂ p ↑↓ i ∂ µ . The equa- tions 13 and 14 can therefore b e understo o d as a first- order T aylor expansion of the monop oles, ∆ p ↑↓ i, 00 ≈ ∂ p ↑↓ i, 00 ∂ Q ↑↓ ∆ Q ↑↓ = ∂ p ↑↓ i ∂ µ P j ∂ p ↑↓ j ∂ µ ∆ Q ↑↓ (15) where the quantities ∆ Q ↑ = Q + S 2 − P j ˜ p (0) , ↑ j, 00 and ∆ Q ↓ = Q − S 2 − P j ˜ p (0) , ↓ j, 00 are the spin-resolv ed c hannel deficits. As total charge and spin normalisation only affect the monop oles, the rest of the multipoles (e.g. dip oles) are equal to the unequilibrated multipoles: { p (0) i,lm := ˜ p (0) i,lm } l> 0 . 4. L ong-r ange Polarisable Field Up dates and F ukui Equilibr ation In order to capture p olarisation effects and long-range c harge transfer, the mo del p erforms a series of long- ranged up dates to the spin-charge densit y , which are in- spired by a self-consisten t field lo op [8]. Eac h global up- date consists of three steps, whic h we call (1) long-ranged electrostatic feature construction, (2) lo cal multipole up- date, and (3) F ukui equilibration. W e no w describe each of these steps. In the following, u = 1 , ..., U denotes the up date itera- tion n umber (similar to lay er n umber t in the local part of the mo del), ∥ denotes the concatenation of v ectors, the differen t W represent learnable weigh t matrices and MLP stands for m ultilay er perceptron. 1. Long-Ranged Electrostatic F eature Con- struction F or each la y er u , w e first compute long-ranged electro- static features using the spin-charge density ρ ( u ) , ↑↓ ( r ′ ) . This is done by computing the spin-resolved electrostatic p oten tial v ( u ) , ↑↓ ( r ) generated b y the spin-charge densit y con volv ed with a 1 / | r | Coulomb kernel: v ( u ) , ↑↓ ( r ) = Z ρ ( u ) , ↑↓ ( r ′ ) | r − r ′ | d r ′ + 1 2 v app ( r ) (16) where the in tegral is o ver the whole space R 3 (or a three- dimensional flat torus for p erio dic systems). The poten- tial v app corresp onds to the p otential generated b y an externally applied electric field and is zero in the absence of an external field. Then, atom-centred electrostatic fea- tures are computed by pro jecting the electrostatic p oten- tial onto atom-cen tred Gaussian basis functions: v ( u ) , ↑↓ i,nlm = 1 N nl Z ϕ nlm ( r − r i ) v ( u ) , ↑↓ ( r ) d r (17) where ϕ nlm ( r − r i ) are the same atom-cen tred Gaussian basis functions as in Equation 5. One can c ho ose to use more basis functions ϕ nlm for the p otential than what w as used for the spin-charge density in Equations (6)- (8), to get a richer description of the potential. The smearing width and maximum angular momen tum can also b e chosen differently to those used in the spin-c harge densit y expansion. In a practical implementation, we do not compute ρ or v on a grid in order to compute the integrals in Equations 16 and 17, but use analytical Gaussian-orbital in tegrals or recipro cal-space transforms. Details of the pro jected-p oten tial ev aluation in op en and p eriodic b oundary conditions are given in SI Sections V E and V F. The electrostatic features are inherently non- lo cal due to the in tegral o v er the whole space. These features are closely related to the LODE features [14]. The main differences are that we constrain the sources of these electrostatic features to be the spin-charge densit y m ultip oles, and we consider only the Coulom b potential. 2. Multip oles Up date . F ollo wing this, the (non- lo cal) electrostatic features are com bined with local no de features to predict an updated set of spin-charge mul- tip oles p ( u +1) , ↑↓ i,lm . This begins b y join tly embedding the electrostatic features with the MACE lo cal features using an element-agnostic biased linear embedding: V ( u ) i,klm = X n, ↑↓ W ( u ) knl ↑↓ v ( u ) , ↑↓ i,nlm + X ↑↓ W kl ↑↓ p ( u ) , ↑↓ i,lm (18) + X ˜ k W ( u ) k ˜ k h ( T ) i, ˜ klm + b ( u ) k,lm δ lm, 00 (19) The ob ject V ( u ) i,klm join tly describ es the non-lo cal electric field, spin-charge density and the lo cal geometry . W e use these features to update the spin-charge density through 7 a series of nonlinear transformations: d ( u ) i,lk = X k ′ W ( u ) , dot lkk ′ l X m = − l h ( T ) i,klm V ( u ) i,k ′ lm (20) a ( u ) i,lk := MLP ( u )  [ d ( u ) i ∥ e ( z i )]  lk (21) g ( u ) i,klm = X k ′ W ( u ) , tp lkk ′ a ( u ) i,lk ′ h ( T ) i,k ′ lm (22) ∆ p ( u ) , ↑↓ i,lm = ReadoutMLP ( g ( u ) i ) lm (23) f ( u ) , ↑↓ i = ReadoutMLP ( g ( u ) i ) (24) ˜ p ( u ) , ↑↓ i,lm = p ( u ) , ↑↓ i,lm + ∆ p ( u ) , ↑↓ i,lm (25) with ( g ( u ) i ) klm := g ( u ) i,klm . The tilde on ˜ p ( u ) , ↑↓ i,lm indicates that these are “un-equilibrated” and the monop oles do not y et sum to the correct total charge and spin. The up date MLP in Equation 21 consists of 3 lay ers with 64 c hannels and SiLU activ ation. Both readouts in Equa- tions 24 and 23 use a tw o-la yer gated nonlinearity with hidden irreps 64x0e + 32x1o , employing SiLU for the scalar gate and sigmoid for the equiv arian t gate. Imp or- tan tly , the ab ov e op erations incorp orate non-lo cal infor- mation through the electrostatic fe atures, but the com- putation itself is lo cal since the new m ultip oles ˜ p ( u ) , ↑↓ i,lm on atom i are updated based only on the geometry and the electrostatic features around atom i . 3. F ukui Equilibration : After eac h prediction of the spin-charge density (including the initial lo cal pre- diction), we renormalise the total c harge and total spin using learnable F ukui functions. p ( u +1) , ↑ i, 00 = ˜ p ( u ) , ↑ i, 00 + f ( u ) , ↑ i P j f ( u ) , ↑ j   Q + S 2 − X j ˜ p ( u ) , ↑ j, 00   (26) p ( u +1) , ↓ i, 00 = ˜ p ( u ) , ↓ i, 00 + f ( u ) , ↓ i P j f ( u ) , ↓ j   Q − S 2 − X j ˜ p ( u ) , ↓ j, 00   (27) where f ( u ) , ↑↓ i are the predicted spin-resolved F ukui func- tions in Equation 24. This equilibration is equiv alen t to Equations 13 and 14, except that the F ukui functions are not restricted to purely lo cal features but can incorp orate non-lo cal information through the field up date. This op- eration is a global equilibration due to the normalisation in the denominators of Equations (26) and (27). The new spin-c harge multipoles p ( u +1) i, 00 no w re-enter at step 1. As with the previous equilibration, the rest of the multipoles (e.g. dip oles) are unc hanged, i.e. { p ( u +1) i,lm := ˜ p ( u ) i,lm } l> 0 . 5. Non-lo c al energy E non-local is a field and charge-dependent correction term that is added to the total energy to account for additional non-local energetic con tributions b ey ond the Coulom b interaction. W e compute it using an MLP on the sum of the dot pro duct of local geometry features, spin-c harge and electrostatic features, p emb i,klm = X ↑↓ W emb k p ( u ) , ↑↓ i,lm v emb i,klm = X n, ↑↓ W emb nk ↑↓ v ↑↓ i,nlm (28) d p i,lk = X k ′ W p, dot lkk ′ l X m = − l h i,klm p emb i,k ′ lm (29) d v i,lk = X k ′ W v , dot lkk ′ l X m = − l h i,klm v emb i,k ′ lm (30) E non-local = X i MLP([ d p i ∥ d v i ]) (31) where p emb i and v emb i are linearly scaled features of the c harges and electrostatic features and ∥ is the concate- nation of tw o vectors. W e use a 3-lay er MLP with SiLU activ ations and 128 hidden units in Equation 31. 6. Ele ctr ostatic Energy fr om Sme ar e d Multip oles The electrostatic energy E electrostatic is computed from the final multipole c harge density co efficients using a Gaussian smearing sc heme that ensures smo oth blending with the local energy at small distances. In the rest of this section, we only consider the final c harge density af- ter U iterations of the long-range up date, ρ ( r ) = ρ ( U ) ( r ) . The Gaussian basis functions defined in Eq. 8 enable an exact treatmen t of electrostatics through their analyt- ically tractable F ourier transforms. As discussed, the c harge density of the mo del is expanded in terms of atomic multipoles and Gaussian basis functions: ρ ( r ) = X i,lm p i,lm ϕ nlm ( r − r i ) (32) where p i,lm = p ↑ i,lm + p ↓ i,lm and ϕ nlm is defined in (8). The range of n is just one, since we only use one radial func- tion for the density expansion. The electrostatic energy is defined as E electrostatic = E Hartree + E app (33) = 1 2 Z Z ρ ( r ) ρ ( r ′ ) | r − r ′ | d r d r ′ + Z ρ ( r ) v app ( r ) d r (34) where v app is an applied p otential such as that from an applied field. T o ev aluate this energy in practice, we use separate implemen tations for real-space or perio dic computations. a. Non-Periodic: F or isolated molecular systems (op en b oundary conditions), we employ a direct real- space summation. Substituting the expression for ρ in 8 terms of p i,lm , the Hartree term b ecomes: E Hartree = 1 2 X ilm,j l ′ m ′ p i,lm p j,l ′ m ′ T ilm,j l ′ m ′ (35) T ilm,j l ′ m ′ := Z Z ϕ lm ( r − r i ) ϕ l ′ m ′ ( r ′ − r j ) | r − r ′ | d r d r ′ (36) F or example, the coefficient for the monop ole part of this sum is simply: T i 00 ,j 00 = erf ( r ij / √ 2 σ ) r ij (37) whic h is the familiar Gaussian Coulomb damping. F or the other coefficients, w e compute the in teraction coef- ficien t T ilm,j l ′ m ′ b y appro ximating one l = 1 Gaussian basis function in (36) as the difference of t wo l = 0 functions, whic h are sligh tly displaced relative to each other. This is equiv alen t to constructing a dip ole from t wo opp osite-sign charges. While it is p ossible to ev aluate l = 1 in tegrals analytically , we find this implementation to b e very efficient and transparen t. In the supplemen- tary information, it is shown that for Gaussian l = 1 m ultip oles, this appro ximation becomes exact for all sep- arations r ij —including when t wo Gaussians ov erlap—as the relativ e offset b etw een the t w o l = 0 functions ap- proac hes zero. In our implementation, we use an offset of 0.02 Å for represen ting Gaussian dip oles as the sum of tw o monop oles, allo wing us to use (37) to compute the c harge-dip ole and dip ole-dip ole in teractions. F ull de- tails of the algorithm are presented in the supplementary information. Note that the sum in the expression abov e includes i = j , meaning that we are including a self-energy term E self = 1 2 P i,lm p i,lm T ilm,ilm p i,lm . One cannot use the form ula (37) to compute this b ecause r ij = 0 , but these co efficien ts can be computed directly from (36) using the prop erties of Gaussian integrals. W e found that includ- ing this term is beneficial since it generally leads to a smo other predicted total electrostatic energy when atoms are close together, as has b een discussed previously [40]. The applied field term can b e ev aluated similarly: E app = X ilm p i,lm B i,lm (38) B i,lm = Z ϕ lm ( r − r i ) v app ( r ) d r (39) All the co efficients B i,lm can b e computed analytically as long as the applied p otential is a homogeneous field. b. Periodic: F or p eriodic systems, we do not con- sider applied fields, and compute the Hartree energy in recipro cal space. Provided the integral of ρ ( r ) o ver the sup ercell is zero, we can express the Hartree energy as: E electrostatic = E Hartree = Ω 2(2 π ) 6 X k ∈ Λ ⋆ k  = 0 4 π k 2 | ˜ ρ ( k ) | 2 (40) where Ω is the volume of the sup ercell, ˜ ρ ( k ) is the F ourier series of the charge density , Λ ⋆ is the recipro cal lattice, and the term k = 0 is omitted. This is tractable b e- cause the F ourier series of the charge density , ˜ ρ ( k ) , can b e computed analytically using the prop erties of Gaus- sian functions: ˜ ρ ( k ) = (2 π ) 3 Ω X i,lm p i,lm ˜ ϕ lm ( k ) e − i k · r i (41) ˜ ϕ lm ( k ) = C l ( − i ) l Y lm ( ˆ k ) I l,σ ( k ) (42) where Y lm are the spherical harmonics, and I l,σ ( k ) is the radial F ourier transform of the Gaussian basis, ev al- uated via cubic spline in terp olation for computational efficiency . F urther details are presented in the supple- men tary information. If the total charge of a p eriodic system is not zero, one can still assign an energy using equation (40), but in or- der to do meaningful calculations of, for instance, charged defect formation energies, correction terms should be ap- plied similarly to DFT [57]. The gradient of E electrostatic with resp ect to atomic p ositions and multipole co efficien ts pro vides the electro- static forces and field-resp onse terms that enter the train- ing loss and molecular dynamics propagation. 7. Summary of the mo del ar chite ctur e The mo del is summarised visually in Figure 1A, ex- plaining each step of the MACE-POLAR-1 model arc hi- tecture: 1. Start by inputting the atomic p ositions and sp ecies in to the local MA CE mo del to pro duce node fea- tures that em b ed the semi-local geometry and c hemistry around eac h atom. 2. Read out a lo cal energy contribution from the no de features, capturing the most lo cal and semi-lo cal in teractions (e.g., b onding, Pauli repulsion, and short-range electrostatics). 3. Predict an initial lo cal spin-charge density , ex- panded in an atom-centred spherical Gaussian ba- sis. 4. F rom the spin-charge density , compute the electro- static potential per spin channel and pro ject it in to atom-cen tred electrostatic features. 5. Use the electrostatic features together with the lo- cal geometry and curren t spin charge densit y to predict an additive up date to the densit y; also pre- dict tw o p er-atom scalars, the F ukui features. 6. Normalise the F ukui features so they sum to one o ver all atoms. 7. Use the normalised F ukui features to equilibrate the charges via the F ukui equilibration step. 9 8. Rep eat steps (5–7) several times (tw o iterations in the current models). 9. Use the final density to compute both the Coulomb energy and a non-local energy contribution. 10. Sum the lo cal, Coulomb, and non-lo cal energy terms to obtain the total energy and differentiate w.r.t p ositions to get the forces. C. T raining Strategy and Mo del Hyp er-parameters MA CE-POLAR-1-OMOL mo dels were trained on the energies and forces of 100 million OMol25 structures us- ing distributed training across 64 NVIDIA H200 GPUs. W e trained multiple model v arian ts and b enchmark the medium and large v ariants (MACE-POLAR-1-M , MA CE-POLAR-1-L ), which were obtained by v arying the num b er of local MA CE interaction lay ers. The full h yp er-parameter settings and loss weigh ts are listed in the SI (T able VI). F or all mo dels, w e use the non-linear in teraction blo cks introduced in [55]. W e use a weigh ted sum of L1 loss on the energy and L2 loss on the forces: L = w E | E − E ref | + w F ∥ F − F ref ∥ 2 (43) W e do not include any training on the partial charges that were av ailable in the dataset (Löwdin and NBO c harges) as w e found that it deteriorates the performance of the mo dels. At the time of training, the total dip oles w ere not a v ailable for the full dataset and will b e consid- ered when the data b ecome av ailable. T ABLE I. Receptiv e fields and explicit long-range ph ysics in b enc hmark ed models. Mo del Receptiv e Field Long-Range Coulom b MA CE-OMOL 18 Å ✗ MA CE-POLAR-1-M 12 Å ✓ MA CE-POLAR-1-L 18 Å ✓ UMA-S-1-OMOL 30 Å ✗ UMA-M-1-OMOL 66 Å ✗ ORBMOL 30 Å ✗ SO3LR 13.5 Å ✓ AIMNet2-NSE 15 Å ✓ Notes: The effective information range gro ws with the n umber of message-passing la yers, but without explicit long-range terms the learned in teractions remain limited b y the graph topology . I II. Results and Discussion A. Benc hmarking and Mo dels T o assess robustness and accuracy , w e assembled a comprehensiv e b enchmark suite spanning high-qualit y reference data in thermo c hemistry , reaction barriers, non-co v alen t interactions, conformers, protein-ligand in- teractions, molecular crystals, and transition metal com- plexes. Bey ond static single-p oin t b enc hmarks, we ev aluate molecular dynamics and geometry-optimisation tasks, including organic liquid densities, liquid-w ater ra- dial distribution functions, solv ated-ion spin dynamics, redo x potentials in w ater, and solv ated ion pairs. W e also b enc hmark a series of baseline mo dels: • MACE-OMOL [54]: A lo cal MACE mo del with the same lo cal arc hitecture and hyper-parameters as MACE-POLAR-1-L , trained on the full 100M OMOL dataset, but without the electrostatics part and including a global em b edding for the total c harge and total spin that feed into the initial no de features. This model represen ts a close ablation that enables us to isolate the improv emen ts from the electrostatic comp onent. • UMA-S/M-1P1 [44]: eSEN [3] mo del trained on 100M inorganic crystals of OMA T, 230M sur- faces/small molecules of OC20/OC22, 100M molec- ular configurations of OMOL, 25M molecular- crystal configurations of OMC and 29M metal or- ganic framew orks of ODA C. The dataset t yp es, the total charge, and the total spin are em b edded in the mo del as a global input. W e b enc hmark the OMOL v ariant throughout this pap er, with b oth the S-1.1 and the larger M-1.1 v arian ts referred to as UMA-S-1P1 and UMA-M-1P1 . • OrbMol : Orb-v3 [4] mo del trained on the 100M OMOL dataset, with a global embedding of total c harge and total spin. • g-xTB [58]: A semi-empirical extended tigh t- binding method parameterised for 103 elements us- ing reference energies and forces at the ω B97M- V/aTZ level. The paper do es not rep ort a sin- gle total training-set size; based on the stated 8,000–25,000 training data p oints p er element (up to ∼ 40,000 for key elemen ts), w e estimate approx- imately 1.1–2.7 million configurations ov erall. B. Thermo c hemistry and Reactions Thermo c hemistry and reaction barriers are core tests of b onding, electron redistribution, and transition-state energetics. W e ev aluate the mo dels on the curated GSCDB138 database [59], an up date of GMTKN55 [60] and MGCDB84 [61] with improv ed CCSD(T) references 10 Ionization Potentials Electron Af finities Proton Af finities Mindless Molecules (MB08/16) Bond Dissociation Energies Radical Stabilization 0 5 10 15 20 25 WTMAD-2 (kcal/mol) a Thermochemistry Barrier Heights Proton T ransfer General Reaction Energies 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 WTMAD-2 (kcal/mol) b Reaction Barriers wB97M-V g-xTB UMA-S-1P1 UMA-M-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 2. Thermochemistry and reaction barrier b enchmarks on GSCDB138 subsets. Bar heights show the weigh ted total mean absolute deviation (WTMAD-2) in kcal/mol for eac h mo del, where low er v alues indicate better accuracy . WTMAD-2 rescales errors by the characteristic energy scale of eac h subset to enable fair comparison across datasets of differen t magnitudes (definition in SI V A). F or these summary bars, extreme outliers are filtered by excluding points with | ∆ E | > 100 kcal/mol (details in SI V B). (a) Thermo chemistry subsets group ed by prop erty class: ionisation p oten tials, electron affinities, proton affinities, and b ond dissociation energies. (b) Reaction-barrier subsets: barrier heights, proton-transfer reactions, and general reaction energies. Models compared: ω B97M-V (reference hybrid DFT), g-xTB (semi-empirical), UMA-S-1P1 /UMA-M-1P1 , MA CE-OMOL , ORBMOL (lo cal MLIPs), and the electrostatic MACE-POLAR-1-M /MACE-POLAR-1-L . and better filtering for spin con tamination. The tested systems are typically small ( ≲ 20 atoms), where long- range electrostatic effects are exp ected to b e small. Ref- erence data are CCSD(T) extrap olated to CBS or F12- qualit y . W e report p erformance using the w eighted mean absolute deviation 2 (WTMAD-2), whic h rescales eac h subset by its energetic scale to yield a statistically rep- resen tative aggregate (as in the GMTKN conv en tion; see SI V A); benchmark lists and citations are pro vided in SI V D. F ull p er-set results app ear in Fig. 17 (SI). A small n umber of configurations are excluded p er mo del due to ev aluation failures or outlier filtering; criteria, counts, and full identifier lists are provided in SI V B. 1. Thermo chemistry Figure 2a shows WTMAD-2 results for the mo dels across thermo chemistry subsets grouped in to ma jor cate- gories. Ionisation p oten tials and electron affinities prob e the description of v arying electron num bers, includ- ing self-interaction and delo calisation errors and op en- shell, spin-p olarised energetics. The electrostatic mo dels roughly halve the error of other MLIPs on ionisation p o- ten tials b y representing total c harge and spin through a ph ysical spin-charge density with learnable equilibra- tion, rather than relying solely on flexible global embed- dings that generalise p oorly when electrons are added or remo ved. W e also see consisten t gains on electron affinities compared to the lo cal baseline MACE-OMOL . W e observe large improv emen ts on proton affinities, with the MA CE-POLAR-1 reac hing accuracies close to the reference DFT, ω B97M-V. The MACE-POLAR-1 mo dels bridge the gap to the semi-empirical g-xTB on these prop erties. The radical stabilisation sets show consisten t gains, reflecting that impro v ed extrap olation yields a b etter description of radicals. Mindless molecules (MB08/16) are synthetic, out-of-distribution structures made to test robustness far b ey ond the training set by constructing diverse and c hemically plausible molecules through random atomic placement and subsequent ge- ometry optimisation; surprisingly , all MLIPs remain rel- ativ ely close to ω B97M-V, underscoring robust extrap o- lation. 2. R e action Barriers Figure 2b aggregates datasets probing reaction barrier heigh ts of small molecules, proton transfer, and general reaction energies. Barrier heights test the energetics of transition states, whic h directly link to reaction kinet- 11 T ABLE I I. WTMAD-2 (kcal/mol) for thermo chemistry and reaction-barrier subsets with only H, C, N, O, F, P , S, and Cl elemen ts to supp ort SO3LR and AIMNet2-NSE. Datasets for each subset are listed in SI Sec. V D. Subset N sets wB97M-V g-xTB UMA-S-1P1 UMA-M-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L SO3LR AIMNet2-NSE FENNIX-BIO-2 TC: Bond Energies + HA T 7 1.57 5.30 3.16 2.18 4.68 5.63 2.75 3.80 26.46 9.27 14.18 TC: Reaction Energies 6 0.94 3.61 1.06 1.14 1.77 1.31 1.51 1.47 18.66 3.37 6.76 TC: Thermochemistry 9 1.22 5.31 2.22 2.47 3.27 2.94 2.39 2.34 27.22 4.87 11.15 BH: Barrier + Proton 10 1.27 3.56 1.79 1.56 2.06 2.62 2.69 1.91 14.27 6.75 18.32 ics, while reaction energies capture thermo dynamic gaps b et w een reactants and pro ducts; b oth require accurate description of local energetics. Proton-transfer barriers are highly relev an t in bio chemistry and energy materi- als (e.g., fuel cells and proton-conducting membranes) and in volv e coupled proton motion and h ydrogen-b ond rearrangemen t. A cross all barrier categories, the MLIPs p erform w ell relative to h ybrid DFT, reaching accuracies on the order of a few k cal/mol; differences are mo dest b ecause these small-system barriers are dominated by lo- cal structural c hemistry rather than long-range c harge redistribution. 3. Comp arison to other pr e-tr aine d mo dels on a subset of elements In order to compare the OMol-trained mo dels with other molecular foundation MLIPs like SO3LR [53], AIMNet2-NSE [52], and F ennix-2-bio [20] that do not co ver the full set of elements in GSCDB138, we curated a subset of the thermo c hemistry and reactivity b enc hmarks including those that con tain only H, C, N, O, F, P , S, and Cl elements. These three mo dels w ere trained on smaller databases than the OMol25 dataset: SO3LR was trained on 4M configurations at the PBE0+MDB lev el of the- ory , FENNIX-BIO-2 was trained on 2.2M at the ω B97M- D3(BJ) lev el, and AIMNet2-NSE was trained on 33M configurations at the ω B97M-D3(BJ) level. T able I I sum- marises the WTMAD-2 v alues for these thermochemistry and reaction-barrier subsets. W e observe that the OMol mo dels significantly outp erform the other three mo dels, reflecting the b enefit of the scale of the OMol dataset. AIMNet2-NSE, which is trained on roughly one-third of the OMol dataset, is the mo del that p erforms b est after the OMol mo dels, whic h sho ws the crucial imp ortance of a large amount of high-quality data. Moreov er, b oth the SO3LR and FENNIX-BIO-2 models are mainly opti- mised for sp eed in order to b e used in bio-simulations and therefore use less expressive but faster arc hitectures than the other mo dels, whic h is reflected in their diminished accuracy on broad c hemistries. C. Non-Co v alen t Interactions A ccurate treatment of non-co v alen t in teractions (NCIs) is essen tial throughout chemistry and biology , from drug binding to protein folding. W e ev aluated the tested MLIPs on comprehensive benchmark sets of non-co v alen t interactions in small neutral systems, ionic dimers, and large supramolecular systems. Impro ving NCIs is a primary goal of adding long-range electrostatic in teractions. Figure 3a aggregates results on CCSD(T)/CBS bench- marks from GSCDB138, testing small non-cov alent sys- tems and probing hydrogen b onding, halogen b onding, disp ersion, charge transfer, and op en-shell and charge- neutral p otential-energy curves. In all these benchmarks, the maximum separation distances b etw een clusters fall within the 6 Å cutoff of the lo cal mo dels; hence, the mo dels can, in principle, repro duce the DFT reference. All MLIPs are broadly competitive, but the electrostatic MA CE-POLAR-1 models consistently rank among the b est, improving o v er the local MA CE-OMOL baseline and often ov er UMA-S-1P1 /UMA-M-1P1 and ORBMOL . This reflects better long-range electrostatics. P anels (c) and (d) isolate electrostatics-dominated tests: ionic hydrogen b onds (IHB100x10) and alk ali- halide disso ciation curv es. The MACE-POLAR-1 v ari- an ts reduce errors b y a factor of three for ionic h ydrogen b onds relative to lo cal baselines and capture the expected long-range 1/r Coulom bic b eha viour in LiCl/NaCl/KBr disso ciation, whereas local models flatten at long range due to their hard cutoff distance. T o test non-cov alent in teractions in supramolecular complexes, we use the S30L dataset [62] containing 30 host-guest complexes with up to 200 atoms and c harge states ranging from -1 to +4. V arious types of non- co v alen t interactions are presen t in the dataset, includ- ing hydrogen-halogen b onding, π − π stacking, and non- p olar dispersion. Figure 3f sho ws a 40% improv emen t in MAE from 7.31 k cal/mol for MACE-OMOL to 3.52–4.78 k cal/mol for MA CE-POLAR-1-L and MACE-POLAR- 1-M , respectively , with the largest gains observ ed for c harged host-guest systems and π -stac ked c omplexes. The ω B97M-V functional p erforms worse than g-xTB on this test. This is likely due to the po or description of three-b o dy disp ersion effects that are imp ortant in π in teractions and not present in the VV-10 dispersion cor- rection. D. Molecular Crystals W e further b enc hmark the mo dels on molecular-crystal systems. Molecular crystals are central in pharma- ceuticals, semiconductors, and agro c hemicals. Crystal- structure prediction (CSP) remains a long-standing chal- lenge in computational chemistry . One imp ortant sub- 12 S22 S66 XB20 X40 W A TER27 HB49 NC1 1 O24x4 0 1 2 3 4 MAE (kcal/mol) a Small NCI complexes QUID PLF547 0.0 0.2 0.4 0.6 0.8 1.0 MAE (kcal/mol) b CCSD Pocket-Ligand Dimers NCIA-IHB100x10 0.0 0.5 1.0 1.5 2.0 2.5 3.0 MAE (kcal/mol) c IHB100x10: Ionic Dimers 10 20 Separation (Å) 0 25 50 75 100 125 150 Interaction energy (kcal/mol) d LiCl PEC (gas) 10 20 Separation (Å) NaCl PEC (gas) 10 20 Separation (Å) KBr PEC (gas) PLA15 0 5 10 15 20 25 30 MAE (kcal/mol) e PLA15: Protein-Ligand Pockets S30L 0.0 2.5 5.0 7.5 10.0 12.5 15.0 MAE (kcal/mol) f S30L: Supramolecular Complexes X23 0.0 0.5 1.0 1.5 2.0 MAE (kcal/mol) g X23: Molecular Crystals Noncovalent Interactions wB97M-V g-xTB UMA-S-1P1 UMA-M-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 3. Comprehensiv e ev aluation of non-co v alen t interaction accuracy . (a) Bar heigh ts show mean absolute errors in k cal/mol for small-molecule non-cov alent interaction datasets with CCSD(T)/CBS references: S22 and S66 (h ydrogen bonding and disp ersion), XB20 (halog en bonding), X40 (mixed in teractions), W A TER27 (w ater clusters), HB49 (div erse hydrogen b onds), NC11 (charge-transfer complexes), and O24x4 (p otential energy curves). (b) Mean absolute errors for protein-ligand fragmen t benchmarks: QUID (quantum-c hemistry dimers from p o c ket-ligand motifs) and PLF547 (protein-ligand fragments with MP2-F12 + DLPNO-CCSD(T) references). (c) Mean absolute errors on the IHB100x10 ionic hydrogen b ond dataset from NCI A tlas, where electrostatic polarisation is critical. (d) Poten tial energy curves for gas-phase alk ali halide disso ciation (LiCl, NaCl, KBr), plotting energy v ersus in teratomic distance to test long-range 1 /r Coulombic b ehaviour. (e) Mean absolute errors for PLA15 complete protein-ligand active sites (259–584 atoms). (f ) Mean absolute errors for S30L supramolecular host-guest complexes (up to 200 atoms, c harge states − 1 to +4 ). (g) Mean absolute errors for X23-DMC molecular crystal lattice energies with diffusion Monte Carlo references. In all bar c harts, lo wer v alues indicate better accuracy . 13 Cor onene A cridine Phthalimide Saccharin 5-Fluor ouracil Carbamazepine 10,11-Dihydr ocarbamazepine Cyheptamide Cytenamide Ox carbazepine F enamic acid Mefenamic acid T olfenamic acid Flufenamic acid Niflumic acid Chalcone Ibupr ofen Napr o x en Desloratadine Sulfamerazine 0 1 2 3 4 5 6 7 8 9 Absolute MAE (kcal/mol) Small rigid molecules Carbamazepine family F enamate family Small drug molecules UMA -S-1P1 UMA -M-1P1 MA CE- OMOL ORBMOL MA CE-POL AR -1-M MA CE-POL AR -1-L FIG. 4. Absolute lattice energy errors for CPOSS209 molecular crystals. Bar heights sho w mean absolute errors in k cal/mol for predicted lattice formation energies, group ed by molecular family . The dataset comprises 209 exp erimen tal and predicted p olymorphs from 20 small drug molecules. Reference calculations are p erformed at the ω B97M-D3(BJ) lev el with 1-b ody CCSD(T) corrections. Low er v alues indicate b etter accuracy . task in CSP is the stability ranking of molecular crystals. Lattice energies provide a first-order proxy for crystal sta- bilit y , making them an essential prop erty to predict accu- rately for CSP . The OMOL dataset con tains no perio dic structures and therefore represents a significant extrap- olation test for the OMOL-only trained mo dels (MA CE- OMOL , MACE-POLAR-1 mo dels, and ORBMOL ). T he extrap olation from clusters to bulk is a stringen t test of the physicalit y of the learned long-range interactions for the MACE-POLAR-1 mo dels, as the electrostatic in- teraction is formally infinite in the crystal. Both UMA mo dels hav e b een trained on a large num ber of p erio dic structures, including 20 million molecular crystals at the PBE level of theory in the OMC dataset [63], and it is therefore less of an extrap olation, even though we are using the OMOL task and not the OMC task. W e b enc hmark the mo dels on the X23-DMC [64] b enc hmark con taining 23 different neutral molecular crystals computed with Diffusion Monte Carlo, a high- lev el theory that has shown very go od accuracy on molec- ular crystals. Figure 3g rep orts MAEs (k cal/mol) on the lattice formation energies of the 23 molecular crystals of the dataset for all mo dels. The MACE-POLAR-1 v ari- an ts are the only mo dels b elo w 1 kcal/mol, with MACE- POLAR-1-L deliv ering the b est accuracy with 0.46 kcal/- mol, representing a three-fold improv emen t ov er the lo- cal baseline MACE-OMOL . This highlights the b enefit of explicit electrostatics for extrap olating from gas-phase molecules to bulk molecular crystals. F urthermore, we b enc hmark the mo dels on a more c hallenging CPOSS209 dataset of organic molecular crys- tals [65], containing 209 exp erimen tal and predicted p oly- morphs of 20 small drug molecules and precursors, each with 6–17 p olymorphs. The CPOSS209 crystal and gas phases, optimised at the PBE-TS level of theory , are used to determine molecular lattice energies, and w e use them to p erform additional reference ω B97M-D3(BJ) calcula- tions with 1-bo dy CCSD(T) corrections. Figure 4 sho ws the absolute MAE achiev ed by the mo dels for all molec- ular crystals in the CPOSS209 dataset. W e note the excellen t p erformance of the MACE-POLAR-1-M and MA CE-POLAR-1-L mo dels, which attain 1.06 kcal/mol and 1.22 kcal/mol, resp ectiv ely , significan tly improving on the lo cal mo dels, including MACE-OMOL (2.73 kcal/- mol). How ev er, all models p erform with similar accuracy when we consider an error metric based on the MAE of relativ e lattice energies, as shown in Figure 19. This er- ror metric is more relev an t to the CSP stability ranking task, and our findings may indicate substantial error can- cellation when computing relative lattice energies, as sub- tracting the low est polymorph as a baseline ma y also re- mo ve the influence of subtle long-range interactions that are similar betw een p olymorphs. The absolute lattice en- ergy is therefore a goo d pro xy to demonstrate that the MA CE-POLAR-1 mo dels correctly predict the relative energetics of p olymorphs for the right reasons and not error cancellation. 14 E. Protein F ragmen ts The accurate description of protein-ligand interactions is cen tral to drug discov ery , where intermolecular inter- actions b etw een ligands and their protein targets deter- mine binding affinit y and sp ecificity . These interactions are typically w eak compared to in tramolecular interac- tions and therefore challenging to capture from data for MLIPs. W e ev aluate our models using three complemen- tary benchmarks derived from protein-ligand complexes: QUID [66], PLF547, and PLA15 [67]. W e first consider the Quantum Interacting Dimer (QUID) b enc hmark in tro duced b y Tk atc henk o and co- w orkers [66], which targets ligand–p o c ket motifs with high-lev el reference data. QUID contains 170 dimers (42 equilibrium and 128 non-equilibrium) with up to 64 atoms spanning H, C, N, O, F, P , S, and Cl, built from nine large po c ket-lik e monomers from the Aquamarine dataset and tw o small ligand fragments (b enzene and imidazole). The reference interaction energies are com- puted using LNO-CCSD(T). W e observe that the MACE- POLAR-1 mo dels perform best, significantly impro ving o ver the other mo dels. All the dimers are within the lo- cal cutoff of the mo dels (6 Å) and therefore within the range of interactions that lo cal models can capture. The impro vemen t therefore reflects b etter capture of in ter- molecular in teractions. These in teractions are usually w eak, and the inclusion of physics-based electrostatic in- teractions enables b etter learning of s uc h weak signals b y creating a strong ph ysical prior. The PLF547 dataset contains 547 complexes of lig- ands with protein fragments (amino acid side chains and bac kb one segments), providing detailed insigh t into the individual contributions to binding. These fragments w ere generated by cutting bonds betw een C α and C β for side c hains and b et ween C and C α for backbone seg- men ts, with appropriate h ydrogen capping. The b ench- mark interaction energies are based on MP2-F12/cc- pVDZ-F12 calculations with DLPNO-CCSD(T) correc- tions, providing near-CCSD(T)/CBS qualit y references. On the PLF547 dataset (Fig. 3b), MACE-POLAR-1- L and MA CE-POLAR-1-M achiev e MAEs of 0.37 and 0.47 kcal/mol, resp ectiv ely , representing a substan tial im- pro vemen t o ver MACE-OMOL (1.08 k cal/mol), UMA-S- 1P1 (0.90 k cal/mol), and UMA-M-1P1 (0.82 kcal/mol). As in the QUID b enchmark, these dimers are within the lo cal cutoffs, and therefore this test prob es the ability of the mo dels to capture subtle intermolecular interactions within their cutoff. The PLA15 dataset extends this analysis to com- plete active site mo dels of proteins, capturing many- b ody effects including mutual p olarisation b etw een pro- tein residues. These 15 protein-ligand complexes contain 259–584 atoms, represen ting realistic drug binding sites. The b enchmark energies combine pairwise MP2-F12 + DLPNO-CCSD(T) con tributions of all dimers with DFT- D3 calculations to account for many-bo dy p olarisation effects. On PLA15 (Fig. 3e), MA CE-POLAR-1-L and MA CE-POLAR-1-M achiev e MAEs of 3.35–3.68 kcal/- mol, resp ectiv ely , dramatically outp erforming MACE- OMOL (29.9 kcal/mol). This order-of-magnitude im- pro vemen t highligh ts the critical imp ortance of long- range electrostatics in large biomolecular systems. The protein environmen t creates complex electrostatic fields that significantly mo dulate ligand binding, effects that cannot b e captured b y local descriptors alone. UMA- S-1P1 and UMA-M-1P1 show in termediate p erformance with MAEs of 15.13 k cal/mol and 13.67 k cal/mol, re- sp ectiv ely . The large interaction range of UMA-M-1P1 (66 Å) enables it to capture the full active site; how ev er, its p erformance is w orse than the shorter-ranged UMA- S-1P1 . This result highligh ts that the flexibility of mes- sage passing ma y not b e suitable for accurately learning long-range in teractions, and demonstrates that the more constrained physics-based approac h p erforms b etter. The excellent p erformance on b oth fragment-lev el and complete active site b enc hmarks demonstrates that MA CE-POLAR-1 mo dels accurately capture the hierar- c hy of interactions in protein-ligand binding: from in- dividual h ydrogen bonds to the collective electrostatic en vironment of the binding p o c ket. This capabilit y is es- sen tial for computational drug design, where an accurate ranking of binding affinities directly impacts lead optimi- sation success rates. F. T ransition Metals T ransition metal systems presen t unique challenges due to v ariable oxidation states, op en-shell electronic structures, and complex co ordination en vironments. W e b enc hmark ed the models on a series of tests of transition- metal chemistry tak en from the GSCDB b enchmark. All references are CCSD(T). Figure 5a (log scale) shows results for ionisation and b onding energies of transition metals, with a wide spread of errors across coinage complexes (CUA GA U83), Pd di- atomics (D APD), and TM diatomics (TMD10). CUA- GA U83 prob es coinage-metal complexes (Cu, Ag and Au); MACE-OMOL fails dramatically (MAE > 10 4 k cal/mol), indicating a hole in the potential, while MA CE-POLAR-1-M and MA CE-POLAR-1-L greatly re- duce the error to the 10 kcal/mol range, comparable to ORBMOL and b elo w UMA-S-1P1 and UMA-M-1P1 . On the DAPD subset that focuses on palladium diatomics, MA CE-POLAR-1-M and MACE-POLAR-1- L are the b est p erforming MLIPs, close to ω B97M-V and w ell ahead of the lo cal baselines. TMD10 co vers transition-metal diatomics with weigh ted MAE; MACE- POLAR-1-M /MA CE-POLAR-1-L again lead the MLIPs and track the DFT reference most closely . Figure 5b con tains a series of benchmarks on di- v erse transition metal complexes from the GSCDB [59]. MME52 [68] b enc hmarks metallo enzyme mo del reaction energies and barriers, MOBH28 targets organometal- lic barrier heights, ROST61 [69] prob es open-shell re- 15 CUAGAU83 Coinage systems DAPD Pd diatomics TMD10 TM diatomics 1 0 1 1 0 0 1 0 1 1 0 2 MAE (kcal/mol) MME52 Enzyme models MOBH28 Organometal barriers ROST61 Open-shell rxns MOR13 Closed-shell rxns TMB1 1 TM barriers 3dTMV 3d TM VIEs 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 MAE (kcal/mol) 37CONF8 ACONFL DipConfS Maltose222 MPCONF196 OpenFF-T ors UPU46 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 MAE (kcal/mol) Molecular conformers a b c T ransition Metals Chemistry wB97M-V g-xTB UMA-S-1P1 UMA-M-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 5. Accuracy on transition metal complexes and molecular conformers. Bar heigh ts sho w mean absolute errors in k cal/mol for eac h mo del; low er v alues indicate b etter accuracy . (a) T ransition metal datasets on logarithmic scale due to large error ranges: CUAGA U83 (coinage metal Cu, Ag, Au complexes), DAPD (palladium diatomics), MOBH28 (organometallic barrier heigh ts), and TMD10 (transition metal diatomics). (b) T ransition metal datasets on linear scale: 3dTMV (v ertical ion- isation energies, ph-AFQMC references), MME52 (metallo enzyme models, DLPNO-CCSD(T) references), R OST61 (open-shell reactions), MOR13 (closed-shell reactions), and TMB11 (barrier heigh ts). (c) Conformational energy benchmarks: 37CONF8 (small organics), ACONFL (n-alk ane conformers), DipConfS (amino acids and dipeptides), Maltose222 (carb oh ydrates), MP- CONF196 (medicinal fragments), Op enFF-T ors (torsional profiles), and UPU46 (RNA backbone fragmen ts). All reference v alues are CCSD(T) or equiv alen t. The GSCDB138 transition metal sets use up dated references with spin-con taminated struc- tures remov ed. actions, MOR13 ev aluates closed-shell reactions and TMB11 measures transition-metal barrier heights. All 16 MLIPs cluster within a narro w band, with accuracy comparable to the underlying DFT; only ORBMOL sho ws significan tly higher errors. MACE-POLAR-1-M /MA CE-POLAR-1-L improv e on MACE-OMOL but re- main ab ov e UMA-S-1P1 . The 3dTMV b enc hmark [70] prob es vertical ionisation energies (VIEs) at the ph- AF QMC reference level; MLIPs cluster around 10–12 k cal/mol, with MACE-POLAR-1-L slightly improving o ver MACE-POLAR-1-M and MACE-OMOL and g-xTB remaining higher, while ω B97M-V is lo west. This accu- racy is notable given the m ultireference c haracter in this test set, where DFT itself is not highly accurate. G. Conformers Conformational energies are critical for applications ranging from drug design to molecular crystals. W e ev al- uated MACE-POLAR-1 mo dels and other MLIPs across div erse conformer b enchmarks spanning small organics to RNA fragmen ts (Fig. 5c). The MLIPs achiev e remark- able accuracy with MAEs b elow 0.5 k cal/mol for most systems, approac hing thermal-fluctuation limits at room temp erature. W e use the following conformer b ench- marks. 37CONF8 [71] targets diverse small organic con- formers. A CONFL [72, 73] fo cuses on longer n-alk ane conformers and their torsional landscap es. DipConfS [74] co vers amino acids and dip eptides with multiple back- b one and side-chain rotamers. Maltose222 [75] bench- marks carb oh ydrate conformers cen tred on maltose. MP- CONF196 [76, 77] collects conformers of medicinal c hem- istry fragmen ts. Op enFF-T ors [78] ev aluates torsional profiles for drug-like fragments across diverse c hemistries. UPU46 [79] prob es RNA backbone fragment conformers. Analysis of error distributions (Fig. 5c) reveals that the MLIP models ac hieve v ery similar accuracy , with lit- tle v ariation depending on the test set. Overall, they largely outp erform semi-empirical approaches, reaching w ell within c hemical accuracy for most subsets and show- ing an ov erall accuracy close to the underlying ω B97M h ybrid DFT. These results highlight the excellen t cov- erage of the OMOL dataset for conformers. The in- clusion of electrostatic interactions do es not materially impact the accuracy on conformers as these primarily prob e cov alent intramolecular interactions or short-range non-co v alen t in teractions that are well describ ed by lo cal mo dels. H. W ater Prop erties W ater represen ts a stringen t test for molecular mo d- els due to its complex hydrogen-bonding netw ork and anomalous thermo dynamic prop erties. The accurate de- scription of w ater is essential for biological and chemical applications, where aqueous solv ation dominates reaction thermo dynamics and protein dynamics. The density pro- file as a function of temp erature arises from a delicate balance b etw een h ydrogen b ond directionalit y and molec- ular pac king that challenges both classical force fields and ab initio metho ds. W e ev aluate the temperature- dep enden t densit y of liquid w ater, a critical test of the mo dels’ abilit y to capture man y-b o dy effects and long- range electrostatic interactions in condensed phases. W e p erformed isothermal-isobaric (NPT) molecular dynamics simulations of 333 water molecules at tem- p eratures ranging from 270 to 330 K and 1 atm pres- sure. The initial structure w as tak en from the GitHub rep ository associated with R ef. [80] and equilibrated with NVT Langevin dynamics implemented in the Atomic Sim ulation Environmen t (ASE) [81] for 50 ps. Con- stan t pressure molecular dynamics w as p erformed us- ing the Martyna-T obias-Klein barostat implemented in ASE, with characteristic timescales of the thermostat and barostat of 50 and 500 fs, resp ectiv ely . Each tem- p erature p oint was equilibrated for 500 ps follow ed b y 500 ps pro duction runs for density calculations. Fig- ure 6 shows the temp erature-dep enden t density profiles for MACE-OMOL , UMA-S-1P1 , MACE-POLAR-1-M , and MACE-POLAR-1-L compared to exp erimen tal data. W e did not test ORBMOL b ecause at the time of writ- ing it did not supp ort stress computation. All mo d- els capture the qualitative decrease in density with in- creasing temp erature, and ov erall agree on the density at ro om temp erature around 1.08–1.10 g/cm 3 . The de- viation from the exp erimental v alue of 1.00 g/cm 3 is lik ely due to the functional of the training data rather than the mo del architectures themselves. This interpre- tation is supp orted b y sev eral observ ations: (1) all mo d- els ov erestimate the densit y despite having substan tially differen t arc hitectures; (2) VV10-based functionals are rep orted to o v erstructure liquid water (e.g., SCAN has a density of 1.05 and SCAN+rVV10 of 1.16) [82]; (3) VV10 lac ks explicit three-b o dy dispersion, while three- b ody in teractions are known to b e imp ortan t in liquid- w ater sim ulations and can bias equilibrium densities [83]; and (4) the mo dels p erform w ell on water-cluster inter- action benchmarks (e.g., W A TER27 within GSCDB138), suggesting that short-range water energetics are well cap- tured [59, 84]. F urther inv estigations are needed to clarify the relative con tributions of these effects. W e use the same NPT pro duction runs at 300 K to compute radial distribution functions (RDF s) for the oxygen-o xygen, hydrogen-h ydrogen, and oxygen- h ydrogen pairs and compare them against the exp erimen- tal RDF av ailable at the same temp erature [85], as sho wn in Figure 7. While the mo dels agree closely with eac h other, the simulated RDF s deviate from exp eriment in p eak heights and second-shell structure, indicating that short-range packing and in termediate-range ordering re- main imp erfect. These deviations are also likely due to the ω B97M-V functional, as all mo dels agree closely , and the functional lacks three-b o dy disp ersion effects known to b e imp ortant in water structure. While the mo d- els ov erestimate the water density , simulations at con- stan t volume at the correct density are also informativ e. 17 270 280 290 300 310 320 330 340 T emperature, K 0.98 1.00 1.02 1.04 1.06 1.08 1.10 1.12 D e n s i t y , g / c m 3 UMA-S-1P1 MACE-OMOL MACE-POLAR-1-M MACE-POLAR-1-L Experiment FIG. 6. Liquid w ater density as a function of temp era- ture. Data points show the equilibrium density (g/cm 3 ) from NPT molecular dynamics simulations of 333 H 2 O molecules at 1 atm pressure. Each p oin t represents the mean density from 500 ps of production dynamics follo wing 500 ps of equi- libration. Coloured symbols correspond to differen t models as indicated in the legend. Experimental data (black circles) sho w the c haracteristic densit y maxim um near 277 K arising from comp etition b etw een thermal expansion and the tetra- hedral hydrogen-bond netw ork. W e therefore performed NVT simulations at a density of 0.997 g cm − 3 and a temp erature of 300 K, using the fi- nal 400 ps of 500 ps runs to compute RDF s. As shown in Figure 7, the water structure shifts noticeably at the exp erimen tal densit y , more closely repro ducing the ex- p erimen tal structure, but residual deviations from exper- imen t remain. I. Organic liquid densities Equilibrium densities of liquids are determined b y a subtle balance b et w een attractive and repulsive inter- molecular forces [86]. W e v alidated four models—MACE- OMOL , MACE-POLAR-1-M , MACE-POLAR-1-L , and UMA-S-1P1 —on the densities of 62 organic liq- uids cov ering b oth aliphatic and aromatic compounds, and common functional groups suc h as alcohols, ethers, esters, k etones, nitriles, halogens, and others. W e did not ev aluate UMA-M-1P1 as it was too slow at the time of writing. F or all liquids, excluding 1-o ctanol, we used initial structures, exp erimen tal temp eratures and densi- ties provided in Ref. [80] and equilibrated each mo del by running NVT molecular dynamics at the exp erimental temp eratures for 50 ps. F or 1-o ctanol, the system was initialised at low densit y using Pac kmol [87]. T o obtain a starting frame for the NPT simulation, each model was equilibrated at 298 K while shrinking the cubic cell pa- rameter b y 0.05 Å in 500 fs interv als, un til the system reac hed 98% of the exp erimen tal densit y . Isothermal- 0 2 4 6 0.6 0.4 0.2 0.0 0.2 0.4 0.6 g O O ( r ) e r r o r NPT 0 2 4 6 0.6 0.4 0.2 0.0 0.2 0.4 0.6 NVT 0 2 4 6 0.4 0.2 0.0 0.2 0.4 g H H ( r ) e r r o r 0 2 4 6 0.4 0.2 0.0 0.2 0.4 0 2 4 6 r , Å 0.4 0.2 0.0 0.2 0.4 g O H ( r ) e r r o r 0 2 4 6 r , Å 0.4 0.2 0.0 0.2 0.4 0 2 4 6 0 1 2 Experiment 0 2 4 6 0 1 2 Experiment 0 2 4 6 0 1 2 Experiment MACE-OMOL MACE-POLAR-1-M MACE-POLAR-1-L UMA-S-1P1 ORBMOL FIG. 7. Differences betw een predicted and exp erimen- tal radial distribution functions for O-O, H-H, and O- H pairs in liquid water at 300 K. The radial distribution function g ( r ) measures the probabilit y of finding an atom pair at distance r relative to an ideal gas. Left panels show dif- ferences b et ween predicted and experimental g ( r ) from a 500 ps NPT sim ulation; right-panel dashed lines show differences in g ( r ) from NVT simulations at the exp erimental densit y of 0.997 g cm − 3 . The inset indicates the reference exp erimental neutron diffraction data. Coloured lines correspond to differ- en t models as ind icated in the legend. isobaric ensemble molecular dynamics runs were p er- formed for 1 ns using the Martyna-T obias-Klein barostat [88] implemented in ASE, with the thermostat and baro- stat damping parameters of 50 and 500 ps, resp ectively . F or each system and mo del, only the final 500 ps of the MD tra jectory were used to compute the equilibrium den- sit y . Figure 8 sho ws that all tested mo dels predict liquid densities close to exp erimen t but systematically ov eresti- mate them, similar to the water densit y in the previous section. Mean absolute errors ov er the entire test set, giv en in T able I I I, sho w that MACE-POLAR-1-L p er- forms the b est, with a density MAE of 0.077 g cm − 3 , 18 while MACE-OMOL is the least accurate on liquid den- sities, with an MAE of 0.126 g cm − 3 . The calculated densities shown in Figure 8 are also tabulated in T able VI I. 0.5 1.0 1.5 2.0 2.5 E x p e r i m e n t a l d e n s i t y , g c m 3 0.1 0.0 0.1 0.2 D e n s i t y e r r o r , g c m 3 MA CE-POL AR -1-L MA CE-POL AR -1-M MA CE- OMOL UMA -S-1P1 FIG. 8. Absolute errors on exp erimental densities for 62 organic liquids. Each point represents the absolute er- ror on the equilibrium density (g/cm 3 ) of one molecular liquid computed from NPT molecular dynamics at 1 atm with ap- pro ximately 1000 atoms for 500 ps of equilibration follow ed b y 500 ps of simulation. The dataset spans alk anes (n-hexane to n-do decane), alcohols (methanol to o ctanol), ethers (di- eth yl ether, THF), aromatics (b enzene, toluene, xylenes), k e- tones (acetone, MEK), and p olar solven ts (acetonitrile, DMF, DMSO). The diagonal black dashed line indicates p erfect agreemen t; p oin ts ab ov e the line indicate ov erestimated den- sit y . T ABLE II I. Mean absolute errors (MAE) in predicted organic liquid densities. Mo del Densit y MAE, g cm − 3 MA CE-OMOL 0.126 MA CE-POLAR-1-M 0.091 MA CE-POLAR-1-L 0.077 UMA-S-1P1 0.102 J. Solv ated Ions and Changes in Oxidation State 1. Artificial disso ciation of solvate d Cl 2 /Cl − p airs Figure 9 demonstrates the critical role of prop er charge lo calisation in describing anionic species in aqueous so- lution. Starting from a Cl 2 molecule solv ated in a wa- ter cluster alongside tw o distan t Cl − ions in separate w ater clusters, the system should maintain the co v a- len t Cl 2 b ond, as the excess − 2 charge resides on the separated chloride ions, as shown in the top panel of Fig. 9. This configuration presents a fundamental chal- lenge for machine learning potentials that treat total sys- tem charge through global charge embedding: because the tw o solv ated clusters are separated by distances ex- ceeding the message-passing netw ork cutoff ( > 6 Å), the lo cal atomic en vironmen t around the Cl 2 molecule con- tains no information ab out the presence of the distan t Cl − ions. Consequently , mo dels without explicit long- range electrostatics (MA CE-OMOL , UMA-S-1P1 and ORBMOL ) interpret the lo cal Cl environmen t descrip- tors as if eac h atom carries a − 1 c harge in an unsta- ble anionic geometry , rather than recognising the neutral co v alen tly bonded Cl 2 configuration. This misidentifi- cation drives erroneous destabilisation of the Cl 2 b ond, leading to unphysical disso ciation with the Cl–Cl b ond length increasing from ∼ 2.0 Å to > 4.0 Å during geometry relaxation. Importantly , when the isolated Cl 2 cluster is simulated at net zero charge—where the global em- b edding correctly iden tifies the neutral state—all models preserv e the intact Cl 2 b ond, confirming that the fail- ure arises sp ecifically from the inability to communicate total c harge information b eyond the lo cal cutoff radius. In contrast, MA CE-POLAR-1 mo dels with explicit long- range electrostatics correctly compute the electrostatic p oten tial at eac h atom from all charges in the system, prop erly lo calising the − 2 charge on the distant chloride ions while maintaining the Cl 2 molecule at its equilibrium b ond length. The h ybrid DFT ω B97M-V has v ery low self- in teraction error, and therefore we exp ect near-exact lo- calisation of t wo additional electrons on the cluster con- taining the t w o Cl − ions and zero electrons on the cluster with Cl 2 . W e verify this b y computing the same system with few er water molecules and observe exact lo calisation for ω B97M-V. W e also test lo calisation in the MACE- POLAR-1 mo dels by summing the c harges obtained b y the mo dels on each cluster. W e observ e substantial lo cal- isation, with 1 . 952 | e | on the tw o Cl − ions and 0 . 048 | e | on the Cl 2 cluster. 2. F e 3+ / F e 2+ r e dox pair in solution Iron in a c hloride–water solution is a prototypical re- do x system in solution. This system tests whether the mo dels correctly lo calise charge into w ell-defined oxida- tion states [22]. W e study a system inv olving either a high-spin F e 2+ (d 6 , S = 2 , multiplicit y 5) or F e 3+ (d 5 , S = 5 / 2 , multiplicit y 6) centres, solv ated in water with Cl − ions as counter-ions. The b o x size is 12 Å, so all the lo cal models can see the full b o x in their receptiv e fields. W e sim ulate the tw o systems at zero total charge, indicated as (0) in Figure 10. The difference in oxida- 19 2e 0e - FIG. 9. Stability of solv ated Cl 2 during geometry relaxation. A neutral Cl 2 molecule is solv ated in water alongside t wo distant Cl − ions (total system c harge − 2 ). Because the chloride ions lie b eyond the message-passing cutoff ( > 6 Å), lo cal mo dels cannot distinguish neutral Cl 2 from hypothetical Cl 2 − 2 based on lo cal environmen t alone. Left: Cl-Cl b ond length (Å) versus relaxation step. Non-electrostatic models (MA CE-OMOL , UMA-S-1P1 , ORBMOL ; red/pink/gra y traces) drive unph ysical bond disso ciation, while electrostatic MA CE-POLAR-1 mo dels (teal traces) preserve the co v alen t b ond near its gas-phase equilibrium v alue ( ∼ 2.0 Å). Shaded regions indicate bonded (green, < 2.5 Å) and disso ciated (pink, > 3.5 Å) regimes. Righ t: Molecular snapshots showing the initial solv ated Cl 2 configuration (top) and final relaxed structures for eac h model. tion states b et w een the F e 2+ and F e 3+ affects the whole solv ation shell, and therefore results in differences in the radial distribution functions. The right panels of Fig- ure 10 plot the F e–O radial distribution functions g ( r ) for the different tested models along with shaded bands cor- resp onding to the exp erimen tal v alues of the first p eak, corresp onding to the first solv ation shell. The v alues are tak en from [22]. F rom the experimental v alues, we ex- p ect the first solv ation shell to systematically contract from 2.0–2.2 Å in F eCl 2 to 1.9–2.0 Å in F eCl 3 , reflecting the stronger electrostatic attraction and reduced ionic ra- dius of the iron ion in the highest oxidation state. This solv ation structure results from a complex interpla y of m ultiple factors: the d-orbital o ccupancy and spin state determine the metal’s ionic radius and ligand field sta- bilisation energy; the oxidation state controls the elec- trostatic attraction b etw een the metal centre and water o xygen atoms; and the presence of chloride coun ter-ions mo dulates the electrostatic environmen t through charge screening and h ydrogen-b onding netw orks with the sol- v ent. MACE-POLAR-1 mo dels capture this complex ph ysics through explicit treatment of long-range electro- statics and pro duce distinct and well-resolv ed first-shell p eaks for each o xidation state. T o further confirm c harge lo calisation on the iron centre, we ev aluated F eCl 2 at a total c harge of +1, which should produce an o xida- tion state of +3. W e observe that F eCl 3 (0) and F eCl 2 (+1) show nearly sup erimp osable g ( r ) profiles consis- ten t with their equiv alen t F e 3+ o xidation states. In con- trast, UMA-S-1P1 shows structural differentiation be- t ween F eCl 3 (0) and F eCl 2 (0); ho wev er, it incorrectly predicts F eCl 2 (+1) as the same as F eCl 2 (0), showing that it delo calises the extra charge on all the solven t and Cl atoms instead of localising it in the F e cen tre. MA CE-OMOL shows mixed results, with clear separa- tion of F eCl 3 (0) and F eCl 2 (0) p eaks, but an interme- diate v alue for F eCl 2 (+1) showing mo derate spurious c harge delo calisation. The F e atomic charges as a function of time (left panels) provide further evidence of the prop er descrip- tion of o xidation states b y the MA CE-POLAR-1 models. Indeed, MACE-POLAR-1 mo dels predict well-defined, 20 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 T ime (ps) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 F e A tomic Char ge (e) MACE-POLAR-1-M F eCl2 (0) F eCl3 (0) F eCl2 (+1) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 T ime (ps) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 F e A tomic Char ge (e) MACE-POLAR-1-L F eCl2 (0) F eCl3 (0) F eCl2 (+1) 2 3 4 5 6 r (Å) 0 5 10 15 20 g(r) - F e- O MACE-OMOL F eCl2 (0) F eCl3 (0) F eCl2 (+1) F e³ r ef (1.9-2.0 Å) F e² r ef (2.0-2.2 Å) 2 3 4 5 6 r (Å) 0 5 10 15 20 g(r) - F e- O UMA-S-1P1 F eCl2 (0) F eCl3 (0) F eCl2 (+1) F e³ r ef (1.9-2.0 Å) F e² r ef (2.0-2.2 Å) 2 3 4 5 6 r (Å) 0 5 10 15 20 g(r) - F e- O MACE-POLAR-1-M F eCl2 (0) F eCl3 (0) F eCl2 (+1) F e³ r ef (1.9-2.0 Å) F e² r ef (2.0-2.2 Å) 2 3 4 5 6 r (Å) 0 5 10 15 20 g(r) - F e- O MACE-POLAR-1-L F eCl2 (0) F eCl3 (0) F eCl2 (+1) F e³ r ef (1.9-2.0 Å) F e² r ef (2.0-2.2 Å) FIG. 10. Solv ation structure and c harge dynamics of aqueous iron c hloride systems. Left column: F e atomic partial c harge (in c harge atomic units, | e | ) as a function of simulation time o ver 20 ps of NVT molecular dynamics at 300 K. Results are sho wn for MACE-POLAR-1-M (top) and MACE-POLAR-1-L (b ottom), comparing three systems: F eCl 2 with net c harge 0 (blue), F eCl 3 with net c harge 0 (red), and F eCl 2 with net c harge +1 (teal). Dotted horizontal lines indicate time-av eraged F e charges. The similar charge v alues for F eCl 3 (0) and F eCl 2 (+1) indicate correct lo calisation of charge to F e 3+ in b oth cases. Right column: F e-O radial distribution functions g ( r ) showing the probability of finding w ater o xygen atoms at distance r from the F e centre. All three systems are ov erlaid in each panel for MACE-OMOL , UMA-S-1P1 , MACE-POLAR-1-M , and MA CE-POLAR-1-L . V ertical dotted lines mark first and second co ordination shell peaks. The first-shell con traction from F e 2+ to F e 3+ reflects the smaller ionic radius at higher o xidation state. nearly constant F e partial charges across the tra jectory , rev ealing the presence of well-defined oxidation states. These partial charges are not integers due to the charge partitioning b et w een the metal centre and its co ordina- tion environmen t: cov alent donation from water oxygen lone pairs into empt y F e d-orbitals, p olarisation of the metal’s d-electron density to w ard the electronegative lig- ands, and delo calisation of the oxidation-state charge across the entire [F e(H 2 O) 6 Cl n ] q complex result in the formal +2 or +3 c harge b eing distributed o v er the F e atom, co ordinating w ater molecules, and c hloride ligands. Consequen tly , the F e partial c harge reflects only the frac- tion of the total oxidation-state charge lo calised on the metal nucleus, with the remainder residing on the solv a- tion shell. The F eCl 2 (+1) case pro vides further evidence of the correctness of charge transfer physics: with t wo Cl − ligands and a net +1 system charge, the iron cen- tre should adopt an effective +3 o xidation state identical to F eCl 3 (0), yielding comparable F e partial charges and solv ation structures. MACE-POLAR-1 mo dels correctly repro duce this electronic equiv alence, with F eCl 3 (0) and F eCl 2 (+1) exhibiting F e c harges differing by only ∼ 0.1 e for MACE-POLAR-1-L and iden tical charges for MA CE- POLAR-1-M . 3. Micr o-solvate d ionisation p otentials of tr ansition metals in water W e ev aluate v ertical ionisation energies on the same h ydrated cluster geometries and reference data rep orted b y Bhattacharjee et al. [89]. The b enchmark considers first-ro w transition metal ions in the 2+/3+ oxidation states (Ti, V, Cr, Mn, F e, Ni, Cu) using t wo explicit hy- dration mo dels: M-W6 ( [M(H 2 O) 6 ] 2+ / 3+ ) and M-W18 ( [M(H 2 O) 6 · (H 2 O) 12 ] 2+ / 3+ ). The reference v alues are computed at the DLPNO-CCSD(T) level of theory , and the geometry is relaxed at the BP86/def2-TZVP lev el of theory . F or each cluster, w e compute the vertical ionisa- tion energy as the difference in electronic energy betw een 21 T i 2 / 3 V 2 / 3 C r 2 / 3 M n 2 / 3 F e 2 / 3 N i 2 / 3 C u 2 / 3 Metal Ion 14 16 18 20 22 24 Ionization Ener gy (eV) (a) M- W6 Clusters DLPNO -CCSD(T) MA CE-POL AR -M (MAE=0.66 eV) MA CE-POL AR -L (MAE=0.65 eV) MA CE- OMOL (MAE=0.85 eV) UMA -S-1P1 (MAE=0.92 eV) UMA -M-1P1 (MAE=0.98 eV) T i 2 / 3 V 2 / 3 C r 2 / 3 M n 2 / 3 F e 2 / 3 N i 2 / 3 C u 2 / 3 Metal Ion 10 11 12 13 14 15 Ionization Ener gy (eV) (b) M- W18 Clusters DLPNO -CCSD(T) MA CE-POL AR -M (MAE=0.81 eV) MA CE-POL AR -L (MAE=0.71 eV) MA CE- OMOL (MAE=1.31 eV) UMA -S-1P1 (MAE=0.80 eV) UMA -M-1P1 (MAE=0.83 eV) FIG. 11. V ertical ionisation energies of hydrated first-row transition metal ions. Line plots show the computed ionisation energy in e V for removing one electron from M 2+ to form M 3+ at fixed geometry , for metals Ti, V, Cr, Mn, F e, Ni, and Cu. (a) M-W6 clusters: metal ion co ordinated b y six water molecules in o ctahedral geometry . (b) M-W18 clusters: same ions with an explicit second solv ation shell of 12 additional water molecules. Black circles show DLPNO-CCSD(T) reference v alues from Bhattacharjee et al. [89]. Coloured lines show ML p otential predictions: MACE-POLAR-1-M (cyan), MA CE-POLAR-1-L (teal), MACE-OMOL (red), UMA-S-1P1 (ligh t gray), and UMA-M-1P1 (dark gray). Mean absolute errors relativ e to the reference are given in the legend. The 4–5 e V reduction from M-W6 to M-W18 reflects electrostatic screening b y the second solv ation shell. the tw o o xidation states at fixed nuclei: IE ( R i ) = E M 3+ ( R i ) − E M 2+ ( R i ) (44) where R i denotes the fixed cluster geometry . When ev al- uating the different oxidation states, we ensure the use of high-spin m ultiplicities. All spin multiplicities are tab- ulated in T able IV. The electrostatic MACE-POLAR-1 mo dels pro vide the closest agreemen t to the DLPNO ref- erences, with MAEs of 0.65–0.66 e V on M-W6 and 0.71– 0.81 e V on M-W18. The lo cal baseline MACE-OMOL deviates more strongly , esp ecially for the larger M-W18 clusters (MAE 1.31 e V), while UMA-S-1P1 /UMA-M- 1P1 are intermediate. Across the metal series, all mo d- els repro duce the strong reduction in ionisation energies when mo ving from M-W6 to M-W18, consistent with en- hanced electrostatic screening by the second solv ation shell. 4. R e dox potentials of transition metals in water Redo x p oten tials of solv ated transition metal ions test the mo del’s abilit y to describ e charge transfer, c harge lo- calisation and oxidation-state-dependent solv ation. The accurate prediction of redox thermo dynamics requires capturing the complex interactions b etw een metal d- orbital energetics, solven t reorganisation, and long-range electrostatic in teractions, whic h makes it a challenging fron tier test for MLIPs. W e ev aluated the mo dels on aqueous M 2+ /M 3+ redo x couples (M = Ti, V, Mn, F e, Co) following the Marcus theory computational frame- w ork established in [90, 91]. F or each metal ion, we constructed systems con taining a single M 2+ or M 3+ ion with 64 w ater molecules in a cubic p erio dic cell (12.4 Å). The initial structures were generated with Pac kmol [87] and then relaxed with eac h mo del separately . Both oxidation states were simulated in their high-spin configurations: M 2+ with spin m ulti- plicities of 3 (Ti), 4 (V), 5 (F e), 6 (Mn), and 4 (Co); M 3+ with multiplicities of 2 (Ti), 3 (V), 6 (F e), 5 (Mn), and 1 (Co). F ollo wing previous proto cols [91], w e p erformed canon- ical (NVT) Langevin molecular dynamics simulations at 300 K for eac h oxidation state indep enden tly . After 50 ps equilibration, we collected 150 ps pro duction tra jectories, sampling configurations ev ery 20 fs. F or eac h sampled configuration R from the M 2+ ensem ble, we computed the vertical energy gap: ∆ E ( R ) = E M 3+ ( R ) − E M 2+ ( R ) (45) via single-p oin t energy calculations at b oth oxidation states. The same pro cedure w as applied to configura- tions from the M 3+ ensem ble. The ensem ble av erages ⟨ ∆ E ⟩ M 2+ and ⟨ ∆ E ⟩ M 3+ directly yield the reorganisation free energy and redox free energy through linear resp onse relations: ∆ F = 1 2 ( ⟨ ∆ E ⟩ M 2+ + ⟨ ∆ E ⟩ M 3+ ) (46) Crucially , ∆ F computed via Eq. 46 approximates the free energy difference of the differen t o xidation states. 22 Self-distributed ( E , V vs SHE) On the MACE-POLAR-1-L tra jectories ( E , V vs SHE) Ion Spin multiplicities E exp MACE-POLAR-1-M MACE-POLAR-1-L MACE-OMOL UMA-S-1P1 MA CE-POLAR-1-M MACE-POLAR-1-L MACE-OMOL UMA-S-1P1 Co 2+ /Co 3+ (4 , 1) 1.92 0.309 0.663 Crashed. Crashed. 0.383 0.664 10.412 14.394 F e 2+ /F e 3+ (5 , 6) 0.77 1.561 1.380 Crashed. Crashed. 1.557 1.380 3.313 2.223 Mn 2+ /Mn 3+ (6 , 5) 1.50 1.734 1.280 Crashed. Crashed. 1.590 1.280 2.892 -1.664 Ti 2+ /Ti 3+ (3 , 2) -0.37 -0.367 -0.273 10.498 Crashed. -0.350 -0.273 -3.248 -11.807 V 2+ /V 3+ (4 , 3) -0.255 0.329 0.514 Crashed. Crashed. 0.385 0.514 -9.804 0.419 MAE (V) 0.679 0.588 - - 0.673 0.676 6.534 6.902 T ABLE IV. Redox p oten tials E in v olts for M 2+ /M 3+ aqueous couples computed from Marcus-theory free energy . Mo del v alues hav e b een shifted b y a constant estimated from a linear regression fit to account for differences in reference p otential. Exp erimen tal potentials E exp (V vs SHE) are taken from the same dataset as in the main text. Cell colours indicate the absolute deviation | E − E exp | , from green (small error) to red (large error). T o compare with the exp erimen tal standard reduction p oten tials E ◦ exp measured vs. the standard hydrogen elec- tro de (SHE), the arbitrary mo del-dep enden t shift in the free energy must b e taken into account. Since our sim- ulations do not include explicit calculations for proton insertion [90] or v acuum alignment to establish the abso- lute reference, we apply a constant shift p er-model C model determined by least-squares fitting for each mo del to ex- p erimen tal data: E ◦ pred = ∆ F F + C model (47) where F is the F araday constant. This approac h enables the ev aluation of relative redo x trends and mo del p erfor- mance while ac kno wledging the absence of an ab initio absolute reference. T able IV presen ts the predicted redo x p otentials after applying the optimal shift for eac h model. W e report results from tw o proto cols: using p er-mo del MD tra jec- tories (left columns, called "Self-distributed") and using shared MACE-POLAR-1-L tra jectories ev aluated with all models (right columns). The shared-tra jectory pro- to col enables ev aluation of the accuracy of the unstable mo dels MA CE-OMOL and UMA-S-1P1 . MA CE-POLAR-1-M and MA CE-POLAR-1-L achiev e mean absolute errors (MAEs) of 0.60–0.64 V across the fiv e redo x couples, represen ting a dramatic improv emen t o ver lo cal mo dels. Critically , MA CE-OMOL and UMA- S-1P1 fail catastrophically on their o wn tra jectories, with most simulations crashing during the equilibration. When forced to ev aluate on stable MACE-POLAR-1-L tra jectories, these lo cal mo dels pro duce errors exceeding 10 V, demonstrating that they fundamentally misrepre- sen t the energetics of solv ated transition metal ions. The relativ ely mo dest accuracy (MA E ∼ 0.6 V) despite correct qualitative physics reflects the challenge of this b enc hmark: transition metal redox couples inv olv e m ulti- reference character, strong electron correlation, and sub- tle spin-state energetics that push the limits of the cur- ren t foundation mo dels. Notably , it is difficult to ev alu- ate the accuracy of the hybrid DFT reference itself. The MA CE-POLAR-1 mo dels’ ability to approac h this accu- racy while maintaining computational efficiency demon- strates that explicit long-range electrostatics successfully transfers the qualitative physics of charge lo calisation and solv ation response to this c hallenging domain, far outside the training distribution. These results confirm that MACE-POLAR-1 mo dels capture qualitativ ely the essen tial physics of aqueous redox chemistry—c harge lo- calisation, p olarisation, and solven t reorganisation—ev en though quan titativ e accuracy will require further fine- tuning. K. Effect of external fields The MACE-POLAR-1 mo dels are trained only on en- ergies and forces, but can naturally extrap olate to ex- ternal fields by using Equation 17 and adding the po- ten tial generated by the external field to the p oten tial from the charge densit y . Computing the response to ex- ternal fields is a challenging extrapolation test of the learned electrostatic response. Figure 12 presen ts the results for external-field resp onse prop erties: molecu- lar dip oles (Dip146 dataset) and p olarisabilities (HR46 dataset). The dip oles and p olarisabilities are computed b y finite differences of the total energy with resp ect to a uniform applied field: dip ole moments are obtained from a first-order finite difference of the total energy ( µ i = − [ E (+∆ e ) − E ( − ∆ e )] / (2∆ e )) with ∆ e = 10 − 3 V/Å, and p olarisabilities are obtained from the second deriv a- tiv e ( α ii = − [ E (+∆ e ) − 2 E (0) + E ( − ∆ e )] / ∆ e 2 ) with ∆ e = 5 × 10 − 4 V/Å. The geometries are re-centred to the molecular centroid in order to av oid translation problems. Critically , MACE-POLAR-1 mo dels were trained exclu- siv ely on ground-state energies and forces without any explicit sup ervision on resp onse prop erties, making this a pure test of whether the machine learning p oten tials hav e learned ph ysically correct induction physics from the un- derlying quan tum mechanical training data. External field resp onse properties represent a hierarc hical series of increasingly challenging extrap olation tasks: molecular dip oles prob e the first-order linear resp onse of the elec- tron density to an applied electric field, while p olarisabil- ities characterise the second-order nonlinear response, re- quiring accurate prediction of field-induced redistribution of charge density . F or the Dip146 b enchmark, wB97M- V (the reference metho d used in training data gener- ation) ac hiev es MAE = 0.058 D, demonstrating in ter- nal consistency , while MA CE-POLAR-1-M and MACE- 23 5 0 5 CCSD(T) dipoles (D) 8 6 4 2 0 2 4 6 Model dipoles (D) Dip146 wB97M-V (MAE=0.058) MACE-POLAR-1-M (MAE=0.197) MACE-POLAR-1-L (MAE=0.174) 0 5 10 15 CCSD(T) polarizabilities (ų) 0 5 10 15 Model polarizabilities (ų) HR46 wB97M-V (MAE=0.187) MACE-POLAR-1-M (MAE=1.76) MACE-POLAR-1-L (MAE=1.85) FIG. 12. External electric field resp onse properties. Scatter plots compare mo del predictions (y-axis) against ω B97M-V reference v alues (x-axis) for molecular resp onse prop erties. Left: Dip146 dataset of molecular dip ole mag- nitudes (Deb ye). Righ t: HR46 dataset of static isotropic po- larisabilities (Å 3 ). The diagonal line indicates p erfect agree- men t; p oin ts closer to the diagonal represent more accu- rate predictions. Notably , the MACE-POLAR-1 mo dels w ere trained only on ground-state energies and forces, yet predict these resp onse properties through the learned electrostatic ph ysics without explicit sup ervision. POLAR-1-L attain MAE = 0.197 D and 0.174 D, re- sp ectiv ely—represen ting ∼ 3 × degradation relativ e to the reference but still maintaining quantitativ e accuracy for most molecules. The p olarisabilit y b enc hmark (HR46) presen ts a substantially more demanding test of second- order resp onse, where wB97M-V achiev es MAE = 0.187 Å 3 , while MACE-POLAR-1-M and MACE-POLAR-1-L sho w MAE = 1.76 Å 3 and 1.85 Å 3 —an order of magni- tude degradation. Notably , the MA CE-POLAR-1 mo dels 0 2 4 6 8 10 12 14 r2SCAN-3c E (kcal/mol) 0 2 4 6 8 10 12 14 Model E (kcal/mol) Lanthanide complexes conformers MACE-POLAR-1-M (MAE=0.83) MACE-POLAR-1-L (MAE=0.83) UMA-S-1P1 (MAE=0.83) UMA-M-1P1 (MAE=1.85) ORBMOL (MAE=1.48) FIG. 13. Lanthanide isomerisation energies. Relative energies (kcal/mol) for seven Ln complexes (La, Ce, Nd, Sm, Eu, Lu; 18 isomers total) from the GFN-FF study [92] com- pared against r2SCAN-3c references (T able S4). Poin ts show mo del ∆ E v ersus reference; dashed line is p erfect agree- men t. Models: MA CE-POLAR-1-M /MACE-POLAR-1-L (teal), UMA-S-1P1 /UMA-M-1P1 (gra y), ORBMOL (pur- ple). MACE-OMOL is excluded due to unstable topologies on this set. exhibit systematic errors for low-polarisability molecules (cluster of predictions near zero in the HR46 plot), sug- gesting that while the mo dels capture the dominan t elec- trostatic induction effects through explicit long-range in- teractions, the man y-b ody p olarisation tensor requires higher-order correlation effects not fully enco ded in the short-range message-passing arc hitecture. Nevertheless, the abilit y of mo dels trained solely on energies and forces to predict dipoles with ∼ 0.2 D accuracy and p olarisabil- ities with ∼ 2 Å 3 accuracy demonstrates that physically meaningful electronic response emerges naturally from learning accurate p otential energy surfaces. L. Lan thanides W e ev aluate the lanthanide b enchmark of Rose et al. [92] on seven published complexes containing isomers of lan thanum, cerium, neo dymium, samarium, europium, and lutetium. W e use the pro vided OR CA geometries, c harges, and spins with r2SCAN-3c reference energies. Lan thanide isomerisation energies prob e heavy-elemen t b onding and relativ e stabilit y in f-blo ck complexes; small splittings and op en-shell f-electron configurations make the ordering sensitive to the electronic description. All 24 MLIPs matc h the reference ordering to within ∼ 1 kcal/- mol on av erage except ORBMOL , which sho ws several outliers; the tightest splittings (La, Eu) drive the visi- ble offsets. Local MACE-OMOL failed to conv erge for sev eral top ologies and is excluded. M. Charge T ransfer T ests T o v alidate charge transfer and localisation during dis- so ciation, w e analysed the separation of c harged water clusters into tw o fragments (Fig. 14). W e track the evo- lution of fragmen t c harges as a function of interfragmen t distance for systems with total c harges of − 1 , 0 , and +1 (in | e | ). This test prob es how the mo del partitions and lo calises charge b etw een separating fragments—a criti- cal capabilit y for describing bond breaking in charged systems. F or the neutral cluster (Fig. 14b), both MACE- POLAR-1-M and MACE-POLAR-1-L correctly maintain near-zero c harges on b oth fragmen ts throughout sepa- ration. F or charged clusters (Fig. 14a,c), the mo dels demonstrate physically correct charge lo calisation: as fragmen ts separate b ey ond 4 Å, the excess charge lo- calises quantitativ ely on one fragment, smo othly conv erg- ing to the DFT-predicted limits of − 1 / 0 and +1 / 0 (in | e | ) for negative and p ositive systems, resp ectiv ely . Notably , b oth fragments carry comparable partial charges at short distances (1–2 Å), reflecting p olarisation in the contact regime, b efore the c harge lo calises as electrostatic cou- pling weak ens. This smo oth evolution v alidates that ex- plicit long-range electrostatics enables ph ysically correct c harge redistribution during dissociation without spuri- ous charge delocalisation betw een distant fragmen ts. IV. Conclusions and Outlo ok MFP mo dels represen t a significan t adv ance in molec- ular mo delling, demonstrating that explicit treatment of electrostatics within machine learning p otentials yields substantial improv emen ts ov er state-of-the-art lo cal MLIPs. F ollowing extensiv e testing, w e con- firmed the excellent quan titative accuracy of the mo dels across div erse chemical systems, including small organic molecules, large protein-ligand complexes, redo x ions in solution, transition metal complexes, and molecular crys- tals. The broa d quantitativ e accuracy of the MACE- POLAR-1 mo dels on molecular chemistry positions them as p ow erful to ols for computational chemistry . The key arc hitectural inno v ation of the p olarisable long-range up- date pro vides a blueprin t for next-generation foundation MLIPs. By capturing b oth short-range quantum effects and long-range electrostatics including p olarisation ef- fects and charge transfer, the mo del bridges the gap b e- t ween the accurate description of short-range quantum effects and the physical description of long-range electro- static interactions. Sev eral directions for future dev elopmen t emerge from this work. The incorp oration of machine-learned dis- p ersion corrections could further enhance accuracy for v an der W aals complexes. The liquid densities, includ- ing water density , follo w the correct qualitativ e trends but remain systematically off b y ab out 5–10% com- pared to exp eriment. This likely reflects limitations of the reference DFT functional used in OMol25, al- though mo del-arc hitecture and sim ulation-proto col ef- fects ma y also contribute. These results suggest that mo ving b ey ond hybrid DFT references, tow ards coupled- cluster-qualit y data, could improv e quantitativ e liquid- prop ert y accuracy . F urthermore, the developmen t of sp e- cialised v arian ts for specific domains (proteins, materials, catalysis) could push accuracy even further while allow- ing higher computational efficiency by , e.g., distillation strategies. More broadly , this work demonstrates that ph ysics-informed mac hine learning—combining the flex- ibilit y of neural netw orks with rigorous ph ysical princi- ples—offers a path tow ard the simulation of molecular systems with ab-initio quan tum accuracy at scale. Data A v ailability The OMol25 dataset is publicly a v ailable at https: //huggingface.co/facebook/OMol25 . All b enchmark datasets are av ailable from their resp ectiv e publications. Pro cessed data and analysis scripts to repro duce the b enc hmarks are pro vided via the ML Performance Guide (ML-PEG): https://ml- peg.stfc.ac.uk . Co de A v ailabilit y The MACE co de is a v ailable at https://github. com/ACEsuit/mace . The training scripts and mo d- els are av ailable at https://github.com/ACEsuit/ mace- foundations/mace- polar- 1 . ML-PEG co de is a v ailable at https://github.com/ddmms/ml- peg . A ckno wledgemen ts W e would lik e to thank NVIDIA Lepton Com- pute for providing compute to run experiments. The authors ackno wledge the use of resources pro vided b y the Isam bard-AI National AI Researc h Resource (AIRR) [93]. Isam bard-AI is op erated by the Universit y of Bristol and is funded by the UK Gov ernmen t’s Depart- men t for Science, Innov ation and T echnology (DSIT) via UK Research and Innov ation; and the Science and T ech- nology F acilities Council [ST/AIRR/I-A-I/1023]. W e w ould like to thank UK Sov ereign AI for providing com- pute on Isambard-AI. E.K. and A.M.E. were supp orted b y the Ada Lov elace Cen tre at the Science and T echnology F acilities Coun- cil (https://adalo v elacecen tre.ac.uk/), the Physical Sci- ences Data Infrastructure (https://psdi.ac.uk; jointly 25 2 4 6 8 Distance between fragments (Å) 1.0 0.8 0.6 0.4 0.2 0.0 F ragment Char ge (e) (a) Negative Cluster ( 1) 2 4 6 8 10 Distance between fragments (Å) 0.15 0.10 0.05 0.00 0.05 0.10 0.15 (b) Neutral Cluster (0) 2 4 6 8 10 Distance between fragments (Å) 0.0 0.2 0.4 0.6 0.8 1.0 (c) Positive Cluster (+1) wB97M- V (F rag 1) wB97M- V (F rag 2) MA CE-POL AR -1-M (F rag 1) MA CE-POL AR -1-M (F rag 2) MA CE-POL AR -1-L (F rag 1) MA CE-POL AR -1-L (F rag 2) DF T limits FIG. 14. Charge lo calisation during water cluster disso ciation. Line plots show the partial charge (in charge atomic units, | e | ) on eac h of t wo water cluster fragments as a function of interfragmen t separation (Å), for systems with total charge (a) − 1 , (b) 0 , and (c) +1 (all in | e | ). Solid and dashed lines show c harges on fragmen ts 1 and 2, respectively , as predicted b y MA CE-POLAR-1-M (blue) and MA CE-POLAR-1-L (orange). Black solid lines show ω B97M-V Hirshfeld reference charges; dotted horizontal lines indicate the exp ected in teger c harges at infinite separation. A t short distances (1–2 Å), p olarisation in the contact regime distributes c harge across b oth fragments. As fragments separate b ey ond 4 Å, excess charge localises on to one fragment, conv erging to the ph ysical − 1 / 0 or +1 / 0 limits. STF C and the Universit y of Southampton) under grants EP/X032663/1 and EP/X032701/1, and EPSRC un- der grants EP/W026775/1 and EP/V028537/1. W e are grateful for computational supp ort from the UK na- tional high-p erformance computing service, AR CHER2, for which access w as obtained via the UKCP consortium and funded by EPSRC gran t reference EP/P022065/1 and EP/X035891/1. I.B. was supp orted by the Harding Distinguished Postgraduate Sc holarship. J. H. was sup- p orted by The LennardJones Cen tre Ruth Lynden-Bell Sc holarship in Scien tific Computing. Comp eting Interests GC is a partner in Symmetric Group LLP that licenses force fields commercially . GC and JHM hav e equity in- terests in Ångström AI. 26 [1] I. Batatia, D. P . Ko v acs, G. Simm, C. Ortner, and G. Csányi, Mace: Higher order equiv ariant message passing neural netw orks for fast and accurate force fields, Adv ances in Neural Information Processing Sys- tems 35 , 11423 (2022). [2] S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P . Mailoa, M. Korn bluth, N. Molinari, T. E. Smidt, and B. K ozinsky , E (3)-equiv arian t graph neural netw orks for data-efficien t and accurate in teratomic potentials, Nature communications 13 , 2453 (2022). [3] X. F u, B. M. W o o d, L. Barroso-Luque, D. S. Levine, M. Gao, M. Dzam ba, and C. L. Zitnick, Learn- ing smo oth and expressive interatomic p otentials for ph ysical property prediction (2025), [ph ysics.comp-ph]. [4] B. Rho des, S. V andenhaute, V. Šimkus, J. Gin, J. Go d- win, T. Duignan, and M. Neumann, Orb-v3: atom- istic simulation at scale (2025), arXiv:2504.06231 [cond- mat.m trl-sci]. [5] A. Mazitov, F. Bigi, M. Kellner, P . Pegolo, D. Tisi, G. F raux, S. Pozdn y ako v, P . Loche, and M. Ceriotti, P et-mad as a light w eight universal interatomic p oten- tial for adv anced materials modeling, Nature Commu- nications 16 , 10.1038/s41467-025-65662-7 (2025). [6] D. M. Anstine and O. Isa y ev, Machine learning inter- atomic p otentials and long-range ph ysics, The Jour- nal of Physical Chemistry A 127 , 2417 (2023), pMID: 36802360. [7] J. Behler, F our generations of high-dimensional neu- ral netw ork p oten tials, Chemical Reviews 121 , 10037 (2021), pMID: 33779150. [8] W. J. Baldwin, I. Batatia, M. V ondrák, J. T. Margraf, and G. Csányi, Design space of self-consistent electro- static machine learning force fields (2026), manuscript in preparation. [9] F. Grasselli, K. Rossi, S. de Gironcoli, and A. Grisafi, Long-range electrostatics in atomistic mac hine learning: a physical p erspective (2026), arXiv:2602.11071 [cond- mat.m trl-sci]. [10] T. Morawietz, V. Sharma, and J. Behler, A neu- ral net work p oten tial-energy surface for the water dimer based on environmen t-dependent atomic ener- gies and c harges, The Journal of Chemical Physics 136 , 10.1063/1.3682557 (2012). [11] N. Artrith, T. Morawietz, and J. Behler, High- dimensional neural-netw ork potentials for multicompo- nen t systems: Applications to zinc oxide, Phys. Rev. B 83 , 153101 (2011). [12] M. J. Mills and P . L. Popelier, Intramolecular p olaris- able m ultip olar electrostatics from the mac hine learning metho d kriging, Computational and Theoretical Chem- istry 975 , 42–51 (2011). [13] O. T. Unk e and M. Meuwly , Ph ysnet: A neural net- w ork for predicting energies, forces, dipole moments, and partial charges, Journal of Chemical Theory and Computation 15 , 3678 (2019), pMID: 31042390. [14] A. Grisafi and M. Ceriotti, Incorp orating long-range ph ysics in atomic-scale machine learning, The Journal of Chemical Physics 151 , 10.1063/1.5128375 (2019). [15] L. Zhang, H. W ang, M. C. Muniz, A. Z. Panagiotopou- los, R. Car, and W. E, A deep p oten tial mo del with long- range electrostatic in teractions, The Journal of Chemi- cal Physics 156 , 124107 (2022). [16] D. Zhang, X. Liu, X. Zhang, C. Zhang, C. Cai, H. Bi, Y. Du, X. Qin, A. Peng, J. Huang, B. Li, Y. Shan, J. Zeng, Y. Zhang, S. Liu, Y. Li, J. Chang, X. W ang, S. Zhou, J. Liu, X. Luo, Z. W ang, W. Jiang, J. W u, Y. Y ang, J. Y ang, M. Y ang, F.-Q. Gong, L. Zhang, M. Shi, F.-Z. Dai, D. M. Y ork, S. Liu, T. Zhu, Z. Zhong, J. Lv, J. Cheng, W. Jia, M. Chen, G. Ke, W. E, L. Zhang, and H. W ang, Dpa-2: a large atomic mo del as a multi-task learner, np j Computational Materials 10 , 10.1038/s41524-024-01493-2 (2024). [17] B. Cheng, Latent ew ald summation for mac hine learning of long-range interactions, np j Computational Materials 11 , 10.1038/s41524-025-01577-7 (2025). [18] A. Kab ylda, J. T. F rank, S. Suárez-Dou, A. Khabibrakhmanov, L. Medrano Sandonas, O. T. Unk e, S. Chmiela, K.-R. Müller, and A. Tk atchenk o, Molecular simulations with a pretrained neural netw ork and universal pairwise force fields, Journal of the American Chemical So ciet y 147 , 33723–33734 (2025). [19] T. Plé, L. Lagardère, and J.-P . Piquemal, F orce-field- enhanced neural netw ork in teractions: from lo cal equiv- arian t em b edding to atom-in-molecule prop erties and long-range effects, Chemical Science 14 , 12554–12569 (2023). [20] T. Plé, O. Adjoua, A. Benali, E. Posenitskiy , C. Villot, L. Lagardère, and J.-P . Piquemal, A foundation mo del for accurate atomistic sim ulations in drug design, Chem- Rxiv (2025). [21] R. Zubatyuk, J. S. Smith, B. T. Nebgen, S. T retiak, and O. Isay ev, T eaching a neural netw ork to attac h and de- tac h electrons from molecules, Nature Communications 12 , 10.1038/s41467-021-24904-0 (2021). [22] E. K o cer, R. El Haouari, C. Dellago, and J. Behler, Ma- c hine learning p oten tials for redo x chemistry in solution (2024), arXiv:2410.03299 [physics.c hem-ph]. [23] W. J. Mortier, K. V an Genech ten, and J. Gasteiger, Electronegativit y equalization: application and parametrization, Journal of the American Chemical So ciet y 107 , 829 (1985). [24] W. J. Mortier, S. K. Ghosh, and S. Shank ar, Electronegativit y-equalization metho d for the calcula- tion of atomic c harges in molecules, Journal of the American Chemical So ciet y 108 , 4315 (1986). [25] A. K. Rappe and W. A. Goddard, Charge equilibra- tion for molecular dynamics sim ulations, The Journal of Physical Chemistry 95 , 3358 (1991). [26] S. A. Ghasemi, A. Hofstetter, S. Saha, and S. Go edec ker, Interatomic p otentials for ionic systems with densit y functional accuracy based on charge den- sities obtained by a neural netw ork, Ph ys. Rev. B 92 , 45131 (2015). [27] S. F ara ji, S. A. Ghasemi, S. Rostami, R. Rasoulkhani, B. Sc haefer, S. Go edeck er, and M. Amsler, High accu- racy and transferability of a neural netw ork potential through c harge equilibration for calcium fluoride, Phys. Rev. B 95 , 104105 (2017). [28] S. F ara ji, S. A. Ghasemi, B. P arsaeifard, and S. Goedeck er, Surface reconstructions and premelting of the (100) caf2 surface, Phys. Chem. Chem. Ph ys. 21 , 27 16270 (2019). [29] T. W. Ko, J. A. Finkler, S. Go edeck er, and J. Behler, A fourth-generation high-dimensional neural netw ork p oten tial with accurate electrostatics including non- lo cal c harge transfer, Nature Communications 12 , 398 (2021). [30] C. G. Staack e, S. W engert, C. Kunkel, G. Csányi, K. Reuter, and J. T. Margraf, Kernel charge equili- bration: efficient and accurate prediction of molecu- lar dipole moments with a machine-learning enhanced electron density model, Mac hine Learning: Science and T echnology 3 , 15032 (2022). [31] S. Gong, Y. Zhang, Z. Mu, Z. Pu, H. W ang, X. Han, Z. Y u, M. Chen, T. Zheng, Z. W ang, L. Chen, Z. Y ang, X. W u, S. Shi, W. Gao, W. Y an, and L. Xiang, A pre- dictiv e machine learning force-field framework for liquid electrolyte developmen t, Nature Machine Intelligence 7 , 543–552 (2025). [32] F. Jensen, Unifying Charge-Flow Polarization Mo dels, Journal of Chemical Theory and Computation 19 , 4047 (2023). [33] J. P . Perdew, R. G. P arr, M. Levy , and J. L. Balduz, Densit y-F unctional Theory for F ractional Particle Num- b er: Deriv ativ e Discontin uities of the Energy , Physical Review Letters 49 , 1691 (1982). [34] M. V ondrák, K. Reuter, and J. T. Margraf, Pushing Charge Equilibration Based Mac hine Learning Poten- tials to their Limits (2025). [35] G. Lee W arren, J. E. Davis, and S. P atel, Origin and con trol of sup erlinear p olarizabilit y scaling in chemical p oten tial equalization metho ds, The Journal of Chemi- cal Physics 128 , 10.1063/1.2872603 (2008). [36] R. Chelli, P . Pro cacci, R. Righini, and S. Califano, Electrical resp onse in chemical p oten tial equalization sc hemes, The Journal of Chemical Ph ysics 111 , 8569 (1999). [37] T. V erstraelen, E. Pau wels, F. De Proft, V. V an Sp ey- bro ec k, P . Geerlings, and M. W aro quier, Assessmen t of atomic c harge mo dels for gas-phase computations on p olypeptides, Journal of Chemical Theory and Compu- tation 8 , 661 (2012), pMID: 26596614. [38] J. Thomas, W. Baldwin, G. Csanyi, and C. Ortner, Self- consisten t coulomb interactions for machine learning in- teratomic p otentials, Nonlinearity 38 , 095024 (2025). [39] A. Gao and R. C. Remsing, Self-consistent determina- tion of long-range electrostatics in neural netw ork p o- ten tials, Nature Communications 13 , 1572 (2022). [40] M. Co ols-Ceupp ens, J. Dambre, and T. V erstrae- len, Mo deling electronic resp onse prop erties with an explicit-electron machine learning potential, Journal of Chemical Theory and Computation 18 , 1672 (2022), pMID: 35171606. [41] X. Xie, K. A. P ersson, and D. W. Small, Incorp orat- ing electronic information in to machine learning p oten- tial energy surfaces via approac hing the ground-state electronic energy as a function of atom-based electronic p opulations, Journal of Chemical Theory and Compu- tation 16 , 4256 (2020), pMID: 32502350. [42] I. Batatia, P . Benner, Y. Chiang, A. M. Elena, D. P . K ov ács, J. Rieb esell, X. R. Advincula, M. Asta, M. A v a ylon, W. J. Baldwin, F. Berger, N. Bern- stein, A. Bhowmik, S. M. Blau, V. Cărare, J. P . Darb y , S. De, F. D. Pia, V. L. Deringer, R. Elijošius, Z. El-Machac hi, F. F alcioni, E. F ako, A. C. F errari, A. Genreith-Sc hriever, J. George, R. E. A. Go o dall, C. P . Grey , P . Grigorev, S. Han, W. Handley , H. H. Heenen, K. Hermansson, C. Holm, J. Jaafar, S. Hof- mann, K. S. Jakob, H. Jung, V. Kapil, A. D. Kaplan, N. Karimitari, J. R. Kermo de, N. Kroupa, J. Kull- gren, M. C. Kuner, D. Kuryla, G. Liepuoniute, J. T. Margraf, I.-B. Magdău, A. Mic haelides, J. H. Mo ore, A. A. Naik, S. P . Niblett, S. W. Norwoo d, N. O’Neill, C. Ortner, K. A. Persson, K. Reuter, A. S. Rosen, L. L. Schaaf, C. Schran, B. X. Shi, E. Sivonxa y , T. K. Stenczel, V. Sv ahn, C. Sutton, T. D. Swinburne, J. Tilly , C. v an der Oord, E. V arga-Um brich, T. V egge, M. V on- drák, Y. W ang, W. C. Witt, F. Zills, and G. Csányi, A foundation model for atomistic materials chemistry (2024), arXiv:2401.00096 [physics.c hem-ph]. [43] B. Deng, P . Zhong, K. Jun, J. Rieb esell, K. Han, C. J. Bartel, and G. Ceder, Chgnet as a pretrained universal neural netw ork p oten tial for charge -informed atomistic mo delling, Nature Machine Intelligence 5 , 1031–1041 (2023). [44] B. M. W o od, M. Dzam ba, X. F u, M. Gao, M. Shuaibi, L. Barroso-Luque, K. Ab delmaqsoud, V. Gharakhany an, J. R. Kitchin, D. S. Levine, K. Mic hel, A. Sriram, T. Cohen, A. Das, A. Rizvi, S. J. Saho o, Z. W. Ulissi, and C. L. Zitnick, Uma: A family of universal mo dels for atoms (2025), [cs.LG]. [45] J. S. Smith, O. Isay ev, and A. E. Roitb erg, Ani-1: an extensible neural net work p otential with dft accuracy at force field computational cost, Chemical Science 8 , 3192 (2017). [46] C. Devereux, J. S. Smith, K. K. Huddleston, K. Barros, R. Zubatyuk, O. Isa y ev, and A. E. Roitb erg, Extend- ing the applicability of the ani deep learning molecular p oten tial to sulfur and halogens, Journal of Chemical Theory and Computation 16 , 4192 (2020). [47] J. S. Smith, R. Zubatyuk, B. Nebgen, N. Lubb ers, K. Barros, A. E. Roitberg, O. Isay ev, and S. T retiak, The ani-1ccx and ani-1x data sets, coupled-cluster and densit y functional theory prop erties for molecules, Sci- en tific Data 7 , 134 (2020). [48] D. P . K o vács, J. H. Mo ore, N. J. B rowning, I. Bata- tia, J. T. Horton, Y. Pu, V. Kapil, W. C. Witt, I.-B. Magdáu, D. J. Cole, and G. Csányi, Mace-off: Short- range transferable machine learning force fields for or- ganic molecules, Journal of the American Chemical So- ciet y 147 , 17598 (2025). [49] P . Eastman, P . K. Behara, D. L. Dotson, R. Galvelis, J. E. Herr, J. T. Horton, Y. Mao, J. D. Chodera, B. P . Pritc hard, Y. W ang, G. D. F abritiis, and T. E. Mark- land, Spice, a dataset of drug-like molecules and p ep- tides for training mac hine learning potentials (2022), arXiv:2209.10702 [physics.c hem-ph]. [50] R. Zubatyuk, J. S. Smith, J. Leszczynski, and O. Isay ev, A ccurate and transferable m ultitask prediction of chem- ical prop erties with an atoms-in-molecules neural net- w ork, Science Adv ances 5 , eaa v6490 (2019). [51] D. M. Anstine, R. Zubat yuk, and O. Isay ev, Aimnet2: a neural netw ork potential to meet your neutral, charged, organic, and elemen tal-organic needs, Chemical Science 16 , 10228 (2025). [52] B. Kalita, R. Zubat yuk, D. M. Anstine, M. Bergeler, V. Settels, C. Stork, S. Spicher, and O. Isa yev, Aimnet2- nse: A transferable reactive neural net work potential for 28 op en-shell chemistry , Angewandte Chemie International Edition 65 , 10.1002/anie.202516763 (2025). [53] A. Kab ylda, J. T. F rank, S. Suárez-Dou, A. Khabibrakhmanov, L. Medrano Sandonas, O. T. Unk e, S. Chmiela, K.-R. Müller, and A. Tk atchenk o, Molecular simulations with a pretrained neural netw ork and universal pairwise force fields, Journal of the American Chemical So ciet y 147 , 33723 (2025). [54] D. S. Levine, M. Shuaibi, E. W. C. Sp otte-Smith, M. G. T aylor, M. R. Hasyim, K. Michel, I. Batatia, G. Csányi, M. Dzam ba, P . Eastman, N. C. F rey , X. F u, V. Gharakhany an, A. S. Krishnapriyan, J. A. Rack ers, S. Ra ja, A. Rizvi, A. S. Rosen, Z. Ulissi, S. V argas, C. L. Zitnick, S. M. Blau, and B. M. W o o d, The op en molecules 2025 (omol25) dataset, ev aluations, and mo d- els (2025), arXiv:2505.08762 [ph ysics.c hem-ph]. [55] I. Batatia, C. Lin, J. Hart, E. Kasoar, A. M. Elena, S. W. Norwoo d, T. W olf, and G. Csányi, Cross learning b et w een electronic structure theories for unifying molec- ular, surface, and inorganic crystal foundation force fields (2025), arXiv:2510.25380 [ph ysics.c hem-ph]. [56] D. P . Ko v ács, I. Batatia, E. S. Arany , and G. Csányi, Ev aluation of the mace force field architecture: F rom medicinal chemistry to materials science, The Journal of Chemical Physics 159 , 10.1063/5.0155322 (2023). [57] G. Mako v and M. C. Pa yne, P erio dic b oundary condi- tions in ab initio calculations, Phys. Rev. B 51 , 4014 (1995). [58] T. F roitzheim, M. Müller, A. Hansen, and S. Grimme, g-xtb: A general-purpose extended tight-binding elec- tronic structure method for the elements h to lr (z=1–103) (2025). [59] J. Liang and M. Head-Gordon, Gold-standard c hemi- cal database 137 (gscdb137): A diverse set of accurate energy differences for assessing and developing density functionals, Journal of Chemical Theory and Computa- tion 21 , 12601–12621 (2025). [60] L. Go erigk, A. Hansen, C. Bauer, S. Ehrlich, A. Na jibi, and S. Grimme, A look at the densit y functional theory zo o with the adv anced gm tkn55 database for general main group thermo chemistry , kinetics and noncov alen t in teractions, Ph ysical Chemistry Chemical Physics 19 , 32184–32215 (2017). [61] N. Mardirossian and M. Head-Gordon, Thirt y years of densit y functional theory in computational chemistry: an ov erview and extensive assessmen t of 200 density functionals, Molecular Physics 115 , 2315–2372 (2017). [62] R. Sure and S. Grimme, Comprehensive b enchmark of asso ciation (free) energies of realistic host–guest com- plexes, Journal of Chemical Theory and Computation 11 , 3785–3801 (2015). [63] V. Gharakhan yan, L. Barroso-Luque, Y. Y ang, M. Sh uaibi, K. Michel, D. S. Levine, M. Dzamba, X. F u, M. Gao, X. Liu, H. Ni, K. Noori, B. M. W oo d, M. Uyt- tendaele, A. Boromand, C. L. Zitnick, N. Marom, Z. W. Ulissi, and A. Sriram, Op en molecular crystals 2025 (omc25) dataset and models (2025), [ph ysics.chem-ph]. [64] F. Della Pia, A. Zen, D. Alfè, and A. Michaelides, How accurate are sim ulations and exp erimen ts for the lattice energies of molecular crystals?, Physical Review Letters 133 , 10.1103/physrevlett.133.046401 (2024). [65] L. S. Price, M. Paloni, M. Salv alaglio, and S. L. Price, One size fits all? developmen t of the cp oss209 data set of experimental and hypothetical poly- morphs for testing computational mo deling meth- o ds, Crystal Gro wth & Design 25 , 3186 (2025), h ttps://doi.org/10.1021/acs.cgd.5c00255. [66] M. Pulev a, L. Medrano Sandonas, B. D. Lörincz, J. Charry , D. M. Rogers, P . R. Nagy , and A. Tk atchenk o, Extending quantum-mec hanical b ench- mark accuracy to biological ligand-p o c ket in teractions, Nature Comm unications 16 , 10.1038/s41467-025-63587- 9 (2025). [67] K. Kriz and J. Rezac, Benchmarking of semiempirical quan tum-mechanical metho ds on systems relev an t to computer-aided drug design, Journal of Chemical In- formation and Mo deling 60 , 1453 (2020). [68] D. A. W app ett and L. Go erigk, Benchmarking densit y functional theory metho ds for metallo enzyme reactions: The introduction of the mme55 set, Journal of Chemical Theory and Computation 19 , 8365–8383 (2023). [69] L. R. Maurer, M. Bursch, S. Grimme, and A. Hansen, Assessing density functional theory for chemically rel- ev an t op en-shell transition metal reactions, Journal of Chemical Theory and Computation 17 , 6134 (2021). [70] H. Neugebauer, H. T. V uong, J. L. W eb er, R. A. F ries- ner, J. Shee, and A. Hansen, T ow ard b enc hmark-qualit y ab initio predictions for 3d transition metal electro cat- alysts: A comparison of ccsd(t) and ph-afqmc, Journal of Chemical Theory and Computation 19 , 6208–6225 (2023). [71] D. I. Sharapa, A. Genaev, L. Cav allo, and Y. Min enk o v, A robust and cost-efficient sc heme for accurate confor- mational energies of organic molecules, ChemPh ysChem 20 , 92 (2019). [72] S. Ehlert, S. Grimme, and A. Hansen, Conformational energy b enc hmark for longer n-alk ane c hains, The Jour- nal of Physical Chemistry A 126 , 3521 (2022). [73] H.-J. W erner and A. Hansen, Accurate calculation of isomerization and conformational energies of larger molecules using explicitly correlated lo cal coupled clus- ter metho ds in molpro and orca, Journal of Chemical Theory and Computation 19 , 7007 (2023). [74] C. Plett, S. Grimme, and A. Hansen, T ow ard reliable conformational energies of amino acids and dipeptides - the dip confs b enc hmark and dip conl datasets, Journal of Chemical Theory and Computation 20 , 8329 (2024). [75] M. Marianski, A. Supady , T. Ingram, M. Schneider, and C. Baldauf, Assessing the accuracy of across-the-scale metho ds for predicting carb oh ydrate conformational en- ergies for the examples of glucose and α -maltose, Jour- nal of c hemical theory and computation 12 , 6157 (2016). [76] J. Řezáč, D. Bím, O. Gutten, and L. Rulíšek, T ow ard ac- curate conformational energies of smaller p eptides and medium-sized macro cycles: Mp conf196 benchmark en- ergy data set, Journal of Chemical Theory and Compu- tation 14 , 1254 (2018). [77] C. Plett, S. Grimme, and A. Hansen, Conformational energies of biomolecules in solution: Extending the mp- conf196 b enchmark with explicit water molecules, Jour- nal of Computational Chemistry 45 , 419 (2023). [78] P . K. Behara, H. Jang, J. T. Horton, T. Gokey , D. L. Dotson, S. Bo othro yd, C. I. Ba yly , D. J. Cole, L.-P . W ang, and D. L. Mobley , Benc hmarking quan tum me- c hanical levels of theory for v alence parametrization in force fields, The Journal of Physical Chemistry B 128 , 7888 (2024). 29 [79] H. Kruse, A. Mladek, K. Gkionis, A. Hansen, S. Grimme, and J. Sponer, Quan tum chemical b ench- mark study on 46 rna backbone families using a dinu- cleotide unit, Journal of Chemical Theory and Compu- tation 11 , 4972 (2015). [80] J. L. W eber, R. D. Guha, G. Agarwal, Y. W ei, A. A. Fik e, X. Xie, J. Stevenson, K. Leswing, M. D. Halls, R. Ab el, and L. D. Jacobson, Efficient long-range ma- c hine learning force fields for liquid and materials prop- erties (2025), arXiv:2505.06462 [ph ysics.c hem-ph]. [81] A. Hjorth Larsen, J. Jørgen Mortensen, J. Blomqvist, I. E. Castelli, R. Christensen, M. Dułak, J. F riis, M. N. Grov es, B. Hammer, C. Hargus, E. D. Hermes, P . C. Jennings, P . Bjerre Jensen, J. Kermo de, J. R. Kitc hin, E. Leonhard Kolsb jerg, J. Kubal, K. Kaasb- jerg, S. Lysgaard, J. Bergmann Maronsson, T. Max- son, T. Olsen, L. Pastewk a, A. Peterson, C. Rost- gaard, J. Schiøtz, O. Sch ütt, M. Strange, K. S. Thyge- sen, T. V egge, L. Vilhelmsen, M. W alter, Z. Zeng, and K. W. Jacobsen, The atomic sim ulation environ- men t—a python library for working with atoms, Journal of Physics: Condensed Matter 29 , 273002 (2017). [82] J. Wiktor, F. Ambrosio, and A. Pasquarello, Note: As- sessmen t of the scan+rvv10 functional for the structure of liquid water, The Journal of Chemical Physics 147 , 216101 (2017). [83] S. R. Pruitt, H. Nak ata, T. Nagata, E. R. May es, Y. Alexeev, G. D. Fletcher, D. G. F edorov, K. Ki- taura, and M. S. Gordon, Imp ortance of three-b o dy in teractions in molecular dynamics simulations of w a- ter demonstrated with the fragmen t molecular orbital metho d, Journal of Chemical Theory and Computation 12 , 1423 (2016). [84] D. Manna, M. K. Kesharwani, N. Sylv etsky , and J. M. L. Martin, Conv entional and explicitly correlated ab ini- tio b enchmark study on water clusters: Revision of the b egdb and water27 data sets, Journal of Chemical The- ory and Computation 13 , 3136 (2017). [85] A. K. Soper, The Radial Distribution F unctions of W a- ter as Deriv ed from Radiation T otal Scattering Exp eri- men ts: Is There Anything W e Can Say for Sure?, ISRN Ph ysical Chemistry 2013 , 1 (2013). [86] I.-B. Magdău, D. J. Arismendi-Arrieta, H. E. Smith, C. P . Grey , K. Hermansson, and G. Csán yi, Machine learning force fields for molecular liquids: Ethylene carb onate/eth yl meth yl carbonate binary solven t, np j Computational Materials 9 , 146 (2023). [87] L. Martínez, R. Andrade, E. G. Birgin, and J. M. Martínez, Pac kmol: A pack age for building initial configurations for molecular dynamics simulations, Journal of Computational Chemistry 30 , 2157 (2009), h ttps://onlinelibrary .wiley .com/doi/p df/10.1002/jcc.21224. [88] G. J. Mart yna, D. J. T obias, and M. L. Klein, Constant pressure molecular dynamics algorithms, The Journal of Chemical Physics 101 , 4177 (1994). [89] S. Bhattacharjee, M. Isegaw a, M. Garcia-Ratés, F. Neese, and D. A. Pan tazis, Ionization energies and redo x p otentials of hydrated transition metal ions: Ev al- uation of domain-based lo cal pair natural orbital cou- pled cluster approac hes, Journal of Chemical Theory and Computation 18 , 1619 (2022). [90] J. Blumberger and M. Sprik, Ab initio molecular dy- namics sim ulation of the aqueous ru2+/ru3+ redo x re- action: The marcus p ersp ectiv e, The Journal of Physi- cal Chemistry B 109 , 6793–6804 (2005). [91] S. Mandal, R. Kar, B. Meyer, and N. N. Nair, Hybrid functional and plane w av es based ab initio molecular dynamics study of the aqueous fe2+/fe3+ redox re- action**, ChemPhysChem 24 , 10.1002/cphc.202200617 (2022). [92] T. Rose, M. Bursch, J.-M. Mewes, and S. Grimme, F ast and robust modeling of lan thanide and actinide complexes, biomolecules, and molecular crystals with the extended gfn-ff mo del, Inorganic Chemistry 63 , 19364–19374 (2024). [93] S. McIntosh-Smith, S. R. Alam, and C. W o ods, Isam bard-ai: a leadership class sup ercomputer opti- mised sp ecifically for artificial in telligence (2024). [94] S. P arthiban and J. M. L. Martin, Assessment of W1 and W2 theories for the computation of electron affini- ties, ionization p oten tials, heats of formation, and pro- ton affinities, J. Chem. Phys. 114 , 6014 (2001). [95] L. Cheng et al., V ertical ionization potentials of small molecules: A b enc hmark study , J. Chem. Ph ys. 127 , 214105 (2007). [96] S. Luo and D. G. T ruhlar, How evenly can appro ximate densit y functionals treat the different m ultiplicities and ionization states of 4d transition metal atoms?, J. Chem. Theory Comput. 8 , 4112 (2012). [97] B. Ermiş, E. Ekinci, and U. Bozk a ya, State-of-the- art computations of vertical electron affinities with the extended K o opmans’ theorem in tegrated with the CCSD(T) metho d, J. Chem. Theory Comput. 17 , 7648 (2021). [98] M. Korth and S. Grimme, “mindless” dft b enchmarking, J. Chem. Theory Comput. 5 , 993 (2009). [99] B. Chan et al., Assessment of DFT metho ds for transi- tion metals with the TMC151 compilation of data sets and comparison with accuracies for main-group chem- istry , J. Chem. Theory Comput. 13 , 3385 (2017). [100] B. Chan and J. M. Simmie, Barriometry—an enhanced database of accurate barrier heights for gas-phase reac- tions, Phys. Chem. Chem. Phys. 20 , 10732 (2018). [101] Y. Zhao, H. T. Ng, R. P ev erati, and D. G. T ruhlar, Benc hmark database for ylidic b ond disso ciation ener- gies and its use for assessments of electronic structure metho ds, J. Chem. Theory Comput. 8 , 2824 (2012). [102] A. Karton et al., Hea vy atom transfer reactions: HA T707 benchmark, J. Phys. Chem. A (2017), from W4-11 dataset. [103] Y. Zhao et al., Radical stabilization energies: RSE43 b enc hmark, J. Chem. Theory Comput. 8 (2012). [104] B. Chan et al., Bond separation reactions: BSR36 b enc hmark, J. Chem. Theory Comput. (2016). [105] Y. Zhao and D. G. T ruhlar, Design of densit y functionals b y combining the metho d of constraint satisfaction with parametrization for thermochemistry , thermo chemical kinetics, and noncov alen t in teractions, J. Chem. Theory Comput. 4 , 1849 (2008). [106] V. K. Prasad, Z. Pei, S. Edelmann, A. Otero-de-la Roza, and G. A. DiLabio, BH9, a new comprehensive b enc h- mark data set for barrier heigh ts and reaction energies: Assessmen t of density functional approximations and basis set incompleteness p oten tials, J. Chem. Theory Comput. 18 , 151 (2022). [107] L.-J. Y u, F. Sarrami, R. J. O’Reilly , and A. Karton, Re- action barrier heights for cyclorev ersion of hetero cyclic rings: An Ac hilles’ heel for DFT and standard ab initio 30 pro cedures, Chem. Ph ys. 458 , 1 (2015). [108] A. Karton, A. T arnop olsky , J.-F. Lamère, G. C. Schatz, and J. M. L. Martin, Highly accurate first-principles b enc hmark data sets for the parametrization and v al- idation of density functional and other approximate metho ds. Deriv ation of a robust, generally applicable, double-h ybrid functional for thermo c hemistry and ther- mo c hemical kinetics, J. Phys. Chem. A 112 , 12868 (2008). [109] A. Karton, R. J. O’Reilly , B. Chan, and L. Radom, Determination of barrier heights for proton exchange in small water, ammonia, and h ydrogen fluoride clusters with G4(MP2)-type, MPn, and SCS-MPn procedures— a cav eat, J. Chem. Theory Comput. 8 , 3128 (2012). [110] A. Karton, R. J. O’Reilly , and L. Radom, Assessment of theoretical pro cedures for calculating barrier heigh ts for a diverse set of water-catalyzed proton-transfer re- actions, J. Phys. Chem. A 116 , 4211 (2012). [111] J. F riedrich, Efficien t calculation of accurate reaction energies-assessmen t of different mo dels in electronic structure theory , J. Chem. Theory Comput. 11 , 3596 (2015). [112] L. A. Curtiss, K. Raghav ac hari, G. W. T rucks, and J. A. P ople, Gaussian-2 theory for molecular energies of first- and second-row comp ounds, J. Chem. Phys. 94 , 7221 (1991). [113] L.-J. Y u, F. Sarrami, R. J. O’Reilly , and A. Karton, Can dft and ab initio metho ds describ e all asp ects of the potential energy surface of cyclorev ersion reactions?, Mol. Phys. 114 , 21 (2016). [114] L. Goerigk and S. Grimme, A general database for main group thermochemistry , kinetics, and nonco v alen t in teractions- assessment of common and reparameter- ized (meta-) gga density functionals, J. Chem. Theory Comput. 6 , 107 (2010). [115] R. G. Parr and R. G. Pearson, Absolute hardness: com- panion parameter to absolute electronegativity , J. Am. Chem. So c. 105 , 7512 (1983). [116] P . Geerlings, F. De Proft, and W. Langenaeker, Concep- tual densit y functional theory , Chem. Rev. 103 , 1793 (2003). [117] W. J. Mortier, S. K. Ghosh, and S. Shank ar, Electronegativit y-equalization metho d for the calcula- tion of atomic charges in molecules, J. Am. Chem. So c. 108 , 4315 (1986). [118] A. K. Rappé and W. A. Goddard I I I, Charge equilibra- tion for molecular dynamics simulations, J. Ph ys. Chem. 95 , 3358 (1991). [119] A. Karton, Highly accurate ccsdt(q)/cbs reaction bar- rier heigh ts for a div erse set of transition structures: Basis set con vergence and cost-effective approaches for estimating p ost-ccsd(t) con tributions, J. Phys. Chem. A 123 , 6720 (2019). [120] Y. Zhao, B. J. Lynch, and D. G. T ruhlar, Multi- co efficien t extrap olated density functional theory for thermo c hemistry and thermo chemical kinetics, Phys. Chem. Chem. Phys. 7 , 43 (2005). [121] Y. Zhao, N. González-García, and D. G. T ruhlar, Benc hmark database of barrier heigh ts for heavy atom transfer, nucleophilic substitution, association, and uni- molecular reactions and its use to test theoretical meth- o ds, J. Ph ys. Chem. A 109 , 2012 (2005). [122] J. Zheng, Y. Zhao, and D. G. T ruhlar, Representativ e b enc hmark suites for barrier heigh ts of div erse reaction t yp es and assessmen t of electronic structure methods for thermochemical kinetics, J. Chem. Theory Comput. 3 , 569 (2007). [123] L. Goerigk and R. Sharma, The inv24 test set: how well do quan tum-c hemical metho ds describe in v ersion and racemization barriers?, Can. J. Chem. 94 , 1133 (2016). [124] D. Hait and M. Head-Gordon, How accurate is density functional theory at predicting dipole moments? an as- sessmen t using a new database of 200 b enchmark v alues, J. Chem. Theory Comput. 14 , 1969 (2018). [125] A. L. Hick ey and C. N. Rowley , Benchmarking quantum c hemical metho ds for the calculation of molecular dip ole momen ts and polarizabilities, J. Phys. Chem. A 118 , 3678 (2014). [126] A. Karton, D. Gruzman, and J. M. L. Martin, Bench- mark thermo chemistry of the cnh2n+2 alk ane isomers (n = 2-8) and p erformance of dft and comp osite ab ini- tio metho ds for disp ersion-driv en isomeric equilibria, J. Ph ys. Chem. A 113 , 8434 (2009). [127] D. Manna and J. M. L. Martin, What are the ground state structures of c20 and c24? an explicitly correlated ab initio approach, J. Phys. Chem. A 120 , 153 (2016). [128] R. Sure, A. Hansen, P . Sch werdtfeger, and S. Grimme, Comprehensiv e theoretical study of all 1812 c60 isomers, Ph ys. Chem. Chem. Ph ys. 19 , 14296 (2017). [129] L.-J. Y u and A. Karton, Assessment of theoretical pro- cedures for a div erse set of isomerization reactions in- v olving double-b ond migration in conjugated dienes, Chem. Phys. 441 , 166 (2014). [130] L.-J. Y u, F. Sarrami, A. Karton, and R. J. O’Reilly , An assessmen t of theoretical pro cedures for π -conjugation stabilisation energies in enones, Mol. Phys. 113 , 1284 (2015). [131] S. Grimme, M. Steinmetz, and M. Korth, How to com- pute isomerization energies of organic molecules with quan tum chemical methods, J. Org. Chem. 72 , 2118 (2007). [132] R. Huenerbein, B. Sc hirmer, J. Mo ellmann, and S. Grimme, Effects of london disp ersion on the isomer- ization reactions of large organic molecules: a density functional benchmark study , Ph ys. Chem. Chem. Ph ys. 12 , 6940 (2010). [133] A. Karton, S. Daon, and J. M. Martin, W4-11: A high-confidence benchmark dataset for computational thermo c hemistry derived from first-principles w4 data, Chem. Phys. Lett. 510 , 165 (2011). [134] A. Karton and J. M. Martin, Explicitly correlated b enc hmark calculations on c8h8 isomer energy separa- tions: how accurate are dft, double-hybrid, and compos- ite ab initio procedures?, Mol. Phys. 110 , 2477 (2012). [135] D. Gruzman, A. Karton, and J. M. L. Martin, Perfor- mance of ab initio and density functional metho ds for conformational equilibria of cnh2n+2 alk ane isomers (n = 4-8), J. Ph ys. Chem. A 1 13 , 11974 (2009). [136] M. K. Kesharwani, A. Karton, and J. M. L. Martin, Benc hmark ab initio conformational energies for the proteinogenic amino acids through explicitly correlated metho ds. assessmen t of density functional metho ds, J. Chem. Theory Comput. 12 , 444 (2016). [137] S. Kozuc h, S. M. Bac hrac h, and J. M. Martin, Con- formational equilibria in butane-1, 4-diol: a b enchmark of a protot ypical system with strong intramolecular h- b onds, J. Ph ys. Chem. A 118 , 293 (2014). 31 [138] U. R. F ogueri, S. K ozuch, A. Karton, and J. M. Martin, The melatonin conformer space: Benchmark and assess- men t of wa v e function and dft metho ds for a paradig- matic biological and pharmacological molecule, J. Phys. Chem. A 117 , 2269 (2013). [139] D. Řeha, H. V aldes, J. V ondrášek, P . Hobza, A. Abu- Riziq, B. Crews, and M. S. De V ries, Structure and ir sp ectrum of phenylalan yl–glycyl–glycine trip etide in the gas-phase: Ir/uv exp eriments, ab initio quantum c hem- ical calculations, and molecular dynamic sim ulations, Chem. Eur. J. 11 , 6803 (2005). [140] L. Go erigk, A. Karton, J. M. Martin, and L. Radom, A c- curate quantum c hemical energies for tetrap eptide con- formations: why mp2 data with an insufficient basis set should b e handled with caution, Phys. Chem. Chem. Ph ys. 15 , 7028 (2013). [141] J. M. L. Martin, What can we learn ab out disp er- sion from the conformer surface of n-pentane?, J. Ph ys. Chem. A 117 , 3118 (2013). [142] G. I. Csonk a, A. D. F rench, G. P . Johnson, and C. A. Stortz, Ev aluation of density functionals and basis sets for carb oh ydrates, J. Chem. Theory Comput. 5 , 679 (2009). [143] H. Kruse, A. Mladek, K. Gkionis, A. Hansen, S. Grimme, and J. Sponer, Quan tum chemical b ench- mark study on 46 rna backbone families using a din- ucleotide unit, J. Chem. Theory Comput. 11 , 4972 (2015). [144] J. Řezáč, Y. Huang, P . Hobza, and G. J. O. Beran, Benc hmark calculations of three-b ody intermolecular in teractions and the p erformance of low-cost electronic structure methods, J. Chem. Theory Comput. 11 , 3065 (2015). [145] S. A. Ochieng and K. Patk o wski, Accurate three-b o dy nonco v alen t interactions: the insights from energy de- comp osition, Ph ys. Chem. Chem. Ph ys. 25 , 28621 (2023). [146] J. Witte, M. Goldey , J. B. Neaton, and M. Head- Gordon, Beyond energies: Geometries of nonbonded molecular complexes as metrics for assessing electronic structure approaches, J. Chem. Theory Comput. 11 , 1481 (2015). [147] J. Řezáč and P . Hobza, Describing nonco v alen t in terac- tions b eyond the common approximations: How accu- rate is the “gold standard,” ccsd(t) at the complete basis set limit?, J. Chem. Theory Comput. 9 , 2151 (2013). [148] S. Grimme, J. Anton y , S. Ehrlic h, and H. Krieg, A con- sisten t and accurate ab initio parametrization of densit y functional dispersion correction (DFT-D) for the 94 el- emen ts H-Pu, J. Chem. Phys. 132 , 154104 (2010). [149] K. U. Lao, R. Schäffer, G. Jansen, and J. M. Herb ert, A ccurate description of intermolecular interactions in- v olving ions using symmetry-adapted perturbation the- ory , J. Chem. Theory Comput. 11 , 2473 (2015). [150] A. Bauzá, I. Alkorta, A. F rontera, and J. Elguero, On the reliability of pure and hybrid dft metho ds for the ev aluation of halogen, c halcogen, and pnicogen b onds in volving anionic and neutral electron donors, J. Chem. Theory Comput. 9 , 5201 (2013). [151] D. L. Crittenden, A systematic ccsd(t) study of long- range and noncov alen t interactions b etw een b enzene and a series of first- and second-ro w h ydrides and rare gas atoms, J. Ph ys. Chem. A 113 , 1663 (2009). [152] S. N. Steinmann, C. Piemontesi, A. Delachat, and C. Cormin bo euf, Wh y are the interaction energies of c harge-transfer complexes challenging for dft?, J. Chem. Theory Comput. 8 , 1629 (2012). [153] B. J. Mintz and J. M. P arks, Benchmark interaction energies for biologically relev ant nonco v alen t complexes con taining div alent sulfur, J. Phys. Chem. A 116 , 1086 (2012). [154] K. U. Lao and J. M. Herb ert, An improv ed treat- men t of empirical disp ersion and a many-bo dy en- ergy decomp osition scheme for the explicit p olariza- tion plus symmetry-adapted p erturbation theory (xs- apt) metho d, J. Chem. Ph ys. 139 , 034107 (2013). [155] S. Y o o, E. Apra, X. C. Zeng, and S. S. Xantheas, High- lev el ab initio electronic structure calculations of water clusters (h2o) 16 and (h2o) 17: A new global minimum for (h2o) 16, J. Phys. Chem. Lett. 1 , 3122 (2010). [156] J. K. Kazimirski and V. Buch, Search for low energy structures of w ater clusters (h2o) n, n= 20- 22, 48, 123, and 293, J. Ph ys. Chem. A 107 , 9762 (2003). [157] A. D. Bo ese, Basis set limit coupled-cluster studies of h ydrogen-b onded systems, Mol. Phys. 113 , 1618 (2015). [158] J. Řezáč, Non-cov alent interactions atlas b enc hmark data sets: Hydrogen b onding, J. Chem. Theory Com- put. 16 , 2355 (2020). [159] V. Oliveira and E. Krak a, Systematic coupled cluster study of noncov alent interactions inv olving halogens, c halcogens, and pnicogens, J. Ph ys. Chem. A 121 , 9544 (2017). [160] J. Lang, G. Garberoglio, M. Przybytek, M. Jeziorsk a, and B. Jeziorski, Three-b ody p otential and third virial co efficien ts for helium including relativistic and nuclear- motion effects, Phys. Chem. Chem. Phys. 25 , 23395 (2023). [161] J. C. F av er, M. L. Benson, X. He, B. P . Rob erts, B. W ang, M. S. Marshall, M. R. Kennedy , C. D. Sherrill, and K. M. Merz Jr, F ormal estimation of errors in com- puted absolute in teraction energies of protein- ligand complexes, J. Chem. Theory Comput. 7 , 790 (2011). [162] K. L. Cop eland and G. S. T sc h ump er, Hydro carbon/wa- ter in teractions: Encouraging energetics and structures from dft but disconcerting discrepancies for hessian in- dices, J. Chem. Theory Comput. 8 , 1646 (2012). [163] T. T ak atani and C. D. Sherrill, P erformance of spin- comp onen t-scaled møller–plesset theory (scs-mp2) for p oten tial energy curv es of nonco v alen t interactions, Ph ys. Chem. Chem. Ph ys. 9 , 6106 (2007). [164] E. G. Hohenstein and C. D. Sherrill, Effects of het- eroatoms on aromatic π - π interactions: benzene- pyri- dine and p yridine dimer, J. Ph ys. Chem. A 113 , 878 (2009). [165] D. G. A. Smith, P . Janko wski, M. Slawik, H. A. Witek, and K. P atko wski, Basis set con v ergence of the p ost- ccsd(t) contribution to noncov alent interaction energies, J. Chem. Theory Comput. 10 , 3140 (2014). [166] K. Mada jczyk, P . S. Żucho wski, F. Brz¸ ek, Ł. Ra jchel, D. K¸ edziera, M. Modrzejewski, and M. Hapk a, Dataset of noncov alent intermolecular in teraction energy curv es for 24 small high-spin open-shell dimers, J. Chem. Phys. 154 , 134106 (2021). [167] D. Setia wan, E. Krak a, and D. Cremer, Strength of the pnicogen b ond in complexes in volving group v a elemen ts n, p, and as, J. Ph ys. Chem. A 119 , 1642 (2015). 32 [168] M. Przybytek, W. Cencek, B. Jeziorski, and K. Sza- lewicz, Pair p otential with submillikelvin uncertain ties and nonadiabatic treatment of the halo state of the he- lium dimer, Phys. Rev. Lett. 119 , 123401 (2017). [169] P . Jurečk a, J. Šponer, J. Čern` y, and P . Hobza, Bench- mark database of accurate (mp2 and ccsd (t) complete basis set limit) in teraction energies of small mo del com- plexes, dna base pairs, and amino acid pairs, Phys. Chem. Chem. Phys. 8 , 1985 (2006). [170] J. Řezáč, K. E. Riley , and P . Hobza, S66: A well- balanced database of b enc hmark interaction energies relev an t to biomolecular structures, J. Chem. Theory Comput. 7 , 2427 (2011). [171] B. T emelso, K. A. Arc her, and G. C. Sh ields, Benc hmark structures and binding energies of small water clusters with anharmonicit y corrections, J. Phys. Chem. A 115 , 12034 (2011). [172] N. Mardirossian, D. S. Lambrec ht, L. McCaslin, S. S. Xan theas, and M. Head-Gordon, The p erformance of densit y functionals for sulfate-water clusters, J. Chem. Theory Comput. 9 , 1368 (2013). [173] P . R. T en tsc her and J. S. Arey , Binding in radical- solv ent binary complexes: Benchmark energies and per- formance of approximate metho ds, J. Chem. Theory Comput. 9 , 1568 (2013). [174] V. S. Bryan tsev, M. S. Diallo, A. C. T. v an Duin, and W. A. Go ddard, Ev aluation of b3lyp, x3lyp, and m06- class density functionals for predicting the binding en- ergies of neutral, protonated, and deprotonated w ater clusters, J. Chem. Theory Comput. 5 , 1016 (2009). [175] J. Řezáč, K. E. Riley , and P . Hobza, Benchmark cal- culations of noncov alent interactions of halogenated molecules, J. Chem. Theory Comput. 8 , 4285 (2012). [176] S. Kozuc h and J. M. L. Martin, Halogen b onds: Benc h- marks and theoretical analysis, J. Chem. Theory Com- put. 9 , 1918 (2013). [177] S. P . McCarth y and A. J. Thakk ar, A ccurate all-electron correlation energies for the closed-shell atoms from ar to rn and their relationship to the corresponding mp2 correlation energies, J. Chem. Ph ys. 134 , 044102 (2011). [178] S. J. Chakrav orty , S. R. Gw altney , E. R. Da vidson, F. A. P arpia, and C. F. p Fisc her, Ground-state correlation energies for atomic ions with 3 to 18 electrons, Ph ys. Rev. A 47 , 3649 (1993). [179] E. R. Johnson, P . Mori-Sánchez, A. J. Cohen, and W. Y ang, Delo calization errors in densit y functionals and implications for main-group thermochemistry , J. Chem. Phys. 129 , 204112 (2008). [180] H. Y u and D. G. T ruhlar, Comp onents of the b ond energy in p olar diatomic molecules, radicals, and ions formed b y group-1 and group-2 metal atoms, J. Chem. Theory Comput. 11 , 2968 (2015). [181] S. N. Steinmann, G. Csonk a, and C. Corminbo euf, Uni- fied inter-and intramolecular disp ersion correction for- m ula for generalized gradient approximation density functional theory , J. Chem. Theory Comput. 5 , 2950 (2009). [182] J. F riedrich and J. Haänchen, Incremen tal ccsd (t)(f12*)| mp2: A black b o x method to obtain highly accurate reaction energies, J. Chem. Theory Comput. 9 , 5381 (2013). [183] B. Chan, Compilation of ionic clusters with the rock salt structure: Accurate b enchmark thermochemical data, assessmen t of quantum c hemistry metho ds, and the con- v ergence b ehavior of lattice energies, J. Phys. Chem. A 127 , 5652 (2023). [184] L. Go erigk and S. Grimme, Efficien t and accurate double-h ybrid-meta-gga densit y functionals–ev aluation with the extended gmtkn30 database for general main group thermochemistry , kinetics, and noncov alen t inter- actions, J. Chem. Theory Comput. 7 , 291 (2011). [185] B. Chan, Accurate thermo chemistry for main-group el- emen ts up to xenon with the wn-p34 series of comp osite metho ds, J. Chem. Theory Comput. 17 , 5704 (2021). [186] A. Karton, P . R. Sc hreiner, and J. M. L. Martin, Heats of formation of platonic h ydro carbon cages by means of high-lev el thermo c hemical procedures, J. Comput. Chem. 37 , 49 (2016). [187] S. Grimme, T ow ards first principles calculation of elec- tron impact mass sp ectra of molecules., Angew. Chem. In t. Ed. 52 (2013). [188] F. Neese, T. Sch w ab e, S. Kossmann, B. Schirmer, and S. Grimme, Assessment of orbital-optimized, spin- comp onen t scaled second-order many-bo dy p erturba- tion theory for thermo c hemistry and kinetics, J. Chem. Theory Comput. 5 , 3060 (2009). [189] N. B. Balabano v and K. A. Peterson, Basis set limit electronic excitation energies, ionization p oten tials, and electron affinities for the 3d transition metal atoms: Coupled cluster and multireference metho ds, J. Chem. Ph ys. 125 , 074110 (2006). [190] B. Chan, The cuagau set of coupled-cluster reference data for small copp er, silver, and gold comp ounds and assessmen t of dft methods, J. Phys. Chem. A 123 , 5781 (2019). [191] B. Chan, Dap d set of p d-con taining diatomic molecules: A ccurate molecular prop erties and the great lengths to obtain them, J. Chem. Theory Comput. 19 , 9260 (2023). [192] M. A. Iron and T. Janes, Ev aluating transition metal barrier heigh ts with the latest density functional theory exc hange–correlation functionals: The mobh35 bench- mark database, J. Ph ys. Chem. A 123 , 3761 (2019). [193] L. R. Maurer, M. Bursch, S. Grimme, and A. Hansen, Assessing density functional theory for chemically rel- ev an t open-shell transition metal reactions, J. Chem. Theory Comput. 17 , 6134 (2021). [194] B. Chan, P . M. W. Gill, and M. Kimura, Assessmen t of dft methods for transition metals with the tmc151 compilation of data sets and comparison with accuracies for main-group chemistry , J. Chem. Theory Comput. 15 , 3610 (2019). [195] J. Řezáč, Non-cov alent interactions atlas b enc hmark data sets: Hydrogen b onding, Journal of Chemical The- ory and Computation 16 , 2355–2368 (2020). 33 V. Supplemen tary Information A. WTMAD-2 aggregation The weigh ted total mean absolute deviation (WTMAD-2) follows the GMTKN55 conv en tion [60]. F or a collection of b enc hmark subsets k with N k en tries and absolute errors ∆ E k,i , the aggregate score is WTMAD - 2 = P k w k 1 N k P N k i =1 | ∆ E k,i | P k w k , (48) where w k = 56 . 84 kcal / mol ⟨| E k |⟩ rescales each subset by the in verse of its mean absolute reference magnitude ⟨| E k |⟩ . This w eighting balances datasets of very different energetic scales (e.g., small reaction energies vs. large atomisation ener- gies) and allows a single num ber to compare broad b enc hmark suites. W e rep ort WTMAD-2 in k cal/mol. B. Outlier filtering and excluded structures A small num ber of b enchmark structures are excluded per mo del. F or aggregation, we rep ort t w o ex- clusion categories: (i) single-atom configurations ( num_atoms ≤ 1 in the mo del CSV s), and (ii) outliers with | ∆ E | > 100 kcal/mol (summary bar plots only). F ull error distributions are rep orted without this outlier cap in the per-dataset SI figures. T able V reports the per-mo del breakdown for the thermo chem- istry and reaction-barrier subsets used in Fig. 2. Lists of excluded identifiers are provided in the uploaded CSV files (column exclusion_type indicates the reason): excluded_structures_MACE-POLAR-1-M.csv , excluded_structures_MACE-POLAR-1-L.csv , excluded_structures_MACE-OMOL.csv , excluded_structures_UMA-S-1P1.csv , excluded_structures_UMA-M-1P1.csv , excluded_structures_ORBMOL.csv , excluded_structures_g-xTB.csv , and the combined file excluded_structures_all_models.csv in the accompa- n ying data files. T ABLE V. Breakdo wn of excluded structures for the thermochemistry and reaction-barrier summary plots (Fig. 2). Columns rep ort single-atom exclusions ( num_atoms ≤ 1 ) and outliers with | ∆ E | > 100 kcal/mol (summary bars only). The final column giv es the fraction of excluded structures relative to the total num ber of structures ev aluated for eac h model on these subsets. Mo del Single-atom Outlier > 100 T otal % excluded MA CE-POLAR-1-M 86 9 95 3.08 MA CE-POLAR-1-L 86 11 97 3.14 MA CE-OMOL 86 96 182 6.26 UMA-S-1P1 86 66 152 4.92 UMA-M-1P1 86 59 145 4.69 ORBMOL 86 121 207 6.70 g-xTB 86 1 87 2.99 C. Hyp er-parameters a. Hyp erp ar ameter definitions. W e pro vide here a brief summary of the meaning of each h yp er-parameter in the table. More details on the h yp er-parameter choices and their implementation can be found in the original MACE pap er [1] and the MACE-MH-1 pap er [55]. • max_ell sets the highest angular momentum used in the spherical-harmonic expansion of lo cal en vironments, con trolling angular resolution. • correlation sets the maximum correlation order retained in the MACE basis. • max_L sets the maximum total angular momentum of messages passed b etw een in teraction lay ers. • num_c hannels_edge is the size of the edge features in the lo cal MACE part; see [55]. • num_c hannels_node is the size of the no de features in the lo cal MACE part; see [55]. 34 T ABLE VI. Hyp er-parameter settings for the tw o MACE-POLAR-1 v arian ts used in this work. Mo dels h yp er-parameter MACE-POLAR-1-M MA CE-POLAR-1-L max_ell 3 3 correlation 3 3 max_L 1 1 n um_channels_edge 128 128 n um_channels_node 512 512 n um_interactions 2 3 n um_radial_basis 8 8 r_max (Å) 6 6 in teractions_class non-linear [55] non-linear [55] irreps 16x0e 16x0e smearing_sigma_c harge (Å) 1.5 1.5 n um_field_features 2 2 smearing_sigma_field (Å) 1.5, 2.0 1.5, 2.0 field_features_l_max 1 1 spin_c harge_density_l_max 1 1 n um_up date 2 2 batc h size 256 256 energy co efficient 1 1 force co efficient 10 10 • num_in teractions is the num b er of MACE me ssage-passing la yers. • num_radial_basis is the num ber of radial basis functions used to expand in teratomic distances. • r_max is the neighbour cutoff radius that defines the lo cal graph used by MACE. • interactions_class specifies whether interaction blocks are linear or n on-linear; see [55]. • hidden_irreps giv es the hidden lay er dimension used in the readout lay ers. • smearing_sigma_charge is the Gaussian width for charge-densit y smearing (Eq. 8). • smearing_sigma_field is the Gaussian width used for field-feature smearing (Eq. 8). • num_field_features is the num ber of learned field-feature c hannels p er up date step. • field_features_l_max sets the maximum angular momentum of the electrostatic features used in non-lo cal up dates. • spin_charge_densit y_l_max sets the maximum m ultip ole order predicted for the spin/charge densit y . • num_update is the num ber of non-lo cal field-up date iterations applied during inference. • batch size is the effective global batc h size used for training across all devices. • energy co efficient is the weigh t applied to the energy term in the training loss. • force co efficient is the weigh t applied to the force term in the training loss. D. Thermo c hemistry and barrier b enc hmark datasets Thermo c hemistry subsets Ionisation p otentials (G21IP adiabatic first-ro w organics [94]; IP23 [95] and IP30 [96] v ertical IPs of small organic/main-group systems); electron affinities (G21EA [94], EA50 [97]); proton affinities (P A26, small organic bases [94]); mindless molecules (MB08-165 [98], MB16-43 [60]); bond disso ciation and HA T 35 sets (BDE99MR and BDE99nonMR [99], HNBrBDE18 [100], YBDE18 [101], HA T707MR/HA T707nonMR [102], RSE43 [103], BSR36 [104]). All b enc hmark names and citations are collected in this SI section for reference. Reaction barrier subsets Barrier heights (BH28 [59], BH46 [105], BH876 [106], BHDIV7 [60], ORBH35 [100], CRBH14 [107]); proton transfer (DBH22 [108], PX9 [109], W CPT26 [110]); general reaction energies (FH51 [111], G2R C24 [112], BH76RC [60], CR20 [113], D ARC [60], NBPRC [114], R C21 [114]). All reference v alues are CCSD(T)/CBS or explicitly correlated F12 where av ailable. a. WTMAD table subsets. T able I I rep orts WTMAD-2 on disjoint subsets to av oid ov erlap b et ween rows. The thermo c hemistry rows are: TC: Bond Energies + HA T (BDE99MR, BDE99nonMR, BSR36, HA T707MR, HA T707nonMR, RSE43, YBDE18); TC: Reaction Energies (BH76RC, CR20, DAR C, FH51, G2RC24, R C21); and TC: Thermo c hemistry (AlkIso d14, DC13, EA50, G21EA, HEA VYSB11, MB08-165, SN13, T AE_W4-17MR, W CPT6). The barrier ro w is BH: Barrier + Proton (BH28, BH46, BH876, BHDIV7, BHROT27, CRBH14, ORBH35, DBH22, PX9, WCPT26). HA T denotes hydrogen-atom transfer. E. Real Space Electrostatic Energy Computation 1. R e al Spac e Ener gy As describ ed in the main text, the electrostatic energy in op en b oundary conditions reduces to the following sum: E Hartree = 1 2 X ilm,j l ′ m ′ p i,lm p j,l ′ m ′ T ilm,j l ′ m ′ (49) T ilm,j l ′ m ′ := Z Z ϕ lm ( r − r i ) ϕ l ′ m ′ ( r ′ − r j ) | r − r ′ | d r d r ′ (50) In the same open-b oundary setting, the electrostatic p otential generated b y the multipoles is v ( r ) = X j l ′ m ′ p j,l ′ m ′ Z ϕ l ′ m ′ ( r ′ − r j ) | r − r ′ | d r ′ (51) and the atom-centred pro jected p otential feature entering Equation 17 is v i,nlm = 1 N nl X j l ′ m ′ p j,l ′ m ′ Z Z ϕ nlm ( r − r i ) ϕ l ′ m ′ ( r ′ − r j ) | r − r ′ | d r d r ′ . (52) F or l = l ′ = 0 , with source width σ charge and receiver width σ field ,n , this b ecomes v i,n 00 = X j p j, 00 erf ( r ij / (2 σ tot ,n )) r ij , σ tot ,n = s σ 2 charge + σ 2 field ,n 2 . (53) The electrostatic energy is recov ered from the same con traction when source and receiver basis widths are iden tical. One can easily ev aluate the monop ole co efficien ts for distinct i, j as: T i 00 ,j 00 = erf ( r ij / √ 2 σ ) r ij (54) where σ is the half-width of the Gaussian function ϕ n 00 . F or the dip ole interactions, one can approximate an l = 1 Gaussian t yp e orbital as a linear combination of l = 0 orbitals whic h are slightly displaced from one another. This is v alid not only for distan t i and j , but also when atoms i and j are close together so that the Gaussian functions on eac h atom o verlap. T o show this, consider elements of ϕ 1 m for m = − 1 , 0 , 1 with a fixed half-width σ . These are Gaussian t yp e orbitals (GTOs) with the follo wing form: ϕ 1 m ( r ) = C 1 r 1 exp  − | r − r i | 2 2 σ 2  Y 1 m  \ r − r i  (55) where C 1 is the normalisation constant. Using the Condon-Shortley phase con v ention for the spherical harmonics, these three GTOs ( m = − 1 , 0 , 1 ) are aligned with the y, z and x axes, in that order. Let ϕ 1 α for α = x, y , z b e a 36 p erm utation of ϕ 1 m whic h sw aps from the Condon-Shortley phase conv en tion ( y , z , x ) bac k to the Cartesian ordering x, y , z , so that ϕ 1 x is aligned with the x axis, and so on. W e wan t to approximate ϕ 1 x ( r ) using ϕ 00 ( r ) and ϕ 00 ( r + a ˆ x ) , where ˆ x is a unit vector in the x direction, and a is a small constant. This can b e done as follows. Using the expression (55), one can sho w that ϕ 1 x ( r ) = √ 3 σ 2 ∂ ∂ x ϕ 00 ( r ) × C 1 C 0 (56) where C 0 is the normalisation constant for the l = 0 GTO and C 1 is the normalisation constant for the l = 1 GTO. Then, since ∂ ∂ x f ( x ) = lim a → 0 f ( x + a ) − f ( x ) a , (57) w e can write ϕ 1 x ( r ) = C 1 C 0 √ 3 σ 2 lim a → 0  ϕ 00 ( r + a ˆ x ) − ϕ 00 ( r ) a  (58) In other w ords, with a few normalisation terms, one can app ro ximate l = 1 orbitals by using tw o l = 0 orbitals displaced by a distance a , and the approximation is exact for all r as a go es to zero. In practice, using a = 0 . 02 Å is sufficien t to ac hieve sub-me V total energy conv ergence. In order to compute the interaction energy of a charge p i, 00 on atom i and a dip ole p j, 1 m on atom j , we therefore pro ceed as follows. F or the x -comp onent of the dip ole, approximate p j, 1 x ϕ 1 x ( r − r j ) = q ϕ 00 ( r − r j + a ˆ x ) − q ϕ 00 ( r − r j ) (59) q = p j, 1 x C 1 √ 3 σ 2 aC 0 (60) Rep eat for the y and z comp onen ts, and then simply compute the interaction energy of these 6 Gaussian charges with other charges and dip oles using (54). The same displaced-charge construction is used for pro jected p oten tial features. F or feature ev aluation we use a = 0 . 1 Å, which improv es n umerical stability of the finite-difference pro jection while preserving the accuracy of the long-range feature map. 2. Self-Inter action Ener gy W e found it beneficial to add the energy of the self-in teraction of the Gaussian charge distribution in teracting with itself to the electrostatic energy . The self-energy for a single Gaussian of m ultip ole order l is: E ( l ) self ,i = 1 2(2 l + 1) Z ∞ 0 r 2 l +2 ϕ 2 l ( r ) V l ( r ) dr (61) where ϕ l ( r ) = C l r l e − r 2 / (2 σ 2 ) and V l ( r ) is the p otential generated by ϕ l . In the implementation, w e define: I 1 ( r , l , σ ) = 2 l +1 / 2 σ 2 l +3 r − ( l +1) P  2 l + 3 2 , r 2 2 σ 2  Γ  2 l + 3 2  (62) I 2 ( r , l , σ ) = σ 2 r l e − r 2 / (2 σ 2 ) (63) where P ( a, x ) is the regularised lo wer incomplete gamma function. The p oten tial contribution is then ev aluated as V l ( r ) = I 1 ( r , l , σ ) + I 2 ( r , l , σ ) and integrated numerically once during model initialisation for eac h ( l, σ ) pair, then stored as constant tensors. The total self-interaction correction is: E self = 1 2 X i X lm X l ′ m ′ p i,lm Γ mm ′ ll ′ ( σ ) p i,l ′ m ′ = 1 2 X i p T i Γ ( σ ) p i (64) where the ov erlap matrix Γ is diagonal in ( l , m ) indices due to angular momentum conserv ation. 37 F. P erio dic Electrostatic Energy Computation 1. R e cipr o c al Sp ac e Grid Construction F or a perio dic system with lattice vectors forming the matrix L = [ a 1 , a 2 , a 3 ] , the recipro cal lattice vectors are defined through L ∗ = 2 π ( L T ) − 1 . The k-space grid is constructed as: k n 1 ,n 2 ,n 3 = n 1 b 1 + n 2 b 2 + n 3 b 3 (65) where b i are the recipro cal lattice vectors (columns of L ∗ ) and ( n 1 , n 2 , n 3 ) ∈ Z 3 are integer indices chosen suc h that | k n 1 ,n 2 ,n 3 | ≤ k cutoff . While one could use all these vectors, we make use of the fact that the density and p otential are scalar fields and hence hav e ˜ f ( k ) = ˜ f ( − k ) ∗ . W e therefore store only a v alues on a half-grid and adjust the formulae b elo w to make use of the conjugate symmetry . The k-space cutoff is determined adaptively from the Gaussian basis parameters using: k cutoff = κ · 2 . 25 σ min ( l max + 1) 0 . 3 (66) where κ is a user-sp ecified factor (t ypically 1.5), σ min = min( σ charge , σ field , 1 , . . . ) is the minimum smearing width across all basis functions, and l max is the maxim um angular momentum. This heuristic ensures that the F ourier transform of the most localised basis function is adequately resolved. 2. F ourier T r ansform of Gaussian Multip oles The F ourier transform of a Gaussian spherical harmonic basis function ϕ nlm ( r − r i ) inv olv es separating angular and radial components. Using the plane wa v e expansion e − i k · r = 4 π P lm ( − i ) l j l ( k r ) Y ∗ lm ( ˆ k ) Y lm ( ˆ r ) where j l is the spherical Bessel function, and in tegrating the Gaussian radial function r l e − r 2 / (2 σ 2 ) against j l ( k r ) , w e obtain: ˜ ϕ nlm ( k ) = Z ϕ nlm ( r ) e − i k · r d r = C l · ( − i ) l Y lm ( ˆ k ) · I l,σ ( k ) (67) where the radial in tegral I l,σ ( k ) ev aluates to: I l,σ ( k ) = 4 π r π 2 σ 2 l +3 k l e − k 2 σ 2 / 2 (68) The fourier series of the full charge densit y is then computed as: ˜ ρ ( k ) = (2 π ) 3 Ω N atoms X i =1 l max X l =0 l X m = − l p i,lm ˜ ϕ lm ( k ) e − k · r i (69) This is ev aluated by first computing the real and imaginary parts separately: Re [ ˜ ρ ( k )] (70) = (2 π ) 3 Ω X i,lm p i,lm [ Re [ ˜ ϕ lm ( k )] cos( k · r i ) + Im [ ˜ ϕ lm ( k )] sin( k · r i )] (71) Im [ ˜ ρ ( k )] (72) = (2 π ) 3 Ω X i,lm p i,lm [ Im [ ˜ ϕ lm ( k )] cos( k · r i ) − Re [ ˜ ϕ lm ( k )] sin( k · r i )] (73) The phases ( − i ) l from the spherical harmonics expansion lead to real parts for even l and imaginary parts for o dd l : Re [ ˜ ϕ lm ( k )] = ( C l ( − 1) l/ 2 Y lm ( ˆ k ) I l,σ ( k ) if l ev en 0 if l o dd (74) Im [ ˜ ϕ lm ( k )] = ( 0 if l even − C l ( − 1) ( l − 1) / 2 Y lm ( ˆ k ) I l,σ ( k ) if l odd (75) 38 3. Coulomb Op er ator in R e cipro c al Sp ac e The Coulomb potential corresp onding to density ˜ ρ ( k ) is obtained by applying the bare Coulomb k ernel: ˜ V ( k ) = 4 π k 2 ˜ ρ ( k ) (76) The electrostatic energy is the inner pro duct of densit y and potential in recipro cal space: E k-space elec = Ω 2 Z R 3 d 3 k (2 π ) 3 ˜ ρ ∗ ( k ) ˜ V ( k ) (77) Discretising ov er the k-p oin t grid and noting that | ˜ ρ ( k ) | 2 = Re [ ˜ ρ ] 2 + Im [ ˜ ρ ] 2 , we obtain: E k-space elec = Ω (2 π ) 6 X k  = 0 k ∈ half-grid 4 π k 2  Re [ ˜ ρ ( k )] 2 + Im [ ˜ ρ ( k )] 2  (78) The factor of (2 π ) 6 arises from the double F ourier transform in the conv olution. The k = 0 term is excluded b ecause it corresponds to the interaction of the uniform background charge densit y and v anishes for neutral systems. F or systems ev aluated in a sufficiently large cell, the k-space sum conv erges exp onen tially with k cutoff . 4. Pr oje cte d Potential F e atur es in R e cipr o c al Sp ac e The pro jected p oten tial features are ev aluated with the same recipro cal-space representation. Giv en ˜ V ( k ) , the pro jection on receiver basis function ( n, l , m ) centred at atom i is v i,nlm = 1 (2 π ) 3     Re h ˜ V ∗ ( 0 ) ˜ ϕ nlm ( 0 ) i + 2 X k  = 0 k ∈ half-grid Re h ˜ V ∗ ( k ) ˜ ϕ nlm ( k ) e − i k · r i i     , (79) where “half-grid” denotes the same op en-half recipro cal set used throughout the recipro cal-space ev aluation; the k = 0 term is treated separately to av oid double counting. When self-interaction features are not included, the on-site ov erlap contribution is subtracted from the pro jected co efficien ts. F or molecular and slab systems ev aluated with the recipro cal-space solver, w e then add the same finite- size correction fields (Mako v–P ayne–Dabo for molecules, slab dip ole correction for 2D-p erio dic cells), pro jected onto the receiver basis, b efore com bining them with the recipro cal-space contribution. 5. Finite-Size Corr e ctions: Makov-Payne-Dab o F ormalism When computing energies of isolated molecules or clusters within p erio dic b oundary conditions, artificial in teractions with p erio dic images must b e corrected. F ollowing Mak ov and Pa yne [Ph ys. Rev. B 51, 4014 (1995)] and Dab o et al. [Ph ys. Rev. B 77, 115139 (2008)], w e apply m ultip ole-based corrections. The leading monop ole correction accounts for the spurious self-interaction of the net charge Q with its p erio dic images: ∆ E mono = α M 2 L Q 2 (80) where L = Ω 1 / 3 and α M = 2 . 837297 is the Madelung constant for a simple cubic lattice, approximating the long-range electrostatic p oten tial exp erienced by a p oin t charge in a p erio dic cubic array . The dip ole correction arises from the quadrup ole interaction energy of the system with the field generated by its p eriodic dip ole images: ∆ E dip = 2 π 3Ω | p | 2 (81) 39 where p = P i q i r i + P i p local i is the total dip ole momen t, including b oth charge-position contributions and intrinsic atomic dip oles (for l max ≥ 1 ). This correction dep ends on the choice of origin; we define r i relativ e to the centre of mass of the atoms. The isotropic quadrup ole correction accounts for the finite spatial extent of the charge distribution: ∆ E quad = − 2 π Q 3Ω T r [ Q ] (82) where the trace of the quadrup ole tensor is T r [ Q ] = P i q i r 2 i + 2 P i r i · p local i . These corrections are applied selectiv ely based on the p erio dicit y tensor pbc : for fully non-p eriodic systems ( pbc = [F alse, F alse, F alse]), all three corrections are applied; for p eriodic systems ( pbc = [T rue, T rue, T rue]), none are applied; for slab geometries ( pbc = [T rue, T rue, F alse]), a dipole correction along the non-p erio dic direction is applied using the slab-sp ecific formalism of Bengtsson [Phys. Rev. B 59, 12301 (1999)]. 40 G. F ukui functions and conceptual DFT This section provides the theoretical basis for the F ukui charge equilibration scheme (Equations 13 and 14 in the main text) b y connecting it to the framework of conceptual density functional theory [115, 116]. In conceptual DFT, the F ukui function f ( r ) describ es the local response of the electron density to a change in the total num ber of electrons N at fixed external p otential v ( r ) : f ( r ) =  ∂ ρ ( r ) ∂ N  v ( r ) (83) The F ukui function iden tifies reactive sites in molecules: regions where f ( r ) is large are preferential sites for nucleophilic or electrophilic attack, dep ending on whether electrons are b eing added or remov ed. In our formalism, the c harge and spin densities are expanded in atom-centred multipoles (Equations 6–8), sub ject to global normalisation constrain ts: Z s ( r ) d r = S, Z ρ ( r ) d r = Q = ⇒ S = X i  p ↑ i, 00 − p ↓ i, 00  , Q = X i  p ↑ i, 00 + p ↓ i, 00  (84) where Q is the total c harge, S is the total spin, and p ↑↓ i, 00 are the spin-up and spin-do wn monop ole co efficien ts (atomic c harges) for atom i . F ollowing the conceptual DFT framew ork, we define an atom-condensed F ukui function f i,Q that describ es how the partial charge on atom i resp onds to a c hange in the total charge Q . In order to in tegrate the spin multiplicit y , w e split the total charge into tw o quantities, the spin-up total charge Q ↑ = Q + S 2 and the spin- do wn total charge Q ↓ = Q − S 2 , and combine them in the arra y Q ↑↓ = ( Q ↑ , Q ↓ ) following the conv ention in the main text. Integrating the F ukui function o v er some atom-cen tred partitioning Ω i and applying the chain rule through the c hemical p otential µ : f ↑ i,Q ↑ = Z Ω i ∂ ρ ( r ) ↑ ∂ Q ↑ d r = ∂ p ↑ i, 00 ∂ Q ↑ = ∂ p ↑ i, 00 ∂ µ ∂ Q ↑ ∂ µ (85) f ↓ i,Q ↓ = Z Ω i ∂ ρ ( r ) ↓ ∂ Q ↓ d r = ∂ p ↓ i, 00 ∂ Q ↓ = ∂ p ↓ i, 00 ∂ µ ∂ Q ↓ ∂ µ (86) The second equalit y in b oth 85 and 86 uses the fact that the monop ole co efficien t p ↑↓ i, 00 represen ts the integrated spin-c harge on atom i for each spin channel, while the third equality applies the chain rule. W e define the unnormalised F ukui co efficien t f ( u ) , ↑↓ i as the sensitivity of the atomic monop ole to the chemical p oten tial: f ( u ) , ↑ i = ∂ p ( u ) , ↑ i, 00 ∂ µ , f ( u ) , ↓ i = ∂ p ( u ) , ↓ i, 00 ∂ µ , f ( u ) , ↑↓ i = ( f ( u ) , ↑ i , f ( u ) , ↓ i ) (87) In classical electronegativity equalisation models, this quantit y is related to the atomic softness [117, 118]. In our mo del, f ( u ) , ↑↓ i is predicted by a neural netw ork and can dep end on b oth lo cal geometry and non-lo cal electrostatic en vironment for u > 0 . The total charge constraint requires: Q ↑↓ = X i p ↑↓ i, 00 = ⇒ ∂ Q ↑↓ ∂ µ = X i ∂ p ↑↓ i, 00 ∂ µ = X i f ( u ) , ↑↓ i . (88) This sho ws that the global softness (deriv ativ e of total c harge with resp ect to c hemical p oten tial) is the sum of atomic softnesses. T o enforce the charge constrain t, we compute how muc h the curren t predicted charges deviate from the target spin-resolved total charges Q ↑↓ and redistribute this error according to the F ukui coefficients. Using a first-order T a ylor expansion: ∆ p ( u ) , ↑↓ i, 00 ≈ ∂ p ( u ) , ↑↓ i, 00 ∂ Q ↑↓ ∆ Q ↑↓ = f ( u ) , ↑↓ i P j f ( u ) , ↑↓ j ∆ Q ↑↓ (89) where ∆ Q ↑↓ = Q ↑↓ target − P j p ( u ) , ↑↓ j, 00 is the spin-charge deficit or surplus. This yields the equilibration formula used in Equations 13 and 14, with the normalised spin-resolved F ukui functions f ( u ) , ↑ i / P j f ( u ) , ↑ j and f ( u ) , ↓ i / P j f ( u ) , ↓ j acting as learnable weigh ts that distribute charge corrections across atoms according to their c hemical softness. 41 -1 0 1 3B-69 -3 0 5 3BHET -4 0 4 A24 -0.36 0 0.82 ADIM6 -10 0 4 AHB21 -5 0 7 Bauza30 - 1 0 3 0 1 0 3 BzDC215 -5 0 2 CARBHB8 -3 0 5 CHB6 -2 0 2 CT20 -1 0 0.74 DS14 -14 0 16 FmH2O10 -60 0 60 HB49 -1 0 3 HB262 -6 0 5 HCP32 - 1 0 3 0 1 0 3 He3 -17 0 8 HEA VY28 -2 0 1 HSG -1 0 1 HW30 -3 0 6 HW6Cl5 -1 1 0 8 HW6F -45 0 29 IHB100 -46 0 30 IHB100x2 -13 0 9 IL16 -2 0 2 NBC10 -77 0 25 NC1 1 - 1 0 3 0 1 0 3 O24 - 1 0 5 0 1 0 5 O24x4 -5 0 1 PNICO23 - 1 0 7 0 1 0 7 RG10N - 1 0 3 0 1 0 3 RG18 -2 0 1 S22 -2 0 2 S66 -5 0 12 Shields38 -3 0 6 SW49Bind22 -87 0 1 0 3 T A13 -31 0 12 W A TER27 -1 0 2 X40 -9 0 9 X40x5 -0.74 0 3 XB8 -6 0 4 XB20 GSCDB138: Noncovalent Interaction wB97M-V g-xTB UMA-M-1P1 UMA-S-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 15. Distribution of signed interaction-energy errors (k cal mol − 1 ) across non-co v alen t benchmarks. Bo xplots compare ω B97M-V, r2SCAN, UMA-M-1P1 , UMA-S-1P1 , MACE-POLAR-1-L , and MACE-POLAR-1-M . Panels use symmetric log scaling only when outliers require it; y-axis tic ks are placed at the low er b ound, 0 (if within range), and the upper b ound. 42 -10 0 8 BH28 -20 0 19 BH46 -40 0 45 BH876 -7 0 22 BHDIV7 -9 0 14 BHPERI1 1 -4 0 2 BHROT27 -10 0 2 CRBH14 -15 0 19 DBH22 -26 0 34 INV23 -24 0 33 ORBH35 -1 1 0 13 PX9 -9 0 10 WCPT26 GSCDB138: Reaction Barriers wB97M-V g-xTB UMA-M-1P1 UMA-S-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 16. Distribution of signed errors (kcal mol − 1 ) for the GSCDB138 reaction-barrier b enchmarks. Each panel sho ws a dataset; b o xplots compare six metho ds ( ω B97M-V, r2SCAN, UMA-M-1P1 , UMA-S-1P1 , MA CE-POLAR-1-L , MACE-POLAR-1-M ). Boxes denote the in terquartile range with median; whisk ers extend to 1.5 × IQR; outliers are small, colour-matched dots. The y-axis uses a symmetric logarithmic scale when extreme outliers are present; ma jor ticks are placed at the low er b ound, 0 (if within range), and the upp er b ound. 43 -13 0 8 AL2X6 - 1 0 3 0 76 ALK8 - 1 0 2 0 1 0 2 AlkAtom19 -46 0 1 0 2 ALKBDE10 -5 0 0.46 AlkIsod14 -16 0 30 BDE99MR -45 0 30 BDE99nonMR -16 0 21 BH76RC -6 0 0.6 BSR36 -14 0 5 CR20 -4 0 4 DARC - 1 0 4 0 1 0 4 DC13 - 1 0 3 0 1 0 2 DIPCS9 - 1 0 3 0 1 0 3 EA50 -8 0 26 FH51 -48 0 20 G21EA - 1 0 3 0 81 G21IP -31 0 24 G2RC24 -51 0 58 HA T707MR -52 0 71 HA T707nonMR -33 0 19 HEA VYSB1 1 - 1 0 2 0 1 0 2 HNBrBDE18 - 1 0 3 0 100 IP23 - 1 0 3 0 1 0 2 IP30 -65 0 76 MB08-165 - 1 0 2 0 1 0 2 MB16-43 - 1 0 2 0 1 0 3 MX34 -6 0 13 NBPRC - 1 0 5 0 1 0 5 P34 - 1 0 2 0 1 0 2 P A26 -21 0 47 PlatonicHD6 -24 0 37 PlatonicID6 -50 0 71 PlatonicIG6 - 1 0 3 0 1 0 3 PlatonicT AE6 - 1 0 2 0 1 0 2 RC21 -74 0 74 RSE43 - 1 0 3 0 1 0 3 SIE4x4 -9 0 8 SN13 -4 0 2 WCPT6 -24 0 12 YBDE18 GSCDB138: Thermochemistry wB97M-V g-xTB UMA-M-1P1 UMA-S-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 17. Distribution of signed thermo c hemical energy errors (kcal mol − 1 ) for GSCDB138. T otal atomisation energy (T AE) datasets (e.g., PlatonicT AE6 and W4-17 T AE subsets) are excluded by design. 44 - 1 0 7 0 1 0 7 CUAGAU83 - 1 0 2 0 1 0 2 DAPD -37 0 42 MME52 -24 0 21 MOBH28 -44 0 29 ROST61 -61 0 77 TMD10 -23 0 44 MOR13 -13 0 18 TMB1 1 GSCDB138: T ransition Metals wB97M-V g-xTB UMA-M-1P1 UMA-S-1P1 MACE-OMOL ORBMOL MACE-POLAR-1-M MACE-POLAR-1-L FIG. 18. Distribution of signed errors (kcal mol − 1 ) for transition-metal datasets in GSCDB138, comparing ω B97M-V, r2SCAN, UMA-M-1P1 , UMA-S-1P1 , MACE-POLAR-1-L , and MACE-POLAR-1-M . 45 T ABLE VII. Calculated densities of organic liquids. Experimental temp eratures and densities for each liquid are also given in the table. NPT simulations were p erformed on systems of around 1000 atoms, with 500 ps equilibration and 500 ps production runs. Liquid Exp. T, K Exp. density , g cm − 3 MACE-OMOL MACE-POLAR-1-M MA CE-POLAR-1-L UMA-S-1P1 1,2-dichloroethane 303.0 1.245 1.354 1.370 1.333 1.369 1,2-dimethoxy-ethane 303.0 0.864 0.983 0.963 0.971 0.956 o-xylene 278.0 0.880 1.031 0.965 0.978 0.984 1,3-dioxolane 298.0 1.060 1.166 1.169 1.150 1.176 1,4-dioxane 298.0 1.034 1.168 1.134 1.147 1.160 1-butanol 298.0 0.809 0.916 0.901 0.891 0.894 1-octanol 298.0 0.826 0.931 0.917 0.904 0.919 1-propanol 303.0 0.800 0.908 0.890 0.874 0.881 2-butanone 303.0 0.800 0.936 0.895 0.849 0.895 t-butyl-alcohol 298.0 0.789 0.894 0.865 0.864 0.876 Fluoroethylene carb onate 298.0 1.485 1.616 1.612 1.468 1.600 N,N-dimethylformamide 303.0 0.945 1.044 1.028 1.026 1.031 N-methyl-2-p yrrolidone 303.0 1.023 1.130 1.115 1.108 1.111 acetic acid 303.0 1.045 1.198 1.187 1.056 1.170 acetone 303.0 0.784 0.930 0.872 0.837 0.879 acetonitrile 298.0 0.786 0.623 0.863 0.782 0.873 acetophenone 298.0 1.028 1.166 1.113 1.075 1.117 benzamide 408.0 1.079 1.192 1.143 1.106 1.167 benzene 298.0 0.876 0.994 0.954 0.968 0.961 benzenethiol 298.0 1.077 1.204 1.132 1.149 1.161 bromobenzene 298.0 1.495 1.675 1.592 1.579 1.621 carbon tetrachloride 298.0 1.594 1.672 1.611 1.669 1.746 chlorobenzene 298.0 1.106 1.242 1.197 1.187 1.221 chloroform 303.0 1.479 1.565 1.536 1.534 1.595 cyclohexane 303.0 0.774 0.916 0.883 0.889 0.888 dibromomethane 298.0 2.497 2.668 2.693 2.562 2.668 dichloromethane 298.0 1.327 1.399 1.438 1.418 1.440 diethyl carbonate 303.0 0.969 1.113 1.082 1.060 1.079 diethyl ether 298.0 0.714 0.841 0.801 0.815 0.809 diethylene glycol 293.0 1.120 1.230 1.213 1.198 1.206 diglyme 298.0 0.943 1.062 1.046 1.052 1.049 dimethyl carbonate 303.0 1.064 1.211 1.194 1.180 1.183 dimethyl sulfide 298.0 0.848 0.956 0.916 0.927 0.929 dimethyl sulfoxide 303.0 1.101 1.205 1.172 1.217 1.193 ethanol 298.0 0.789 0.898 0.880 0.861 0.873 ethyl acetate 298.0 0.900 1.071 1.036 1.002 1.038 ethylene carbonate 317.0 1.321 1.434 1.427 1.361 1.429 ethylene glycol 298.0 1.113 1.212 1.201 1.174 1.191 ethylmeth yl carb onate 298.0 1.012 1.156 1.128 1.110 1.119 fluorobenzene 298.0 1.022 1.185 1.091 1.087 1.129 formaldehyde 258.0 0.815 0.958 0.997 1.058 0.983 glycerol 298.0 1.261 1.346 1.328 1.317 1.325 hexamethylphosphoramide 298.0 1.030 1.152 1.114 1.150 1.116 hexane 298.0 0.661 0.773 0.742 0.741 0.749 methanethiol 298.0 0.867 0.997 0.925 0.907 0.961 methanol 298.0 0.791 0.896 0.884 0.874 0.870 anisole 298.0 0.994 1.120 1.069 1.079 1.080 methyl acetate 298.0 0.934 1.111 1.080 1.043 1.073 methyl benzoate 303.0 1.084 1.230 1.175 1.139 1.184 methyl t-butyl ether 303.0 0.735 0.865 0.822 0.843 0.839 morpholine 298.0 1.000 1.117 1.091 1.105 1.096 nitromethane 298.0 1.137 1.326 1.188 1.148 1.218 nitrobenzene 298.0 1.204 1.310 1.273 1.255 1.286 propylene carbonate 298.0 1.205 1.324 1.295 1.240 1.312 pyridine 298.0 0.982 1.090 1.073 1.066 1.068 quinoline 293.0 1.098 1.216 1.182 1.171 1.185 tetrahydrofuran 303.0 0.883 1.000 0.973 0.967 0.993 thioanisole 298.0 1.058 1.180 1.123 1.134 1.144 thiophene 298.0 1.065 1.158 1.111 1.085 1.175 toluene 298.0 0.867 0.999 0.926 0.944 0.947 triethylamine 298.0 0.728 0.853 0.817 0.836 0.830 vinylene carbonate 303.0 1.350 1.522 1.412 1.386 1.495 46 3.5 4.0 4.5 Small rigid molecules Carbamazepine family F enamate family Small drug molecules UMA -S-1P1 UMA -M-1P1 MA CE- OMOL ORBMOL MA CE-POL AR -1-M MA CE-POL AR -1-L Cor onene A cridine Phthalimide Saccharin 5-Fluor ouracil Carbamazepine 10,11-Dihydr ocarbamazepine Cyheptamide Cytenamide Ox carbazepine F enamic acid Mefenamic acid T olfenamic acid Flufenamic acid Niflumic acid Chalcone Ibupr ofen Napr o x en Desloratadine Sulfamerazine 0.0 0.2 0.4 0.6 0.8 1.0 R elative MAE (kcal/mol) FIG. 19. Relative lattice energy errors for CPOSS209 molecular crystals. Bar heights show mean absolute errors in kcal/mol for predicted relative lattice energies (p olymorph energy differences), group ed by molecular family . The dataset comprises 209 experimental and predicted polymorphs from 20 small drug molecules. Reference calculations combine ω B97M- D3 crystal-phase energies with 1-b ody CCSD(T) corrections. Low er v alues indicate b etter accuracy . 47 T ABLE VII I: Benchmark datasets used in this w ork. Category Dataset Size (struc- tures) Description Reference level Citation Barrier heights (GSCDB) BH28 (GSCDB) 28 Barrier heigh ts subset CCSD(T)/CBS [59, 119] BH46 (GSCDB) 46 Barrier heigh ts CCSD(T)/CBS [59, 105, 120, 121] BH876 (GSCDB) 876 Barrier heigh ts CCSD(T)/CBS [59, 106] BHDIV7 (GSCDB) 7 Barrier heights CCSD(T)/CBS [59, 60] BHPERI11 (GSCDB) 11 P ericyclic barriers CCSD(T)/CBS [59, 114] BHR OT27 (GSCDB) 27 Rotational barriers CCSD(T)/CBS [59, 60] CRBH14 (GSCDB) 14 Cyclorev ersion barriers CCSD(T)/CBS [59, 107] DBH22 (GSCDB) 22 BH76 subset CCSD(T)/CBS [59, 108, 122] INV23 (GSCDB) 23 In v ersion/racemisation barriers CCSD(T)/CBS [59, 123] ORBH35 (GSCDB) 35 Oxygen reaction barriers CCSD(T)/CBS [59, 100] PX9 (GSCDB) 9 Proton-exchange barriers CCSD(T)/CBS [59, 109] W CPT26 (GSCDB) 26 W ater-catalyzed proton transfer CCSD(T)/CBS [59, 110] Electric field props (GSCDB) Dip146 (GSCDB) 190 Dipole momen ts CCSD(T)/CBS [59, 124] HR46 (GSCDB) 128 P olarizabilities CCSD(T)/CBS [59, 125] Isomerisation (GSCDB) AlkIsomer11 (GSCDB) 11 Alk ane isomerisation CCSD(T)/CBS [59, 126] C20C246 (GSCDB) 6 C20/C24 isomers CCSD(T)/CBS [59, 127] C60ISO7 (GSCDB) 7 C60 isomers CCSD(T)/CBS [59, 128] DIE60 (GSCDB) 60 Conjugated diene migration CCSD(T)/CBS [59, 129] EIE22 (GSCDB) 22 Enecarb onyl isomers CCSD(T)/CBS [59, 130] ISO34 (GSCDB) 34 Small/medium isomers CCSD(T)/CBS [59, 131] ISOL23 (GSCDB) 23 Large organic isomers CCSD(T)/CBS [59, 132] ISOMERIZA TION20 (GSCDB) 20 W4-11 isomerisation CCSD(T)/CBS [59, 133] P Arel (GSCDB) 20 Protonated isomers CCSD(T)/CBS [59, 60] St yrene42 (GSCDB) 42 C 8 H 8 isomers CCSD(T)/CBS [59, 134] T AUT15 (GSCDB) 15 T automer isomerisation CCSD(T)/CBS [59, 60] In tramolecular NCI (GSCDB) A CONF (GSCDB) 15 Alk ane conformers CCSD(T)/CBS [59, 135] Amino20x4 (GSCDB) 80 Amino-acid conformers CCSD(T)/CBS [59, 136] BUT14DIOL (GSCDB) 64 Butane-1,4-diol confs CCSD(T)/CBS [59, 137] ICONF (GSCDB) 17 Inorganic conformers CCSD(T)/CBS [59, 60] IDISP (GSCDB) 6 Intramolecular disp ersion CCSD(T)/CBS [59, 114] MCONF (GSCDB) 51 Melatonin conformers CCSD(T)/CBS [59, 138] PCONF21 (GSCDB) 18 Peptide conformers CCSD(T)/CBS [59, 139, 140] P entane13 (GSCDB) 13 n-Pen tane torsions CCSD(T)/CBS [59, 141] SCONF (GSCDB) 17 Sugar conformers CCSD(T)/CBS [59, 142] UPU23 (GSCDB) 23 RNA-backbone conformers CCSD(T)/CBS [59, 143] Nonco v alen t (GSCDB) 3B-69 (GSCDB) 69 Three-b o dy NCIs CCSD(T)/CBS [59, 144] 3BHET (GSCDB) 20 Three-bo dy hetero NCIs CCSD(T)/CBS [59, 145] A19Rel6 (GSCDB) 114 A24 PEC relatives CCSD(T)/CBS [59, 146] A24 (GSCDB) 24 Small NCI dimers CCSD(T)/CBS [59, 147] ADIM6 (GSCDB) 6 Alk ane dimers CCSD(T)/CBS [59, 148] AHB21 (GSCDB) 21 Anion-neutral dimers CCSD(T)/CBS [59, 149] Bauza30 (GSCDB) 30 Halogen/chalcogen/pnicogen dimers CCSD(T)/CBS [59, 150] BzDC215 (GSCDB) 215 Benzene dimer PECs CCSD(T)/CBS [59, 151] CARBHB8 (GSCDB) 8 Carbene–X h ydrogen b onds CCSD(T)/CBS [59, 60] CHB6 (GSCDB) 6 Cation-neutral dimers CCSD(T)/CBS [59, 149] CT20 (GSCDB) 20 Charge transfer CCSD(T)/CBS [59, 152] DS14 (GSCDB) 14 Div alen t sulfur NCIs CCSD(T)/CBS [59, 153] F mH2O10 (GSCDB) 10 F − (H 2 O) 10 isomers CCSD(T)/CBS [59, 154] Con tinued on next page 48 Category Dataset Size (struc- tures) Description Reference level Citation H2O16Rel4 (GSCDB) 4 (H 2 O) 16 conformers CCSD(T)/CBS [59, 155] H2O20Rel9 (GSCDB) 9 (H 2 O) 20 conformers CCSD(T)/CBS [59, 156] HB49 (GSCDB) 49 Hydrogen bonds CCSD(T)/CBS [59, 157] HB262 (GSCDB) 262 Hydrogen bonds CCSD(T)/CBS [59, 158] HCP32 (GSCDB) 32 Halogen/chalcogen/pnicogen CCSD(T)/CBS [59, 159] He3 (GSCDB) 49 He trimer 3-b ody NCIs CCSD(T)/CBS [59, 160] HEA VY28 (GSCDB) 28 Heavy-elemen t NCIs CCSD(T)/CBS [59, 148] HSG (GSCDB) 21 Protein-ligand fragments CCSD(T)/CBS [59, 161] HW30 (GSCDB) 30 Hydrocarb on–water dimers CCSD(T)/CBS [59, 162] HW6Cl5 (GSCDB) 5 Cl − –w ater clusters CCSD(T)/CBS [59, 154] HW6F (GSCDB) 6 F − –w ater clusters CCSD(T)/CBS [59, 154] IHB100 (GSCDB) 100 Ionic hydrogen b onds CCSD(T)/CBS [59, 158] IHB100x2 (GSCDB) 200 Ionic H-bond PECs CCSD(T)/CBS [59, 158] IL16 (GSCDB) 16 Ionic liquids CCSD(T)/CBS [59, 149] NBC10 (GSCDB) 184 NCI PECs (b enzene etc.) CCSD(T)/CBS [59, 163, 164] NC11 (GSCDB) 11 Small NCIs CCSD(T)/CBS [59, 165] O24 (GSCDB) 24 Op en-shell dimers CCSD(T)/CBS [59, 166] O24x4 (GSCDB) 96 Op en-shell PECs CCSD(T)/CBS [59, 166] PNICO23 (GSCDB) 23 Pnicogen NCIs CCSD(T)/CBS [59, 167] R G10N (GSCDB) 275 Rare-gas dimer PECs CCSD(T)/CBS [59, 168] R G18 (GSCDB) 18 Rare-gas complexes CCSD(T)/CBS [59, 60] S22 (GSCDB) 22 NCI dimers CCSD(T)/CBS [59, 169] S66 (GSCDB) 66 NCI dimers CCSD(T)/CBS [59, 170] S66Rel7 (GSCDB) 462 PECs for S66 CCSD(T)/CBS [59, 170] Shields38 (GSCDB) 38 W ater clusters (H 2 O) n CCSD(T)/CBS [59, 171] SW49Bind22 (GSCDB) 22 SO 2 − 4 (H 2 O) n binding CCSD(T)/CBS [59, 172] SW49Rel28 (GSCDB) 28 SO 2 − 4 (H 2 O) n conformers CCSD(T)/CBS [59, 172] T A13 (GSCDB) 13 Radical dimers CCSD(T)/CBS [59, 173] W A TER27 (GSCDB) 27 W ater clusters/ions CCSD(T)/CBS [59, 84, 174] X40 (GSCDB) 40 Halogenated complexes CCSD(T)/CBS [59, 175] X40x5 (GSCDB) 200 Halogen PECs CCSD(T)/CBS [59, 175] XB20 (GSCDB) 20 Halogen-b onded dimers CCSD(T)/CBS [59, 176] Thermo c hemistry (GSCDB) AE11 (GSCDB) 11 Atomic energies (Ar–Rn) CCSD(T)/CBS [59, 177] AE18 (GSCDB) 18 Atomic energies (H–Ar) CCSD(T)/CBS [59, 178] AL2X6 (GSCDB) 6 AlX 3 dimerisation CCSD(T)/CBS [59, 179] ALK8 (GSCDB) 8 Alk aline reactions CCSD(T)/CBS [59, 60] AlkA tom19 (GSCDB) 19 Alk ane atomisation CCSD(T)/CBS [59, 126] ALKBDE10 (GSCDB) 10 Group 1/2 diatomic BDEs CCSD(T)/CBS [59, 180] AlkIso d14 (GSCDB) 14 Alk ane isodesmic CCSD(T)/CBS [59, 126] BDE99MR (GSCDB) 16 W4-11 MR BDEs CCSD(T)/CBS [59, 99] BDE99nonMR (GSCDB) 83 W4-11 SR BDEs CCSD(T)/CBS [59, 99] BH76R C (GSCDB) 30 BH76 reaction energies CCSD(T)/CBS [59, 120, 121] BSR36 (GSCDB) 36 Hydro carb on b ond separations CCSD(T)/CBS [59, 104, 181] CR20 (GSCDB) 20 Cycloreversion energies CCSD(T)/CBS [59, 113] D ARC (GSCDB) 14 Diels–Alder energies CCSD(T)/CBS [59, 179] DC13 (GSCDB) 13 Difficult cases CCSD(T)/CBS [59, 60] DIPCS9 (GSCDB) 9 Double ionisation potentials CCSD(T)/CBS [59, 60] EA50 (GSCDB) 50 Electron affinities CCSD(T)/CBS [59, 97] FH51 (GSCDB) 51 Reaction energies CCSD(T)/CBS [59, 111, 182] G21EA (GSCDB) 25 Electron affinities CCSD(T)/CBS [59, 94] G21IP (GSCDB) 36 Ionisation potentials CCSD(T)/CBS [59, 94] Con tinued on next page 49 Category Dataset Size (struc- tures) Description Reference level Citation G2R C24 (GSCDB) 24 G2/97 reaction energies CCSD(T)/CBS [59, 112] HA T707MR (GSCDB) 202 Hea vy-atom transfer (MR) CCSD(T)/CBS [59, 102] HA T707nonMR (GSCDB) 505 Hea vy-atom transfer (SR) CCSD(T)/CBS [59, 102] HEA VYSB11 (GSCDB) 11 Heavy-elemen t disso ciation CCSD(T)/CBS [59, 60] HNBrBDE18 (GSCDB) 18 N–Br BDEs CCSD(T)/CBS [59, 100] IP23 (GSCDB) 23 V ertical ionisation p otentials CCSD(T)/CBS [59, 95] IP30 (GSCDB) 30 V ertical ionisation p otentials CCSD(T)/CBS [59, 96] MB08-165 (GSCDB) 165 Mindless molecules CCSD(T)/CBS [59, 98] MB16-43 (GSCDB) 43 Mindless molecules CCSD(T)/CBS [59, 60] MX34 (GSCDB) 34 Ionic cluster atomisation CCSD(T)/CBS [59, 183] NBPR C (GSCDB) 12 Oligomerisation/fragmentation CCSD(T)/CBS [59, 114, 184] P34AE (GSCDB) 44 P-blo ck atomisation CCSD(T)/CBS [59, 185] P34EA (GSCDB) 9 P-blo ck electron affinities CCSD(T)/CBS [59, 185] P34IP (GSCDB) 15 P-blo ck ionisation potentials CCSD(T)/CBS [59, 185] P A26 (GSCDB) 26 Proton affinities CCSD(T)/CBS [59, 60] PlatonicRE18 (GSCDB) 18 Platonic reaction energies CCSD(T)/CBS [59, 186] PlatonicT AE6 (GSCDB) 6 Platonic atomisation energies CCSD(T)/CBS [59, 186] R C21 (GSCDB) 21 Radical cation reactions CCSD(T)/CBS [59, 187] RSE43 (GSCDB) 43 Radical stabilisation CCSD(T)/CBS [59, 103, 188] T ransition met- als (GSCDB) 3dTMV rep orted in Ref. [70] 3d-metal v ertical ionisation ener- gies ph-AF QMC [59, 70] 3d4dIPSS (GSCDB) 32 TM ionisation p oten tials CCSD(T) [59, 189] CUA GAU83 (GSCDB) 83 Coinage complexes CCSD(T) [59, 190] D APD (GSCDB) 12 Pd diatomics CCSD(T) [59, 191] MME52 (GSCDB) 52 Metalloenzymes DLPNO- CCSD(T) [59, 68, 170] MOBH28 (GSCDB) 28 Organometallic barriers CCSD(T) [59, 192] R OST61 (GSCDB) 61 Op en-shell reactions CCSD(T) [59, 193] TMD10 (GSCDB) 10 TM diatomics CCSD(T) [59, 194] MOR13 (GSCDB) 13 Closed-shell TM reactions CCSD(T) [59, 194] TMB11 (GSCDB) 11 TM barriers CCSD(T) [59, 194] Conformers (ex- ternal) 37CONF8 296 Small organic conformers CCSD(T)/CBS [71] A CONFL rep orted in Ref. [72] n-alk ane conformers CCSD(T)/CBS [72, 73] DipConfS reported in Ref. [74] Amino-acid and dip eptide con- formers CCSD(T)/CBS [74] Maltose222 222 Carbohydrate conformers CCSD(T)/CBS [75] MPCONF196 196 Medicinal-chemistry conformers CCSD(T)/CBS [76, 77] Op enFF-T ors rep orted in Ref. [78] Drug-lik e torsional profiles CCSD(T)/CBS [78] UPU46 46 RNA bac kb one conformers CCSD(T)/CBS [79] Non-co v alen t in- teractions S30L 30 Host–guest supramolecular DLPNO- CCSD(T) [62] IHB100x10 1000 Ionic hydrogen-bond disso cia- tion curves (NCI A tlas) CCSD(T)/CBS [195] PLA15 15 Protein active sites MP2-F12 + DLPNO- CCSD(T) [67] Con tinued on next page 50 Category Dataset Size (struc- tures) Description Reference level Citation PLF547 547 Protein–fragment in teractions MP2-F12/cc- pVDZ-F12 + DLPNO- CCSD(T) [67] QUID 170 Protein-po ck et dimers LNO-CCSD(T) [66] Alk ali-halide PECs (LiCl/NaCl/KBr) 3 PECs Disso ciation curves DFT ( ω B97M- V) This work Molecular Crys- tals X23-DMC 23 Molecular crystal lattice energies DMC [64] CPOSS209 209 Molecular crystal lattice energies ω B97M-V+ 1- b ody CCSD(T) correction [65], This w ork Liquids & W ater W ater densit y/RDF 333 Liquid w ater (NPT MD) Exp erimen t tar- gets [85] Organic liquids densi- ties 62 Exp erimen tal densities (NPT MD) Exp erimen t tar- gets [80] Solv ation / Re- do x F e/Cl aqueous redox clusters tra j. (ps) Solv ated F e/Cl chlorides DFT [22] Solv ated Cl 2 /Cl − dis- so ciation tra j. (ps) Charge-localisation test in w ater clusters DFT ( ω B97M- V) This work W ater-cluster disso ci- ation (charge trans- fer) scan of separa- tions F ragment-c harge lo calisation during disso ciation DFT ( ω B97M- V) This work Hydrated TM ionisa- tion (M-W6/M-W18) 14 TM ionisation in w ater DLPNO- CCSD(T) [89] Redo x Poten tials in solution tra j. (ps) Solv ated F e/Co chlorides DFT [22] Lan thanides Lanthanide isomers 18 Lanthanide complex isomers r2SCAN-3c [92]

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment