A Comparative Investigation of Thermodynamic Structure-Informed Neural Networks
Physics-informed neural networks (PINNs) offer a unified framework for solving both forward and inverse problems of differential equations, yet their performance and physical consistency strongly depend on how governing laws are incorporated. In this…
Authors: Guojie Li, Liu Hong
A Comparativ e In v estigation of Thermo dynamic Structure-Informed Neural Net w orks Guo jie Li 1 and Liu Hong 1* 1 Sc ho ol of Mathematics, Sun Y at-sen Univ ersity , Guangzhou, 510275, China. *Corresp onding author(s). E-mail(s): hongliu@sysu.edu.cn ; Abstract Ph ysics-informed neural net works (PINNs) offer a unified framework for solving b oth forw ard and inv erse problems of differen tial equations, yet their p erformance and physical consistency strongly depend on ho w go verning la ws are incorp orated. In this work, w e present a systematic comparison of differen t thermo dynamic structure-informed neural net works b y incorp orating v arious thermodynamics form ulations, including Newtonian, Lagrangian, and Hamiltonian mechanics for conserv ativ e systems, as well as the Onsager v ariational principle and extended irreversible thermodynamics for dissipative systems. Through compre- hensiv e n umerical exp erimen ts on represen tative ordinary and partial differential equations, we quantitativ ely ev aluate the impact of these formulations on accu- racy , ph ysical consistency , noise robustness, and in terpretability . The results sho w that Newtonian-residual-based PINNs can reconstruct system states but fail to reliably reco v er key physical and thermodynamic quan tities, whereas structure-preserving formulation significantly enhances parameter identification, thermo dynamic consistency , and robustness. These findings provide practical guidance for principled design of thermo dynamics-consistency mo del, and lay the groundw ork for integrating mo re general nonequilibrium thermo dynamic structures into physics-informed machine learning. Keyw ords: Physics-Informed Neural Net works, In v erse Problem, Thermo dynamics F ormalism. 1 1 In tro duction Learning dynamical systems whose solutions are consistent with data is an imp ortant task in scien tific machine learning and has receiv ed widespread attention [ 1 – 3 ]. One notable category of works is the physics-informed mac hine learning (PIML). The sem- inal contribution on this sub ject dates bac k to the earlier study of Lagaris [ 4 ] and Owhadi [ 5 ], and is no w referred to as the physics-informed neural netw orks (PINNs) [ 6 ], which utilize the high expressibilit y of neural netw orks to approximate the solu- tion of a differential equation. The PINNs mo del incorp orates the go verning equation in to the netw ork loss function, thereb y transforming the problem of inferring solutions and identifying the unkno wn parameters of PDEs into an optimization problem of the loss function, whic h means it can solve the forward and inv erse problems sim ultane- ously . These characteristics hav e made it a fo cus of attention in the field of scientific mac hine learning [ 7 , 8 ]. Mean while, n umerous issues of PINNs ha ve b een identified, such as their inability to conv erge to the right solution [ 9 – 12 ], the high time burden of training [ 13 , 14 ], the violation of ph ysical principles [ 15 ], etc. Especially to wards the last issue, numerous efforts ha ve b een made b y either inserting explicit physical la ws, like the mass/- momen tum/energy conserv ation and entrop y dissipation, into the loss function, or reform ulating the differe n tial equations into a more physical meaning structure to mak e the learning procedure more stable and the learned results more accurate [ 16 , 17 ]. A systematic wa y to incorp orate such physical structures is pro vided by the thermo dynamic formalism, which offers a unified description of b oth conserv a- tiv e and dissipative dynamical systems. Conserv ative systems can b e equiv alen tly form ulated through Newtonian, Lagrangian, and Hamiltonian mec hanics [ 18 – 20 ], emphasizing force balance, v ariational principles, and energy-preserving symplectic structures, respectively . Ho wev er, man y realistic systems are inherently dissipativ e and require a thermo dynamically consistent treatment, which is captured by formalisms suc h as the Onsager v ariational principle [ 21 ], Classical and Extended Irreversible Thermo dynamics (CIT/EIT), the GENERIC framework [ 22 ], and Conserv ation- Dissipation F ormalism (CDF) [ 23 ]. These approac hes describ e irreversible dynamics through entrop y pro duction, dissipation potentials, or coupled rev ersible-irreversible structures, thereby imp osing intrinsic constraints on admission system ev olution. Em b edding these thermo dynamic structures into PINNs provides a principled mech- anism to constrain the solution space and impro ve the consistency and reliability of sim ultaneously solving forw ard and in verse problems. Ho wev er, these previous w orks are based solely on conserv ation laws or dissipative structures to solve the forward and inv erse problems of differential equations. T o the b est of our knowledge, there is no quantitativ e assessment on the impact of combin- ing the PINNs model with differen t thermodynamics formalism on the performance of solving forward and inv erse problems simultaneously . This constitutes the main motiv ation for the current study . The remainder of this pap er is organized as follows. Section 2 reviews the funda- men tals of PINNs, and the detailed formulation of v arious prop osed thermo dynamic- informed neural netw orks, including Lagrangian and Hamiltonian mechanics informed neural net works for conserv ative systems, Onsager’s v ariational principle, and 2 extended irreversible thermodynamics informed neural netw orks for dissipativ e sys- tems. Section 3 presen ts n umerical experiments comparing the p erformance of differen t thermo dynamic PINN v arian ts, including the ideal mass-spring oscillator, simple p en- dulum, and double p endulum as represen tativ e conserv ativ e systems, as w ell as the damp ed p endulum, diffusion equation, and Fisher-Kolmogorov equation as the dissipa- tiv e examples. F urthermore, w e inv estigate the generalization capabilities of different thermo dynamics formalism-informed PINNs mo dels from the p ersp ectiv e of the loss landscap e, since the primary distinction among these mo dels lies in their loss functions. Finally , Section 4 summarizes the main findings and discusses p oten tial directions for future research. 2 Metho ds 2.1 Problem setup T o ev aluate the n umerical adv antages of em b edding different forms of dynamic equations of the same physical system (including b oth non-dissipativ e and dissipa- tiv e systems) into PINNs mo dels, we constructed multiple mo dels incorp orating these thermo dynamic formalisms in to neural netw orks. Our comparative analyses aim to include b oth forw ard and inv erse problems: i.e., the ability of the mo del to learn the solutions to the equations and to infer the unkno wn parameters in the equations. The unique framework of the PINNs mo del enables the simultaneous solution of both for- w ard and inv erse problems asso ciated with a dynamic equation, given partial data and knowledge of the equations. This capability stands out as a key adv an tage of the PINNs mo del. F or a system of parameterized nonlinear differential equations of the general form u t + F [ u , λ ] = 0 , ∀ t ∈ [0 , T ] , ∀ x ∈ Ω , (1) I [ u ](0 , x ) = 0 , ∀ x ∈ Ω , B [ u ]( t, x ) = 0 , ∀ t ∈ [0 , T ] , ∀ x ∈ ∂ Ω . where u ( t, x ) is the solution of the differential equation, F [ · , λ ] is the nonlinear op erator parameterized by λ . I [ u ] and B [ u ] stand for general initial and b oundary conditions compatible with the differen tial equations. Ω is the subset of R d , ∂ Ω rep- resen ts the b oundary of the domain Ω, and λ may b e unknown. Many works [ 24 , 25 ] hop e to find the most suitable λ under a given data set and call this type of problem an inv erse problem. On the other hand, man y w orks [ 26 , 27 ] hop e to design the mo del to obtain a data-driven solution that conforms to the equation ( 1 ) when there is some data and λ is known, and this t yp e of question is referred to as the forward problem. In this pap er, we do not carefully distinguish b etw een forward problems and inv erse problems. W e hop e that when λ is unknown, the design mo del can find the most suit- able parameters for a given small dataset, and then can solve the equation ( 1 ) in a fast and accurate w ay . F or the forward and inv erse problems, to ev aluate the accuracy of the learned solution ˆ u ( t, x ), the L 2 relativ e error ∥ ˆ u − u ∥ 2 ∥ u ∥ 2 is used. While for the in verse problems, 3 Fig. 1 : The schematic diagram of the Lagrangian-mechanics, Hamiltonian- mec hanics, Onsager’s V ariational Principle and Extended Irreversible Thermo dynamics informed neural net w orks. w e use the relative error | ˆ λ i − λ i | λ i is computed to ev aluate the accuracy of the predicted co efficien ts ˆ λ . Due to randomness stemming from factors such as random sampling, net work initialization, and optimization, each exp eriment is conducted by ten times. Subsequen tly , w e compute the geometric mean of the errors for eac h case. 2.2 Ph ysics-informed Neural Net works Ph ysics-informed Neural Netw orks(PINNs)[ 6 ] eliminates the need for discretization of the solution domain, and the efficien t optimization and prediction capabilities of neural net works are exploited. F ollowing the original work, one can pro ceed b y representing the unknown solution u ( t, x ) by a deep neural netw ork u θ ( t, x ), where θ denotes all trainable parameters of the netw ork, including the w eights and biases. Then, PINNs appro ximate the map b etw een points in the spatio-temp oral domain to the solution of the differential equations by minimizing the follo wing comp osited loss function Loss ( θ ) = λ data Loss data ( θ ) + λ res Loss res ( θ ) + λ ic Loss ic ( θ ) + λ bc Loss bc ( θ ) . (2) The first term of the ab ov e loss function c haracterizes the difference betw een the neural net work predictions and the lab eled data u true at those given p oin ts ( t i data , x i data ): Loss data ( θ ) = 1 N data N data X i =1 u θ ( t i data , x i data ) − u true ( t i data , x i data ) 2 , (3) 4 where t i data ∈ [0 , T ] , x i data ∈ Ω. The second term of the loss function is the mean squared error due to the Residual of the differential equations: Loss res ( θ ) = 1 N res N res X i =1 ∂ u θ ∂ t ( t i res , x i res ) − F [ u θ ]( t i res , x i res ) 2 , (4) where ∂ u θ ∂ t ( t i res , x i res ) − F [ u θ ]( t i res , x i res ) denotes the equation error predicted by the neural netw ork at the giv en residual p oin ts ( t i res , x i res ), and t i res ∈ [0 , T ] , x i res ∈ Ω. The third term is the mean squared error of the initial condition: Loss ic ( θ ) = 1 N ic N ic X i =1 I [ u θ ](0 , x i ic ) 2 , x i ic ∈ Ω (5) where I [ u θ ](0 , x i ic ) represen ts the initial conditions predicted b y the neural net work at p oin ts (0 , x i ic ). And the last term shows the mean squared error ab out the b oundary condition: Loss bc ( θ ) = 1 N bc N bc X i =1 B [ u θ ]( t i bc , x i bc ) 2 , t i bc ∈ [0 , T ] , x i bc ∈ ∂ Ω (6) where B [ u θ ]( t i bc , x i bc ) represents the b oundary condition of the PDEs predicted by the neural netw ork at the b oundary points ( t i bc , x i bc ). The hyperparameters λ data , λ res , λ ic , and λ bc allo w assigning a different learning rate to each loss term to balance their in terplay during mo del training. F or practical implementation, a finite num b er of lab eled data p oints are prescrib ed on the initial and b oundary conditions, and the corresp onding data loss is ev aluated jointly with lab eled p oin ts sampled within the solution domain. This treatmen t is consistently adopted in all subsequent mo dels based on v arious thermo dynamic form ulations. 2.3 Lagrangian-Mec hanics informed Neural Netw orks W e first design the neural netw orks embedded with the Lagrangian theory of classical mec hanics, called Lagrangian-Mec hanics informed Neural Netw orks. Assuming that w e hav e known the Lagrangian function of the system is L = T − V , which should b e a function of the generalized co ordinates q and ˙ q , abbreviated as L = L ( q , ˙ q , t ). It is w ell-known that L satisfies the Euler-Lagrange equation d dt ∂ L ∂ ˙ q − ∂ L ∂ q = 0 . (7) Consider embedding the Lagrangian function and Lagrangian equation in to neural net work mo dels. First, use a neural netw ork to approximate the function from time t to the physical system co ordinate q . Then, we can use the automatic differentiation tec hnique and the system’s Lagrangian function L ( q , ˙ q , t ) to automatically calculate 5 the left-hand side term of the Lagrangian equation, which can then b e embedded into the neural netw ork mo del as the residual loss. The loss function of the mo del primarily comprises the data loss, equation residual loss, and Lagrangian function loss, i.e., Loss = λ data Loss data + λ res Loss res + λ lag Loss lag . Here λ data , λ res , λ lag are three adjustable weigh ts. Loss data is the data loss, and let ˆ q ( t i data ) and ˆ ˙ q ( t i data ) b e the predicted results of the neural netw ork, while q ( t i data ) and ˙ q ( t i data ) b e the real data at time t i data , one can get Loss data = 1 N data N data X i =1 h | ˆ q ( t i data ) − q ( t i data ) | 2 + | ˆ ˙ q ( t i data ) − ˙ q ( t i data ) | 2 i . (8) Loss res stands for the residual loss of the Euler-Lagrange equation. If we write f = d dt ∂ L ∂ ˙ q − ∂ L ∂ q , then Loss res = 1 N res N res X j =1 | f ( t i res ) | 2 . (9) Loss Lag represen ts the constrain t on the Lagrangian function, whic h can b e calculated as Loss lag = 1 N lag N lag X i =1 ( L ( ˆ q ( t i lag ) , ˆ ˙ q ( t i lag )) − L i ) 2 . (10) L ( ˆ q ( t i lag ) , ˆ ˙ q ( t i lag )) is the Lagrangian quantit y of the system predicted by the mo del at time t i lag , and L i is the real data of the Lagrangian at the same time p oint. 2.4 Hamiltonian-Mec hanics informed Neural Netw orks In addition to using the Lagrangian theory of classical mechanics to design PINNs mo dels, we can also refer to the Hamiltonian theory to design corresp onding PINNs mo dels. Supp ose that w e ha ve known the Hamiltonian function of the system H = T + V , which should b e a function of generalized co ordinates q and the general- ized momen tum p of the system, we shorten it as H = H ( q , p , t ). F urthermore, w e kno w that the Hamiltonian function of the system satisfies the Hamiltonian canonical equation ˙ q = ∂ H ∂ p , ˙ p = − ∂ H ∂ q . (11) T o make full use of the existing ph ysical information and solv e the forw ard and in verse problems of the ph ysical system at the same time, we consider using neural net works to approximate the mapping b et ween t and the generalized co ordinates q and the generalized momentum p of the system. F urthermore, we can calculate the Hamil- tonian of the system through the H ( q , p , t ), and automatically calculate eac h term in the Hamilton’s canonical equation with the help of the technique of automatic differen tiation. 6 The loss function of this mo del is Loss = λ data Loss data + λ res Loss res . Let ˆ q ( t i data ) and ˆ p ( t i data ) be predicted results of the neural net work, while q ( t i data ) and p ( t i data ) b e the real data at time t i data . Then Loss data = 1 N data N data X i =1 h | ˆ q ( t i data ) − q ( t i data ) | 2 + | ˆ p ( t i data ) − p ( t i data ) | 2 i . (12) F or the residual loss of Hamilton’s canonical equation, one can write f 1 = d q dt − ∂ H ∂ p , f 2 = d p dt + ∂ H ∂ q , then Loss res = 1 N res N res X j =1 h | f 1 ( t i eq ) | 2 + | f 2 ( t i eq ) | 2 i . (13) F urthermore, if the ph ysical system to b e solved is a conserv ed system, we consider adding a loss term Loss cons = 1 N cons N cons X i =1 | d H dt ( ˆ q ( t i cons ) , ˆ p ( t i cons )) | 2 . (14) d H dt ( ˆ q ( t i cons ) , ˆ p ( t i cons )) stands for the total deriv ative of the predicted Hamiltonian function with resp ect to time. Since the system is ideal, complete, and the active force is conserv ative, the energy of the ph ysical system is conserved, and the time deriv ative of the system’s Hamiltonian should b e 0, and the Loss cons adds this constraint to the loss function. 2.5 O VP informed Neural Netw orks Dissipativ e physical systems, such as diffusion pro cesses, chemical reaction netw orks, and non-equilibrium thermo dynamics, are fundamentally go verned b y the Onsager’s v ariational principle (OVP)[ 28 , 29 ], an extension of Rayleigh’s principle of the least energy dissipation in Stok esian hydrodynamics. Given generalized co ordinates x = ( x 1 , · · · , x m ), where m is the degree of freedom of the system, we can introduce the Ra yleighian of the system: R = Φ + ˙ U = 1 2 X i,j ξ ij ˙ x i ˙ x j + X i ∂ U ∂ x i ˙ x i , (15) where Φ = 1 2 P i,j ξ ij ˙ x i ˙ x j is called the dissipation function, and ξ ij is the friction co effi- cien ts, satisfying the Onsager recipro cal relation ( ξ ij = ξ j i ) and p ositiv e definiteness. U ( x ) denotes the p otential energy of the system. 7 The Onsager’s v ariational principle states that the true time-evolution path of the system minimizes the Ra yleighian, i.e. δ R δ ˙ x i = X j ξ ij ˙ x j + ∂ U ∂ x i = 0 . (16) The ab ov e equation is equiv alent to the force balance equation X j ξ ij ˙ x j = − ∂ U ∂ x i . (17) Let ( ξ − 1 ) ij b e the inv erse of the matrix ξ ij , then equation ( 17 ) gives a time-evolution equation for x i as dx i dt = − X j ( ξ − 1 ) ij ∂ U ∂ x j . (18) It can b e sho wn that this v ariational structure guaran tees physically consisten t b eha viors, such as monotonic energy dissipation, stability , and correct steady states. Here w e develop a general framew ork called the Onsager-V ariational-Principle Informed Neural Netw orks (OVP-PINNs), which integrates the Onsager’s v ariational structure in to ph ysics-informed neural netw orks for learning dissipativ e dynamical sys- tems. Supp ose that w e know the Rayleighian R ( u , v ) of the sys tem, a function of the system state u and the generalized velocity v . By embedding the Ra yleighian mini- mization condition ∂ R ∂ v = 0 as a structural constraint in the loss function, the neural net work is forced to learn dynamics that resp ect the correct thermo dynamic geome- try . The system states u and the generalized velocity v will b e learned by the neural net works. The total loss com bines three terms Loss = λ data Loss data + λ res Loss res . Let ˆ u ( x i data , t i data ) and ˆ v ( x i data , t i data ) be the system states and generalized v elo cit y predicted by the neural netw ork, while u ( x i data , t i data ) and v ( x i data , t i data ) b e the lab eled data p oint at co ordinate ( x i data , t i data ). Then Loss data = 1 N data N data X i =1 h | ( ˆ u − u )( x i data , t i data ) | 2 + | ( ˆ v − v )( x i data , t i data ) | 2 i . (19) Mean while, the Onsager v ariational principle gives Loss res = 1 N res N res X i =1 δ R δ ˆ v ( x i res , t i res ) 2 . (20) 8 As a consequence, combining the Onsager v ariational principle with neural netw orks yields a physics-grounded, structure-preserving, and data-efficient learning framework for a wide range of dissipative systems. 2.6 EIT informed Neural Net work Extended Irrev ersible Thermo dynamics (EIT) [ 30 ], building on classical irreversible thermo dynamics (CIT) [ 31 ], treats non-equilibrium flows suc h as heat flow and dif- fusion flow as indep enden t thermo dynamic state v ariables, and th us can effectively describ e transient and nonlo cal effects in nonequilibrium systems. Within the EIT framew orks, the en tropy function S = S ( e, n, q , Π , · · · ) , (21) is written as a function of internal energy e , molecular density n , heat flux q , strain Π, etc., and satisfies the lo cal entrop y balance equation ∂ S ∂ t + ∇ · J s = σ s ≥ 0 , (22) where J s is the entrop y flux and σ s is the entrop y pro duction rate. In CIT, en tropy dep ends only on conserved v ariables such as energy density e and particle num b er densit y n . Incorp orating nonequilibrium fluxes into the entrop y function allo ws EIT to capture memory , relaxation, and finite-sp eed propagation effects that are absent in CIT. F ollowing the deriv ation pro cedure of EIT, it can b e shown that each flow satisfies a relaxation-type equation. F or example, the heat flow ob eys the Cattaneo-V ernotte equation τ q dq dt + q = − κ ∇ T , (23) whic h guaran tees the finite propagation speed and causalit y , in con trast to the classical heat-conduction equation with F ourier’s laws. T o inv estigate its effectiv eness when combined with deep learning, we incorp orate EIT into physics-informed neural netw orks to learn dissipative dynamical systems directly from data and physical laws. Let u represen t the system state together with its extended nonequilibrium v ariables, and let ˆ u b e its neural netw ork approximation. The total loss function is defined as Loss = Loss data + Loss res , (24) where Loss data = 1 N data N data X i =1 | ˆ u ( x i data , t i data ) − u ( x i data , t i data ) | 2 (25) is the data loss term. F urther assume the form of the local en tropy function S ( e, n, q , Π , · · · ) is kno wn, we can in tro duce the residue loss due to the enetropy balance 9 equation: Loss ent = 1 N ent N ent X j =1 h ∂ S ∂ t ( x j ent , t j ent ) + ∇ · J s ( x j ent , t j ent ) − σ s ( x j ent , t j ent ) 2 i . (26) By enforcing the en tropy balance law as a structural constrain t, the resulting EIT- informed neural net work is guided to learn dynamics that remain consisten t with the fundamen tal principles of nonequilibrium thermo dynamics. T o sum up, here we introduce sev eral v ariants of PINNs by incorp orating differ- en t thermo dynamic form ulations – Hamiltonian and Lagrangian mechanics informed neural netw orks for mo deling conserv ative dynamics, Onsager’s v ariational principle and extended irrev ersible thermo dynamics informed neural net works for dissipative dynamics, to be exact. A k ey distinction b et ween them lies in residue loss (see T able 1 ). The mo dels deriv ed based on differen t thermodynamic form ulations for the same prob- lem may lead to dramatically distinguished p erformance in the pro cedure of mac hine learning, which is the key issue we hop e to explore in the curren t study . T able 1 : Comparison among different thermo dynamics-informed neural net works. F ramework Theoretical F oundation Pre- Knowledge Neural Netw ork Residue Loss Description NM-PINN Newtonian Mechanics u ( x, t ) u θ ( x, t ) ∂ u θ ∂ t − F [ u θ ] 2 Conserv ativ e or Dis- sipative Dynamics LM-PINN Lagrangian Mechanics L ( q , ˙ q, t ) q θ ( t ) d dt ∂ L ∂ ˙ q − ∂ L ∂ q 2 Conserv ativ e Dynamics HM-PINN Hamiltonian Mechanics H ( q , p, t ) q θ ( t ) p θ ( t ) dq dt − ∂ H ∂ p 2 + dp dt + ∂ H ∂ q 2 Conserv ativ e Dynamics OVP-PINN Onsager’s V ariational Principle R [ u, v ] u θ ( x, t ) v θ ( x, t ) δ R δ v θ 2 Dissipative Dynamics EIT-PINN Extended Irreversible Thermody- namics S [ u, v ] u θ ( x, t ) v θ ( x, t ) ∂ S ∂ t + ∇ · J s − σ s 2 Dissipative Dynamics 10 3 Results 3.1 Conserv ative Systems 3.1.1 Ideal Mass Spring Our first task is to mo del and solve the dynamics of an ideal mass-spring system [ 32 ]. Supp ose there is an ob ject with mass m attached to the end of the ideal spring. Then w e can calculate the acceleration driven b y the elastic force according to Newton’s second law of motion: m ¨ q = − k q, (27) where k is the elastic co efficient of the spring, and q is the distance of the spring from its equilibrium p osition. F or simplicit y , we set k = 1 . 0 and m = 1 . 0 in the exp eriment. Mean while, this system can also b e reformulated within the framew ork of Lagrangian and Hamiltonian mechanics. It is easy to show that the Lagrangian of this system reads L = T − V = 1 2 m ˙ q 2 − 1 2 k q 2 , (28) and the Hamiltonian is H = T + V = 1 2 m ˙ q 2 + 1 2 k q 2 = 1 2 m p 2 + 1 2 k q 2 , (29) where p = m ˙ q is the momentum of the system. The motion of the ideal mass spring is then gov erned by the Euler-Lagrange e quation or Hamilton’s canonical equations. As a typical conserved system, it is natural to assess whether the solutions learned b y NM-PINNs, LM-PINNs, and HM-PINNs preserve key physical quantities, such as the Lagrangian and Hamiltonian, in the forward problem. The first column of Figure 2 (a-c) compares the Hamiltonian learned b y the three mo dels with the true v alue. Consisten t with references [ 17 ], NM-PINNs fail to recov er the conserv ed Hamiltonian. LM-PINNs ac hieve partial accuracy (MSE: 2 . 02 × 10 − 2 ), while the HM-PINNs, thanks to the addition of extra loss Loss cons , can accurately learn the Hamiltonian with a high precision (MSE: 1 . 14 × 10 − 3 ). Notably , all three mo dels can learn the accurate system state, as shown in the third column of Figure 2 (a-c). W e further compare the learned Lagrangian. NM-PINNs capture the p eriodic trend of the Lagrangian but exhibit substantial deviation from the exact v alue. In contrast, b oth LM-PINNs and HM-PINNs successfully recov er the Lagrangian with high accuracy and maintain the correct p erio dic structure. Differen t formulations of the gov erning equations used as residual loss also influ- ence the p erformance on the inv erse problem. Supp ose b oth parameters k and m are unkno wn. Figure 3 (a) presents the distribution of k /m inferred by eac h model under v aried noise lev els across ten training runs with random initialization, where the black dashed line represen ts the true k /m . All three mo dels can achiev e accurate iden tification in the noise-free setting. Under noisy conditions, ho wev er, LM-PINNs demonstrate the strongest robustness, main taining a relativ e absolute error of appro xi- mately 4 . 72 × 10 − 1 ev en in the presence of 10% noise. NM-PINNs rank second, whereas HM-PINNs exhibit the po orest p erformance. These differences arise from the structure 11 Fig. 2 : Results for the forw ard problem of ideal mass spring. The first column presen ts the predicted and true Hamiltonian, the s econd column sho ws the Lagrangian, the third column is the tra jectory of the system, and the last column illustrates the phase diagram of q and p . Panels ( a-c ) corresp ond to the results obtained using the NM-PINNs, LM-PINNs, and HM-PINNs, resp ectively . of the residual equations. The Newton form ulation in volv es second-order deriv atives that strongly amplify noise. The Hamiltonian formulation dep ends on b oth p osition and momentum, and the increased num b er of noise-affected v ariables undermines mo del stability . 3.1.2 Ideal P endulum An ideal pendulum is a theoretical model consisting of a point mass m suspended from a fixed p oin t b y a massless, inextensible string of length l , whic h swings under the influ- ence of gravit y without air resistance or friction. The nonlinear ordinary differential equation that gov erns the motion of the p endulum reads ¨ q + g l sin q = 0 , (30) where q denotes the angular displacement and g is the gra vitational acceleration. F or the ideal p endulum system, Lagrangian mec hanics and Hamiltonian mec hanics can also b e used. F or simplicity , we directly give the Lagrangian function of the system as L = T − V = 1 2 ml 2 ˙ q 2 − mg l (1 − cosq ) , (31) 12 Fig. 3 : Results for the inv erse problem of ideal mass spring and ideal pen- dulum. ( a ) F or the ideal mass spring system, the learned distribution of k/m obtained from the three models after ten random initializations under differen t noise lev els. The blac k dashed line represents the ground truth v alue of k /m . ( b ) Corresp onding results for ideal p endulum systems. and the Hamiltonian function is H = T + V = 1 2 ml 2 p 2 + mg l (1 − cosq ) , (32) where p = ml 2 ˙ q is the momen tum of the system. In this case, we set m = 1, g = 9 . 81 and l = 1 . 0 in all exp erimen ts. A cen tral question is why residuals constructed based on differen t forms of the go verning equations lead to suc h pronounced p erformance differences among the Newtonian, Lagrangian, and Hamiltonian mo dels. In the ideal mass-spring system, NM-PINNs p erform substantially worse than LM-PINNs and HM-PINNs in learn- ing the phase p ortrait of p and q (see the fourth column of Figure 2 (a-c)), which ma y b e the main reason why NM-PINNs cannot accurately learn Lagrangian and Hamiltonian. Consistent with this b eha vior, the ideal p endulum, which shares a sim- ilar dynamic structure, exhibits the same trend, as shown in the fourth column of Figure 4 (a-c). The Newton residual only constrains the second-order dynamics, with- out enforcing any structure on the first-order v ariables. In contrast, the Lagrangian and Hamiltonian formulations impose first-order structure-preserving constraints that directly regulate momentum and energy . The ability of HM-PINNs to simultaneously learn accurate Lagrangian and Hamiltonian representations comes with a trade-off. In 13 Fig. 4 : Results for the forward problem of ideal p endulum. The first column presen ts the predicted and true Hamiltonian, the s econd column sho ws the Lagrangian, the third column is the tra jectory of the system, and the last column illustrates the phase diagram of q and p . Panels ( a-c ) corresp ond to the results obtained using the NM-PINNs, LM-PINNs, and HM-PINNs, resp ectively . the inv erse problem (Figure 3 (b)), the mean v alue of g /l inferred by HM-PINNs ov er ten random initializations under 10% noise is comparable to that obtained by NM- PINNs, yielding relative absolute errors of 2 . 38 × 10 − 1 and 1 . 71 × 10 − 1 , resp ectively . Ho wev er, HM-PINNs exhibit a marked v ariance, indicating reduced stability despite similar av erage accuracy . 3.1.3 Double p endulum A double p endulum is a classic example of a nonlinear dynamical system [ 33 ] consisting of tw o rigid ro ds connected in series, with the first p endulum attached to a fixed p oin t and the second one attac hed to the end of the first. Eac h rod is assumed to be massless, and p oin t masses m 1 and m 2 are lo cated at their resp ective ends, with lengths l 1 and l 2 . The configuration of the system can b e describ ed by tw o angular v ariables θ 1 and θ 2 , represen ting the angular displacemen ts from the v ertical. F or the sak e of simplicit y , w e directly giv e the dynamics equation of the double p endulum system deriv ed from 14 the Newton mechanics ˙ θ 1 = ω 1 , ˙ θ 2 = ω 2 , ˙ ω 1 = − g (2 m 1 + m 2 ) sin θ 1 − m 2 g sin( θ 1 − 2 θ 2 ) − 2 sin( θ 1 − θ 2 ) m 2 ( ω 2 2 l 2 + ω 2 1 l 1 cos( θ 1 − θ 2 )) l 1 (2 m 1 + m 2 − m 2 cos(2 θ 1 − θ 2 )) , ˙ ω 2 = 2 sin( θ 1 − θ 2 )( ω 2 1 l 1 ( m 1 + m 2 ) + g ( m 1 + m 2 ) cos θ 1 + ω 2 2 l 2 cos( θ 1 − θ 2 )) l 2 (2 m 1 + m 2 − m 2 cos(2 θ 1 − 2 θ 2 )) . (33) The Lagrangian function of the double p endulum system is L = m 1 + m 2 2 l 2 1 ˙ θ 2 1 + m 2 2 l 2 2 ˙ θ 2 2 + m 2 l 1 l 2 ˙ θ 1 ˙ θ 2 cos( θ 1 − θ 2 ) + ( m 1 + m 2 ) g l 1 cos θ 1 + m 2 g l 2 cos θ 2 . (34) Since the momentum p 1 = ∂ L ∂ ˙ θ 1 , p 2 = ∂ L ∂ ˙ θ 2 , one can obtain the Hamiltonian function as follo ws H = l 2 2 m 2 p 2 1 + l 2 1 ( m 1 + m 2 ) p 2 2 − 2 m 2 l 1 l 2 p 1 p 2 cos( θ 1 − θ 2 ) 2 l 2 1 l 2 2 m 2 [ m 1 + sin 2 ( θ 1 − θ 2 ) m 2 ] − m 2 g l 2 cos θ 2 − ( m 1 + m 2 ) g l 1 cos θ 1 . (35) F or the uniformity of the exp erimen t, we set l 1 = m 1 = m 2 = 1 . 0 , l 2 = 2 . 0. Although the double p endulum system is substantially more complex than the ideal p endulum case, the mo dels exhibit broadly consistent b eha viors in learning the Lagrangian and Hamiltonian. All three approac hes recov er accurate tra jectories. Ho w- ev er, Newton PINNs fail to capture the correct Lagrangian and Hamiltonian (Figure 5 (a)). Lagrange PINNs successfully learn the Lagrangian, but their predicted Hamilto- nian shows substantial deviation from the ground truth(Figure 5 (b)). These findings further underscore that incorp orating structured physical priors (esp ecially Hamil- tonian structures, see Figure 5 (c)) is crucial for enhancing the interpretabilit y and reliabilit y of PINN-based mo dels. 3.2 Dissipativ e Systems 3.2.1 Damp ed p endulum The damp ed p endulum provides a more realistic representation of the p endulum dynamics, as it accounts for energy dissipation that causes the oscillation amplitude to decay ov er time. Its equation of motion reads ¨ θ + λ ˙ θ + β 2 θ = 0 , θ ( t = 0) = θ 0 , (36) where θ is the angular displacement, λ is the damping co efficien t, and β is the natural frequency in the absence of damping. The term λ ˙ θ mo dels the mec hanism through 15 Fig. 5 : Results for the forw ard problem of double pendulum. The first column presen ts the predicted and true Hamiltonian, the s econd column sho ws the Lagrangian, and the third column is the tra jectory of the system. Panels ( a-c ) corresp ond to the results obtained using the NM-PINNs, LM-PINNs, and HM-PINNs, resp ectiv ely . whic h mec hanical energy is gradually dissipated. The ab o ve equation will b e used for constructing NM-PINNs. In this example, we set λ = 0 . 2 , β = √ 9 . 81 = 3 . 1321. Alternativ ely , considering the structural equation ˙ θ = ω , we can in tro duce the dissipation function Φ = 1 2 λω 2 and the p oten tial function U = 1 2 ( β 2 θ 2 + ω 2 ), which leads to the Ra yleighian R = Φ + ˙ U = 1 2 λω 2 + β 2 θ ˙ θ + ω ˙ ω . (37) Using the Onsager v ariational principle, one can get δ R δ ω = λω + β 2 θ + ˙ ω = 0 . (38) Along with the structural equation ˙ θ = ω , we arrive at the full mo del used for OVP- PINNs. T o form ulate the dynamics from a thermodynamic persp ectiv e, consider the en tropy function S = S ( θ , ω ) = − 1 2 β 2 θ 2 − 1 2 ω 2 , whose time deriv ative gives dS dt = ∂ S ∂ θ dθ dt + ∂ S ∂ ω dω dt = − β 2 θ dθ dt − ω dω dt . (39) 16 By imp osing constraint ˙ θ = ω and the constitutive equation ˙ ω + β 2 θ = − λω , the en tropy balance equation leads to a non-negative entrop y production rate dS dt = σ s = λω 2 ≥ 0 . (40) in accordance with the second la w of thermodynamics, as long as λ ≥ 0. The structural equation ˙ θ = ω , the constitutive equation, as well as the en tropy balance equation will all b e included into the residue loss of EIT-PINNs. Noise Model λ mean ± std β mean ± std (no outliers) NM-PINNs 0 . 2000 ± 4 . 8021 × 10 − 5 3 . 1321 ± 2 . 2490 × 10 − 5 0% OVP-PINNs 0 . 2000 ± 3 . 9388 × 10 − 5 3 . 1321 ± 6 . 8259 × 10 − 6 EIT-PINNs 0 . 2000 ± 9 . 3730 × 10 − 5 3 . 1273 ± 6 . 2342 × 10 − 3 NM-PINNs 0 . 2002 ± 1 . 9835 × 10 − 4 3 . 1322 ± 1 . 3322 × 10 − 5 1% OVP-PINNs 0 . 2001 ± 3 . 5251 × 10 − 5 3 . 1324 ± 9 . 3518 × 10 − 6 EIT-PINNs 0 . 2003 ± 1 . 3328 × 10 − 4 3 . 1253 ± 4 . 7730 × 10 − 3 NM-PINNs 0 . 1994 ± 1 . 0121 × 10 − 4 3 . 1325 ± 1 . 3920 × 10 − 5 5% OVP-PINNs 0 . 1996 ± 3 . 3281 × 10 − 5 3 . 1324 ± 1 . 2373 × 10 − 5 EIT-PINNs 0 . 1977 ± 1 . 7854 × 10 − 4 3 . 1198 ± 5 . 0098 × 10 − 3 NM-PINNs 0 . 1989 ± 8 . 8869 × 10 − 5 3 . 1282 ± 7 . 2125 × 10 − 6 10% OVP-PINNs 0 . 2004 ± 4 . 1065 × 10 − 5 3 . 1319 ± 4 . 5269 × 10 − 5 EIT-PINNs 0 . 1996 ± 6 . 1039 × 10 − 4 3 . 1738 ± 4 . 2032 × 10 − 3 T able 2 : Unknown parameter prediction results of damp ed p endulum system under differen t noise intensities. T en ran- dom initialization sim ulations w ere p erformed for each mo del, and all results are av eraged ov er ten simulations. ”no outliers” excludes sign- flipp ed β v alues caused by the β 2 -based residual loss. As a prototypical dissipative system, we inv estigate the influence of using Newto- nian, Onsager v ariational principle, and EIT form equation as residual loss on both forw ard and inv erse problems. NM-PINNs, O VP-PINNs, and EIT-PINNs can all suc- cessfully reconstruct the system state tra jectory (see the third and fourth columns of Figure 6 (a-c)). How ever, pronounced differences emerge in their abilit y to learn the Ra yleighian and entrop y (first and second columns of Figure 6 (a-c)). Sp ecifically , the Ra yleighian learned by the NM-PINNs mo del sho ws a noticeable bias at the early time (MSE: 1 . 59 × 10 − 1 ), while OVP-PINNs and EIT-PINNs maintain high accuracy , with MSEs of 6 . 08 × 10 − 2 and 3 . 89 × 10 − 2 , resp ectiv ely . Moreov er, NM-PINNs exhibit the largest error in reco vering the entrop y function (MSE: 5 . 75 × 10 − 2 ), follo wed by O VP-PINNs (MSE: 1 . 33 × 10 − 2 ). In con trast, EIT-PINNs ac hiev e higi-precision reco v- ery of the entrop y prediction with an MSE of 1 . 47 × 10 − 4 . One notable adv antage of O VP-PINNs emerges in in verse problems, particularly in the iden tification of β . Under 10% noise, OVP-PINNs exhibit sup erior stability compared to b oth NM-PINNs and EIT-PINNs, substantially reducing the incidence of incorrect estimations (see T able 2 ). 17 Fig. 6 : Results for the forward problem of damp ed p endulum. The first column presen ts the predicted and true Rayleighian, the second column shows the en tropy function S , the third column is the tra jectory of the system, and the last column illustrates the phase diagram of q and p . P anels ( a-c ) corresp ond to the results obtained using the NM-PINNs, OVP-PINNs, and EIT-PINNs, resp ectiv ely . 3.2.2 Diffusion equation The 2d diffusion equation describ es the spatiotemp oral evolution of a scalar field, suc h as temp erature, concen tration, or probabilit y densit y , under the effect of diffusiv e transp ort. It is written as ∂ u ∂ t = D x ∂ 2 u ∂ x 2 + D y ∂ 2 u ∂ y 2 , ∀ ( x, y ) ∈ Ω , ∀ t ∈ [0 , T ] . (41) where u ( x, y , t ) denotes the field of in terest, and D x , D y > 0 is the diffusion co efficien ts along the tw o spatial directions. In this case, we consider the Cauch y problem of the 2d diffusion equation and choose the initial condition to b e a sup erposition of tw o anisotropic Gaussian kernels where the Gaussian centers are lo cated at ( x 1 , y 1 ) = (0 . 5 , 0 . 5) and ( x 2 , y 2 ) = ( − 0 . 5 , − 0 . 5). The diffusion co efficien ts used to generate the reference solution are D x = 0 . 2 and D y = 0 . 5. T o utilize Onsager’s v ariation principle, w e start with the contin uit y equation ∂ u ∂ t + ∂ ∂ x ( uv 1 ) + ∂ ∂ y ( uv 2 ) = 0 . (42) 18 Considering the dissipation function Φ = 1 2 R Ω dxdy u D − 1 x v 2 1 + D − 1 y v 2 2 , and the p oten tial energy function U = R Ω dxdy ( u ln u − u ), we obtain the Ra yleighian as R = Φ + ˙ U = Z Ω h 1 2 u D − 1 x v 2 1 + D − 1 y v 2 2 + ∂ u ∂ t ln u i dxdy , (43) Then the v ariation of Rayleighian with resp ect to v 1 , v 2 yields t wo constitutive relations for the v elo cities δ R δ v 1 = D − 1 x uv 1 + ∂ u ∂ x = 0 , δ R δ v 2 = D − 1 y uv 2 + ∂ u ∂ y = 0 , (44) b y using in tegration by parts and infinite b oundary conditions. The con tinuit y equation together with t wo constitutiv e relations will b e used for constructing the residue loss of O VP-PINNs. The diffusion equation also fits into the EIT framew ork. The starting point is still the contin uity equation ∂ u ∂ t + ∂ ∂ x ( uv 1 ) + ∂ ∂ y ( uv 2 ) = 0. Define the lo cal entrop y function as S [ u ] = − ( u ln u − u ), then the lo cal en tropy balance equation reads ∂ S ∂ t = ∂ u ∂ t dS du = ∂ ∂ x ( uv 1 ) + ∂ ∂ y ( uv 2 ) ln u = ∂ ∂ x ( v 1 u ln u ) + ∂ ∂ y ( v 2 u ln u ) − u v 1 ∂ ∂ x (ln u ) + v 2 ∂ ∂ y (ln u ) . (45) In the ab o ve equation, the first term is recognized as the lo cal entrop y flux, while the second term denotes the lo cal entrop y pro duction rate. T o keep the non-negativity of the second term, it is natural to set v 1 = − D x ∂ ∂ x (ln u ) , v 2 = − D y ∂ ∂ y (ln u ) . (46) It is clearly seen that EIT gives the same constitutive relations as those of OVP in the curren t case. No w, the lo cal entrop y pro duction rate b ecomes σ s = u ( D x v 2 1 + D y v 2 2 ). As a t ypical dissipativ e system, w e aim to further compare the p erformance of NM- PINNs, OVP-PINNs, and EIT-PINNs on b oth forward and inv erse problems. Figure 7 (a) presents the distribution of D x inferred by the three mo dels under different noise intensities, each ev aluated ov er ten random initializations. The results indicate that O VP-PINNs exhibit markedly stronger noise robustness than NM-PINNs and EIT-PINNs, as reflected by a significantly smaller v ariance. F or the forward prob- lem, all three mo dels are capable of learning relativ e v ariance solutions, as sho wn in the first and second columns of Figure 7 (c-e), with corresp onding RMSEs as 3 . 04 × 10 − 2 , 7 . 25 × 10 − 3 , and 2 . 88 × 10 − 3 , resp ectively . The primary distinctions 19 Fig. 7 : Results for the forw ard problem of diffusion equation. ( a ) The learned distribution of D x from the three mo dels after ten random initializations under dif- feren t noise lev els. ( b ) The reference en trop y flux at t = 0 . 5. ( c-e ) Results obtained using NM-PINNs, O VP-PINNs, and EIT-PINNs, resp ectiv ely . Each panel shows the p oin t-wise absolute error at t = 0 . 5 and t = 1 . 0 (first tw o columns), the predicted normalized Rayleighian (third column), and the difference field b et ween true and pre- dicted entrop y fluxes at t = 0 . 5 (fourth column). among the mo dels arise in their predictions of the Rayleighian and en tropy flux. The third column of Figure 7 (c-e) compares the normalized Rayleighian inferred b y each mo del with the ground truth v alue. NM-PINNs display substantial deviations at early times, including instances with incorrect signs. In contrast, although O VP-PINNs and EIT-PINNs also exhibit transient discrepancies, their predictions rapidly conv erge to correct Ra yleighian and are significantly more accurate than those of NM-PINNs. The difference field of true and predicted entrop y flux by the three mo dels is shown in the fourth column of Figure 7 (c-e), while the analytically deriv ed en tropy flux is presen ted in Figure 7 (b). Among the three models, EIT-PINNs yields the smallest magnitude of en tropy flux discrepancy and, more imp ortan tly , eliminates large scale coherent error 20 structures. The residual differences exhibit only small scale, spatially uncorrelated pat- terns, indicating that the underlying entrop y flux geometry is accurately captured. In con trast, NM-PINNs and OVP-PINNs suffer from systematic directional biases and lo calized structural mismatches. 3.2.3 Fisher-Kolmogoro v equation The Fisher-Kolmogorov (F-K) equation is a typical kind of reaction-diffusion equations, with the 1D general form ∂ u ∂ t = D ∂ 2 u ∂ x 2 + αu (1 − u/K ) , ∀ x ∈ Ω , ∀ t ∈ [0 , T ] , (47) where u ( x, t ) represents the state v ariables of the spatiotemp oral field (such as p op- ulation density , c hemical concen tration, etc.), D is the diffusion coefficient, α is the gro wth rate, and K is the upp er limit of the steady state (which w e assume to b e 1). This equation w as first proposed indep enden tly b y Fisher [ 34 ] and Kolmogoro v- P etrovskii-Piskuno v [ 35 ] to explain the spread and evolution of genes in a p opulation, and it is curren tly widely used in ecology and brain science [ 36 , 37 ]. Onsager’s v ariational principle provides an insightful w ay to understand the F-K equation from a ph ysical asp ect. Start with the contin uity equation ∂ u ∂ t + ∂ ∂ x ( uv ) = F ( u ) , (48) where F ( u ) = F + ( u ) − F − ( u ) with F + ( u ) = αu and F − ( u ) = αu 2 denotes the source term due to chemical reactions. Considering the dissipation function Φ = 1 2 R Ω dx ( D − 1 uv 2 ), and the p otential energy function U = R Ω dx ( u ln u − u ), we obtain the Rayleighian as R = Φ + ˙ U = Z Ω dx 1 2 D − 1 uv 2 + ∂ u ∂ t ln u = Z Ω dx 1 2 D − 1 uv 2 + F ( u ) ln u − ∂ ∂ x ( uv ) ln u . = Z Ω dx 1 2 D − 1 uv 2 + F ( u ) ln u + v ∂ u ∂ x . (49) The last line is obtained by using in tegration b y parts and infinite b oundary conditions. Applying the Onsager v ariational principle yields the optimality conditions δ R δ v = D − 1 uv + ∂ u ∂ x = 0 , (50) Inserting it in to the con tinuit y equation, w e reco ver the classical Fisher-KPP equation. The F-K equation could b e understo od from a thermo dynamic p oin t of view to o. With resp ect to the lo cal entrop y function S ( u ) = − ( u ln u − u ), the lo cal en tropy 21 balance equation reads ∂ S ∂ t = ∂ u ∂ t dS du = ∂ ( uv ) ∂ x ln u − [ F + ( u ) − F − ( u )] ln u = ∂ ∂ x ( v u ln u ) − uv ∂ ∂ x (ln u ) + [ F + ( u ) − F − ( u )] ln F + ( u ) F − ( u ) . (51) By setting v = − D ∂ ∂ x (ln u ), it b ecomes clear that ∂ ∂ x ( v u ln u ) denotes the local en tropy flux, while the lo cal entrop y pro duction rate includes t wo different contributions: D u [ ∂ ∂ x (ln u )] 2 caused b y particle diffusion and [ F + ( u ) − F − ( u )] ln[ F + ( u ) /F − ( u )] due to chemical reactions. In this example, EIT PINNs are trained using the lo cal en tropy balance equation deriv ed from the entrop y function S [ u ] = − ( u ln u − u ) as the residual loss, and their p erformance is compared with that of Newton PINNs and O-VPINNs. The first column of Figure 8 (c-e) rep orts the absolute error b etw een the solutions learned by the three mo dels and the reference solution sho wn in Figure 8 (a). EIT PINNs ac hieve the smallest maxim um absolute error, and the RMSE v alues of the three mo dels are 2 . 98 × 10 − 3 , 4 . 77 × 10 − 3 , and 1 . 39 × 10 − 3 , resp ectiv ely . Regarding the learning of Ra yleighian (second column of Figure 8 (c-e)), NM-PINNs exhibit p o or b ehaviors consisten t with previous examples, c haracterized b y pronounced errors and oscillations at early times. In this setting, p ossibly due to the adoption of a more general en tropy function, EIT PINNs do not recov er the Rayleighian as accurately as O-VPINNs. Nev ertheless, owing to the incorp oration of the lo cal entrop y balanced equation, EIT PINNs significan tly outp erform b oth Newton PINNs and O-VPINNs in learning the en tropy pro duction (third column of Figure 8 (c-e)) and the RMSE asso ciated with reference entrop y pro duction in Figure 8 (b) is 2 . 84 × 10 − 2 . 3.3 Analysis by Loss Landscap e T raining neural netw orks amounts to minimizing a high-dimensional non-conv ex loss, whic h is NP-hard in theory [ 38 ], y et often tractable in practice. Empirical evidence sho ws that, despite the presence of man y local minima, standard gradien t-based meth- o ds frequen tly conv erge to solutions with similar p erformance, even though the data and lab els are randomized before training[ 39 ]. How ev er, such fa vorable behavior is not univ ersal and strongly dep ends on netw ork architecture, optimizer c hoice, and loss design. A long-standing b elief holds that small-batc h SGD tends to find ”flat” minima with b etter generalization, whereas large batc hes conv erge to ”sharp” minima [ 40 – 42 ], although this view has b een challenged by subsequen t studies [ 43 , 44 ], and alternative training strategies hav e p erformed w ell even with large batch sizes [ 45 , 46 ]. T o address ambiguities in comparing loss landscapes, Li et al. [ 47 ] in tro duced a filter-wise normalization tec hnique that enables meaningful visualization and empir- ically supp orts the connection b et ween flatter minima and b etter generalization. Sp ecifically , for a loss function L ( θ ) = 1 m P m i =1 l ( x i , y i ; θ ), the lo cal landscap e around a reference p oin t θ ∗ is visualized via f ( α, β ) = L ( θ ∗ + α δ + β µ ) , (52) 22 Fig. 8 : Results for the forward problem of Fisher-Kolmogoro v equation. ( a ) Reference solution and ( b ) reference lo cal en tropy production rate for F-K equation. ( c-e ) Results obtained using NM-PINNs, OVP-PINNs, and EIT-PINNs, resp ectiv ely . Eac h panel shows the p oint-wise absolute error of the predicted solution u (first col- umn), the predicted normalized Rayleighian (second column), and the predicted lo cal en tropy pro duction rate (fourth column). where δ and µ are normalized random directions. F ollowing [ 47 ], these directions are constructed by rescaling a Gaussian random vector d in a filter-wise manner, d ij ← − d ij ∥ d ij ∥ ∥ θ ij ∥ , (53) thereb y mitigating parameter-scaling effects and allowing fair comparisons across differen t lay ers and architectures. Based on the metho d ab o ve, we visualized the loss landscap e of different thermo- dynamic formalism-informed neural netw orks in the examples of mass-spring (Figure 9 (a)) and diffusion (Figure 9 (b)). T o quantify the flatness of the loss landscap e, w e calculated the av erage F rob enius norm E [ ∥ H ∥ 2 F ] , ∥ H ∥ 2 F = X i,j H 2 ij , H = L xx L xy L y x L y y , (54) where H is the Hessian matrix calculated based on the Loss Landscap e. In Figure 9 (a), the av erage F rob enius norms calculated for NM-PINNs, LM-PINNs, and HM-PINNs 23 Fig. 9 : Loss Landscap e of ideal mass-spring system and diffusion equation. ( a ) Loss Landscap e of ideal mass-spring system, shown from left to righ t for NM- PINNs, LM-PINNs, and HM-PINNs. ( b ) Loss Landscap e of diffusion equation, shown from left to righ t for NM-PINNs, OVP-PINNs, and EIT-PINNs. are 214 . 64, 43 . 47, and 34 . 07, resp ectiv ely . In Figure 9 (b), the mean F rob enius norms corresp onding to the loss landscap e of NM-PINNs, OVP-PINNs, and EIT-PINNs are 15 . 25, 1 . 28, and 0 . 71, respectively . This analysis clarifies wh y HM-PINNs and EIT- PINNs are able to learn additional ph ysical quan tities b ey ond the system tra jectory . Ho wev er, the aforementioned loss landscape analysis do es not accoun t for the sup erior p erformance of LM-PINNs and OVP-PINNs in inv erse problems. In fact, accurate parameter reco v ery is go v erned b y the global structure of the reduced loss with respect to physical parameters, rather than the lo cal flatness of the full loss landscap e. This issue is left to our future studies. 4 Conclusion In this work, we present a systematic comparison and analysis of thermo dynamic structure informed neural netw orks, whic h is built on the general framework of PINNs b y incorp orating differen t thermo dynamic form ulations – Newtonian, Lagrangian, and Hamiltonian mec hanics for conserv ative systems, and the Onsager v ariational principle and extended irrev ersible thermo dynamics for dissipative systems. Through compre- hensiv e n umerical exp erimen ts on a range of classical dynamical systems and partial differen tial equations (cov ering conserv ative and dissipativ e systems, ordinary and par- tial differential equations, and b oth forward and inv erse problems), we quantitativ ely assess ho w differen t physical form ulations used as residual constrain ts affect the mo del accuracy , physical consistency , noise robustness, and in terpretability . 24 F or conserv ativ e systems, including the ideal mass-spring oscillator, the simple p endulum, and the double pendulum, n umerical experiments demonstrate that PINNs relying solely on Newtonian residuals can accurately reconstruct system tra jecto- ries but are markedly insufficient in learning key physical quantities such as the Lagrangian and Hamiltonian. Moreov er, they exhibit instability in recov ering the cor- rect phase-space structure. In con trast, LM-PINNs offer clear adv an tages in parameter iden tification and noise robustness. HM-PINNs, by explicitly enforcing energy conser- v ation, more faithfully preserve the Hamiltonian structure and energy inv arian ts of the system, but their inv erse problem stability decreases under high noise conditions. These findings clearly indicate that explicitly em b edding structure-preserving ph ysical form ulations into PINNs is essential for enhancing physical consistency . F or dissipativ e systems, such as damp ed p endulums, diffusion equations, and Fisher-Kolmogoro v equations, NM-PINNs remain effective in reconstructing system states, but exhibit systematic biases when learning thermo dynamic quantities such as the Rayleighian, entrop y function, entrop y flux, and entrop y pro duction. O VP- PINNs, by incorp orating the Onsager v ariational principle, achiev e enhanced stability and robustness in Rayleighian learning and parameter identification, particularly under noisy conditions. F urthermore, EIT-PINNs, through the explicit enforcemen t of entrop y balance and entrop y pro duction constraints, are capable of reco vering the en tropy function and entrop y flux in complex dissipative systems. While achiev- ing comparable Ra yleighian learning accuracy , EIT-PINNs significan tly outp erforms O VP-PINNs in terms of thermo dynamic consistency and in terpretability . In summary , the results presen ted here provide clear empirical evidence to guide the systematic selection and design of appropriate thermodynamic form ulations within the PINNs framework, and offer a foundation for integrating more general nonequilibrium thermo dynamic structures, such as the GENERIC framework, with deep learning mo dels, including neural op erators. 25 Co de & Data a v ailabilit y . The source co de and data for this pro ject are av ailable at https://gith ub.com/jay- mini/Thermodynamics- F ormalism- informed- PINNs.git . Ac knowledgemen ts. This work was supp orted by the National Key R&D Pro- gram of China (Grant No. 2024YF A1011900), and the Guangdong Provincial Key Lab oratory of Mathematical and Neural Dynamical Systems (2024B1212010004). The authors thank W uyue Y ang for her helpful discussions. Author Contributions. Guo jie Li: Inv estigation, Conceptualization, Metho d- ology , Data curation, F ormal analysis, Visualization, W riting-original draft. Liu Hong: Sup ervision, F unding Acquisition, Conceptualization, Pro ject Administration, W riting-Review & Editing. All authors reviewed the manuscript. Comp eting In terests. The authors declare that they hav e no conflict of interest. 26 References [1] W an, Z.Y., Vlac has, P ., Koumoutsakos, P ., Sapsis, T.: Data-assisted reduced- order mo deling of extreme even ts in complex dynamical systems. PloS one 13 (5), 0197704 (2018) [2] Ghnatios, C., Alfaro, I., Gonz´ alez, D., Chinesta, F., Cueto, E.: Data-driven generic mo deling of p oro visco elastic materials. Entrop y 21 (12), 1165 (2019) [3] Karapip eris, K., Stainier, L., Ortiz, M., Andrade, J.E.: Data-driven multiscale mo deling in mechanics. Journal of the Mec hanics and Physics of Solids 147 , 104239 (2021) [4] Lagaris, I.E., Lik as, A., F otiadis, D.I.: Artificial neural netw orks for solving ordi- nary and partial differential equations. IEEE transactions on neural netw orks 9 (5), 987–1000 (1998) [5] Owhadi, H.: Ba yesian numerical homogenization. Multiscale Mo deling & Simu- lation 13 (3), 812–828 (2015) [6] Raissi, M., Perdik aris, P ., Karniadakis, G.E.: Physics-informed neural netw orks: A deep learning framew ork for solving forward and inv erse problems in volving nonlinear partial differen tial equations. Journal of Computational physics 378 , 686–707 (2019) [7] Chen, Y., Lu, L., Karniadakis, G.E., Dal Negro, L.: Ph ysics-informed neural net works for in verse problems in nano-optics and metamaterials. Optics express 28 (8), 11618–11633 (2020) [8] Rasht-Behesh t, M., Hub er, C., Shukla, K., Karniadakis, G.E.: Physics-informed neural net works (pinns) for wa ve propagation and full wa v eform inv ersions. Journal of Geophysical Research: Solid Earth 127 (5), 2021–023120 (2022) [9] Krishnapriyan, A., Gholami, A., Zhe, S., Kirby , R., Mahoney , M.W.: Charac- terizing p ossible failure mo des in physics-informed neural netw orks. Adv ances in neural information pro cessing systems 34 , 26548–26560 (2021) [10] W ang, S., T eng, Y., Perdik aris, P .: Understanding and mitigating gradien t flo w pathologies in physics-informed neural netw orks. SIAM Journal on Scientific Computing 43 (5), 3055–3081 (2021) [11] W ang, S., Y u, X., Perdik aris, P .: When and why pinns fail to train: A neural tangen t k ernel p ersp ectiv e. Journal of Computational Physics 449 , 110768 (2022) [12] De Ryck, T., Bonnet, F., Mishra, S., B´ ezenac, E.: An op erator precondition- ing p erspective on training in physics-informed machine learning. arXiv preprint arXiv:2310.05801 (2023) 27 [13] Bihlo, A.: Improving physics-informed neural netw orks with meta-learned opti- mization. Journal of Mac hine Learning Researc h 25 (14), 1–26 (2024) [14] Li, G., Ran, S., Y ang, W., Hong, L.: Improving generalization ability of deep- learning-based ode solv ers using con tinuous dep endence. np j Artificial Intelligence 1 (1), 22 (2025) [15] Hern´ andez, Q., Bad ´ ıas, A., Gonz´ alez, D., Chinesta, F., Cueto, E.: Structure- preserving neural net works. Journal of Computational Ph ysics 426 , 109950 (2021) [16] Jagtap, A.D., Kharazmi, E., Karniadakis, G.E.: Conserv ative physics-informed neural netw orks on discrete domains for conserv ation laws: Applications to forw ard and inv erse problems. Computer Metho ds in Applied Mechanics and Engineering 365 , 113028 (2020) [17] Cardoso-Bihlo, E., Bihlo, A.: Exactly conserv ative physics-informed neural net- w orks and deep op erator netw orks for dynamical systems. Neural Netw orks 181 , 106826 (2025) [18] Greydanus, S., Dzam ba, M., Y osinski, J.: Hamiltonian neural net works. Adv ances in neural information pro cessing systems 32 (2019) [19] Cranmer, M., Greydan us, S., Ho yer, S., Battaglia, P ., Spergel, D., Ho, S.: Lagrangian neural netw orks. arXiv preprint arXiv:2003.04630 (2020) [20] Chu, H., Miyatak e, Y., Cui, W., W ei, S., F urihata, D.: Structure-preserving ph ysics-informed neural netw orks with energy or lyapuno v structure. arXiv preprin t arXiv:2401.04986 (2024) [21] Y u, H., Tian, X., E, W., Li, Q.: Onsagernet: Learning stable and interpretable dynamics using a generalized onsager principle. Physical Review Fluids 6 (11), 114402 (2021) [22] Zhang, Z., Shin, Y., Em Karniadakis, G.: Gfinns: Generic formalism informed neural netw orks for deterministic and sto chastic dynamical systems. Philosophical T ransactions of the Ro yal So ciet y A 380 (2229), 20210207 (2022) [23] Peng, L., Hong, L.: Recen t adv ances in conserv ation–dissipation formalism for irrev ersible pro cesses. Entrop y 23 (11), 1447 (2021) [24] T artak ovsky , A.M., Marrero, C.O., P erdik aris, P ., T artako vsky , G.D., Bara jas- Solano, D.: Learning parameters and constitutive relationships with ph ysics informed deep neural net works. arXiv preprint arXiv:1808.03398 (2018) [25] T artak ovsky , A.M., Marrero, C.O., P erdik aris, P ., T artako vsky , G.D., Bara jas- Solano, D.: Ph ysics-informed deep neural netw orks for learning parameters and 28 constitutiv e relationships in subsurface flow problems. W ater Resources Research 56 (5), 2019–026731 (2020) [26] Y u, J., Lu, L., Meng, X., Karniadakis, G.E.: Gradient-enhanced ph ysics-informed neural netw orks for forward and inv erse p de problems. Computer Metho ds in Applied Mechanics and Engineering 393 , 114823 (2022) [27] Guo, L., W u, H., Y u, X., Zhou, T.: Monte carlo fpinns: Deep learning metho d for forward and inv erse problems inv olving high dimensional fractional partial differen tial equations. Computer Metho ds in Applied Mechanics and Engineering 400 , 115523 (2022) [28] Onsager, L.: Rec iprocal relations in irreversible pro cesses. i. Physical review 37 (4), 405 (1931) [29] Onsager, L.: Recipro cal relations in irreversible pro cesses. ii. Ph ysical review 38 (12), 2265 (1931) [30] Jou, D., Casas-V´ azquez, J., Leb on, G.: Extended irrev ersible thermo dynamics. Rep orts on Progress in Physics 51 (8), 1105 (1988) [31] Prigogine, I., V an Rysselb erghe, P .: In tro duction to thermo dynamics of irre- v ersible pro cesses. Journal of The Electro c hemical So ciet y 110 (4), 97 (1963) [32] Goldstein, H., Poole, C.P ., Safko, J.L.: Classical Mechanics, 3rd edn. Pearson Education India, Noida, India (2011) [33] T ab or, M.: Chaos and Integrabilit y in Nonlinear Dynamics: an In tro duction, (1989) [34] Fisher, R.A.: The wa ve of adv ance of adv antageous genes. Annals of eugenics 7 (4), 355–369 (1937) [35] Kolmogorov, A., P etro vskii, I., Piskunov, N.: A study of the diffusion equation with increase in the amount of substance, and its application to a biological problem. ¨ ub ersetzung aus: Bulletin of the mosco w state univ ersit y series a 1: 1-26, 1937. Selected W orks of AN Kolmogorov 1 [36] Sch¨ afer, A., Peirlinc k, M., Link a, K., Kuhl, E., (ADNI), A.D.N.I.: Ba yesian ph ysics-based mo deling of tau propagation in alzheimer’s disease. F rontiers in ph ysiology 12 , 702975 (2021) [37] Zhang, Z., Zou, Z., Kuhl, E., Karniadakis, G.E.: Discov ering a reaction–diffusion mo del for alzheimer’s disease by combining pinns with symbolic regression. Computer Metho ds in Applied Mechanics and Engineering 419 , 116647 (2024) [38] Blum, A., Rivest, R.: T raining a 3-no de neural netw ork is np-complete. Adv ances in neural information pro cessing systems 1 (1988) 29 [39] Zhang, C., Bengio, S., Hardt, M., Rech t, B., Vin yals, O.: Understanding deep learning (still) requires rethinking generalization. Comm unications of the ACM 64 (3), 107–115 (2021) [40] Ho c hreiter, S., Schmidh uber, J.: Flat minima. Neural computation 9 (1), 1–42 (1997) [41] Kesk ar, N.S., Mudigere, D., Nocedal, J., Smely anskiy , M., T ang, P .T.P .: On large- batc h training for deep learning: Generalization gap and sharp minima. arXiv preprin t arXiv:1609.04836 (2016) [42] Chaudhari, P ., Choromansk a, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Cha yes, J., Sagun, L., Zecc hina, R.: En tropy-sgd: Biasing gradien t descen t into wide v alleys. Journal of Statistical Mechanics: Theory and Experiment 2019 (12), 124018 (2019) [43] Dinh, L., P ascanu, R., Bengio, S., Bengio, Y.: Sharp minima can generalize for deep nets. In: International Conference on Machine Learning, pp. 1019–1028 (2017). PMLR [44] Kaw aguc hi, K., Kaelbling, L.P ., Bengio, Y.: Generalization in deep learning. arXiv preprin t arXiv:1710.05468 1 (8) (2017) [45] Hoffer, E., Hubara, I., Soudry , D.: T rain longer, generalize b etter: closing the generalization gap in large batch training of neural netw orks. Adv ances in neural information pro cessing system s 30 (2017) [46] De, S., Y adav, A., Jacobs, D., Goldstein, T.: Automated inference with adaptiv e batc hes. In: Artificial Intelligence and Statistics, pp. 1504–1513 (2017). PMLR [47] Li, H., Xu, Z., T a ylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. Adv ances in neural information pro cessing systems 31 (2018) 30
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment