Kirchhoff-Inspired Neural Networks for Evolving High-Order Perception

Deep learning architectures are fundamentally inspired by neuroscience, particularly the structure of the brain's sensory pathways, and have achieved remarkable success in learning informative data representations. Although these architectures mimic …

Authors: Tongfei Chen, Jingying Yang, Linlin Yang

Kirchhoff-Inspired Neural Networks for Evolving High-Order Perception
Kirc hhoff-Inspired Neural Net w orks for Ev olving High-Order P erception T ongfei Chen 1 , Jingying Y ang 1 , Linlin Y ang 2 , Jinh u L ¨ u 6 , Da vid Do ermann 4 , Ch unyu Xie 5 , Long He 1 , Tian W ang 1 , Juan Zhang 1 , Guo dong Guo 3 , Bao c hang Zhang 1 1* Sc ho ol of Artificial Intelligence (Institute of Artificial In telligence), Beihang Universit y , Beijing, China. 2 Comm unication Univ ersit y of China, Beijing, China. 3 Eastern Institute of T ec hnology , Ningb o, Ningbo, China. 4 Departmen t of Computer Science and Engineering, Universit y at Buffalo, New Y ork, Buffalo, USA. 5 360 AI Researc h, Qihoo 360, Beijing, China. 6 Sc ho ol of Automation Science and Electrical Engineering, Beihang Univ ersity , Beijing, China. Abstract Deep learning architectures are fundamentally inspired by neuroscience, particu- larly the structure of the brain’s sensory path w ays, and ha ve achiev ed remark able success in learning informativ e data representations. Although these architec- tures mimic the comm unication mechanisms of biological neurons, their strategies for information enco ding and transmission are fundamentally distinct. Biological systems dep end on dynamic fluctuations in mem brane p otential; b y contrast, con- v entional deep netw orks optimize weigh ts and biases by adjusting the strengths of in ter-neural connections, lacking a systematic mechanism to join tly characterize the interpla y among signal in tensity , coupling structure, and state evolution. T o tac kle this limitation, we prop ose the Kirc hhoff-Inspired Neural Netw ork (KINN), a state-v ariable-based net work arc hitecture constructed based on Kirchhoff ’s curren t law. KINN derives n umerically stable state up dates from fundamen tal ordinary differential equations, enabling the explicit decoupling and enco ding of higher-order evolutionary components within a single lay er while preserv- ing ph ysical consistency , interpretabilit y , and end-to-end trainability . Extensive exp erimen ts on partial differential equation (PDE) solving and ImageNet image classification v alidate that KINN outperforms state-of-the-art existing metho ds. 1 Keyw ords: Kirchhoff-Inspired Neural Netw ork, Deep Learning, Higher-order neural netw ork, Partial differen tial equations Deep learning architectures ha ve been inspired b y neuroscience, particularly by hier- arc hical sensory pro cessing [ 1 , 2 ]. In biological systems, sensory path wa ys rely on con tinuous mem brane-p oten tial dynamics and long-range transmission via action p oten tials without decremen t along axons [ 3 , 4 ]. Imp ortantly , temporal dev elopment is em b edded in the evolving states of neural p opulations rather than b eing treated as an external index. At the p opulation level, we therefore view neural information through three complemen tary asp ects: in tensity (firing strength), connection (interaction structure among units), and ev olution (intrinsic temp oral dev elopmen t) [ 5 – 7 ]. Existing deep netw orks, parameterized b y learned weigh ts and architectural con- nectivit y , operate as largely instantaneous mappings ov er discrete inputs, and can effectiv ely enco de signal intensit y and in ter-unit coupling through feedforward trans- formations [ 8 , 9 ]. Y et, as recognized in neuroscience, information in sensory pathw a ys is encoded and transmitted through con tin uous membrane-potential dynamics across connected neurons, rather than as largely instan taneous mappings ov er discrete inputs [ 10 , 11 ]. Consequently , in many con temp orary feedforward and atten tion-based arc hitectures, the ev olutionary asp ect of a signal—i.e., how its latent representa- tion changes across p ositions or steps—is commonly introduced through arc hitectural devices suc h as positional enco dings, atten tion masks, or gating mechanisms, rather than formulated as an intrinsic dynamical state v ariable [ 9 , 12 ]. As a result, a sin- gle update is typically not explicitly designed to jointly capture intensit y , interaction structure, and higher-order evolution, whic h can b e particularly imp ortan t for data go verned by con tin uous physical dynamics suc h as PDEs. In this pap er, we introduce the Kirchhoff-Inspired Neural Netw ork (KINN), which lev erages Kirc hhoff circuit dynamics to define a neural arc hitecture for modeling higher-order state evolution. Our KINN derives closed-form, n umerically stable dis- crete updates from the underlying ordinary differen tial equations [ 13 , 14 ], enabling the mo del to explicitly enco de higher-order ev olutionary structure induced by cascaded state ev olution within a unified representation la y er. The resulting framework remains in terpretable and fully end-to-end trainable. Inspired by the view that neural computation is carried not only b y instantaneous inputs but also by the con tinuous ev olution of membrane potentials, w e mo del rep- resen tation learning as the evolution of an intrinsic latent p oten tial rather than as a sequence organized b y external positional heuristics. An RC no de pro vides a min- imal physical template for this purp ose: capacitance accum ulates past input, while conductance regulates state relaxation and the update timescale [ 15 ]. Based on this mec hanism, w e formulate a Kirchhoff architecture in which the hidden p oten tial serv es as an internal carrier of ev olution and external inputs enter as driving curren ts. Under zero-order-hold discretization, the resulting dynamics yield a closed-form recurren t up date [ 13 , 14 ]: one term retains the previous state, while the other injects the cur- ren t input into the evolving state. In this w ay , evolution is represen ted as an intrinsic state-ev olution v ariable. 2 This formulation naturally extends to higher-order state evolution through cas- caded RC nodes. A single RC node realizes a first-order state up date, whereas cascading multiple cells yields a higher-order system with additional latent states. F rom a systems p ersp ectiv e, each added stage enriches the family of admissible tem- p oral resp onses, allowing the mo del to represen t more expressive patterns of state ev olution. As a result, sensitivit y to complex temp oral v ariation is not in tro duced b y externally app ended p ositional mechanisms, but emerges directly from the model’s in ternal dynamics. This provides a compact and interpretable route to jointly rep- resen t input injection, in teraction structure, and higher-order temporal dev elopmen t within a unified framework. T o v alidate the effectiveness of modeling temp oral evolution as an in trinsic state- ev olution v ariable, we instan tiate KINN with the Kirc hhoff Neural Cell (KNC), which represen ts an RC no de, and the Cascaded Kirchhoff Blo c k (CKB), which realizes la yer-wise cascaded RC no des for higher-order state evolution. The resulting architec- ture mo dels the contin uous evolution of latent representations under Kirchhoff circuit dynamics [ 15 ], enabling multi-order feature integration that captures zeroth-, first-, and higher-order evolutionary comp onen ts within a single represen tational la y er. W e systematically ev aluate KINN on neural operator learning, spatiotemp oral dynamics prediction, and visual recognition tasks, including Darcy Flo w, Shallo w W ater, Na vier– Stok es, and ImageNet-1K [ 16 ]. Across these b enchmarks, KINN delivers consistent impro vemen ts, achieving errors of 1 . 775 × 10 − 2 , 2 . 587 × 10 − 3 , and 9 . 875 × 10 − 3 on Darcy Flo w, Shallow W ater, and Navier–Stok es, resp ectively , as well as T op-1 accura- cies of 83.3% (Tin y) and 83.9% (Small) on ImageNet-1K. T aken together, these results indicate that modeling ev olution as an intrinsic state v ariable and elev ating represen- tational order through cascade composition pro vide a unified, stable, and interpretable route to higher-order evolution mo deling. Results Cascaded Kirc hhoff Neural Cells realize higher-order state ev olution This subsection examines the central premise of KINN that temp oral evolution is rep- resen ted as an intrinsic latent-state v ariable and that higher-order temporal structure emerges through cascade comp osition. W e show that a single Kirc hhoff Neural Cell realizes a first-order state-evolution pro cess gov erned by Kirchhoff-inspired RC dynam- ics and admits a stable discrete implementation under zero-order hold. W e further sho w that cascading suc h cells systematically increases the effectiv e order of the under- lying dynamical system, thereby pro viding an explicit and interpretable mechanism for modeling higher-order temporal structure. W e first examine ho w temp oral evolution can be represented as an intrinsic state v ariable, rather than supplied through externally injected p ositional informa- tion [ 17 , 18 ]. T o this end, we instan tiated the proposed Kirchhoff Neural Cell (KNC) and examined its b eha vior from three complementary persp ectiv es: contin uous-time circuit dynamics, discrete-time neural realization, and cascade-induced system com- p osition. A t the single-cell level, KNC models the hidden representation as a laten t 3 Kirchhoff Neura l Cell realizes first- order state-evolution g . f . e. d. c. b. a. h. ﹘ 82.8% Error (Darcy Flow) [ RMSE 10 -2 ] Error (Poission) [ RMSE %] Error (Navier-Stokes) [ RMSE 10 -2 ] Error (Shallow Water) [ RMSE 10 -3 ] Accuracy (ImageNet-1K) [ Top-1 Acc %] ﹘ 21.7% ﹘ 27.0% ﹘ 72.7% + 1.2% KNC Baseline KNC Baseline KNC Baseline KNC Baseline KNC Baseline f. g. Fig. 1 KINN models temp oral ev olution as an intrinsic state v ariable and elev ates represen tational order through cascade comp osition. a, Biological motiv ation at the neural- population level. A received neural signal can be decomp osed in to three complementary aspects: c onne ction , indicating which upstream units con tribute to the signal; intensity , indicating its instan- taneous or short-time magnitude; and evolution , indicating its con tinuous temporal dev elopment. b, In many con temp orary architectures, order information is commonly in troduced through p osi- tional enc o ding , in which p ositional cues are externally injected into discrete feature cells rather than emerging from an internal evolving state. c, Recurrent mo dels introduce a laten t state, but a single hidden-state transition typically realizes only shal low first-or der state evolution . d, KINN addresses this limitation through casc ade d Kir chhoff circuit pr oc essing , where multi-stage internal state transi- tions progressiv ely enrich temporal dynamics and enable higher-order ev olution. e, A single Kir chhoff Neur al Cel l (KNC) realizes a first-order state-evolution pro cess under Kirchhoff-inspired RC dynam- ics, in which laten t voltage, input injection, leak age dissipation, and coupling mo dulation join tly determine the state update and readout. f, The full arc hitecture instantiates KINN from k ey modules, combining KNC and the Casc ade d Kir chhoff Blo ck (CKB) to integrate zeroth-, first-, and higher- order ev olutionary components within a unified representational hierarc hy . g, W e ev aluate KINN across three task families spanning neur al op er ator le arning (Darcy Flow), sp atiotemp or al dynam- ics pre diction (Shallo w W ater and Navier–Stok es), and visual r ec o gnition (ImageNet-1K). h, KINN yields consistent gains ov er strong baselines across these domains, supp orting the view that intrinsic state evolution and cascade-induced order elev ation provide a unified, stable, and interpretable route to higher-order evolution mo delling. v oltage state go verned by Kirchhoff-inspired R C dynamics [ 15 , 19 ]: C dv ( t ) dt = − ( G leak + G p ) v ( t ) + B p u ( t ) , (1) where C denotes the effective capacitance, G leak and G p denote the leak age and coupling conductances, and B p u ( t ) is the input-driven injection term. Defining α ≜ G leak + G p C , β ≜ B p C , 4 the dynamics is equiv alen tly written as a first-order state-evolution pro cess [ 20 , 21 ], ˙ v ( t ) = − α v ( t ) + β u ( t ) , sho wing that the latent ev olution is join tly controlled b y state relaxation and input injection. T o deplo y this mechanism in neural computation, we discretize the con tinuous dynamics under zero-order hold, whic h yields a closed-form recurrent up date [ 13 ]: v t +1 = e − α ∆ t t v t + Z ∆ t t 0 e − α (∆ t t − τ ) β u t dτ = e − α ∆ t t v t + β  1 − e − α ∆ t t α  u t . (2) This update provides a stable, in terpretable state-evolution rule in discrete form: the exp onen tial factor con trols reten tion of past states, whereas the second term injects the current stimulus into the evolving laten t representation. Each KNC then emits an output via an explicit readout of the updated state and the current input, and this output, rather than the in ternal state itself, is propagated to the next stage. W e next asked whether comp osing such state-up date/readout units increases the effectiv e order of the end-to-end input-output dynamics. Eliminating the internal states from the cascade yields an end-to-end op erator form for the final-stage output: n Y ℓ =1  C ℓ d dt + a ℓ  y n ( t ) = n Y ℓ =1  d o,ℓ C ℓ d dt + d o,ℓ a ℓ + c o,ℓ b ℓ  u ( t ) , (3) where u ( t ) is the external input, y n ( t ) is the output of the final stage, and c o,ℓ , d o,ℓ are fixed readout co efficien ts of the ℓ -th KNC. Equation ( 3 ) makes the higher-order nature of the cascade explicit: the left-hand side is the pro duct of n first-order differential op erators acting on the final output, showing that recursive composition progressively elev ates the effective order of end-to-end evolution; in the generic non-degenerate case, the resulting mapping is n -th order [ 20 , 21 ]. Thus, within the cascade itself, higher-order evolutionary sensitivity is not manually app ended as an external p osi- tional heuristic, but emerges from the internal composition of state-evolution units. Consisten t with this view, increasing cascade depth led to progressively stronger evo- lution mo deling capacit y and impro ved downstream p erformance (see Sec. ?? . The full deriv ation is pro vided in Metho ds. The cascaded Kirc hhoff blo c k implemen ts m ulti-order ev olution in neural net w orks W e next ask ed ho w the Kirc hhoff-inspired state-evolution principle can b e realized in a practical neural mo dule. Figure 2 pro vides a direct visual corresp ondence from the theoretical formulation to the implemented architecture: the single-cell schematic illustrates a Kirchhoff Neural Cell (KNC) as the basic state-evolution unit, the 5 Linear Linear Norm KNC SiLU SiLU DSConv N-order CKO KNC KNC Cascaded Kirchhoff Block Element-wise Multipl ication Element-wise Addi tion Lift Descen d Act Spectral Act Conv Act Spectral Conv Act Spectral Conv Spectral CKB FNO with KINN Descen d Down Conv Encoder Down Conv Encoder Conv Decoder Up Conv Decoder Up CKB Conv Decoder Up CKB Conv Decoder Up Lift Down CKB Conv Encoder Down CKB Conv Encoder CKB UNet with KINN ) ( t u C B / p ) ( t v R C G G 1 ) ( p leak   ) ( t y Kirchhoff Neural Cell Kirchhoff Neural Cell e. c. a. b. d. Fig. 2 Arc hitectural instantiation of KINN and its integration into neural operators and enco der–decoder backbones. a, Structure of a single Kir chhoff Neur al Cel l (KNC). The hidden representation is modeled as a latent voltage state v ( t ) gov erned by Kirc hhoff-inspired RC dynamics, where input injection, leak age dissipation, and state reten tion jointly determine the cell update and readout. b, Cascade d Kir chhoff Blo ck (CKB). By stacking multiple KNCs in sequence, the output of each stage is propagated to the next stage, progressively increasing the effective order of the resulting dynamics and enabling higher-order state ev olution. c, N -order Casc ade d Kirch- hoff Op er ator (CKO). KNCs are combined with light weigh t pro jection, normalization and nonlinear transformation mo dules to form a trainable high-order evolution op erator for deep architectures. d, FNO with KINN . The prop osed Kirchhoff mo dules are incorporated into F ourier neural operators by inserting CKB-enhanced evolution pathw ays alongside spectral and con volutional transformations, enabling multi-order feature in teraction in op erator learning. e, U-Net with KINN . CKB is further embedded in to the enco der–decoder hierarch y , where Kirchhoff-inspired ev olution modules are placed along the do wnsampling, bottleneck and upsampling paths to enrich latent dynamics across scales. T ogether, these instantiations show that KINN is not restricted to a single backbone, but provides a unified and modular mechanism for introducing intrinsic and higher-order state evolution in to diverse architectures. cascaded-cell sc hematic sho ws how serial comp osition giv es rise to higher-order ev o- lution, and the CKB diagram shows how this principle is instantiated as a trainable neural block. A t the single-cell lev el, the con tinuous-time KNC is implemented as a selective discrete state-up date/readout unit. In this implementation, the relaxation, input- injection, and readout roles derived in Eqs. ( 9 )–( 11 ) are preserved, while part of the effective co efficients are made adaptive to the current feature. In this w ay , the implemen ted KNC remains faithful to the theoretical relaxation–injection–readout in terpretation while gaining the flexibility required for neural representation learning. A t the blo c k lev el, the figure further sho ws ho w higher-order evolution is realized b y cascading multiple KNCs. Each KNC contributes one state-evolution step, and their serial comp osition forms the neural counterpart of the higher-order cascade derived in Eqs. ( 15 )–( 16 ). T o preserv e information from differen t ev olutionary depths, the 6 outputs of all stages are aggregated into a unified multi-order representation, ¯ y = N X k =1 y ( k ) , (4) where y ( k ) denotes the output of the k -th KNC. This aggregation retains shallow, in termediate, and deep evolutionary responses simultaneously , rather than exp osing only the deep est cascaded state. In parallel, the blo ck also preserv es the curren t input through a direct input- conditioned modulation path, which serves as an explicit zero-order feature path w ay . The final block output is obtained by combining this zero-order path with the aggregated higher-order resp onse through gated residual fusion: Y = X + ¯ y ⊙ g , (5) where g is generated directly from the current input. Therefore, the resulting CKB join tly realizes tw o complemen tary asp ects of represen tation: zero-order feature preser- v ation from the current input, and higher-order feature aggregation from cascaded state ev olution. Consistent with this design, replacing conv entional single-step tran- sition blo c ks with CKB led to stronger temp oral mo deling capacity and impro ved do wnstream p erformance. F ull implemen tation details are provided in the Metho ds section. Empirical v alidation across physical field solving b enc hmarks Dar cy Flow Pr e diction with KINN Metho ds. T o ev aluate the mo del’s abilit y to learn complex mappings from high- dimensional parameter fields to solution fields, w e first fo cus on the Darcy flo w, a classic steady-state elliptic PDE. Unlik e temp oral forecasting, this task requires the mo del to infer the global pressure distribution [ 22 ] u ( x ) from a spatially heterogeneous p ermeabilit y field a ( x ), gov erned b y −∇ · ( a ( x ) ∇ u ( x )) = f ( x ) . Mathematically , the solution at an y spatial point is coupled with the global distribu- tion of the p ermeabilit y field, necessitating a mo del with high-order spatial reasoning and extensiv e receptiv e fields. Our prop osed architecture, termed CKB , is built up on the Unet, whic h has demon- strated superior performance in learning solution operators of PDEs by maintaining con tinuous-discrete equiv alence (CDE) [ 23 ]. The CKB mo del retains the hierarchical Op erator U-Net structure but introduces t w o k ey structural enhancements to optimize computational efficiency and physical mo deling accuracy . • Depth wise Separable Con v olutions (dscon v): T o optimize parameter utiliza- tion and computational efficiency without sacrificing mo del capacity , w e replace all standard 3 × 3 con volutions in the baseline CNO (including those in the enco der, 7 deco der, and residual blo c ks) with depthwise separable con volutions [ 24 ]. This mod- ification significan tly reduces the FLOPs and parameter coun t while maintaining the local receptive fields necessary for capturing multi-scale physical features. • Second-Order Kirchhoff Neural Components (KNC): W e seamlessly inte- grate the proposed KNC mo dules in to the deep est t wo lay ers of b oth the encoder and the deco der, as w ell as the b ottlenec k lay er. These lay ers corresp ond to the lo west spatial resolutions and the largest receptiv e fields, where the netw ork cap- tures macro-scale global dynamics. By em b edding KNC at these strategic lo cations, the mo del explicitly incorp orates second-order physical priors and conserv ation constrain ts directly into the laten t feature space, enhancing the approximation of complex PDE evolution la ws. F or the Darcy flo w dataset, we adopt a U-Net–based neural op erator architecture, sho wn in Fig. 2 (e), and integrate the prop osed CKB mo dules to enhance spatial mo d- eling. The netw ork follows an encoder–deco der structure. The input field is first lifted to a latent representation and then pro cessed through a multi-stage enco der that pro- gressiv ely reduces spatial resolution while increasing c hannel capacity . CKB mo dules are introduced in the deep est enco der and decoder stages, enabling structured spatial scanning and directional state evolution on high-level features. After m ulti-scale fea- ture fusion through U-Net skip connections, the reconstructed latent represen tation is pro jected bac k to the ph ysical field to pro duce the predicted solution. Our prop osed CKB arc hitecture natu rally aligns with this requiremen t through t w o distinct mechanisms. First, the cascaded high-order dynamics deriv ed in our theory (Eq. 16-18) allow the netw ork to appro ximate the inv erse of the Laplacian-lik e op er- ator more effectively than standard first-order mo dels. Second, the quad-directional selectiv e scan ensures omnidirectional information propagation across the 2D grid, mimic king the ph ysical dissipation of pressure in a porous medium. Results. Quantitativ e results demonstrate that our Kirchhoff-inspired approac h significan tly outperforms established baselines. As shown in T able 1 , our mo del ac hieves a normalized relative L 2 error (nRMSE) of 1 . 775 × 10 − 2 , which represen ts a 4.5 × reduction in error compared to U-Net (7 . 993 × 10 − 2 ) and a 6.4 × reduction compared to the F ourier Neural Op erator (FNO, 1 . 142 × 10 − 1 ). F urthermore, the maxim um error is reduced to 6 . 593 × 10 − 2 , nearly a 3 × improv emen t ov er baseline metho ds, indicating superior lo cal fidelit y . Visual analysis confirms that while base- lines often exhibit structural blurring or artifacts in high-gradient regions, our metho d main tains sharp b oundaries and recov ers the principal dynamical mo des with high precision, v alidating the efficacy of ph ysics-consistent state ev olution in steady-state op erator learning. Shal low-Water F or e c asting with KINN Metho ds. T o ev aluate the mo del’s capability in capturing non-stationary spatiotem- p oral ev olution, we apply our metho d to the Shallow W ater equations (SWE) dataset. Unlik e steady-state problems, SWE emphasizes long-horizon temporal ev olution cou- pled with spatial transp ort, typically exhibiting w av e propagation and nonlinear adv ection [ 25 ]. T raditional discrete dee p learning mo dels often struggle with such 8 T able 1 Quan titative ev aluation on steady-state Darcy Flo w. Predictive p erformance comparison on the 2D Darcy Flow dataset. nRMSE denotes the normalized relative L 2 error. The b est results are highlighted in b old, demonstrating significant improv ements across all metrics. Method RMSE nRMSE Max Error U-Net 1 . 545 × 10 − 2 7 . 993 × 10 − 2 1 . 930 × 10 − 1 FNO 2 . 129 × 10 − 2 1 . 142 × 10 − 1 1 . 986 × 10 − 1 Ours (CKB) 3 . 657 × 10 − 3 1 . 775 × 10 − 2 6 . 593 × 10 − 2 tasks, as truncation errors from discrete temporal up dates rapidly accum ulate, leading to structural divergence and numerical instability ov er long rollouts [ 26 ]. Our architecture addresses this b ottlenec k through the inherent ph ysical con- sistency of the Kirchhoff Neural Cell (KNC). By adopting an exact exp onential in tegration scheme [ 13 ] with zero-order hold (ZOH), the discrete state up dates in our netw ork are rigorously A-stable. This mathematical prop ert y ensures that the con tinuous-time ph ysical dissipation and propagation are faithfully preserved in the discrete computational graph, effectively suppressing the spurious amplification of high-frequency errors during temp oral iteration. W e ev aluate the prop osed CKB mo dule on the shallow-w ater equations (SWE) using a mo dified F ourier Neural Op erator arc hitecture (FKNO), illustrated in Fig. 2 (d). The mo del follo ws a four-la y er FNO bac kb one. The input consists of the pre- vious ten temporal states concatenated with spatial coordinates ( x, y ) and is first lifted to a latent represen tation. Each F ourier blo c k combines a spectral conv olution captur- ing global interactions with a local op erator mo deling spatial correlations. In the first blo c k, the lo cal op erator is replaced by the proposed CKB mo dule, which performs m ulti-directional spatial scanning and state-ev olution transformations b efore feature aggregation. The resulting representation is finally pro jected back to the ph ysical space to predict the next SWE state. Reslults. Quantitativ e comparisons against data-driv en (U-Net, FNO [ 27 ]) and ph ysics-informed (PINN [ 28 ]) baselines v alidate this theoretical adv antage. As detailed in T able 2 , while conv entional conv olution-based mapping (U-Net) struggles to main tain coheren t w av e structures (yielding an nRMSE of 9 . 120 × 10 − 2 ), our Kirc hhoff- inspired mo del achiev es state-of-the-art predictive accuracy with an nRMSE of 2 . 587 × 10 − 3 , notably outp erforming b oth the sp ectral in terp olation of FNO (3 . 301 × 10 − 3 ) and the soft-constrained PINN (1 . 336 × 10 − 2 ). Moreov er, the maximum error is restricted to 4 . 958 × 10 − 2 . The b ounded error accumulation confirms that the physics-complian t exp onen tial updates act as a stabilizing inductiv e bias, enabling the netw ork to learn robust dynamical transitions rather than merely ov erfitting to short-term s patial correlations. CKB Or der A blation on Poisson Pr oblems Metho ds. T o explicitly v alidate our theoretical claim that cascading Kirchhoff Neural Cells induces higher-order dynamical constrain ts, w e p erform an ablation study on the cascade depth ( n -pass) using the P oisson equation. The Poisson equation, go verned 9 T able 2 Quan titative ev aluation on 2D Shallo w W ater equations. Predictive performance comparison for temp oral forecasting. nRMSE denotes the normalized relative L 2 error. Our method exhibits sup erior stability and accuracy compared to b oth data-driven and physics-informed baselines. Method RMSE nRMSE Max Error U-Net 9 . 500 × 10 − 2 9 . 120 × 10 − 2 6 . 503 × 10 − 1 PINN 1 . 389 × 10 − 2 1 . 336 × 10 − 2 1 . 682 × 10 − 1 FNO 3 . 436 × 10 − 3 3 . 301 × 10 − 3 5 . 464 × 10 − 2 Ours (CKB) 2 . 692 × 10 − 3 2 . 587 × 10 − 3 4 . 958 × 10 − 2 b y the Laplace op erator (∆ u = f ), intrinsically requires the mo deling of second-order spatial deriv atives. According to our deriv ations (Eq. 16-18), a single KNC b eha ves as a first-order operator, whereas cascading them analytically pro duces higher-order p olynomial transfer functions that are b etter suited for higher-order PDEs. F or the P oisson dataset, we inv estigate the influence of the CKB order within the UKNO arc hitecture. The bac kb one follo ws the U-Net–based neural op erator describ ed ab o v e, where CKB modules are in tegrated into the deep est encoder–deco der stages. The standard configuration employs an CKB mo dule comp osed of KNCs. T o ev aluate the impact of model capacity and hierarc hical dynamics, we v ary the n umber of KNCs within the CKB mo dule. Sp ecifically , we consider configurations with one, t wo, three, and four KNCs. F or comparison, we also include the original U-Net architecture with- out CKB mo dules as a baseline. All other arc hitectural settings, training pro cedures, and h yp erparameters remain unchanged across the experiments. Results. W e ev aluate v ariants of our model from 1-pass (1 m ) to 4-pass (4 m ) arc hitectures. As summarized in T able 3 , increasing the cascade stages pro vides a systematic enhancemen t in predictiv e accuracy . The 1-pass model, constrained b y its first-order nature, yields a baseline OUT relative L1 error of 0 . 307%. By stepping to a 2-pass cascade, whic h mathematically aligns with the second-order Laplacian operator, the nRMSE drops from 3 . 854 × 10 − 3 to 3 . 692 × 10 − 3 . When further extending the cascade to 4 passes, the relative L1 error achiev es a global minim um of 0 . 268%, and the OUT nRMSE significantly decreases to 3 . 424 × 10 − 3 . Although we observe a slight optimization fluctuation in the mean error at 3 passes, the maximum error (Max Error) steadily decreases at higher orders, indi- cating that deep er cascades are particularly effective at resolving sharp, high-gradient lo cal structures. These empirical results firmly corrob orate our mathematical form u- lation: the p erformance gain of the Cascaded Kirchhoff Block (CKB) is not merely a consequence of increased parameter count, but a direct result of aligning the neural net work’s in trinsic differen tial order with the underlying ph ysical gov erning equations. R ol lout Horizon Evaluation on Navier–Stokes Dynamics Metho ds. T o further ev aluate the temp oral stability of the prop osed mo del, we con- duct long-horizon forecasting experiments on the 2D incompressible Navier–Stok es equations in the v orticity formulation with viscosity ν = 10 − 3 . This b enc hmark is widely used to assess the long-term predictiv e robustness of neural operator models. 10 T able 3 Ablation on cascade stages ( n -pass) for the Poisson equation. W e compare the relative L1 error, nRMSE, and maximum error on the OUT field. Parameters (Params) and inference times indicate that the m ulti-pass mechanism maintains high efficiency while scaling the operator’s mathematical order. V ariant Params (M) Rel. L1 Error (%) OUT nRMSE OUT Max Error 1-pass (1 m ) 2.60 0.307 3 . 854 × 10 − 3 1 . 074 × 10 − 2 2-pass (2 m ) 2.83 0.292 3 . 692 × 10 − 3 1 . 033 × 10 − 2 3-pass (3 m ) 3.07 0.390 4 . 791 × 10 − 3 8 . 764 × 10 − 3 4-pass (4 m ) 3.31 0.268 3 . 424 × 10 − 3 9 . 403 × 10 − 3 The Navier–Stok es dynamics exhibit strongly nonlinear conv ection–diffusion in ter- actions and multi-scale vortex structures, making long-term forecasting particularly c hallenging. Small prediction errors may accum ulate and amplify o ver recursiv e time steps, ev entually leading to structural div ergence. T o systematically analyze this sta- bilization effect, w e ev aluate the mo dels ov er a rigorous 40-step rollout under v arying training horizons: 3-step, 5-step, and 10-step temp oral unrolling. Results. As detailed in T able 4 , our metho d consistently outp erforms the standard F ourier Neural Op erator (FNO) across all training strategies under matc hed parameter budgets ( 470K parameters). F or instance, under the 10 → 1 training strategy , our mo del achiev es a rollout relative L 2 error of 1 . 054 × 10 − 2 , establishing a clear adv an tage o ver FNO (1 . 349 × 10 − 2 ). The sup eriorit y of the CKB arc hitecture b ecomes ev en more pronounced in the temp oral error tra jectories. As observed in the rollout curves, while all mo dels exp e- rience an initial error descen t follow ed by gradual accum ulation , baseline models exhibit an accelerated error gro wth phase at later stages ( t > 20). In con trast, our mo del main tains a significan tly flatter error tra jectory throughout the entire 40-step horizon. Pixel-lev el error analysis further rev eals that baselines suffer from sev ere error spik es during highly nonlinear even ts, suc h as vortex merging. Our approach effectively suppresses these fluctuations, preserving sharper structural b oundaries in high-shear regions without suffering from high-frequency sp ectral distortion. These findings cor- rob orate that the cascaded Kirchhoff dynamics provide a robust, ph ysics-aligned inductiv e bias for long-term spatiotemp oral forecasting. T able 4 Long-term rollout p erformance on the 2D Na vier-Stokes equations. W e rep ort the av erage relativ e L 2 error ov er a 40-step rollout. Mo dels are ev aluated across v arying training sequence lengths (from 3-step to 10-step). Our CKB-based mo del consistently exhibits low er error accumulation and sup erior stability across all settings under comparable parameter budgets. Method Rollout Relative L 2 Error across T raining Horizons 3-step (3 → 1) 5-step (5 → 1) 10-step (10 → 1) FNO 2 . 226 × 10 − 2 1 . 814 × 10 − 2 1 . 349 × 10 − 2 Ours (CKB) 1 . 533 × 10 − 2 1 . 449 × 10 − 2 9 . 875 × 10 − 3 11 Dir e ctional Sc an A blation on Navier–Stokes Dynamics Metho ds. F or the Na vier–Stokes dataset, we examine the impact of directional spatial scanning within the CKB module of the FKNO architecture. The bac kb one follo ws the same FNO-based neural op erator describ ed in the SWE exp erimen ts, where the first lo cal operator is replaced b y the proposed CKB mo dule. The CKB mo dule performs structured spatial tra v ersal follo wed by state-evolution transformations and directional feature aggregation. T o analyze the role of directional scanning, we v ary the num b er of tra versal directions used in the spatial scanning stage. Sp ecifically , three configurations are ev aluated: a single-direction scan using a raster horizontal trav ersal, a t wo-direction sc heme combining horizon tal and vertical scans, and a four-direction configuration emplo ying both forward and rev erse horizontal and vertical trav ersals. In addition, the original FNO mo del without the CKB module is included as a baseline. All other arc hitectural components, training settings, and h yp erparameters remain unc hanged across the exp eriments. Results. As summarized in T able 6 , the empirical results on the 64 × 64 resolution NS b enc hmark reveal a clear correlation betw een directional div ersity and predictiv e accuracy . The baseline FNO yields a relative L 2 error of 1 . 349 × 10 − 2 . By introducing ev en a single-directional Kirchhoff scan (FKNO-1), the error is reduced to 1 . 043 × 10 − 2 , demonstrating the sup erior inductive bias of the Kirchhoff-inspired exp onen tial in tegration ov er standard sp ectral conv olutions. More importantly , increasing the scanning complexity further refines the results. The FKNO-4 mo del ac hieves the optimal p erformance with a test L 2 error of 9 . 875 × 10 − 3 , outp erforming the FKNO-2 (1 . 026 × 10 − 2 ) and FKNO-1 v ariants. This progression confirms that the highly non-linear advection and vortex stretc hing inher- en t in Na vier-Stokes flows require the model to capture spatial dep endencies from m ultiple orien tations. While the computational time increases slightly with more direc- tions (from 54.1s for FNO to 83.3s for FKNO-4), the significan t gain in precision justifies the multi-directional design. This ablation empirically v alidates that the four- w ay scanning mec hanism pro vides a more isotropic and robust represen tation of the underlying fluid velocity fields. T able 5 Ablation of scanning directions on Navier-Stok es equations. W e ev aluate the impact of the num b er of Kirchhoff scanning directions (FKNO- n ) on the 64 × 64 resolution dataset (10 → 1 time-step prediction). L 2 denotes the relative full-field error. Model V ariant Directions Parameters T raining Time (s) T est L 2 Error V anilla FNO (Baseline) - 465,717 54.12 1 . 349 × 10 − 2 FKNO-1 1 476,698 71.48 1 . 043 × 10 − 2 FKNO-2 2 481,018 79.37 1 . 026 × 10 − 2 FKNO-4 (Ours) 4 489,658 83.26 9 . 875 × 10 − 3 Generalization to large-scale natural image recognition T o demonstrate that the physical inductiv e biases introduced by the Kirchhoff Neural Cell (KNC) and Cascaded Kirc hhoff Block (CKB) are not limited to solving explicit 12 ph ysical partial differen tial equations (PDEs), w e ev aluate our architecture on the ImageNet-1K image classification benchmark. In the context of computer vision, map- ping ra w pixels to high-level semantic lab els can b e view ed as the con tinuous ev olution of a 2D spatial feature field [ 29 ]. Consequently , the high-order receptive fields and sta- ble gradient propagation derived from our physical constraints theoretically provide a sup erior backbone for visual represen tation learning. W e integrate the CKB as a drop-in replacement in to a standard hierarchical vision bac kb one. T able 6 compares our mo dels (Ours-T and Ours-S) against leading Con- v olutional Neural Net works (ConvNeXt [ 30 ]), Vision T ransformers (Swin [ 31 ]), and recen t State-Space Mo dels (VMamba [ 32 ], Mam baVision [ 33 ]). At the Tiny scale ( 32M parameters, 4.9G FLOPs), our mo del ac hieves a T op-1 accuracy of 83 . 3%, outp er- forming strong SSM baselines lik e VMam ba-T (82 . 6%) and MambaVision-T (82 . 3%). Scaling up to the Small size ( 50M parameters, 8.2G FLOPs), Ours-S reaches 83 . 9% T op-1 accuracy , establishing a new state-of-the-art Pareto fron tier for accuracy and computational complexit y . T able 6 ImageNet-1K classification p erformance. Comparison of our Kirchhoff-inspired backbone with state-of-the-art CNNs, Vision T ransformers, and visual State Space Mo dels (SSMs). Our mo dels achiev e sup erior accuracy under strictly matched parameter and FLOP constraints. Model #Params (M) FLOPs (G) T op-1 Acc (%) ConvNeXt-T 28.6 4.5 82.0 Swin-T 28.3 4.4 81.3 EfficientVMam ba-S 11.0 1.3 78.7 VMamba-T 30.0 4.9 82.6 MambaVision-T 31.8 4.4 82.3 Ours-T 32.3 4.9 83.3 ConvNeXt-S 50.2 8.7 83.1 VMamba-S 50.0 8.7 83.6 MambaVision-S 50.1 7.5 83.2 Ours-S 50.3 8.2 83.9 T o trace the origins of these empirical gains bac k to our mathematical formu- lation, w e conduct rigorous ablations on the mo dule top ology and the injection of ph ysical constraints (T able 7 ). Starting from a baseline configuration (Tiny: 82 . 6%, Base: 83 . 3%), we ev aluate tw o expansion strategies: a Parallel routing and a Series (Cascaded) routing. Crucially , the Series cascade yields a substan tially higher p erformance jump (Tin y: +0 . 6%, Base: +0 . 9%) compared to the P arallel approac h (Tin y: +0 . 3%, Base: +0 . 3%). This precisely v alidates the core tenet of our deriv ation (Eq. 16-18): a sequential cascade of first-order Kirchhoff cells analytically constructs a higher-order differential op erator, which exp onentially increases the spatiotemp oral expressive capacity of the net work, whereas a parallel com bination merely acts as a wide first-order ensem ble. F urthermore, incorp orating the strict physical constrain ts (e.g., KCL-complian t decay 13 and exp onen tial integration) pro vides an additional accuracy b o ost, confirming the utilit y of ph ysical conserv ation laws in regularizing deep visual features. Finally , we observ e that the physical constrain ts act as a p o werful stabilizing prior during optimization. When ev aluating the short-schedule training efficiency (T op-1 accuracy at 30 ep ochs), our mo dels systematically outpace the VMamba baselines. Sp ecifically , Ours-S achiev es 72 . 5% at ep o c h 30, a full 1 . 0% ahead of VMamba-S (71 . 5%). This accelerated conv ergence indicates that the intrinsic A-stability and dis- sipativ e properties of the Kirc hhoff dynamics facilitate smoother and faster gradien t propagation across the loss landscap e. T able 7 Ablation studies on ImageNet-1K. (T op) Impact of routing topologies (Parallel vs. Series cascade) and physical constraints on final accuracy . (Bottom) T raining efficiency measured by T op-1 accuracy at 30 epo chs, demonstrating faster conv ergence. T op ological & Physical Constraint Ablation Scale Baseline Parallel Series (Cascade) + Physical Constraints Tiny 82.6 82.9 83.2 83.3 Base 83.3 83.6 84.2 84.3 T raining Efficiency (T op-1 Acc at 30 ep o c hs) Model -T (Tiny) -S (Small) -B (Base) VMamba 70.2 71.5 69.4 Ours 71.0 72.5 71.0 Discussion Recen t adv ances in neural operators, state-space mo dels, and mo dern vision arc hitec- tures hav e sho wn that strong p erformance can emerge from increasingly expressiv e mec hanisms for long-range interaction and hierarc hical feature transformation. Y et, in man y suc h formulations, evolution is introduced primarily as a computational device, rather than as an explicitly grounded state pro cess. The present study departs from this trend by treating represen tation dynamics as the evolution of an endogenous latent p oten tial gov erned by conserv ation and dissipation. In this sense, the prop osed frame- w ork is less an incremental extension of existing sequence or operator mo dels than an alternative mo delling viewp oint, one in which stimulus, coupling, and temp oral dev elopment are embedded within a unified physical state equation. F rom this p ersp ectiv e, the con tribution of our work lies not only in introducing a new architectural blo c k, but also in demonstrating that higher-order representation dynamics can b e constructed through structured, cascaded evolution. Starting from Kirc hhoff-consistent RC dynamics, the Kirc hhoff Neural Cell pro vides an in terpretable first-order primitiv e. A t the same time, the cascaded design extends this primitiv e in to a higher-order mec hanism without abandoning its dissipative character. This yields a mo del family that is b oth analytically motiv ated and practically effective. The empir- ical results suggest that such a form ulation is sufficiently general to supp ort div erse 14 learning settings, including steady-state op erator approximation, long-horizon spa- tiotemp oral prediction, and large-scale visual recognition. Rather than relying solely on empirical architectural heuristics, the model b enefits from an inductiv e bias that is ph ysically constrained y et flexible in implementation. A t a broader lev el, these findings indicate that physically motiv ated state ev olu- tion ma y offer a useful route to wards neural arc hitectures that balance expressivit y , stabilit y , and interpretabilit y . A t the same time, the presen t results should b e in ter- preted with appropriate caution. Neural and circuit principles inspire the framework, but it is not in tended as a biologically faithful accoun t of neuronal computation. In addition, the curren t study explores only a limited set of cascade configurations and b enc hmark families. It therefore remains to be seen ho w far the same design princi- ple extends to more heterogeneous modalities, irregular dynamical regimes, or larger foundation-mo del settings. F uture work m a y clarify whether adaptiv e order selection, ric her circuit motifs, or tighter links b et ween contin uous-time analysis and discrete learning dynamics can further strengthen this line of inquiry . Metho ds Mathematic al formulation of the Kir chhoff Neur al Cel l. W e view representation learning as the evolution of an intrinsic laten t p oten tial v ( t ), analogous to a membrane-potential-lik e state. F or a single Kirc hhoff cell driv en by an external input u ( t ), Kirchhoff ’s current law gives i C ( t ) + i leak ( t ) + i couple ( t ) = i in ( t ) , (6) where the branch curren ts are defined as i C ( t ) = C dv ( t ) dt , i leak ( t ) = G leak v ( t ) , i couple ( t ) = − G p v ( t ) , i in ( t ) = B p u ( t ) . (7) Substituting Eq. ( 7 ) into Eq. ( 6 ) yields C dv ( t ) dt = − ( G leak + G p ) v ( t ) + B p u ( t ) . (8) Defining α ≜ G leak + G p C , β ≜ B p C , the dynamics can b e written in standard state form as ˙ v ( t ) = − α v ( t ) + β u ( t ) , (9) whic h makes explicit that the latent ev olution is jointly go verned b y state relaxation and input injection. T o deploy this contin uous-time mechanism in neural computation, we discretize Eq. ( 9 ) under zero-order hold (ZOH), assuming the input is piecewise constant within 15 eac h interv al [ t, t + ∆ t t ]. This yields the exact update [ 13 ] v t +1 = e − α ∆ t t v t + Z ∆ t t 0 e − α (∆ t t − τ ) β u t dτ = e − α ∆ t t v t + β  1 − e − α ∆ t t α  u t . (10) This update pro vides a stable, in terpretable discrete-state evolution rule: the exp onen- tial factor con trols retention of past states. At the same time, the second term injects the curren t stim ulus in to the ev olving latent state. A KNC do es not exp ose its internal state directly . Instead, its output is read out from both the updated state and the curren t input: y t = c o v t +1 + d o u t , (11) where y t is the output voltage of the cell, and c o and d o are fixed readout co efficients. This separates internal evolution from external emission : the internal state stores and transforms laten t dynamics, whereas the output is the quantit y propagated to the next cell. F or contin uous-time analysis, w e use the corresponding readout form y ( t ) = c o v ( t ) + d o u ( t ) . (12) Higher-or der evolution via the Casc ade d Kir chhoff Blo ck. Let y 0 ( t ) ≡ u ( t ) denote the external input. F or the ℓ -th cell in a cascade, we define the state-update and output-readout equations as C ℓ dv ℓ ( t ) dt = − a ℓ v ℓ ( t ) + b ℓ y ℓ − 1 ( t ) , (13) y ℓ ( t ) = c o,ℓ v ℓ ( t ) + d o,ℓ y ℓ − 1 ( t ) , ℓ = 1 , . . . , n. (14) Here, v ℓ ( t ) is the internal state of the ℓ -th KNC, y ℓ ( t ) is its output, and the next stage receiv es y ℓ ( t ) rather than v ℓ ( t ). This formalizes the intended KNC–CKB hierarch y: eac h cell first up dates its laten t v oltage state and then emits an output voltage that driv es the next cell. Eliminating the internal state from Eqs. ( 13 ) and ( 14 ) gives v ℓ ( t ) = y ℓ ( t ) − d o,ℓ y ℓ − 1 ( t ) c o,ℓ . Substituting this into Eq. ( 13 ) yields the output-lev el operator recursion  C ℓ d dt + a ℓ  y ℓ ( t ) =  d o,ℓ C ℓ d dt + d o,ℓ a ℓ + c o,ℓ b ℓ  y ℓ − 1 ( t ) . (15) 16 Equation ( 15 ) sho ws that eac h stage induces a first-order operator relation b et ween successiv e outputs. Applying Eq. ( 15 ) recursively across n stages yields the end-to-end operator form n Y ℓ =1  C ℓ d dt + a ℓ  y n ( t ) = n Y ℓ =1  d o,ℓ C ℓ d dt + d o,ℓ a ℓ + c o,ℓ b ℓ  u ( t ) . (16) Eq. ( 16 ) makes the higher-order nature of the cascade explicit: the left-hand side is the pro duct of n first-order differential operators acting on the final-stage output y n ( t ). Therefore, cascading first-order KNCs yields a progressively higher-order end-to-end ev olution pro cess; in the generic non-degenerate case, the resulting mapping is n -th order [ 20 , 21 ]. This pro vides a principled mechanism for enric hing temp oral expres- siv eness through recursive comp osition, rather than through externally app ended p ositional heuristics. The formulation ab o ve clarifies three levels of structure in KINN. First, a sin- gle KNC implements a first-order state-evolution unit grounded in R C dynamics. Second, ZOH discretization turns these contin uous dynamics into a closed-form recurren t up date suitable for neural computation. Third, the CKB comp oses state- up date/readout pairs across stages, so that higher-order temporal sensitivit y emerges from the mo del’s in ternal recursiv e structure. Neur al ar chite ctur e of Kir chhoff Neur al Cel l and Casc ade d Kir chhoff Blo ck. The theoretical form ulation abov e defines a Kirc hhoff Neural Cell (KNC) as a state- ev olution unit with three essen tial comp onen ts: a relaxation term, an input-injection term, and a readout term. In con tinuous time, these roles are pla yed b y the co efficien ts α , β , c o , and d o in Eqs. ( 9 )–( 12 ). In neural implementation, we preserve the same state-up date/readout structure, but instantiate part of these co efficien ts in a selectiv e discrete form so that the effectiv e dynamics can adapt to the curren t feature. Let the input feature b e X ∈ R B × L × d , (17) where B is the batch size, L is the sequence length (or the flattened spatial length for 2D features), and d is the channel dimension. The blo c k output is Y ∈ R B × L × d , (18) with the same shap e as the input, so that the blo ck can b e seamlessly inserted into deep hierarc hical bac kb ones. Giv en the normalized input e X = Norm( X ), the blo ck first splits into an ev olution branc h and a gate branch: u = ϕ  DSCon v ( W u e X )  , g = ϕ ( W g e X ) , (19) where W u and W g are learnable linear pro jections, DSConv( · ) denotes depthwise separable conv olution, and ϕ ( · ) denotes the SiLU activ ation. The evolution branch 17 pro vides the driving signal for state evolution, while the gate branc h provides an input- dep enden t mo dulation path. The DSConv lay er introduces light weigh t lo cal interaction b efore the dynamical update. F or the discrete neural realization of a single KNC, we use v t to denote the laten t state at step t , in direct correspondence with the contin uous-time state v ( t ). The implemen ted state-up date/readout rule is written as v t +1 = ¯ α t ⊙ v t + ¯ β t ⊙ u t , y t = c t ⊙ v t +1 + d u t , (20) where y t is the emitted output of the cell, ¯ α t is the effectiv e retention co efficient, ¯ β t is the effective input-injection co efficien t, c t is the readout co efficien t, and d is a learned skip/readout parameter. This is the discrete implemen tation coun terpart of Eqs. ( 9 ) and ( 11 ): ¯ α t corresp onds to relaxation/retention, ¯ β t corresp onds to input injection, and ( c t , d ) correspond to state/input readout. T o enhance expressive capacity , part of these effectiv e co efficien ts are made input- dep enden t through learnable pro jections: ∆ t = softplus( W ∆ u t + b ∆ ) , b t = W b u t , c t = W c u t , (21) and the discretized co efficien ts are formed as ¯ α t = exp( − ∆ t ⊙ λ ) , ¯ β t = 1 − exp( − ∆ t ⊙ λ ) λ ⊙ b t , (22) where λ is a learned decay parameter. In this parameterization, λ plays the role of the contin uous-time deca y rate, ∆ t acts as an input-dependent effective discretiza- tion timescale, and the combination exp( − ∆ t ⊙ λ ) is the selective implementation coun terpart of the theoretical retention factor e − α ∆ t t in Eq. ( 10 ). Thus, the net- w ork implemen tation preserv es the Kirchhoff-inspired relaxation–injection–readout in terpretation while allo wing the dynamics to adapt to the current feature. The ev olution branc h is then processed b y a cascaded Kirc hhoff operator (CK O) comp osed of serial KNCs. Denoting the k -th KNC by K θ k ( · ), the cascade is written as y (1) = K θ 1 ( u ) , y ( k ) = K θ k  y ( k − 1)  , k = 2 , . . . , N , (23) where N is the cascade depth. This realizes the discrete counterpart of the higher-order cascade deriv ed in Eqs. ( 15 )–( 16 ): each KNC contributes a first-order state-evolution step, and serial composition progressiv ely enric hes the effectiv e order of the end-to-end dynamics. T o preserve shallow, in termediate, and deep ev olutionary responses, we aggregate the outputs of all KNC stages into the final cascaded represen tation: ¯ y = N X k =1 y ( k ) , (24) 18 where y ( k ) denotes the output of the k -th KNC in the cascade. Equiv alently , this can b e viewed as progressively adding all preceding stage resp onses into the final- stage path wa y . Suc h dense aggregation preserves low-order and in termediate-order ev olutionary cues while in tegrating the deep est resp onse pro duced by the cascaded op erator. The fused feature is then modulated b y the gate branch through element-wise m ultiplication: b y = ¯ y ⊙ g , (25) and the final blo c k output is obtained by residual addition: Y = X + b y . (26) Therefore, the prop osed CKB implements the theoretical KNC–CKB hierarc hy in a practical neural form: a single KNC realizes Kirchhoff-inspired selective state evolu- tion, while the full blo c k com bines local prepro cessing, cascaded m ulti-order ev olution, dense cross-stage aggregation, input-aw are gating, and residual fusion within a unified arc hitecture. References [1] F ukushima, K. Neocognitron: A self-organizing neural net work model for a mec h- anism of pattern recognition unaffected by shift in p osition. Biolo gic al Cyb ernetics 36 , 193–202 (1980). [2] Y amins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning mo dels to understand sensory cortex. Natur e Neur oscienc e 19 , 356–365 (2016). [3] Purves, D. et al. in L ong-distanc e signaling by me ans of action p otentials 2 edn, Neur oscienc e (Sinauer Associates, Sunderland, MA, 2001). [4] Marzv an yan, A. & Alha wa j, A. F. Ph ysiology , sensory receptors. StatPe arls (2019). [5] Averbeck, B. B., Latham, P . E. & Pouget, A. Neural correlations, p opulation co ding and computation. Natur e R eviews Neur oscienc e 7 , 358–366 (2006). [6] Bassett, D. S. & Sp orns, O. Net work neuroscience. Natur e Neur oscienc e 20 , 353–364 (2017). [7] Vyas, S., Golub, M. D., Sussillo, D. & Shenoy , K. V. Computation through neural p opulation dynamics. Annual R eview of Neur oscienc e 43 , 249–275 (2020). [8] Krizhevsky , A., Sutsk ever, I. & Hin ton, G. E. ImageNet classific ation with de ep c onvolutional neur al networks . A dvanc es in Neur al Information Pr o c essing Systems , V ol. 25 (2012). 19 [9] V asw ani, A. et al. Attention is al l you ne e d . A dvanc es in Neur al Information Pr o c essing Systems , V ol. 30 (2017). [10] Debanne, D., Bialow as, A. & Rama, S. What are the mec hanisms for analogue and digital signalling in the brain? Natur e R eviews Neur oscienc e 14 , 63–69 (2013). [11] Day an, P . & Abb ott, L. F. The or etic al Neur oscienc e: Computational and Mathematic al Mo deling of Neur al Systems (MIT Press, Cam bridge, MA, 2001). [12] Ho c hreiter, S. & Schmidh ub er, J. Long short-term memory . Neur al Computation 9 , 1735–1780 (1997). [13] Higham, N. J. F unctions of Matric es: The ory and Computation (SIAM, Philadelphia, P A, 2008). [14] Gu, A., Goel, K. & R´ e, C. Efficiently mo deling long se quenc es with structur e d state sp ac es . International Confer enc e on L e arning R epr esentations (2022). [15] Chua, L. O., Deso er, C. A. & Kuh, E. S. Line ar and Nonline ar Cir cuits (McGraw- Hill, New Y ork, 1987). [16] Deng, J. et al. ImageNet: A lar ge-sc ale hier ar chic al image datab ase . 2009 IEEE Confer enc e on Computer Vision and Pattern R e c o gnition , 248–255 (IEEE, 2009). [17] Morita, T. P ositional enco ding helps recurrent neural netw orks handle a large v o cabulary . arXiv pr eprint arXiv:2402.00236 (2024). [18] Zhao, L. et al. L ength extr ap olation of tr ansformers: A survey fr om the p ersp e ctive of p ositional enc o ding . Findings of the Asso ciation for Computational Linguistics: EMNLP 2024 , 9959–9977 (2024). [19] Nilsson, J. W. & Riedel, S. A. Ele ctric Cir cuits 12 edn (Pearson, Hob ok en, NJ, 2022). [20] Ogata, K. Mo dern Contr ol Engine ering (Prentice Hall, Upper Saddle River, NJ, 2010). [21] Kailath, T. Line ar Systems (Pren tice-Hall, Englewoo d Cliffs, NJ, 1980). [22] Stuart, A. M. Inv erse problems: A Ba yesian p erspective. A cta Numeric a 19 , 451–559 (2010). [23] Guibas, J. et al. Adaptiv e F ourier neural op erators: Efficient token mixers for transformers. arXiv pr eprint arXiv:2111.13587 (2021). [24] Chollet, F. Xc eption: De ep le arning with depthwise sep ar able c onvolutions . Pr o- c e e dings of the IEEE Confer enc e on Computer Vision and Pattern R e c o gnition (2017). 20 [25] Lusch, B., Kutz, J. N. & Brunton, S. L. Deep learning for universal linear em b eddings of nonlinear dynamics. Natur e Communic ations 9 , 4950 (2018). [26] Butcher, J. C. The Numeric al Analysis of Or dinary Differ ential Equations: R unge–Kutta and Gener al Line ar Metho ds (Wiley , Chichester, 1987). [27] Li, Z. et al. F ourier neur al op er ator for p ar ametric p artial differ ential e quations . International Confer enc e on L e arning R epr esentations (2021). [28] Raissi, M., Perdik aris, P . & Karniadakis, G. E. Ph ysics-informed neural net w orks: A deep learning framework for solving forward and in verse problems inv olving nonlinear partial differen tial equations. Journal of Computational Physics 378 , 686–707 (2019). [29] Sermanet, P . et al. Ov erfeat: Integrated recognition, localization and detection using con volutional netw orks. arXiv pr eprint arXiv:1312.6229 (2013). [30] Liu, Z. et al. A ConvNet for the 2020s . Pr o c e e dings of the IEEE/CVF Confer enc e on Computer Vision and Pattern R e c o gnition (2022). [31] Liu, Z. et al. Swin tr ansformer: Hier ar chic al vision tr ansformer using shifte d windows . Pr o c e e dings of the IEEE/CVF International Confer enc e on Computer Vision (2021). [32] Liu, Y. et al. VMamb a: Visual state sp ac e mo del . A dvanc es in Neur al Information Pr o c essing Systems (2024). [33] Hatamizadeh, A. & Kautz, J. Mamb aVision: A hybrid Mamb a-T r ansformer vision b ackb one . Pr o c e e dings of the Computer Vision and Pattern R e c o gnition Confer enc e , 25261–25270 (2025). 21

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment