Scalable Back-Propagation-Free Training of Optical Physics-Informed Neural Networks
Physics intelligence and digital twins often require rapid and repeated performance evaluation of various engineering systems (e.g. robots, autonomous vehicles, semiconductor chips) to enable (almost) real-time actions or decision making. This has motivated the development of accelerated partial differential equation (PDE) solvers, in resource-constrained scenarios if the PDE solvers are to be deployed on the edge. Physics-informed neural networks (PINNs) have shown promise in solving high-dimensional PDEs, but the training time on state-of-the-art digital hardware (e.g., GPUs) is still orders-of-magnitude longer than the latency required for enabling real-time decision making. Photonic computing offers a potential solution to address this huge latency gap because of its ultra-high operation speed. However, the lack of photonic memory and the large device sizes prevent training real-size PINNs on photonic chips. This paper proposes a completely back-propagation-free (BP-free) and highly salable framework for training real-size PINNs on silicon photonic platforms. Our approach involves three key innovations: (1) a sparse-grid Stein derivative estimator to avoid the BP in the loss evaluation of a PINN, (2) a dimension-reduced zeroth-order optimization via tensor-train decomposition to achieve better scalability and convergence in BP-free training, and (3) a scalable on-chip photonic PINN training accelerator design using photonic tensor cores. We validate our numerical methods on both low- and high-dimensional PDE benchmarks. Through pre-silicon simulation based on real device parameters, we further demonstrate the significant performance benefit (e.g., real-time training, huge chip area reduction) of our photonic accelerator.
💡 Research Summary
The paper addresses the pressing need for ultra‑low‑latency partial differential equation (PDE) solvers required by physics‑based intelligence and digital twins. While physics‑informed neural networks (PINNs) have shown promise for high‑dimensional PDEs, their training on conventional GPUs still takes hours, far exceeding real‑time constraints on edge devices. Photonic computing offers orders‑of‑magnitude higher clock speeds, but existing photonic AI accelerators suffer from two fundamental limitations: (i) large footprint of Mach‑Zehnder interferometer (MZI) MAC units, which makes scaling to the >10⁵ parameters of a realistic PINN infeasible, and (ii) the inability to store intermediate activations, rendering back‑propagation (BP) impractical on‑chip.
To overcome these challenges, the authors propose a completely back‑propagation‑free (BP‑free) training framework that operates entirely on a silicon photonic platform. The framework consists of three tightly coupled innovations.
-
Sparse‑grid Stein derivative estimator – The PINN loss contains high‑order spatial derivatives (∇u, Δu). The authors first rewrite the network output as a Gaussian‑smoothed expectation, enabling the use of Stein’s identity to express derivatives as expectations over perturbed inputs. Rather than evaluating these expectations via naïve Monte‑Carlo (requiring >10³ forward passes), they adopt sparse‑grid quadrature (Smolyak construction) to dramatically reduce the number of required samples while preserving integration accuracy. This yields a BP‑free, memory‑less method to compute all derivative terms needed for the PDE residual loss.
-
Dimension‑reduced zeroth‑order optimization (ZO‑SGD) via tensor‑train (TT) decomposition – Standard ZO gradient estimators suffer from variance that scales with the parameter dimension d, making end‑to‑end training of large networks impractical. The authors compress the full weight tensor into a low‑rank TT format, thereby reducing the effective optimization dimension to the size of the core tensors. Gradient estimates are then performed in this reduced space using random perturbations, and a variance‑reduction scheme (control variates and adaptive sampling) further stabilizes the updates. The resulting ZO‑SGD converges much faster than naïve ZO methods and does not require any storage of intermediate activations.
-
Scalable photonic accelerator design – Building on existing photonic matrix‑multiplication engines, the authors design a “photonic tensor core” that implements the TT‑compressed MAC operations using arrays of MZIs. Two architectural options are presented: (a) a full‑model‑on‑chip layout where all TT cores are instantiated simultaneously, and (b) a time‑multiplexed design that reuses a single core to process all TT factors sequentially, dramatically reducing the number of required MZIs. A lightweight digital controller orchestrates the forward evaluations for the Stein estimator, the perturbation generation for ZO‑SGD, and the weight updates. The hardware analysis shows a 42.7× reduction in MZI count compared with a naïve dense implementation, while maintaining high optical throughput.
Experimental validation spans low‑dimensional wave equations up to a 10‑dimensional Black‑Scholes option‑pricing PDE. The BP‑free training achieves comparable or slightly higher final loss than conventional BP‑based PINN training, but with a 2–3× speedup on a GPU baseline. In pre‑silicon simulations using realistic photonic device parameters, the accelerator solves the Black‑Scholes problem in 1.64 seconds, demonstrating real‑time capability. Energy‑per‑operation and chip‑area estimates indicate orders‑of‑magnitude improvements over electronic accelerators for the same task.
Contributions and impact – The work delivers (i) a mathematically rigorous, memory‑free method for evaluating high‑order PDE residuals on photonic hardware, (ii) a scalable ZO optimization scheme that leverages low‑rank tensor structure to tame the curse of dimensionality, and (iii) a concrete photonic hardware blueprint that can train realistic PINNs with hundreds of neurons per layer. By eliminating the need for back‑propagation and dramatically reducing photonic component count, the paper paves the way for on‑chip, real‑time physics‑aware learning systems suitable for edge deployment in autonomous vehicles, robotics, and semiconductor thermal management. Future work will involve silicon‑fabricated prototypes and extending the approach to adaptive mesh‑free PDE formulations, further bridging the gap between physical intelligence and ultra‑low‑latency hardware.
Comments & Academic Discussion
Loading comments...
Leave a Comment