The Enhanced Physics-Informed Kolmogorov-Arnold Networks: Applications of Newton's Laws in Financial Deep Reinforcement Learning (RL) Algorithms
Deep Reinforcement Learning (DRL), a subset of machine learning focused on sequential decision-making, has emerged as a powerful approach for tackling financial trading problems. In finance, DRL is commonly used either to generate discrete trade signals or to determine continuous portfolio allocations. In this work, we propose a novel reinforcement learning framework for portfolio optimization that incorporates Physics-Informed Kolmogorov-Arnold Networks (PIKANs) into several DRL algorithms. The approach replaces conventional multilayer perceptrons with Kolmogorov-Arnold Networks (KANs) in both actor and critic components-utilizing learnable B-spline univariate functions to achieve parameter-efficient and more interpretable function approximation. During actor updates, we introduce a physics-informed regularization loss that promotes second-order temporal consistency between observed return dynamics and the action-induced portfolio adjustments. The proposed framework is evaluated across three equity markets-China, Vietnam, and the United States, covering both emerging and developed economies. Across all three markets, PIKAN-based agents consistently deliver higher cumulative and annualized returns, superior Sharpe and Calmar ratios, and more favorable drawdown characteristics compared to both standard DRL baselines and classical online portfolio-selection methods. This yields more stable training, higher Sharpe ratios, and superior performance compared to traditional DRL counterparts. The approach is particularly valuable in highly dynamic and noisy financial markets, where conventional DRL often suffers from instability and poor generalization.
💡 Research Summary
The paper introduces a novel reinforcement‑learning framework for portfolio optimization that integrates Physics‑Informed Kolmogorov‑Arnold Networks (PIKANs) into several state‑of‑the‑art DRL algorithms. Traditional DRL approaches for finance typically rely on multilayer perceptrons (MLPs) as function approximators, which are parameter‑heavy and difficult to interpret. By replacing MLPs with Kolmogorov‑Arnold Networks (KANs), the authors exploit a decomposition of multivariate functions into sums of learnable univariate B‑spline functions. This yields a dramatically more parameter‑efficient architecture (often >70 % fewer parameters) while providing direct interpretability of each spline component.
Beyond architectural changes, the authors embed a physics‑informed regularization term into the loss. Inspired by Newton’s second law, they treat asset returns as “velocity” and the portfolio weight adjustments as “acceleration.” The regularizer penalizes the discrepancy between observed first‑ and second‑order temporal differences of returns (i.e., empirical velocity and acceleration) and the corresponding changes implied by the agent’s actions. This second‑order temporal consistency loss is combined with the standard policy‑gradient or actor‑critic objective using an adaptive weighting schedule: a high physics‑loss weight early in training stabilizes the policy by suppressing abrupt rebalancing, while the weight is gradually reduced to allow the agent to focus on reward maximization.
The framework is instantiated on four popular actor‑critic algorithms—A2C, DDPG, PPO, and TD3—by swapping their neural nets for PIKANs and adding the physics loss. The state representation consists of a 5‑day look‑back window of OHLCV data and twelve technical indicators (ADX, ATR, Bollinger Bands, MACD, Momentum, OBV, RSI, Realized Volatility, Williams %R, etc.). The action is a continuous weight vector over m risky assets, constrained to sum to one and be non‑negative. Rewards are logarithmic returns adjusted for transaction costs.
Empirical evaluation is performed on three equity markets: the United States (developed), China (emerging), and Vietnam (emerging). Across all markets, PIKAN‑enhanced agents consistently outperform baseline DRL agents that use standard MLPs as well as classical online portfolio selection methods (e.g., OLMAR, Anticor). Performance metrics include cumulative and annualized returns, Sharpe ratio, Calmar ratio, and maximum drawdown. For example, in the US market the PIKAN‑PPO agent achieved an average annual return of ~15 % with a Sharpe ratio of 2.1 and a Calmar ratio of 0.5, surpassing the MLP‑PPO baseline (Sharpe ≈1.4) and traditional methods (Sharpe ≈0.9). Similar gains are observed in China and Vietnam. Training curves reveal that the physics‑informed loss dramatically reduces loss oscillations, accelerates convergence, and mitigates over‑fitting, leading to more stable policies.
Ablation studies isolate the contributions of (i) the KAN architecture and (ii) the physics regularizer. Using KAN alone improves parameter efficiency but does not fully resolve instability; using the physics loss alone yields smoother policies but at the cost of lower returns due to excessive conservatism. The combination of both yields the best trade‑off, confirming that the two innovations are complementary.
The paper’s main contributions are: (1) introducing KANs as a compact, interpretable alternative to MLPs in DRL for finance; (2) formulating a Newton‑law‑inspired physics regularizer that enforces second‑order temporal consistency between market dynamics and portfolio actions; (3) providing a generalizable PIKAN‑DRL framework applicable to multiple actor‑critic algorithms; and (4) delivering extensive empirical evidence of superior performance and stability across diverse market conditions. The authors suggest future work on extending the physics constraints to multiple physical laws (e.g., conservation principles), integrating stochastic differential equation models of asset dynamics, and exploring uncertainty quantification within the PIKAN paradigm.
Comments & Academic Discussion
Loading comments...
Leave a Comment