Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective
The Rectified Power Unit (RePU) activation function, a differentiable generalization of the Rectified Linear Unit (ReLU), has shown promise in constructing neural networks due to its smoothness properties. However, deep RePU networks often suffer from critical issues such as vanishing or exploding values during training, rendering them unstable regardless of hyperparameter initialization. Leveraging the perspective of effective field theory, we identify the root causes of these failures and propose the Modified Rectified Power Unit (MRePU) activation function. MRePU addresses RePU’s limitations while preserving its advantages, such as differentiability and universal approximation properties. Theoretical analysis demonstrates that MRePU satisfies criticality conditions necessary for stable training, placing it in a distinct universality class. Extensive experiments validate the effectiveness of MRePU, showing significant improvements in training stability and performance across various tasks, including polynomial regression, physics-informed neural networks (PINNs) and real-world vision tasks. Our findings highlight the potential of MRePU as a robust alternative for building deep neural networks.
💡 Research Summary
The paper investigates why deep neural networks that employ the Rectified Power Unit (RePU) activation function often become unstable, exhibiting vanishing or exploding activations regardless of standard initialization schemes. By interpreting neural networks through the lens of effective field theory (EFT), the authors treat the distribution of pre‑activations as a statistical field. In the infinite‑width limit the network behaves like a free field, fully characterized by its mean and two‑point correlator. Introducing a non‑linear activation such as RePU creates higher‑order connected correlators (three‑point, four‑point, etc.), which correspond to interactions in the field‑theoretic picture.
For RePU, defined as σ(z)=max(0,z)^p with integer p≥2, the recursion for the variance of pre‑activations becomes q_{l+1}=C_W·E
Comments & Academic Discussion
Loading comments...
Leave a Comment