Fast Training of Sinusoidal Neural Fields via Scaling Initialization

Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Neural fields are an emerging paradigm that represent data as continuous functions parameterized by neural networks. Despite many advantages, neural fields often have a high training cost, which prevents a broader adoption. In this paper, we focus on a popular family of neural fields, called sinusoidal neural fields (SNFs), and study how it should be initialized to maximize the training speed. We find that the standard initialization scheme for SNFs – designed based on the signal propagation principle – is suboptimal. In particular, we show that by simply multiplying each weight (except for the last layer) by a constant, we can accelerate SNF training by 10$\times$. This method, coined $\textit{weight scaling}$, consistently provides a significant speedup over various data domains, allowing the SNFs to train faster than more recently proposed architectures. To understand why the weight scaling works well, we conduct extensive theoretical and empirical analyses which reveal that the weight scaling not only resolves the spectral bias quite effectively but also enjoys a well-conditioned optimization trajectory. The code is available $\href{https://github.com/effl-lab/Fast-Neural-Fields}{here}$.


💡 Research Summary

This paper investigates how the initialization of sinusoidal neural fields (SNFs)—multilayer perceptrons that use sine activations—affects training efficiency. While the standard SNF initialization, introduced by Sitzmann et al. (2020), is derived from a signal‑propagation principle that keeps the distribution of activations roughly arcsine(−1, 1) across layers, the authors demonstrate that this scheme is far from optimal for speed. They propose a remarkably simple modification: multiply every weight matrix (except the final layer) by a constant factor α ≥ 1, a procedure they call “weight scaling.”

Empirically, weight scaling yields up to a ten‑fold reduction in the number of training steps required to reach a target PSNR on a variety of tasks, including high‑resolution image regression, 3D scene reconstruction, and physics‑informed simulations. The speed‑up is consistent across several recent SNF‑based architectures (e.g., SIREN, MFN, Gaussian‑Net), and the method even outperforms newer designs that were specifically engineered for fast training. Importantly, when α is chosen within a moderate range (typically 1.5–2.5 depending on data resolution and model size), the accelerated training does not sacrifice test‑time performance; the interpolation quality remains comparable to that of the baseline.

The authors provide a thorough theoretical analysis to explain these observations. First, they prove that scaling the weights by α preserves the arcsine activation distribution for all hidden layers, so the forward signal remains well‑behaved even for deep networks. Second, they show that weight scaling simultaneously increases the effective frequency of each sinusoidal basis and amplifies higher‑order harmonics, which makes the network capable of fitting high‑frequency components early in training. Third, an eigenspectrum study reveals that scaled networks have a more favorable conditioning: singular values of the Jacobian are larger and more evenly spread, leading to smoother optimization trajectories. This contrasts with the “lazy training” regime observed in ReLU networks, where large weight scaling pushes the model into a kernel‑like regime that converges quickly but generalizes poorly. In sinusoidal networks, the non‑linearity remains strong enough that scaling accelerates convergence without entering a degenerate lazy regime.

To select α without exhaustive per‑dataset tuning, the authors formulate an optimization problem that maximizes the relative speed gain subject to a small allowable increase in test loss. Empirical results suggest that the optimal α correlates with the physical scale of the problem (e.g., image resolution, number of parameters) and can be approximated by a simple logarithmic rule.

Overall, the contribution is twofold: (1) a practically trivial yet highly effective initialization technique that dramatically reduces training time for SNFs, and (2) a set of analytical insights that clarify why sinusoidal activations respond uniquely to weight scaling, highlighting the importance of initialization beyond traditional signal‑preservation criteria. The work invites further exploration of scaling‑based initializations for other activation families and suggests that revisiting initialization design can be as impactful as architectural innovations in the neural field community.


Comments & Academic Discussion

Loading comments...

Leave a Comment