공간 가변 스펙트럼을 학습하는 신경망 표현

Reading time: 6 minute
...

📝 Abstract

Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, audio, and 3D scenes. However, existing INR frameworks-including MLPs with Fourier features, SIREN, and multiresolution hash grids-implicitly assume a global and stationary spectral basis. This assumption is fundamentally misaligned with real-world signals whose frequency characteristics vary significantly across space, exhibiting local high-frequency textures, smooth regions, and frequency drift phenomena. We propose Neural Spectral Transport Representation (NSTR), the first INR framework that explicitly models a spatially varying local frequency field. NSTR introduces a learnable frequency transport equation, a PDE that governs how local spectral compositions evolve across space. Given a learnable local spectrum field S(x) and a frequency transport network F θ enforcing ∇S(x) ≈ F θ (x, S(x)), NSTR reconstructs signals by spatially modulating a compact set of global sinusoidal bases. This formulation enables strong local adaptivity and offers a new level of interpretability via visualizing frequency flows. Experiments on 2D image regression, audio reconstruction, and implicit 3D geometry show that NSTR achieves significantly better accuracy-parameter trade-offs than SIREN, Fourier-feature MLPs, and Instant-NGP. NSTR requires fewer global frequencies, converges faster, and naturally explains signal structure through spectral transport fields. We believe NSTR opens a new direction in INR research by introducing explicit modeling of space-varying spectrum.

💡 Analysis

Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, audio, and 3D scenes. However, existing INR frameworks-including MLPs with Fourier features, SIREN, and multiresolution hash grids-implicitly assume a global and stationary spectral basis. This assumption is fundamentally misaligned with real-world signals whose frequency characteristics vary significantly across space, exhibiting local high-frequency textures, smooth regions, and frequency drift phenomena. We propose Neural Spectral Transport Representation (NSTR), the first INR framework that explicitly models a spatially varying local frequency field. NSTR introduces a learnable frequency transport equation, a PDE that governs how local spectral compositions evolve across space. Given a learnable local spectrum field S(x) and a frequency transport network F θ enforcing ∇S(x) ≈ F θ (x, S(x)), NSTR reconstructs signals by spatially modulating a compact set of global sinusoidal bases. This formulation enables strong local adaptivity and offers a new level of interpretability via visualizing frequency flows. Experiments on 2D image regression, audio reconstruction, and implicit 3D geometry show that NSTR achieves significantly better accuracy-parameter trade-offs than SIREN, Fourier-feature MLPs, and Instant-NGP. NSTR requires fewer global frequencies, converges faster, and naturally explains signal structure through spectral transport fields. We believe NSTR opens a new direction in INR research by introducing explicit modeling of space-varying spectrum.

📄 Content

NSTR: NEURAL SPECTRAL TRANSPORT REPRESENTATION FOR SPACE-VARYING FREQUENCY FIELDS Plein Versace Essential.ai, Italy plein@essential.ai.com ABSTRACT Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, audio, and 3D scenes. However, existing INR frameworks—including MLPs with Fourier features, SIREN, and multiresolution hash grids—implicitly assume a global and stationary spectral basis. This assumption is fundamentally misaligned with real-world signals whose frequency characteristics vary significantly across space, exhibiting local high-frequency textures, smooth regions, and frequency drift phenomena. We propose Neural Spectral Transport Representation (NSTR), the first INR framework that explicitly models a spatially varying local frequency field. NSTR introduces a learnable frequency transport equation, a PDE that governs how local spectral compositions evolve across space. Given a learnable local spectrum field S(x) and a frequency transport network Fθ enforcing ∇S(x) ≈Fθ(x, S(x)), NSTR reconstructs signals by spatially modulating a compact set of global sinusoidal bases. This formulation enables strong local adaptivity and offers a new level of interpretability via visualizing frequency flows. Experiments on 2D image regression, audio reconstruction, and implicit 3D geometry show that NSTR achieves significantly better accuracy–parameter trade-offs than SIREN, Fourier-feature MLPs, and Instant-NGP. NSTR requires fewer global frequencies, converges faster, and naturally explains signal structure through spectral transport fields. We believe NSTR opens a new direction in INR research by introducing explicit modeling of space-varying spectrum. 1 Introduction Implicit Neural Representations (INRs) encode signals as continuous functions parameterized by neural networks [1– 18], offering memory-efficient and differentiable alternatives to discrete grids. They have become foundational in neural rendering, geometry processing, audio synthesis, scientific simulations, and compression. Most existing INR formulations assume that a neural network — typically an MLP augmented with sinusoidal activations, Fourier features, or multiresolution hash encodings — directly maps a coordinate x to a signal value f(x). This “coordinate-to-value” paradigm has driven remarkable progress, yet implicitly relies on a strong but rarely challenged assumption: the spectral basis used to represent the signal is global, stationary, and fixed throughout space. In practice, however, natural signals exhibit rich and spatially varying spectral structures. Consider typical real-world data: • Textures and images contain localized edges, periodic micro-textures, smoothly varying shading, and sharp discontinuities — each region having drastically different frequency content. • 3D shapes and SDFs include nearly flat surfaces (low-frequency), corners and creases (high-frequency), and topology-dependent frequency modulation. • Neural radiance fields (NeRFs) demonstrate viewpoint-dependent frequency variations due to specular highlights, varying density gradients, and complex light–material interactions. • Audio or 1D signals exhibit local pitch drift, vibrato, transients, and harmonics that are not globally stationary. arXiv:2511.18384v1 [cs.SD] 23 Nov 2025 These observations expose a fundamental limitation of existing INRs: even when equipped with sophisticated architec- tures, the model ultimately relies on a global coordinate system whose induced representation basis cannot adapt to the local spectral structure of the signal. For example, sinusoidal networks (SIREN) impose a frequency ω that is uniform across space; Fourier feature embeddings encode a fixed set of frequencies regardless of local complexity; and hash-grid encodings focus on localized content but do not explicitly model how frequencies evolve spatially. Consequently, networks are forced to compensate by increasing depth, width, or embedding resolution, leading to:

  1. unnecessary over-parameterization in smooth regions,
  2. underfitting or aliasing in high-frequency areas,
  3. slower optimization due to spectral mismatch,
  4. poor scalability when modeling signals with heterogeneous frequency distributions. These challenges lead to a central research question: Can an INR explicitly model the local spectrum of a signal and its spatial evolution, instead of relying on a fixed global basis? To answer this, we introduce a new family of implicit representations, Neural Spectral Transport Representations (NSTR). The core insight is to reinterpret a signal not merely as a mapping x 7→s(x), but as a spatially evolving spectral field. Specifically, we assume that each position x is associated with a local spectrum S(x), and that the spectrum evolves smoothly according to a neural partial differential equation (PDE): ∇S(x) = Fθ(x, S(x)). Here Fθ acts as a learned spectral flow field, transporting local frequency bases across

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut