SONIC: Spectral Oriented Neural Invariant Convolutions
Convolutional Neural Networks (CNNs) rely on fixed-size kernels scanning local patches, which limits their ability to capture global context or long-range dependencies without very deep architectures. Vision Transformers (ViTs), in turn, provide global connectivity but lack spatial inductive bias, depend on explicit positional encodings, and remain tied to the initial patch size. Bridging these limitations requires a representation that is both structured and global. We introduce SONIC (Spectral Oriented Neural Invariant Convolutions), a continuous spectral parameterisation that models convolutional operators using a small set of shared, orientation-selective components. These components define smooth responses across the full frequency domain, yielding global receptive fields and filters that adapt naturally across resolutions. Across synthetic benchmarks, large-scale image classification, and 3D medical datasets, SONIC shows improved robustness to geometric transformations, noise, and resolution shifts, and matches or exceeds convolutional, attention-based, and prior spectral architectures with an order of magnitude fewer parameters. These results demonstrate that continuous, orientation-aware spectral parameterisations provide a principled and scalable alternative to conventional spatial and spectral operators.
💡 Research Summary
**
The paper introduces SONIC (Spectral Oriented Neural Invariant Convolutions), a novel way to parameterize convolutional operators directly in the continuous Fourier domain. Traditional CNNs rely on small, fixed‑size kernels that only capture local patterns; to obtain global context they must stack many layers, which is inefficient. Vision Transformers (ViTs) achieve global receptive fields through self‑attention, but their quadratic cost in the number of patches, dependence on explicit positional encodings, and sensitivity to the chosen patch size limit scalability and robustness.
SONIC addresses these issues by representing the frequency response of a convolution as a superposition of a small set of orientation‑aware spectral modes. Each mode is defined analytically by a few interpretable parameters: a unit direction vector (v_m) (the “needle” in frequency space), a scale (s_m) controlling selectivity along that direction, a complex damping term (a_m) (real part for decay, imaginary part for oscillation), and a transverse penalty (\tau_m) that suppresses energy orthogonal to the needle. The mode’s transfer function follows a resolvent form reminiscent of linear time‑invariant (LTI) systems: \
Comments & Academic Discussion
Loading comments...
Leave a Comment