EUGens: Efficient, Unified, and General Dense Layers
Efficient neural networks are essential for scaling machine learning models to real-time applications and resource-constrained environments. Fully-connected feedforward layers (FFLs) introduce computation and parameter count bottlenecks within neural network architectures. To address this challenge, in this work, we propose a new class of dense layers that generalize standard fully-connected feedforward layers, \textbf{E}fficient, \textbf{U}nified and \textbf{Gen}eral dense layers (EUGens). EUGens leverage random features to approximate standard FFLs and go beyond them by incorporating a direct dependence on the input norms in their computations. The proposed layers unify existing efficient FFL extensions and improve efficiency by reducing inference complexity from quadratic to linear time. They also lead to \textbf{the first} unbiased algorithms approximating FFLs with arbitrary polynomial activation functions. Furthermore, EuGens reduce the parameter count and computational overhead while preserving the expressive power and adaptability of FFLs. We also present a layer-wise knowledge transfer technique that bypasses backpropagation, enabling efficient adaptation of EUGens to pre-trained models. Empirically, we observe that integrating EUGens into Transformers and MLPs yields substantial improvements in inference speed (up to \textbf{27}%) and memory efficiency (up to \textbf{30}%) across a range of tasks, including image classification, language model pre-training, and 3D scene reconstruction. Overall, our results highlight the potential of EUGens for the scalable deployment of large-scale neural networks in real-world scenarios.
💡 Research Summary
The paper introduces a novel class of dense layers called EUGens (Efficient, Unified, General dense layers) that aim to replace the conventional fully‑connected feed‑forward layers (FFLs) which dominate the parameter and compute budget of modern neural architectures such as Transformers, Vision Transformers, and implicit neural representations (e.g., NeRF). The central idea is to disentangle the weight matrix W and the input vector x, map each through separate random‑feature transformations, and then compute a simple inner product between the transformed representations.
Formally, for a k‑th order EUGen the output is
EUGenₖ(w, x) = Ψ(concat_{i=0…k} ∑{j=1}^{i} G{ij} w) ⊤ Φ(concat_{i=0…k} ∑{j=1}^{i} G{ij} x),
where G_{ij} ∈ ℝ^{m×(d+1)} are random matrices, Ψ and Φ are element‑wise non‑linearities (often the identity), and the concatenation builds a low‑dimensional representation whose dimension grows with the order k. By choosing m ≪ min(d, l) the inference cost drops from O(d l + d²) to O(m d k² + m l), i.e., linear in the input dimension.
Theoretical contributions are threefold. First, Theorem 3.1 proves that for any polynomial activation f(x)=∑{i=0}^{k} a_i x^i, an appropriately scaled random matrix G{ij} yields an unbiased estimator: E
Comments & Academic Discussion
Loading comments...
Leave a Comment