On the Expressive Power of Permutation-Equivariant Weight-Space Networks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Weight-space learning studies neural architectures that operate directly on the parameters of other neural networks. Motivated by the growing availability of pretrained models, recent work has demonstrated the effectiveness of weight-space networks across a wide range of tasks. SOTA weight-space networks rely on permutation-equivariant designs to improve generalization. However, this may negatively affect expressive power, warranting theoretical investigation. Importantly, unlike other structured domains, weight-space learning targets maps operating on both weight and function spaces, making expressivity analysis particularly subtle. While a few prior works provide partial expressivity results, a comprehensive characterization is still missing. In this work, we address this gap by developing a systematic theory for expressivity of weight-space networks. We first prove that all prominent permutation-equivariant networks are equivalent in expressive power. We then establish universality in both weight- and function-space settings under mild, natural assumptions on the input weights, and characterize the edge-case regimes where universality no longer holds. Together, these results provide a strong and unified foundation for the expressivity of weight-space networks.

💡 Research Summary

This paper develops a comprehensive theory of the expressive power of permutation‑equivariant weight‑space networks, which operate directly on the parameters of other neural networks, especially multilayer perceptrons (MLPs). The authors first formalize the weight space V_A of a fixed MLP architecture A and the permutation group G_A that captures hidden‑neuron symmetries: permuting neurons within any hidden layer leaves the realized function unchanged. A weight‑space network is said to be permutation‑invariant (output does not change under G_A) or permutation‑equivariant (output transforms according to the same permutation).

The literature contains several architectures designed to respect these symmetries, including Deep Weight Space (DWS) networks, Graph Meta‑Networks (GMNs), Neural Functional Networks (NFNs), Neural Graph GNNs (NG‑GNNs), and Neural Functional Transformers (NFTs). The authors introduce two families of maps that a network class π can approximate on a compact set K ⊂ V_A: N_π^inv(K) (invariant maps V → ℝⁿ) and N_π^equi(K) (equivariant operators V → V).

Expressive equivalence
Theorem 5.2 shows that, except for NFTs, all listed architectures generate exactly the same N_π^inv(K) and N_π^equi(K). The proof constructs mutual approximations between any two architectures by explicitly simulating the base layers of one with the other, establishing that they are functionally interchangeable. NFTs, because of their attention‑based design, are not equivalent in full generality; however, Proposition 5.3 proves that under a general‑position (GP) assumption—i.e., all bias terms in each hidden layer are pairwise distinct—NFTs achieve the same expressive power as the other models. Since the set of weight configurations violating GP (the exclusion set E_A) has Lebesgue measure zero, the GP condition holds almost surely in practice.

Four approximation settings
The paper then identifies four natural target families for weight‑space learning, each with its own notion of approximation:

Function‑space functionals (F → ℝⁿ) – outputs depend only on the function realized by the input weights (e.g., model accuracy prediction).
Permutation‑invariant functionals (V → ℝⁿ) – outputs may depend on the specific parameterization but must be invariant to hidden‑neuron permutations (e.g., L₂‑norm of weights, curvature prediction).
Function‑space operators (F → F) – maps one function to another (e.g., image or 3D scene editing, domain adaptation).
Permutation‑equivariant operators (V → V) – weight‑to‑weight transformations that respect G_A (e.g., pruning mask prediction, meta‑optimization gradient prediction).

For each setting the authors prove universality results, i.e., that permutation‑equivariant weight‑space networks can approximate any continuous target map to arbitrary precision, under appropriate conditions.

Key findings:

Function‑space functionals: Universality holds without any extra assumptions. Because the realization map R: V → C(X,ℝ^{d_L}) is continuous, any continuous functional of the realized function can be approximated by a permutation‑equivariant network.
Permutation‑invariant functionals & permutation‑equivariant operators: Universality fails on the whole weight space due to degenerate configurations where two neurons share the same bias (these lie in the exclusion set E_A). However, when inputs are restricted to the GP region V \ E_A, the networks become universal. The proof constructs a continuous canonization map that uniquely orders neurons based on distinct biases, allowing the network to treat each neuron as identifiable and thus simulate any invariant/equivariant map.
Function‑space operators: If the input weights are drawn from a fixed architecture (fixed depth and width), universality does not hold because the set of realizable functions is limited. By allowing the architecture to grow arbitrarily (both depth and width), the authors show that weight‑space networks can approximate any continuous operator on C(X,ℝ^{d_L}). This extends prior “width‑infinite” universal approximation results to the operator setting.

The analysis relies heavily on the notion of an exclusion set (where biases collide) and the general‑position assumption, which together enable a continuous canonical form for weight vectors. The authors also leverage feed‑forward simulation results (e.g., DWS can simulate an MLP forward pass) and extend them to show that the same simulation power suffices for the more demanding operator approximations.

Practical implications
Because all major permutation‑equivariant architectures are shown to be expressively equivalent, practitioners can choose among them based on computational efficiency, ease of implementation, or hardware considerations rather than expressive concerns. NFTs, despite their distinct attention mechanism, are theoretically on par with GNN‑style models under realistic GP conditions, validating their use in practice. Moreover, the identified edge cases (bias collisions, fixed‑size architectures) highlight when additional regularization or architectural scaling is necessary to retain universal approximation capabilities.

Conclusion
The paper delivers the first unified expressivity theory for permutation‑equivariant weight‑space networks. It establishes (i) expressive equivalence across existing architectures, (ii) universal approximation guarantees for four natural learning settings under mild, practically satisfied assumptions, and (iii) precise characterizations of the regimes where universality breaks down. This work provides a solid theoretical foundation for future research and applications that manipulate neural network weights directly.

On the Expressive Power of Permutation-Equivariant Weight-Space Networks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment