Identifiability of Deep Polynomial Neural Networks
Polynomial Neural Networks (PNNs) possess a rich algebraic and geometric structure. However, their identifiability – a key property for ensuring interpretability – remains poorly understood. In this work, we present a comprehensive analysis of the identifiability of deep PNNs, including architectures with and without bias terms. Our results reveal an intricate interplay between activation degrees and layer widths in achieving identifiability. As special cases, we show that architectures with non-increasing layer widths are generically identifiable under mild conditions, while encoder-decoder networks are identifiable when the decoder widths do not grow too rapidly compared to the activation degrees. Our proofs are constructive and center on a connection between deep PNNs and low-rank tensor decompositions, and Kruskal-type uniqueness theorems. We also settle an open conjecture on the dimension of PNN’s neurovarieties, and provide new bounds on the activation degrees required for it to reach the expected dimension.
💡 Research Summary
This paper presents a comprehensive theoretical study of the identifiability of deep Polynomial Neural Networks (PNNs), i.e., feed‑forward networks whose activation functions are monomials of possibly varying degree. Identifiability—whether the model parameters can be uniquely recovered (up to trivial symmetries such as neuron permutations and scaling) from the input‑output map—is crucial for interpretability, disentangled representation learning, and reliable model manipulation.
Key Contributions
-
Localization Theorem – The authors prove that for a homogeneous PNN (hPNN) with L > 2 layers, if every consecutive two‑layer subnetwork is identifiable on some subspace of its inputs, then the whole deep network is identifiable. The proof proceeds by induction, reducing the problem to the identifiability of 2‑layer blocks, which are shown to be equivalent to partially symmetric canonical polyadic tensor decompositions (CPDs). By invoking Kruskal‑type uniqueness conditions for CPDs, the authors guarantee that each 2‑layer block has a unique decomposition, which in turn forces the Jacobian of the full network to attain its maximal rank. Consequently, the associated neurovariety is non‑defective (its dimension equals the effective number of parameters).
-
Architectural Corollaries –
- Pyramidal (non‑increasing width) networks: When the layer widths satisfy d₀ ≥ d₁ ≥ … ≥ d_L, generic choices of weights satisfy the Kruskal condition for every 2‑layer block, yielding global identifiability. This settles a conjecture that any quadratic‑or‑higher degree PNN with non‑increasing widths is identifiable.
- Encoder‑decoder (bottleneck) networks: The decoder’s width must not grow faster than the product of the preceding width and the activation degree. Formally, for each decoder layer ℓ, d_{ℓ} < d_{ℓ‑1}·r_{ℓ‑1} ensures Kruskal’s condition and thus identifiability.
-
Improved Activation‑Degree Bounds – Prior work required activation degrees that scale quadratically with layer widths to guarantee identifiability. The present analysis shows that a linear scaling suffices: it is enough that each degree r_ℓ be proportional to the preceding width d_{ℓ‑1}. This dramatically reduces the required polynomial degree for practical architectures.
-
Bias Handling via Homogenization – Networks with bias terms are transformed into bias‑free hPNNs by augmenting the input with a constant dimension. This “homogenization” preserves the equivalence class of parameters and allows the bias‑free identifiability results to be applied unchanged.
-
Neurovariety Dimension Conjecture Resolved – The authors prove that when a PNN is finitely identifiable, the Zariski closure of its image (the neurovariety) attains the expected dimension equal to the number of effective parameters. Hence the neurovariety is non‑defective, linking algebraic geometry directly to identifiability.
Methodological Highlights
- The connection between PNNs and low‑rank tensor decompositions is exploited throughout. A 2‑layer hPNN corresponds to a partially symmetric CPD; Kruskal’s theorem provides a deterministic uniqueness condition based on the Kruskal ranks of factor matrices.
- The authors carefully track the effect of neuron permutations and diagonal scalings (Lemma 4) to define the equivalence class of parameters. This formalism underpins the definitions of “unique representation” and “finite‑to‑one representation.”
- By analyzing the Jacobian of the hPNN map, they translate identifiability into a rank‑maximization problem, which is then solved via the tensor‑based arguments.
Practical Implications
- Designing deep PNNs with decreasing layer widths automatically yields globally identifiable models, simplifying architecture selection.
- The linear activation‑degree bound allows practitioners to use low‑degree polynomials (e.g., quadratic or cubic) even in wide networks, without sacrificing identifiability.
- Bias terms no longer pose a theoretical obstacle; they can be incorporated via the homogenization trick without affecting identifiability guarantees.
- Since identifiability is equivalent to the neurovariety attaining full dimension, any compression or pruning technique that reduces the effective parameter count must preserve this dimension to avoid introducing non‑identifiable degeneracies.
Conclusion
The paper delivers the first unified theory of identifiability for deep polynomial neural networks. By bridging neural‑network theory with tensor decomposition uniqueness results, it establishes a set of clear, verifiable conditions—on layer widths, activation degrees, and bias handling—that guarantee both finite and global identifiability. The work resolves several standing conjectures, notably the dimension of neurovarieties and the activation‑degree thresholds, and provides concrete design guidelines for building interpretable, mathematically well‑behaved polynomial deep models.
Comments & Academic Discussion
Loading comments...
Leave a Comment