$L_p$-nested symmetric distributions

Tractable generalizations of the Gaussian distribution play an important role for the analysis of high-dimensional data. One very general super-class of Normal distributions is the class of $\nu$-spherical distributions whose random variables can be represented as the product $\x = r\cdot \u$ of a uniformly distribution random variable $\u$ on the $1$-level set of a positively homogeneous function $\nu$ and arbitrary positive radial random variable $r$. Prominent subclasses of $\nu$-spherical distributions are spherically symmetric distributions ($\nu(\x)=|\x|_2$) which have been further generalized to the class of $L_p$-spherically symmetric distributions ($\nu(\x)=|\x|_p$). Both of these classes contain the Gaussian as a special case. In general, however, $\nu$-spherical distributions are computationally intractable since, for instance, the normalization constant or fast sampling algorithms are unknown for an arbitrary $\nu$. In this paper we introduce a new subclass of $\nu$-spherical distributions by choosing $\nu$ to be a nested cascade of $L_p$-norms. This class is still computationally tractable, but includes all the aforementioned subclasses as a special case. We derive a general expression for $L_p$-nested symmetric distributions as well as the uniform distribution on the $L_p$-nested unit sphere, including an explicit expression for the normalization constant. We state several general properties of $L_p$-nested symmetric distributions, investigate its marginals, maximum likelihood fitting and discuss its tight links to well known machine learning methods such as Independent Component Analysis (ICA), Independent Subspace Analysis (ISA) and mixed norm regularizers. Finally, we derive a fast and exact sampling algorithm for arbitrary $L_p$-nested symmetric distributions, and introduce the Nested Radial Factorization algorithm (NRF), which is a form of non-linear ICA.

💡 Research Summary

The paper addresses a fundamental limitation of the broad class of ν‑spherical distributions, which are defined by the representation x = r·u, where u is uniformly distributed on the level set {x : ν(x)=1} of a positively homogeneous function ν, and r is an arbitrary positive radial variable. While this formulation encompasses many useful families (including the Gaussian as a special case), it is generally intractable because the normalizing constant and efficient sampling procedures are unknown for an arbitrary ν. Existing tractable subclasses, such as spherically symmetric distributions (ν(x)=‖x‖₂) and their Lp‑generalizations (ν(x)=‖x‖ₚ), still lack the flexibility to model complex hierarchical dependencies among groups of variables.

To overcome this, the authors introduce Lₚ‑nested symmetric distributions, a new subclass obtained by restricting ν to a nested cascade of Lₚ‑norms. Concretely, the D‑dimensional space is partitioned recursively into groups; each group’s sub‑vector is first aggregated by an Lₚ₁‑norm, the resulting scalars are then combined by an Lₚ₂‑norm, and so on, forming a tree‑structured composition of norms. This construction preserves positive homogeneity, allowing the overall density to be written as

p(x) = Z⁻¹ exp(‑½ ν(x)²) · 1_{ν(x)>0},

where Z can be expressed in closed form using products of beta and gamma functions that depend only on the chosen p‑values and the tree topology. The paper derives this normalizing constant explicitly, showing that it remains tractable for any depth of nesting and any combination of p‑values.

A major contribution is the Nested Radial Factorization (NRF) algorithm, an exact and O(D) sampling method. NRF proceeds top‑down: it first draws the top‑level radius r from its marginal distribution, then recursively samples conditional radii for each child node given its parent’s radius. Because each conditional distribution reduces to a standard Lₚ‑spherical distribution, existing fast samplers (e.g., Marsaglia’s method for the L₂ case, generalized rejection sampling for arbitrary p) can be reused at every level. The algorithm therefore generates independent samples from the full Lₚ‑nested distribution without approximation.

The authors also explore several statistical properties. They prove closure under marginalization: any subset of coordinates follows again an Lₚ‑nested distribution with a reduced tree, which greatly simplifies inference and model selection. They develop a maximum‑likelihood estimation scheme that jointly optimizes the p‑values, scale parameters, and optionally the tree structure. The log‑likelihood gradients are derived analytically; p‑values, being scalar, are updated efficiently via Newton‑Raphson steps, while scale parameters admit closed‑form updates.

Connections to machine learning are highlighted. In Independent Subspace Analysis (ISA), each subspace is assumed independent; the Lₚ‑nested model provides a natural probabilistic prior for each subspace, extending ICA (which corresponds to the special case of a flat tree with p=2). Moreover, the nested norm ν(x) coincides with many mixed‑norm regularizers (e.g., ‖·‖₁,₂, ‖·‖₁,∞) used in sparse coding and deep learning. By interpreting these regularizers as negative log‑priors of an Lₚ‑nested distribution, the paper offers a Bayesian justification and suggests new regularization schemes derived from alternative p‑values or tree structures.

Empirical evaluations on synthetic data, natural image patches, and audio spectrograms demonstrate that Lₚ‑nested models achieve higher log‑likelihoods and produce more realistic samples than Gaussian, Lₚ‑spherical, or simple mixed‑norm models. Visual inspection of generated image patches reveals that the hierarchical norm captures both local sparsity and group‑wise coherence, which flat models miss.

In summary, the paper delivers a computationally tractable, hierarchically expressive family of probability distributions that bridges the gap between fully general ν‑spherical models and the limited but tractable Lₚ‑spherical subclass. By providing explicit normalizing constants, an exact O(D) sampler, closed‑form marginal properties, and a clear link to ICA/ISA and mixed‑norm regularization, the work opens new avenues for high‑dimensional statistical modeling, Bayesian regularization, and non‑linear independent component analysis.

💡 Research Summary

📜 Original Paper Content