Pooling probability distributions and partial information decomposition.
Notwithstanding various attempts to construct a partial information decomposition (PID) for multiple variables by defining synergistic, redundant, and unique information, there is no consensus on how one ought to precisely define either of these quantities. One aim here is to illustrate how that ambiguity-or, more positively, freedom of choice-may arise. Using the basic idea that information equals the average reduction in uncertainty when going from an initial to a final probability distribution, synergistic information will likewise be defined as a difference between two entropies. One term is uncontroversial and characterizes “the whole” information that source variables carry jointly about a target variable T. The other term then is meant to characterize the information carried by the “sum of its parts.” Here we interpret that concept as needing a suitable probability distribution aggregated (“pooled”) from multiple marginal distributions (the parts). Ambiguity arises in the definition of the optimum way to pool two (or more) probability distributions. Independent of the exact definition of optimum pooling, the concept of pooling leads to a lattice that differs from the often-used redundancy-based lattice. One can associate not just a number (an average entropy) with each node of the lattice, but (pooled) probability distributions. As an example, one simple and reasonable approach to pooling is presented, which naturally gives rise to the overlap between different probability distributions as being a crucial quantity that characterizes both synergistic and unique information.
💡 Research Summary
The paper tackles the long‑standing ambiguity in defining the components of a Partial Information Decomposition (PID)—synergistic, redundant, and unique information—by returning to the most elementary definition of information as the average reduction in uncertainty when moving from an initial to a final probability distribution. The author proposes to define synergistic information as the difference between two entropies: the entropy that quantifies “the whole” information that a set of source variables jointly carries about a target variable T, and the entropy that quantifies “the sum of its parts.” The latter is obtained by first aggregating (or “pooling”) the marginal distributions of the individual sources into a single probability distribution, then computing its entropy.
The central technical problem becomes: how should multiple probability distributions be pooled in an optimal way? The paper argues that any choice of pooling rule constitutes a degree of freedom in constructing a PID, and that different choices lead to different lattice structures. Rather than the traditional redundancy‑based lattice, where each node is associated with a scalar redundancy value, the proposed “pooling‑based” lattice assigns a full probability distribution to each node. Consequently, moving up or down the lattice changes not only the average entropy but also the shape of the underlying distribution, providing a richer description of information flow.
To make the discussion concrete, the author introduces a simple, principled pooling method. For each outcome (s, t) the method takes the minimum of the probabilities assigned by the individual source‑target marginals, thereby defining an “overlap” region where all sources agree. The remaining probability mass is then redistributed proportionally to preserve each marginal’s total mass. This construction maximizes the common mass (the overlap) while respecting the original marginals, and it yields a pooled distribution that reflects the shared structure of the sources without erasing their individual contributions.
The overlap quantity emerges as a pivotal metric. When the overlap is large, the pooled distribution is close to each marginal, the entropy of the pooled distribution approaches the joint entropy, and the difference (i.e., the synergy) becomes small. In this regime the sources mainly provide redundant or unique information, and the synergistic term vanishes. Conversely, when the overlap is small, the pooled distribution is far from the individual marginals, the pooled entropy is high, and the difference between the whole‑joint entropy and the pooled entropy becomes large—signifying strong synergistic interaction. At the same time, the unique information of each source can be quantified by the divergence between its marginal and the pooled distribution; a small overlap inflates this divergence for each source, indicating that each contributes information that the others do not share.
By embedding this pooling operation into a lattice, the paper demonstrates that the PID can be derived from a family of possible pooling rules rather than a single, universally accepted redundancy measure. The lattice nodes are no longer mere scalar scores but carry the pooled distributions themselves, allowing analysts to inspect how the distributional shape evolves across the lattice. This perspective reveals that PID is not a fixed decomposition but a flexible framework whose specific form can be tailored to the scientific question at hand.
The author illustrates the approach with a toy example involving two discrete sources X and Y and a target Z. By computing the overlap, constructing the pooled distribution, and evaluating the entropies, the paper shows that when X and Y have highly overlapping conditional distributions given Z, the synergy term is near zero and the unique information terms dominate. When the conditionals are largely disjoint, synergy becomes substantial while unique information diminishes.
In conclusion, the paper contributes a novel conceptual tool—probability‑distribution pooling—to the PID literature. It clarifies how the choice of pooling rule introduces a controlled degree of freedom, how this choice reshapes the underlying lattice, and how the overlap between distributions simultaneously governs synergistic and unique information. This framework promises greater flexibility for fields such as neuroscience, complex systems, and machine learning, where multivariate information interactions are central and where a one‑size‑fits‑all redundancy measure may be insufficient.
Comments & Academic Discussion
Loading comments...
Leave a Comment