On the Hardness of Entropy Minimization and Related Problems

We investigate certain optimization problems for Shannon information measures, namely, minimization of joint and conditional entropies $H(X,Y)$, $H(X|Y)$, $H(Y|X)$, and maximization of mutual information $I(X;Y)$, over convex regions. When restricted to the so-called transportation polytopes (sets of distributions with fixed marginals), very simple proofs of NP-hardness are obtained for these problems because in that case they are all equivalent, and their connection to the well-known \textsc{Subset sum} and \textsc{Partition} problems is revealed. The computational intractability of the more general problems over arbitrary polytopes is then a simple consequence. Further, a simple class of polytopes is shown over which the above problems are not equivalent and their complexity differs sharply, namely, minimization of $H(X,Y)$ and $H(Y|X)$ is trivial, while minimization of $H(X|Y)$ and maximization of $I(X;Y)$ are strongly NP-hard problems. Finally, two new (pseudo)metrics on the space of discrete probability distributions are introduced, based on the so-called variation of information quantity, and NP-hardness of their computation is shown.

💡 Research Summary

The paper investigates the computational complexity of several fundamental optimization problems involving Shannon information measures: the minimization of joint entropy (H(X,Y)), the minimization of conditional entropies (H(X|Y)) and (H(Y|X)), and the maximization of mutual information (I(X;Y)). The authors focus first on a highly structured class of feasible distributions known as transportation polytopes—convex sets of joint probability tables whose row and column sums (the marginal distributions) are fixed in advance. Within this setting, the four optimization problems become equivalent because the objective functions are linearly related: minimizing joint entropy is the same as maximizing mutual information, and each conditional entropy can be expressed as a linear combination of the joint entropy and the marginal entropies, which are constants on the polytope.

To establish hardness, the authors construct a polynomial‑time reduction from the classic NP‑hard Subset‑Sum (and its special case Partition) to any of the four problems on a transportation polytope. Given a Subset‑Sum instance ({a_1,\dots,a_n}) and target sum (B), they define a bipartite matrix with (n) rows and two columns, fixing the row sums to (a_i) and the column sums to (B) and (\sum a_i - B). The decision whether a joint distribution achieving the prescribed marginals can have a joint entropy below a certain threshold is shown to be equivalent to deciding whether a subset of the numbers sums exactly to (B). Consequently, each of the entropy‑related optimization problems is NP‑hard.

The paper then extends the result to arbitrary convex polytopes of probability distributions. Since a transportation polytope is a special case of a general polytope, the NP‑hardness automatically carries over, showing that the problems remain intractable even without the marginal‑fixing restriction.

A particularly insightful contribution is the identification of a simple family of polytopes where the four problems no longer share the same complexity. The authors consider polytopes that enforce exactly one positive entry per row (i.e., each row of the joint distribution is a unit vector) while leaving column sums unrestricted. In this regime, minimizing joint entropy (H(X,Y)) and minimizing the conditional entropy (H(Y|X)) become trivial: the joint distribution is forced to be deterministic given (X), so the entropy is simply the entropy of (X) and cannot be reduced further. By contrast, minimizing (H(X|Y)) and maximizing (I(X;Y)) remain strongly NP‑hard because the freedom in the column dimension still permits a Subset‑Sum‑type combinatorial choice. This dichotomy demonstrates that subtle changes in the feasible region can dramatically alter computational difficulty.

Finally, the authors introduce two new (pseudo)metrics on the space of discrete probability distributions based on the variation of information (VI) quantity (V(P,Q)=H(P)+H(Q)-2I(P;Q)). While VI is a well‑known information‑theoretic distance, the proposed metrics involve optimizing over joint distributions that realize given marginals, which again reduces to the previously studied entropy minimization problems on transportation polytopes. The authors prove that computing these metrics exactly is NP‑hard, implying that even seemingly simple distance calculations can be computationally prohibitive.

Overall, the paper provides a clear and unified hardness framework for entropy‑based optimization, connects these problems to classic combinatorial challenges, and highlights how the geometry of the feasible set governs algorithmic tractability. The results have immediate implications for fields such as statistical inference, machine learning, and network information theory, where joint and conditional entropies are routinely optimized under marginal constraints.

💡 Research Summary

📜 Original Paper Content