Maximum Likelihood for Matrices with Rank Constraints

Maximum likelihood estimation is a fundamental optimization problem in statistics. We study this problem on manifolds of matrices with bounded rank. These represent mixtures of distributions of two independent discrete random variables. We determine the maximum likelihood degree for a range of determinantal varieties, and we apply numerical algebraic geometry to compute all critical points of their likelihood functions. This led to the discovery of maximum likelihood duality between matrices of complementary ranks, a result proved subsequently by Draisma and Rodriguez.

💡 Research Summary

The paper investigates the maximum likelihood estimation (MLE) problem for statistical models whose parameter space consists of matrices with a prescribed rank bound. Such matrices arise naturally when one models the joint distribution of two independent discrete random variables; the joint probability table is an n × m matrix whose entries are non‑negative and sum to one, and imposing a rank‑r constraint corresponds to assuming that the distribution is a mixture of at most r product distributions.

The authors first place this problem in the language of algebraic geometry. The set of all n × m matrices of rank ≤ r is a determinantal variety, a closed algebraic subvariety of projective space defined by the vanishing of all (r + 1) × (r + 1) minors. On this variety the likelihood function, expressed in terms of the observed counts, becomes a rational function whose critical points are solutions of a system of polynomial equations obtained from the Lagrange multiplier conditions. The number of complex solutions for generic data is called the maximum likelihood degree (ML degree) of the variety; it measures the algebraic complexity of the MLE problem.

A major contribution of the paper is the exact computation of the ML degree for a wide range of rank‑constrained varieties. For rank‑1 matrices the authors recover the known formula ML = n + m − 2. For full‑rank matrices the degree is 1, reflecting the fact that the unconstrained MLE is unique. The central result is a closed combinatorial expression for the ML degree when 1 < r < min(n,m). This expression is derived by intersecting the determinantal variety with a generic linear space of complementary dimension and applying Chern class calculations. The resulting formula matches previously known special cases and extends them to all intermediate ranks.

To validate the theoretical counts, the authors employ numerical algebraic geometry, specifically homotopy continuation methods, to compute all complex critical points for concrete instances. Using software such as Bertini and PHCpack, they track solution paths from a start system with known solutions to the target likelihood equations for random data. The numerical experiments confirm that the number of isolated solutions equals the predicted ML degree, even for examples where the degree reaches several thousand (e.g., a 5 × 5 matrix of rank 2). This demonstrates that modern numerical tools can handle the otherwise intractable polynomial systems arising from rank‑constrained MLE.

During these experiments the authors discover a striking duality: matrices of rank r and matrices of complementary rank (n + m − r) share the same set of likelihood critical values, up to a simple transformation. In other words, the likelihood equations for a rank‑r model are in bijection with those for the complementary rank model. This “maximum likelihood duality” is later proved rigorously by Draisma and Rodríguez, confirming that the phenomenon is not an artifact of particular examples but a structural property of determinantal varieties.

The paper concludes with several implications. First, the ML degree provides a quantitative measure of model complexity that can be used in model selection and in assessing the difficulty of numerical optimization. Second, the homotopy continuation approach offers a practical way to obtain all stationary points, which is valuable for global optimization strategies and for understanding the landscape of the likelihood function. Third, the duality suggests that statistical inference for low‑rank models can be translated into inference for high‑rank complementary models, potentially simplifying certain computations. Finally, the authors outline future directions: extending the analysis to mixtures of more than two variables, to continuous distributions, and to other algebraic constraints (e.g., symmetric or positive‑definite matrices). They also propose exploring GPU‑accelerated homotopy methods and integrating the algebraic insights into Bayesian posterior approximations.

Overall, the work bridges algebraic geometry, statistics, and numerical computation, delivering both theoretical formulas for the ML degree of rank‑constrained models and concrete algorithms for solving the associated likelihood equations. It opens new avenues for the rigorous study of constrained statistical models and demonstrates the power of algebraic techniques in modern data analysis.