Data-Discriminants of Likelihood Equations
Maximum likelihood estimation (MLE) is a fundamental computational problem in statistics. The problem is to maximize the likelihood function with respect to given data on a statistical model. An algebraic approach to this problem is to solve a very structured parameterized polynomial system called likelihood equations. For general choices of data, the number of complex solutions to the likelihood equations is finite and called the ML-degree of the model. The only solutions to the likelihood equations that are statistically meaningful are the real/positive solutions. However, the number of real/positive solutions is not characterized by the ML-degree. We use discriminants to classify data according to the number of real/positive solutions of the likelihood equations. We call these discriminants data-discriminants (DD). We develop a probabilistic algorithm for computing DDs. Experimental results show that, for the benchmarks we have tried, the probabilistic algorithm is more efficient than the standard elimination algorithm. Based on the computational results, we discuss the real root classification problem for the 3 by 3 symmetric matrix~model.
💡 Research Summary
The paper addresses the fundamental computational problem of maximum likelihood estimation (MLE) from an algebraic perspective. By representing a statistical model as an algebraic variety defined by polynomial equations, the authors formulate the likelihood equations using Lagrange multipliers, yielding a structured system of polynomials (F_0,\dots,F_{n+s+1}) in the data variables (u_0,\dots,u_n), the probability parameters (p_0,\dots,p_n), and the multipliers (\lambda_1,\dots,\lambda_{s+1}). For generic data the system has a finite number of complex solutions, whose cardinality is the model’s ML‑degree. However, only real, positive solutions are statistically meaningful, and their number is not determined by the ML‑degree.
The authors observe that the number of real/positive solutions can change only when the data cross a certain algebraic hypersurface. They identify three sources of such “special” data: (1) non‑properness of the projection (\pi: L_X \to \mathbb{C}^{n+1}) (denoted (L_X^\infty)), (2) points where the Jacobian determinant of the likelihood system vanishes (denoted (L_X^J)), and (3) points where one of the probability coordinates becomes zero (denoted (L_X^p)). Each of these sets is defined by a homogeneous polynomial: (D_X^\infty), (D_X^J), and (D_X^p) respectively. The product \
Comments & Academic Discussion
Loading comments...
Leave a Comment