Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We develop an active set algorithm for the maximum likelihood estimation of a log-concave density based on complete data. Building on this fast algorithm, we indidate an EM algorithm to treat arbitrarily censored or binned data.

💡 Research Summary

The paper addresses the problem of non‑parametric density estimation under the log‑concavity constraint, a setting that guarantees a unique maximum‑likelihood estimator (MLE) and desirable statistical properties such as consistency and shape regularity. While existing approaches (e.g., isotonic regression, convex‑optimization solvers) can compute the MLE for complete data, they typically require solving large‑scale linear or quadratic programs whose computational cost grows at least cubicly with the sample size. Moreover, extending these methods to censored or binned observations usually involves costly numerical integration or Monte‑Carlo approximations, leading to a trade‑off between accuracy and speed.

Active‑set algorithm for complete data
The authors first reformulate the log‑concave density f(x)=exp(φ(x)) where φ is a convex function. By sorting the observations and exploiting the fact that the optimal φ is piecewise‑linear with knots at a subset of the data points, the estimation problem can be expressed in terms of a finite set of slopes and intercepts. The key insight is that only a small subset of “active intervals” actually influences the KKT conditions at optimality. The algorithm proceeds iteratively:

Initialise φ with a simple linear interpolation of the sorted data.
Evaluate the KKT residuals on each interval; intervals where the residual is non‑zero are marked as violating.
Update the active set by adding violating intervals or merging/splitting existing ones.
Solve a reduced linear (or quadratic) program defined solely on the current active set to obtain a new φ.
Repeat steps 2‑4 until the active set stabilises and the change in log‑likelihood falls below a tolerance.

Because each iteration manipulates only the active intervals—typically O(log n) in number—the per‑iteration cost is linear in n, and the total number of iterations is modest. The authors prove that the algorithm converges to the global MLE and that, once the active set stops changing, convergence is achieved in a single additional iteration.

EM extension for censored/binned data
For data that are not observed exactly but only known to fall within intervals ((L_i,U_i]) (right‑censoring, left‑censoring, or general binning), the authors embed the active‑set MLE within an Expectation‑Maximisation framework. In the E‑step, given the current estimate φ^{(t)}, they compute the conditional expectations of the sufficient statistics of the complete data, i.e., (\mathbb{E}_{φ^{(t)}}

Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment