The AI&M Procedure for Learning from Incomplete Data
We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihood-based methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimzing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very high-dimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and mazimization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihood-based inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random.
💡 Research Summary
The paper tackles the notoriously difficult problem of learning model parameters from data that are not missing at random (MNAR). In such settings the likelihood of the observed data must be maximized after profiling out the unknown missingness mechanism, which leads to a profile likelihood that is both highly multimodal and defined over a very high‑dimensional parameter space. Traditional approaches either resort to conservative inference—averaging over every possible data completion—or apply the Expectation‑Maximization (EM) algorithm under the (often violated) Missing‑At‑Random (MAR) assumption. Both strategies have serious drawbacks: conservative inference is overly cautious and computationally prohibitive, while EM can become trapped in local optima and its theoretical justification breaks down when the MAR assumption does not hold.
To address the dimensionality issue, the authors propose the AI&M (Adjusting Imputation and Maximization) procedure. Rather than searching directly in the parameter space, AI&M operates in the space of data completions. The algorithm iterates two steps. In the Maximization step, given a current imputed complete dataset, a standard maximum‑likelihood estimate of the model parameters is computed (exactly as if the data were fully observed). In the Adjusting step, these newly estimated parameters are used to re‑weight or modify the set of possible completions, effectively moving the imputation toward regions that are more compatible with the current parameter estimate. This alternating scheme can be viewed as a reformulation of EM where the E‑step is replaced by a deterministic or stochastic search over completions, guided by the latest M‑step results. Because each completion is a concrete dataset, the maximization sub‑problem remains low‑dimensional and can exploit existing efficient solvers.
The authors instantiate AI&M for Bayesian networks with known structure, focusing on learning the conditional probability tables (CPTs). They generate synthetic data under a deliberately MNAR mechanism, then compare three methods: (1) conservative inference, which enumerates or samples all completions and averages the resulting likelihoods; (2) standard EM assuming MAR; and (3) AI&M. Performance is measured by the Kullback‑Leibler divergence between the true CPTs and the learned ones, as well as by convergence speed and stability across random initializations.
Results show that conservative inference, while theoretically safe, yields parameter estimates that are substantially biased toward the center of the feasible region, reflecting its tendency to “smooth out” the effect of the missingness mechanism. EM, despite violating its MAR premise, often converges to good solutions when initialized reasonably, but it exhibits sensitivity to starting points and occasional divergence on more pathological missingness patterns. AI&M consistently outperforms conservative inference and matches or exceeds EM in terms of KL divergence, while requiring far fewer evaluations of the full completion space. Moreover, AI&M demonstrates a more robust convergence behavior: the Adjusting step steers the imputation away from poor local optima, reducing the chance of getting stuck.
The paper’s contributions are threefold. First, it introduces a novel optimization framework that shifts the search from an intractable high‑dimensional parameter manifold to a more manageable completion space. Second, it provides empirical evidence that likelihood‑based learning remains feasible for MNAR data when equipped with the AI&M procedure, challenging the prevailing belief that only conservative or fully Bayesian treatments are viable. Third, it clarifies the practical role of EM in MNAR contexts, showing that EM can still be competitive if complemented with good initialization or hybridized with AI&M‑style adjustments.
Limitations are acknowledged. The number of possible completions grows exponentially with the amount of missing data, so exhaustive enumeration is impossible for large datasets. The authors suggest stochastic sampling, importance weighting, or heuristic search (e.g., beam search, Monte‑Carlo tree search) as scalable approximations, but these extensions are left for future work. Additionally, the current study assumes a fixed network structure; extending AI&M to simultaneous structure learning would be a natural but non‑trivial next step.
In conclusion, AI&M offers a promising avenue for maximum‑likelihood estimation under MNAR conditions, especially for structured probabilistic models such as Bayesian networks. By exploiting the relatively low‑dimensional nature of data completions, it mitigates the curse of dimensionality inherent in profile likelihood optimization, delivers accurate parameter estimates, and retains computational tractability. The work opens the door to broader applications, including other graphical models, latent‑variable deep networks, and hybrid systems where missingness mechanisms are partially known or can be learned jointly with model parameters.
Comments & Academic Discussion
Loading comments...
Leave a Comment