Neural Parameter Estimation with Incomplete Data

Neural Parameter Estimation with Incomplete Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Advances in artificial intelligence (AI) and deep learning have led to neural networks being used to generate lightning-speed answers to complex science questions, paintings in the style of Monet, or stories like those of Twain. Leveraging their computational speed and flexibility, neural networks are also being used to facilitate fast, likelihood-free statistical inference. However, it is not straightforward to use neural networks with data that for various reasons are incomplete, which precludes their use in many applications. A recently proposed approach to remedy this issue uses an appropriately padded data vector and a vector that encodes the missingness pattern as input to a neural network. While computationally efficient, this “masking” approach is not robust to the missingness mechanism and can result in statistically inefficient inferences. Here, we propose an alternative approach that is based on the Monte Carlo expectation-maximization (EM) algorithm. Our EM approach is likelihood-free, substantially faster than the conventional EM algorithm as it does not require numerical optimization at each iteration, and more statistically efficient than the masking approach. This research addresses a prototypical problem that asks how improvements could be made in AI by introducing Bayesian statistical thinking. We compare the two approaches to missingness using simulated incomplete data from a variety of spatial models. The utility of the methodology is shown on Arctic sea-ice data, analyzed using a novel hidden Potts model with an intractable likelihood.


💡 Research Summary

The paper addresses a critical gap in the application of neural Bayes estimators (NBEs) to settings where data are incomplete. While recent “masking” approaches—replacing missing entries with a constant (e.g., zero) and feeding a binary mask alongside the data into a neural network—offer computational speed, they suffer from two major drawbacks. First, they require an explicit model for the missingness mechanism; misspecification leads to biased and inefficient inference, especially under non‑MCAR (MAR or MNAR) conditions. Second, the mask adds an extra input dimension, making the learning task harder and degrading generalisation when missingness patterns are complex.

To overcome these issues, the authors propose a likelihood‑free Monte Carlo Expectation–Maximization (MCEM) framework tailored to NBEs. The key idea is to treat missing values as latent variables, sample them conditionally on the current parameter estimate, and then apply a pre‑trained NBE (which has been trained only on complete data) to obtain a MAP estimate. The E‑step is approximated by Monte Carlo draws of the missing data; the M‑step requires no numerical optimisation because the NBE directly maps a complete data vector to the MAP estimate. Consequently, the algorithm avoids explicit likelihood evaluation and is robust to the missingness mechanism.

The methodology section first reviews NBEs, formalising Bayes risk minimisation and describing how a neural network f(·;γ) is trained to approximate the Bayes estimator by minimising empirical loss over simulated (θ, Z) pairs. The masking approach is then detailed in Algorithm 1, showing how simulated complete data are padded, masked, and fed to a network g(U,W;γ). The MCEM approach replaces the padding step with conditional simulation of the missing components, keeping the network architecture unchanged and preserving statistical efficiency.

Extensive simulation studies compare the two methods across a suite of spatial models (Gaussian fields, Markov random fields, hidden Potts models) and three missingness regimes (MCAR, MAR, MNAR). Results consistently show that MCEM yields lower bias, smaller mean‑squared error, and better confidence‑interval coverage than masking, particularly under MNAR where masking can be severely biased. Although MCEM incurs extra cost for sampling missing values each iteration, the number of EM iterations required for convergence is modest, making overall runtime comparable to or even faster than masking in many scenarios.

The authors demonstrate real‑world utility by analysing Arctic sea‑ice thickness data using a novel hidden Potts model with an intractable likelihood. By applying the MCEM‑based NBE, they obtain stable parameter estimates and realistic uncertainty quantification despite substantial missingness due to cloud cover. This application underscores the method’s ability to handle complex, high‑dimensional spatial models where traditional likelihood‑based EM would be prohibitive.

Both algorithms are implemented in the open‑source “NeuralEstimators” package (available for Julia and R), providing a user‑friendly interface and reproducible code. The paper concludes that while masking remains attractive for its simplicity and speed, the MCEM approach offers superior statistical guarantees with only modest additional computation. Future work is suggested on extending the framework to non‑spatial data, integrating automatic learning of the missingness model, and exploring hybrid schemes that combine the strengths of both approaches.


Comments & Academic Discussion

Loading comments...

Leave a Comment