Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging
Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees. While learning-based approaches such as deep unrolling and meta-learning achieve strong empirical performance, they typically lack explicit control over descent and curvature, limiting robustness. We propose a learned Majorization-Minimization (MM) framework for inverse problems within a bilevel optimization setting. Instead of learning a full optimizer, we learn a structured curvature majorant that governs each MM step while preserving classical MM descent guarantees. The majorant is parameterized by a lightweight recurrent neural network and explicitly constrained to satisfy valid MM conditions. For cosine-similarity losses, we derive explicit curvature bounds yielding diagonal majorants. When analytic bounds are unavailable, we rely on efficient Hessian-vector product-based spectral estimation to automatically upper-bound local curvature without forming the Hessian explicitly. Experiments on EEG source imaging demonstrate improved accuracy, stability, and cross-dataset generalization over deep-unrolled and meta-learning baselines.
💡 Research Summary
The paper introduces a learned Majorization‑Minimization (MM) framework for solving ill‑posed inverse problems, with a focus on EEG source imaging (ESI). Classical MM algorithms guarantee monotone descent by constructing a quadratic surrogate (majorant) that upper‑bounds the objective at the current iterate; minimizing this surrogate yields a provably decreasing update. However, traditional MM requires an analytically derived curvature bound for each loss term, which becomes infeasible when the regularizer is a deep neural network or when the loss is non‑standard, such as cosine similarity.
To bridge this gap, the authors propose to learn only the curvature majorant while preserving the MM structure. They parameterize the majorant matrix (P_x) as a diagonal matrix whose entries are generated by a lightweight recurrent neural network (RNN) that takes the current state (x) and its gradient (\nabla\xi(x)) as inputs. The update therefore takes the form (\hat{x}=x - p\odot\nabla\xi(x)), where (p) is the vector of inverse diagonal curvatures output by the RNN. By restricting to a diagonal majorant, inversion is trivial, keeping the per‑iteration cost low.
A major theoretical contribution is the derivation of explicit curvature bounds for cosine‑similarity loss, which is widely used in ESI because it is scale‑invariant and emphasizes directional alignment. The authors prove that on any bounded convex subset of the feasible domain (where (|x|\ge \nu>0)), the gradient of the cosine similarity is Lipschitz continuous with a constant that can be expressed in terms of the forward operator norm (|L|), the current iterate norm, and the Jacobian/Hessian norms of the learned representation (\Phi(x;\theta)). This yields a closed‑form interval (\nu\mathbf{1}\preceq p\preceq \frac{1}{\mu_1+\lambda\mu_2}\mathbf{1}) guaranteeing the majorant condition.
When analytic bounds are unavailable (e.g., for highly nonlinear learned regularizers), the paper introduces an automatic curvature estimation based on Hessian‑vector products and a power‑iteration scheme. By estimating the dominant eigenvalue (\lambda_{\max}) of the Hessian, the algorithm sets (P_x = \lambda_{\max} I), which is provably an upper bound on the local curvature without ever forming the full Hessian.
Training is cast as a bilevel optimization problem. The inner level runs the learned MM iterations to produce a reconstruction (\hat{x}(\theta)); the outer level updates the RNN parameters (\theta) by back‑propagating a task‑specific loss (e.g., reconstruction error) through the unrolled MM steps. Because each inner iteration satisfies the MM descent property, the overall bilevel problem meets the standard assumptions required for convergence to stationary points in non‑convex bilevel optimization, as established in prior works.
Empirical evaluation on publicly available EEG datasets demonstrates that the learned MM solver outperforms state‑of‑the‑art deep‑unrolled methods (such as LISTA, ADMM‑unrolled) and meta‑learning baselines (e.g., Meta‑Curvature, ModGrad). The proposed method achieves 10‑15 % lower reconstruction error, exhibits faster convergence, and shows superior cross‑dataset generalization, indicating robustness to distribution shifts. Despite using only diagonal majorants, the method’s convergence speed matches or exceeds that of full‑matrix quasi‑Newton approaches.
In summary, the paper delivers a novel optimization paradigm that combines the interpretability and theoretical guarantees of MM with the adaptability of learned curvature estimation. This approach is particularly valuable for high‑dimensional, noisy inverse problems where handcrafted curvature bounds are impractical, and it opens avenues for extending learned MM to other loss functions, non‑diagonal majorants, and real‑time neuroimaging applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment