Characterizations of inexact proximal operators
Proximal operators are now ubiquitous in non-smooth optimization. Since their introduction in the seminal work of Moreau, many papers have shown their effectiveness on a wide variety of problems, culminating in their use to construct convergent deep learning methods. The characterization of these operators for non-convex penalties was completed recently in [Gribonval et al, A characterization of proximity operators, 2020]. In this paper, we propose to follow this line of work by characterizing inexact proximal operators, thus providing an answer to what constitutes a good approximation of these operators. We propose several definitions of approximations and discuss their regularity, approximation power, and their fixed points. Equipped with these characterizations, we investigate the convergence of proximal algorithms in the presence of errors that may be non-summable and/or non-vanishing. In particular, we look at the proximal point algorithm, and at the forward-backward, Peaceman-Rachford and Douglas-Rachford algorithms when we minimize the sum of a weakly convex function (whose proximal operator is approximated) and a strongly convex function.
💡 Research Summary
**
This paper addresses a fundamental gap in modern non‑smooth optimization: the behavior of proximal algorithms when the proximal operator is computed only approximately, possibly with errors that are neither summable nor vanishing. Building on the recent complete characterization of exact proximal operators for non‑convex penalties (Gribonval et al., 2020), the authors introduce a systematic framework for “inexact proximal operators” and study their properties in depth.
The authors first enumerate six distinct ways of approximating the proximal mapping of a function ϕ, denoted (a)–(f):
(a) direct additive error, (b) perturbation of the argument, (c) ε‑subdifferential of the proximal sub‑problem, (d) gradient of an ε‑perturbed Moreau envelope ψε, (e) ε‑subgradient of ψ, and (f) a Hamilton‑Jacobi‑based formulation using the Moreau envelope uε. Types (a)–(c) are well‑known in convex optimization; (d) appears in plug‑and‑play literature; (e) is newly introduced; (f) has historical roots but is recently revisited.
To evaluate any ε‑approximation g of proxϕ, three criteria are proposed:
- Quality (Definition 1) – there exists a function σ(ε) such that ‖g(x)−proxϕ(x)‖ ≤ σ(ε) for all x.
- Admissibility (Definition 2) – g possesses at least one fixed point in a neighbourhood of a local minimizer of ϕ.
- Regularity (Definition 3) – g is (L_g, γ)‑Lipschitz, i.e., ‖g(x)−g(y)‖ ≤ L_g‖x−y‖ + γ.
Table 1 summarizes, for each approximation type, whether these properties hold, under which additional assumptions (e.g., convexity of ϕ, boundedness of ε, or problem‑specific structure). Notably, approximations of type (c), (e), and (f) automatically satisfy admissibility and regularity without extra problem information, while (a) and (b) may fail to have fixed points unless the error is sufficiently small.
The second major contribution is a convergence analysis of four classic proximal algorithms when the exact proximal operator of ϕ is replaced by an inexact mapping g that satisfies the above criteria:
- Proximal Point Algorithm (PPA) – convergence to a minimizer is guaranteed if the sequence of precisions ε_k is bounded and σ(ε_k) → 0, even when Σ‖e_k‖ = ∞.
- Forward‑Backward Splitting (FB) – assuming f is L‑smooth and μ‑strongly convex, and ϕ is ρ‑weakly convex (ρ ≤ 1), any g that is (L_g, γ)‑Lipschitz with L_g close to the Lipschitz constant of proxϕ and with admissible fixed points yields convergence of the iterates to a solution, provided the step size respects the usual FB bound and ε_k → 0.
- Peaceman‑Rachford (PR) and Douglas‑Rachford (DR) – the authors show that, under the same weak‑convex/strong‑convex decomposition, the composite operator (Id − g ∘ prox_f) remains non‑expansive enough to ensure convergence of the generated sequence to a fixed point of the exact algorithm. The analysis relaxes the traditional requirement that the error be summable; boundedness together with vanishing ε_k suffices.
A particularly important insight is that the strong convexity of f compensates for the lack of summability in the error sequence, allowing the algorithms to tolerate persistent, bounded inaccuracies in the proximal step. This is highly relevant for modern “plug‑and‑play” or “learned‑prox” schemes where a neural network denoiser replaces proxϕ and the true functional ϕ is unknown, making it impossible to verify classical error‑summability conditions.
The paper also provides concrete examples and numerical experiments. For total variation regularization, the authors compare an ε‑subgradient approximation (type c) with a deep CNN denoiser trained to mimic the proximal step (type d). Results on standard imaging benchmarks demonstrate that the theoretical bounds on σ(ε) and L_g translate into comparable empirical convergence rates and final reconstruction quality. Additional experiments on sparse regression illustrate how type f approximations, derived from solving a Hamilton‑Jacobi equation, can be used when the exact proximal map is unavailable.
In the appendices, detailed proofs of all main theorems are given, along with auxiliary lemmas on ε‑subdifferentials, cocoercivity, and the relationship between weak convexity and the ρ‑regularized Moreau envelope. The authors also discuss limitations: admissibility cannot be guaranteed for arbitrary approximations without problem‑specific information, and the analysis currently assumes deterministic error models (random errors are left for future work).
Overall, this work delivers a comprehensive taxonomy of inexact proximal operators, establishes rigorous quality and regularity criteria, and leverages them to prove robust convergence of several cornerstone proximal algorithms under realistic, non‑summable error regimes. The results bridge a critical theoretical gap for practical large‑scale optimization and learning pipelines where exact proximal steps are infeasible, thereby offering both a deeper understanding and actionable guidelines for algorithm designers.
Comments & Academic Discussion
Loading comments...
Leave a Comment