The adaptive projected subgradient method constrained by families of quasi-nonexpansive mappings and its application to online learning

Many online, i.e., time-adaptive, inverse problems in signal processing and machine learning fall under the wide umbrella of the asymptotic minimization of a sequence of non-negative, convex, and continuous functions. To incorporate a-priori knowledge into the design, the asymptotic minimization task is usually constrained on a fixed closed convex set, which is dictated by the available a-priori information. To increase versatility towards the usage of the available information, the present manuscript extends the Adaptive Projected Subgradient Method (APSM) by introducing an algorithmic scheme which incorporates a-priori knowledge in the design via a sequence of strongly attracting quasi-nonexpansive mappings in a real Hilbert space. In such a way, the benefits offered to online learning tasks by the proposed method unfold in two ways: 1) the rich class of quasi-nonexpansive mappings provides a plethora of ways to cast a-priori knowledge, and 2) by introducing a sequence of such mappings, the proposed scheme is able to capture the time-varying nature of a-priori information. The convergence properties of the algorithm are studied, several special cases of the method with wide applicability are shown, and the potential of the proposed scheme is demonstrated by considering an increasingly important, nowadays, online sparse system/signal recovery task.

💡 Research Summary

The paper addresses a fundamental class of online inverse problems that can be expressed as the asymptotic minimization of a sequence of non‑negative, convex, continuous functions (f_n). In many practical signal‑processing and machine‑learning scenarios, additional a‑priori information is incorporated by constraining the iterates to a fixed closed convex set (C). While this approach works well when the prior knowledge is static, it becomes inadequate when the knowledge evolves over time or possesses a more complex structure.

To overcome this limitation, the authors extend the Adaptive Projected Subgradient Method (APSM) by replacing the static projection onto (C) with a sequence of strongly attracting quasi‑nonexpansive mappings ({T_n}_{n\ge0}) defined on a real Hilbert space (\mathcal H). A mapping (T) is quasi‑nonexpansive if for every fixed point (p\in\operatorname{Fix}(T)) the inequality (|Tx-p|\le|x-p|) holds; the additional “strongly attracting’’ property introduces a positive constant (\alpha_n) such that (|Tx-p|^2\le|x-p|^2-\alpha_n|x-Tx|^2). This condition guarantees that each iteration not only does not increase the distance to the fixed‑point set but also pulls the iterate strictly closer, which is essential for handling time‑varying constraints.

The algorithm proceeds as follows for iteration (n):

Compute a subgradient (g_n\in\partial f_n(x_n)) of the current loss function at the current iterate (x_n).
Take an adaptive subgradient step with step size (\mu_n>0): (y_n = x_n - \mu_n g_n).
Apply the current quasi‑nonexpansive mapping: (x_{n+1}=T_n(y_n)).

The step sizes are required to satisfy the classic diminishing‑but‑non‑summable conditions (\sum_{n=0}^\infty\mu_n = \infty) and (\sum_{n=0}^\infty\mu_n^2 < \infty). Under these conditions, together with the strong attraction of each (T_n), the authors prove that the generated sequence ({x_n}) converges to a point belonging to the intersection of all fixed‑point sets (\bigcap_{n\ge0}\operatorname{Fix}(T_n)). Moreover, if the minimizer set of the asymptotic problem (\bigcap_{n\ge0}\operatorname{arg,min} f_n) intersects this fixed‑point intersection, the algorithm converges to a common element, i.e., a solution that simultaneously satisfies the evolving prior constraints and minimizes the loss.

The convergence proof is built on three pillars: (i) Fejér monotonicity of the iterates with respect to the fixed‑point intersection, (ii) Opial’s lemma, which translates Fejér monotonicity into weak convergence in Hilbert spaces, and (iii) a careful handling of the time‑varying nature of the mappings, showing that the accumulated “attraction’’ terms (\sum \alpha_n|x_n-T_nx_n|^2) remain finite, which forces (|x_n-T_nx_n|\to0).

Several important special cases are derived to illustrate the flexibility of the framework:

Static orthogonal projection (T_n=P_C) recovers the classical APSM.
Proximal operators (T_n=\operatorname{prox}_{\gamma_n h_n}) allow the inclusion of additional regularizers (h_n) that may also evolve over time, linking the method to proximal‑gradient schemes.
Weighted‑average or sliding‑window operators model situations where recent data are given higher importance, enabling the algorithm to adapt to non‑stationary environments.
Support‑set projections for sparse recovery, where each (T_n) enforces sparsity on a support set estimated from past iterates.

The authors showcase the practical impact of their theory on an online sparse system identification problem. The measurement model is (b_n = A_n x^\star + \nu_n) with a time‑varying measurement matrix (A_n) and additive noise (\nu_n). The loss function is the quadratic data‑fit term (f_n(x)=\frac12|A_n x-b_n|^2). Sparsity is promoted by an (\ell_1) regularizer and, crucially, by a mapping (T_n) that projects onto the subspace spanned by a support set (\mathcal S_n) estimated online (e.g., via thresholding of the current iterate). Because (\mathcal S_n) can change as the underlying sparse signal evolves, the mapping sequence captures this time‑varying prior. Numerical experiments compare the proposed method with the traditional APSM that uses a fixed sparsity constraint. Results demonstrate faster convergence, lower steady‑state mean‑square error, and robustness to abrupt changes in the support set, confirming that the dynamic constraint mechanism yields tangible performance gains.

The paper concludes with a discussion of limitations and future directions. The current convergence analysis relies on the strong attraction property; extending the theory to general quasi‑nonexpansive mappings without this extra condition remains open. Adaptive selection of the step sizes (\mu_n) and the attraction parameters (\alpha_n) is another promising avenue, as is the treatment of non‑convex loss functions or stochastic subgradients. Finally, the authors suggest exploring distributed implementations where each node runs a local instance of the algorithm and exchanges information to enforce a global, possibly time‑varying, constraint.

In summary, the manuscript introduces a powerful generalization of APSM that replaces a static convex constraint with a sequence of strongly attracting quasi‑nonexpansive operators. This innovation enables online learning algorithms to incorporate evolving a‑priori knowledge in a mathematically rigorous way, while preserving convergence guarantees. The theoretical contributions, the breadth of special cases, and the compelling sparse‑recovery experiment together make a strong case for the relevance of this method in modern adaptive signal processing and machine‑learning applications.

💡 Research Summary

📜 Original Paper Content