Objective-Function Free Multi-Objective Optimization: Rate of Convergence and Performance of an Adagrad-like algorithm

Objective-Function Free Multi-Objective Optimization: Rate of Convergence and Performance of an Adagrad-like algorithm
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose an Adagrad-like algorithm for multi-objective unconstrained optimization that relies on the computation of a common descent direction only. Unlike classical local algorithms for multi-objective optimization, our approach does not rely on the dominance property to accept new iterates, which allows for a flexible and function-free optimization framework. New points are obtained using an adaptive stepsize that does not require neither knowledge of Lipschitz constants nor the use of line search procedures. The rate of convergence is analyzed and is shown to be $\mathcal{O}(1 / \sqrt{ k+1})$ with respect to the norm of the common descent direction. The method is extensively validated on a broad class of unconstrained multi-objective problems and simple multi-task learning instances, and compared against a first-order line search algorithm. Additionally, we present a preliminary study of the behavior under noisy multi-objective settings, highlighting the robustness of the method.


💡 Research Summary

The paper introduces a novel algorithm for unconstrained multi‑objective optimization that belongs to the class of objective‑function‑free optimization (OFFO) methods. Unlike traditional multi‑objective algorithms that rely on dominance relations and explicit evaluation of all objective functions to accept new iterates, the proposed method computes only a common descent direction and updates the iterate using an adaptive step size that does not require Lipschitz constants or any line‑search procedure.

Algorithmic framework
At iteration k the algorithm solves a small convex subproblem Ω(x_k) to obtain non‑negative weights λ_k^j (∑j λ_k^j = 1) that minimize the Euclidean norm of the weighted sum of the gradients of the m objective functions. The resulting common descent direction is
 g_k = Σ
{j=1}^m λ_k^j ∇f_j(x_k).
A scalar weight w_k is updated by accumulating the squared norm of g_k:
 w_k = √(w_{k‑1}² + ‖g_k‖²), w_{‑1}=√ζ (ζ∈(0,1)).
The new iterate is then
 x_{k+1} = x_k – g_k / w_k.

The key point is that the step size is determined solely from the history of the common descent directions, exactly as in the classical Adagrad‑Norm algorithm, but now applied to a multi‑objective setting. No function values are ever computed, and no line search is performed.

Theoretical contributions
The authors introduce two auxiliary scalar functions:

  1. Φ(x) = max_j f_j(x), which is used only for the convergence analysis (it is never evaluated in the algorithm).
  2. ω(x) = min_{λ≥0, Σλ=1} ‖Σ λ_j ∇f_j(x)‖², a measure of Pareto criticality; ω(x)=0 iff x is a Pareto critical point.

Through a series of lemmas they establish that ‖g_k‖² = ω(x_k) and that the quantity max_j ∇f_j(x_k)ᵀ(−g_k) is directly proportional to ω(x_k). Assuming each gradient ∇f_j is L‑Lipschitz continuous, they prove a global convergence rate of
 ‖g_k‖ = O(1/√k).
Thus the norm of the common descent direction, which serves as a natural optimality certificate, diminishes at the same rate as standard stochastic gradient methods, despite the absence of any function‑value information.

Numerical experiments
The method, named MO‑Adagrad, is tested on two families of problems:

Synthetic multi‑objective benchmarks (ZDT, DTLZ, and other custom functions) covering a variety of non‑convexities, scaling differences, and numbers of objectives (up to m=10).

Multi‑task learning on the MNIST dataset, where a single neural network is trained simultaneously for digit classification and a regression task (predicting the sum of pixel intensities).

In all cases MO‑Adagrad outperforms a first‑order line‑search based multi‑objective algorithm in terms of the decrease of ω, the quality of the approximated Pareto front (measured by hypervolume and average distance), and computational efficiency (fewer gradient evaluations per unit of progress).

A robustness study adds Gaussian noise to the objective evaluations. Even when the noise level is high, MO‑Adagrad maintains stable convergence, whereas the line‑search method suffers from erratic step sizes and sometimes diverges.

Discussion and limitations
The main computational overhead lies in solving Ω(x_k) at each iteration. This subproblem is a small convex quadratic program with m variables, which is inexpensive for moderate m but may become costly when both m and the dimension n are very large. The authors suggest future work on approximate solutions, stochastic estimation of λ, or using a limited memory version of the subproblem.

Impact
By eliminating the need to evaluate any objective function, the algorithm is especially attractive for settings where each evaluation is extremely expensive or noisy (e.g., deep learning with massive datasets, simulation‑based optimization, real‑time control). The adaptive step‑size mechanism guarantees a theoretically sound convergence rate while keeping the implementation simple. The work opens a new direction for function‑free multi‑objective optimization and provides a solid foundation for extensions to constrained, stochastic, or distributed environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment