Monotonic Transformation Invariant Multi-task Learning
Multi-task learning (MTL) algorithms typically rely on schemes that combine different task losses or their gradients through weighted averaging. These methods aim to find Pareto stationary points by using heuristics that require access to task loss values, gradients, or both. In doing so, a central challenge arises because task losses can be arbitrarily scaled relative to one another, causing certain tasks to dominate training and degrade overall performance. A recent advance in cooperative bargaining theory, the Direction-based Bargaining Solution (DiBS), yields Pareto stationary solutions immune to task domination because of its invariance to monotonic nonaffine task loss transformations. However, the convergence behavior of DiBS in nonconvex MTL settings is currently not understood. To this end, we prove that under standard assumptions, a subsequence of DiBS iterates converges to a Pareto stationary point when task losses are nonconvex, and propose DiBS-MTL, an adaptation of DiBS to the MTL setting which is more computationally efficient that prior bargaining-inspired MTL approaches. Finally, we empirically show that DiBS-MTL is competitive with leading MTL methods on standard benchmarks, and it drastically outperforms state-of-the-art baselines in multiple examples with poorly-scaled task losses, highlighting the importance of invariance to nonaffine monotonic transformations of the loss landscape. Code available at https://github.com/suryakmurthy/dibs-mtl
💡 Research Summary
The paper tackles a fundamental robustness issue in multi‑task learning (MTL): the arbitrary scaling of task losses. Most existing MTL methods combine task losses or their gradients using fixed or dynamically adjusted weights. Because these approaches rely on the raw magnitude of the losses, a monotonic but non‑affine transformation (e.g., taking the log, applying a sigmoid, or any other monotonic reshaping) can dramatically change the relative importance of tasks. In practice this leads to “task domination” where one task overwhelms the others, especially in settings such as multi‑task reinforcement learning where reward shaping can produce loss scales that differ by orders of magnitude.
The authors bring in a concept from cooperative bargaining theory called the Direction‑based Bargaining Solution (DiBS). DiBS determines an update direction by normalizing each task’s gradient and weighting it by the squared distance between the current parameter vector and a “preferred” point for that task (the local minimizer of the task loss). Because the direction uses only normalized gradients and distances to local minima, any monotonic, possibly non‑affine transformation of the loss functions leaves the update direction unchanged. This invariance property is precisely what is needed for MTL methods that must operate under unknown or changing loss scalings.
A major theoretical contribution is the extension of DiBS’s convergence guarantees from the previously studied strongly convex setting to the realistic non‑convex regime typical of deep neural networks. Under standard assumptions (differentiable, bounded losses; interior Pareto stationary points; Robbins‑Monro step‑size conditions) and a mild boundedness assumption on the iterates (later relaxed), the authors prove that a subsequence of DiBS iterates converges to a Pareto stationary point. Importantly, the proof does not require linear independence of task gradients at non‑stationary points—a condition that earlier Nash‑Bargaining‑based MTL analyses needed. This makes the result applicable to a far broader class of practical problems.
Building on this theory, the authors propose DiBS‑MTL, an algorithm that adapts DiBS to the MTL setting. At each epoch j, the algorithm computes normalized gradients (\bar g_{i,j}) for all N tasks, defines a preferred parameter for each task as (\theta^{}{i,j}= \theta_j - \epsilon \bar g{i,j}) (i.e., a step of size ε in the negative gradient direction), and then performs T inner bargaining updates. Each inner update moves the current parameter vector toward a weighted sum of the normalized gradients, where the weight for task i is the squared distance between the current inner iterate and (\theta^{}_{i,j}). After T inner steps, the accumulated change (\Delta\theta_j) is applied with a learning‑rate η. This procedure requires only a single gradient evaluation per task per epoch and avoids solving any inner optimization problem, making it computationally cheaper than prior bargaining‑inspired MTL methods that rely on solving Nash‑Bargaining sub‑problems.
Empirically, the authors evaluate DiBS‑MTL on several standard MTL benchmarks: NYU‑Depth V2 (depth + surface normal), Cityscapes (semantic segmentation + depth), a quantum chemistry dataset (multiple molecular property predictions), and a multi‑task reinforcement learning suite. They also construct synthetic experiments where task losses are deliberately rescaled using highly non‑linear monotonic functions. In all cases, DiBS‑MTL matches or exceeds the performance of state‑of‑the‑art MTL baselines (e.g., GradNorm, PCGrad, MGDA, IMTL‑G). The advantage becomes dramatic when loss scales are mismatched: while baseline methods suffer performance drops of 10‑30 % or even collapse, DiBS‑MTL remains stable, confirming the practical importance of monotonic‑invariance.
In summary, the paper makes three key contributions: (1) a novel convergence proof for DiBS under non‑convex task losses, (2) the DiBS‑MTL algorithm that brings the theoretical benefits of DiBS to practical multi‑task training with low computational overhead, and (3) extensive experiments demonstrating that invariance to monotonic non‑affine loss transformations leads to robust and often superior performance across diverse domains. This work therefore establishes a new paradigm for designing MTL methods that are fundamentally insensitive to how individual task losses are scaled or transformed.
Comments & Academic Discussion
Loading comments...
Leave a Comment