Feedback Control for Multi-Objective Graph Self-Supervision
Can multi-task self-supervised learning on graphs be coordinated without the usual tug-of-war between objectives? Graph self-supervised learning (SSL) offers a growing toolbox of pretext objectives: mutual information, reconstruction, contrastive learning; yet combining them reliably remains a challenge due to objective interference and training instability. Most multi-pretext pipelines use per-update mixing, forcing every parameter update to be a compromise, leading to three failure modes: Disagreement (conflict-induced negative transfer), Drift (nonstationary objective utility), and Drought (hidden starvation of underserved objectives). We argue that coordination is fundamentally a temporal allocation problem: deciding when each objective receives optimization budget, not merely how to weigh them. We introduce ControlG, a control-theoretic framework that recasts multi-objective graph SSL as feedback-controlled temporal allocation by estimating per-objective difficulty and pairwise antagonism, planning target budgets via a Pareto-aware log-hypervolume planner, and scheduling with a Proportional-Integral-Derivative (PID) controller. Across 9 datasets, ControlG consistently outperforms state-of-the-art baselines, while producing an auditable schedule that reveals which objectives drove learning.
💡 Research Summary
The paper tackles a fundamental challenge in graph self‑supervised learning (SSL): how to combine multiple pretext tasks—mutual information maximization, masked reconstruction, contrastive learning, link prediction—without suffering from the well‑known “tug‑of‑war” among objectives. Existing multi‑pretext pipelines typically blend all task losses at every optimizer step (per‑step mixing) or merge gradients into a single direction. The authors identify three recurring failure modes of this approach: (1) Disagreement – conflicting gradients force a compromise that is sub‑optimal for all tasks; (2) Drift – the usefulness of a pretext task changes over the course of training, so static or slowly adapting weights lag behind; (3) Drought – adaptive weighting can drive the importance of some tasks to near zero, making it impossible to know whether those tasks ever contributed.
To overcome these issues, the authors recast multi‑objective graph SSL as a temporal allocation problem: instead of mixing tasks at every step, allocate whole blocks of optimization steps to a single task, and decide when each task receives its share of the compute budget. This leads to the ControlG framework, which consists of three tightly coupled loops operating at different timescales:
-
SENSE (full‑graph state estimation) – every u blocks the system scans the entire graph to compute two signals for each task k:
- Spectral Demand – the Rayleigh quotient of the task‑specific gradient field on the graph Laplacian, measuring how “high‑frequency” (hard to smooth) the learning signal is. A higher value indicates a more difficult task.
- Interference – the Pareto‑relevant weight λ*_k obtained from Multi‑Gradient Descent Algorithm (MGDA). Large λ*_k means the task is currently constraining the Pareto front and therefore likely to conflict with others.
These signals are combined (with tunable coefficients) into a bounded difficulty score D_k and the current loss is also normalized (˜L_k).
-
PLAN (Pareto‑aware allocation) – using the normalized losses, a log‑hypervolume (log‑HV) sensitivity w_HV_k = 1/(r_k – ˜L_k) is computed, where r_k is a reference point worse than any observed loss. Log‑HV is a Pareto‑compliant scalarization: tasks that are closer to the Pareto front receive larger sensitivity. The planner then produces a target allocation a_k ∝ w_HV_k / D_k, which is normalized to a probability vector f(t) ∈ Δ^K representing the desired proportion of total blocks for each task at epoch t.
-
CONTROL (deficit‑tracking PID controller) – the system tracks how many blocks each task has actually received (N_k) versus the planned count N_ref_k = f_k(t)·(total blocks). The deficit e_k = N_ref_k – N_k is fed into a discrete‑time PID controller:
ν_k = K_P·e_k + K_I·∑_{τ≤m} e_k(τ) + K_D·(e_k(m) – e_k(m‑1)).
The logits ν are turned into a sampling distribution via softmax (with ε‑greedy exploration) to select the next task for the upcoming block. This mechanism mirrors classic deficit round‑robin scheduling, guaranteeing long‑run fairness while allowing rapid correction of temporary imbalances.
During training, each selected block runs B_block mini‑batches of the chosen task using a standard optimizer (e.g., Adam). The key design choice is that the sequence of tasks {k_{t,m}} becomes a first‑class decision variable, rather than a hidden by‑product of loss weighting.
The authors evaluate ControlG on nine standard graph benchmarks (Cora, Citeseer, Pubmed, ogbn‑arxiv, ogbn‑products, etc.) with four pretext tasks. Baselines include ParetoGNN (MGDA‑based multi‑gradient descent), WAS (instance‑level task selection), AutoSSL (static task search), and GraphTCM (task‑correlation modeling). Results show that ControlG consistently outperforms all baselines, achieving 2–5 percentage‑point gains in downstream node‑classification accuracy, especially on datasets where task interference is strong. Moreover, ControlG produces an interpretable schedule: early epochs explore all tasks, mid‑training shifts budget toward mutual‑information when interference spikes, and later epochs burst on reconstruction once it becomes the lagging objective. Such transparency is absent in conventional weighted‑sum methods.
Key contributions are:
- Formulating multi‑objective graph SSL as a closed‑loop scheduling problem with explicit temporal allocation.
- Introducing full‑graph spectral demand and MGDA‑derived interference as principled difficulty signals.
- Designing a Pareto‑aware log‑hypervolume planner that adjusts allocation based on both progress and task hardness.
- Employing a deficit‑tracking PID controller to execute the plan while preventing starvation (the “drought” problem).
- Demonstrating robust performance gains and providing an auditable training trace across diverse benchmarks.
In summary, ControlG shows that temporal separation of tasks, guided by real‑time difficulty estimates and Pareto‑sensitive planning, can eliminate gradient conflicts, adapt to non‑stationary task utility, and guarantee fair resource distribution. This advances the stability, efficiency, and interpretability of multi‑task graph self‑supervision, opening avenues for more sophisticated curricula and for extending the framework to other domains such as vision or language where self‑supervised pretext tasks also compete.
Comments & Academic Discussion
Loading comments...
Leave a Comment