Feedback delays are inevitable in real-world multi-agent learning. They are known to severely degrade performance, and the convergence rate under delayed feedback is still unclear, even for bilinear games. This paper derives the rate of linear convergence of Weighted Optimistic Gradient Descent-Ascent (WOGDA), which predicts future rewards with extra optimism, in unconstrained bilinear games. To analyze the algorithm, we interpret it as an approximation of the Extra Proximal Point (EPP), which is updated based on farther future rewards than the classical Proximal Point (PP). Our theorems show that standard optimism (predicting the next-step reward) achieves linear convergence to the equilibrium at a rate $\exp(-Θ(t/m^{5}))$ after $t$ iterations for delay $m$. Moreover, employing extra optimism (predicting farther future reward) tolerates a larger step size and significantly accelerates the rate to $\exp(-Θ(t/(m^{2}\log m)))$. Our experiments also show accelerated convergence driven by the extra optimism and are qualitatively consistent with our theorems. In summary, this paper validates that extra optimism is a promising countermeasure against performance degradation caused by feedback delays.
Online learning aims for efficient sequential decisionmaking. Typically, it assumes an ideal situation in which current strategies can be determined from all the past feedback. In real-world online learning scenarios, however, delays in feedback are generally inevitable. For instance, in online advertising, there is often a significant time lag between displaying an ad and observing a conversion (Chapelle, 2014;Yoshikawa & Imai, 2018;Yasui et al., 2020). Similarly, in distributed learning, communication latency and asyn-chronous updates inherently introduce delays in gradient aggregation (Agarwal & Duchi, 2011;McMahan & Streeter, 2014;Zheng et al., 2017). Indeed, a considerable number of papers on online learning are motivated by such feedback delays and report that delays amplify regret for full feedback (Weinberger & Ordentlich, 2002;Zinkevich et al., 2009;Quanrud & Khashabi, 2015;Joulani et al., 2016;Shamir & Szlak, 2017) and bandit feedback (Neu et al., 2010;Joulani et al., 2013;Desautels et al., 2014;Cesa-Bianchi et al., 2016;Vernade et al., 2017;Pike-Burke et al., 2018;Cesa-Bianchi et al., 2018;Li et al., 2019). Such feedback delays have also been of interest in multiagent learning or learning in games (Zhou et al., 2017;Hsieh et al., 2022) and are known to severely degrade performance (Fujimoto et al., 2025a). This is because good performance in multi-agent learning is based on each agent predicting their future reward, and feedback delays make this prediction more challenging. Indeed, Optimistic Follow the Regularized Leader (OFTRL), which is a predictive algorithm and enjoys O(1)-regret under instantaneous feedback, suffers from Ω( √ T )-regret for the time horizon T . Even if we adopt a delay-correction mechanism called "Weighted" OFTRL (WOFTRL), the regret scales as O(m2 ), growing too large with delay m.
Despite these prior studies, fundamental challenges remain in delayed feedback in multi-agent learning, especially about convergence analysis in bilinear games, defined as (bilinear game)
The prior study (Fujimoto et al., 2025a) proved WOFTRL converges to the equilibrium, called last-iterate convergence (LIC), when X and Y are constrained to probability spaces. However, LIC in the unconstrained setting X = R dX and Y = R dY under delayed feedback is not guaranteed. Furthermore, its convergence rate is still unestablished. Finding this rate is vital for understanding how quickly agents can stabilize their strategies in applications, and thus convergence rate is an attractive topic in the context of learning in games. Lastly, although the experiment in the prior research suggests that predicting the farther future than necessary to correcting the delays (called “extra prediction”) results in faster convergence, the validity of this extra prediction has yet to be established.
In this paper, we address these open problems in unconstrained bilinear games. Our contributions are as follows.
• We establish the rate of linear convergence even with feedback delays. We approximate our algorithm WOGDA by Extra Proximal Point (EPP), an extension of classical Proximal Point (PP) to predict future rewards. We prove that EPP linearly converges and that the difference between WOGDA and EPP is sufficiently small by setting the step size appropriately.
• We demonstrate that extra prediction accelerates convergence. We find that extra prediction permits larger step sizes, and the underlying EPP converges faster. Therefore, WOGDA with extra prediction achieves much faster convergence at the scale of delay m.
• Our theoretical results are also reproduced in experiments. Both the linear convergence and the acceleration by extra prediction are observed in experiments using both representative (Matching Pennies) and unintended (5 × 5 random matrix) games.
Unconstrained bilinear games: This study targets the class of unconstrained bilinear games. This class is closely related to min-max optimization, and convergence is an important issue there. Also, unconstrained bilinear games are one of the minimum necessary configurations (zero-sum utility and Euclidean strategy space), including difficulties specific to multi-agent learning, and thus have the potential to develop into other various advanced configurations, such as convex-concave utility and constrained strategy space. Indeed, the celebrated study showing LIC in unconstrained bilinear games (Daskalakis et al., 2018) has been thereafter applied to the constrained setting (Mertikopoulos et al., 2019). The linear convergence was first demonstrated in unconstrained bilinear games (Mokhtari et al., 2020) and later shown to hold in constrained saddle-point problems (Wei et al., 2021). Also, LIC in time-varying games was first proven in unconstrained bilinear games (Feng et al., 2023) and later discussed for the constrained setting (Feng et al., 2024;Fujimoto et al., 2025b). To summarize, unconstrained bilinear games serve as a touchstone for analyzing novel phenomen
This content is AI-processed based on open access ArXiv data.