Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning

Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Generative models have recently demonstrated remarkable success across diverse domains, motivating their adoption as expressive policies in reinforcement learning (RL). While they have shown strong performance in offline RL, particularly where the target distribution is well defined, their extension to online fine-tuning has largely been treated as a direct continuation of offline pre-training, leaving key challenges unaddressed. In this paper, we propose Flow Matching with Injected Noise for Offline-to-Online RL (FINO), a novel method that leverages flow matching-based policies to enhance sample efficiency for offline-to-online RL. FINO facilitates effective exploration by injecting noise into policy training, thereby encouraging a broader range of actions beyond those observed in the offline dataset. In addition to exploration-enhanced flow policy training, we combine an entropy-guided sampling mechanism to balance exploration and exploitation, allowing the policy to adapt its behavior throughout online fine-tuning. Experiments across diverse, challenging tasks demonstrate that FINO consistently achieves superior performance under limited online budgets.


💡 Research Summary

This paper introduces FINO (Flow Matching with Injected Noise for Offline‑to‑Online Reinforcement Learning), a method that adapts flow‑matching based policies to the two‑stage offline‑to‑online RL setting. The authors identify a key limitation of existing flow‑matching policies: during offline pre‑training they are forced to match the data distribution point‑wise, which yields a policy that is overly confined to the offline dataset and consequently exhibits poor exploration when fine‑tuned online.

FINO addresses this limitation in two complementary ways. First, during offline pre‑training the authors inject time‑dependent Gaussian noise into the flow‑matching objective. Concretely, the standard conditional path (p^{\text{FM}}t(x|x_1)=\mathcal N(x; t x_1, (1-(1-\sigma{\min})t)^2 I)) is replaced by a noise‑augmented path (p^{\text{FINO}}_t(x|x_1)=\mathcal N(x; t x_1, (1-(1-\eta)t)^2 I)) where (\eta\in


Comments & Academic Discussion

Loading comments...

Leave a Comment