Meta-reinforcement learning with minimum attention

Meta-reinforcement learning with minimum attention
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Minimum attention applies the least action principle in the changes of control concerning state and time, first proposed by Brockett. The involved regularization is highly relevant in emulating biological control, such as motor learning. We apply minimum attention in reinforcement learning (RL) as part of the rewards and investigate its connection to meta-learning and stabilization. Specifically, model-based meta-learning with minimum attention is explored in high-dimensional nonlinear dynamics. Ensemble-based model learning and gradient-based meta-policy learning are alternately performed. Empirically, the minimum attention does show outperforming competence in comparison to the state-of-the-art algorithms of model-free and model-based RL, i.e., fast adaptation in few shots and variance reduction from the perturbations of the model and environment. Furthermore, the minimum attention demonstrates an improvement in energy efficiency.


💡 Research Summary

The paper introduces “minimum attention,” a regularization derived from the least‑action principle, into reinforcement learning (RL) and meta‑learning. Minimum attention penalizes rapid changes of the control signal by adding the term ‖∂u/∂x‖² + ‖∂u/∂t‖² to the reward, encouraging smoother, lower‑energy policies. The authors embed this regularizer into a model‑based meta‑policy optimization framework (MB‑MPO) that uses an ensemble of learned dynamics models and a gradient‑based meta‑learning loop reminiscent of MAML.

The system dynamics are modeled as a continuous‑time stochastic differential equation dx = f(x,u)dt + σ(x,t)dWₜ. A neural‑network ensemble predicts f̂_θᴹ, trained by minimizing prediction error on real‑world transition data. The control law is linearized as u(x,t)=K(t)x+v(t)+ε, where ε follows an Ornstein‑Uhlenbeck process. The meta‑parameters θ =


Comments & Academic Discussion

Loading comments...

Leave a Comment