Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.

💡 Research Summary

The paper extends the well‑known potential‑based reward shaping (PBRS) technique from single‑agent Markov decision processes (MDPs) to multi‑agent general‑sum stochastic games. In a stochastic game each of the n players selects an action a_i from its own action set A_i, the joint action a = (a_1,…,a_n) together with the current state s determines a transition probability P(s’|s,a) and a vector of immediate rewards (R_1(s,a),…,R_n(s,a)). The authors introduce a scalar potential function Φ: S → ℝ that depends only on the state. The shaped reward for player i is defined as

R_i′(s,a) = R_i(s,a) + γ Φ(s′) – Φ(s)

where γ∈

Policy Invariance under Reward Transformations for General-Sum Stochastic Games

💡 Research Summary

Comments & Academic Discussion

Leave a Comment