Policy Invariance under Reward Transformations for General-Sum Stochastic Games
We extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.
š” Research Summary
The paper extends the wellāknown potentialābased reward shaping (PBRS) technique from singleāagent Markov decision processes (MDPs) to multiāagent generalāsum stochastic games. In a stochastic game each of the n players selects an action a_i from its own action set A_i, the joint action a = (a_1,ā¦,a_n) together with the current state s determines a transition probability P(s’|s,a) and a vector of immediate rewards (R_1(s,a),ā¦,R_n(s,a)). The authors introduce a scalar potential function Φ:āÆSāÆāāÆā that depends only on the state. The shaped reward for player i is defined as
āR_iā²(s,a) = R_i(s,a) + γāÆĪ¦(sā²) ā Φ(s)
where γā
Comments & Academic Discussion
Loading comments...
Leave a Comment