
Dynamic Policy Programming
In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic linfty-norm performance-loss bounds for DPP in the presence of approximat








