Continuous Inverse Optimal Control with Locally Optimal Examples

Inverse optimal control, also known as inverse reinforcement learning, is the problem of recovering an unknown reward function in a Markov decision process from expert demonstrations of the optimal policy. We introduce a probabilistic inverse optimal control algorithm that scales gracefully with task dimensionality, and is suitable for large, continuous domains where even computing a full policy is impractical. By using a local approximation of the reward function, our method can also drop the assumption that the demonstrations are globally optimal, requiring only local optimality. This allows it to learn from examples that are unsuitable for prior methods.

💡 Research Summary

The paper tackles the problem of inverse optimal control (IOC), also known as inverse reinforcement learning (IRL), in continuous, high‑dimensional Markov decision processes (MDPs) where computing a full optimal policy is infeasible. Traditional IOC/IRL approaches rely on two restrictive assumptions: (1) the need to evaluate or approximate a global optimal policy, which becomes computationally prohibitive as the state‑action space grows, and (2) the requirement that expert demonstrations are globally optimal. In real‑world robotics or human‑behavior datasets, demonstrations are often only locally optimal, noisy, or even sub‑optimal, limiting the applicability of existing methods.

To overcome these limitations, the authors propose a probabilistic IOC algorithm that uses a local reward approximation. For each demonstrated state‑action pair ((s_i, a_i)), the unknown reward function (r(s,a;\theta)) is approximated by a second‑order Taylor expansion (or any smooth quadratic surrogate) around ((s_i, a_i)). The key assumption is that the demonstrated action is a local optimum: any infinitesimal deviation from (a_i) would decrease the reward. Under this assumption, the probability of observing (a_i) given (s_i) and parameters (\theta) can be expressed using a Laplace approximation: