Discretized Approximations for POMDP with Average Cost

Discretized Approximations for POMDP with Average Cost
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. Weshow the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.


💡 Research Summary

The paper tackles the long‑standing challenge of solving partially observable Markov decision processes (POMDPs) under an average‑cost (undiscounted) criterion. While a rich literature exists for discounted POMDPs—where lower‑bound approximation schemes based on a finite set of belief points are well understood—the average‑cost case has remained largely intractable because the lack of a discount factor eliminates the natural contraction property that underpins most convergence proofs. The authors introduce a novel discretization‑based lower‑approximation framework that works for both discounted and average‑cost settings, but they devote the majority of the analysis to the average‑cost case, which, to the best of their knowledge, is the first such scheme.

Core Idea.
Select a finite collection B of belief points that “covers” the continuous belief simplex. For each b ∈ B, define an unknown scalar (\tilde J(b)) that will serve as the approximated average cost at that belief. By enforcing the Bellman optimality inequalities only at the points in B, the authors obtain a finite‑dimensional linear (or piecewise‑linear) program that is mathematically equivalent to a finite‑state Markov decision process (MDP) with the same action set as the original POMDP. Crucially, this surrogate MDP is an average‑cost MDP that may have multiple recurrent classes (a multichain), so any standard multichain algorithm—Howard’s policy iteration, relative value iteration, or linear‑programming approaches—can be applied directly.

Theoretical Guarantees.

  1. Lower‑bound property. The solution (\tilde J) of the surrogate MDP satisfies
    \

Comments & Academic Discussion

Loading comments...

Leave a Comment