Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control
We study diffusion control problems under parameter uncertainty. Controllers based on plug-in estimation can be brittle due to potential distribution shifts. Bayesian control with a prior on the parameters offers a formulation with beliefs about such shifts. However, as with any Bayesian model, the prior may be misspecified. To mitigate misspecification and reduce over-pessimism compared to classical robust control approaches (e.g. \citet{hansen2008robustness}), we propose a distributionally robust Bayesian control (DRBC) formulation in which an adversary perturbs the prior within a divergence neighborhood of a baseline prior. We develop a strong duality result that reduces the distributionally robust prior evaluation to a low-dimensional optimization and yields a practical simulation-based policy evaluation and learning procedure with structured policy parameterizations. We validate the efficiency of the algorithm on a synthetic linear-quadratic control example and real-data portfolio selection.
💡 Research Summary
The paper tackles continuous‑time diffusion control problems in which a latent parameter B influences the dynamics but cannot be observed directly. Traditional plug‑in methods that first estimate B and then apply a deterministic control law are fragile under distribution shifts, while pure Bayesian control assumes a correctly specified prior µ and suffers when this assumption is violated. To address both issues, the authors propose a Distributionally Robust Bayesian Control (DRBC) framework. In DRBC an adversary is allowed to perturb the prior within a Kullback–Leibler (KL) divergence ball of radius δ around the baseline prior µ, while keeping the conditional diffusion dynamics (given B) unchanged. This single‑time‑zero perturbation mitigates the over‑conservatism of classical distributionally robust control (DRC), which replenishes the adversary’s power at every instant, but it also eliminates the dynamic programming principle, requiring new computational tools.
The core theoretical contribution is a strong duality result for the inner robust‑prior problem. For any fixed admissible policy π, the worst‑case expected return under the KL‑ball can be expressed as a one‑dimensional supremum over a Lagrange multiplier λ≥0: \
Comments & Academic Discussion
Loading comments...
Leave a Comment