An Efficient Algorithm for Thresholding Monte Carlo Tree Search
We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $θ$, a player must answer whether the root node value of $\mathcal{T}$ is at least $θ$ or not. In the given tree, MAX' or MIN’ is labeled on each internal node, and the value of a MAX'-labeled (MIN’-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.
💡 Research Summary
The paper introduces a new decision problem called Thresholding Monte Carlo Tree Search (Thresholding MCTS). In this setting a rooted tree 𝒯 is given, each internal node is labeled either “MAX” or “MIN”, and each leaf corresponds to an arm whose reward distribution belongs to a one‑parameter exponential family with unknown mean μℓ. The value of a leaf is its mean; the value of an internal node is the max (for “MAX”) or min (for “MIN”) of its children’s values. The goal is to decide, with probability at least 1 − δ, whether the root value V_{s0}(μ) is at least a prescribed threshold θ (output “win”) or not (“lose”). This is a pure‑exploration problem: the tree structure is fixed, and the algorithm may sample any leaf repeatedly.
Theoretical lower bound.
Using the framework of Garivier and Kaufmann (2016) and Degenne & Kolen (2019), the authors derive an asymptotic lower bound on the expected stopping time of any δ‑correct algorithm:
\
Comments & Academic Discussion
Loading comments...
Leave a Comment