Computer Science / Artificial Intelligence

Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies

February 09, 2026

Reading time: 2 minute

...

📝 Original Info

Title: Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies
ArXiv ID: 2601.01301
Date: 2026-01-03
Authors: Keith Frankston, Benjamin Howard

📝 Abstract

We introduce a recursive AlphaZero-style Monte-Carlo tree search algorithm, "RMCTS". The advantage of RMCTS over AlphaZero's MCTS-UCB [3] is speed. In RMCTS, the search tree is explored in a breadth-first manner, so that network inferences naturally occur in large batches. This significantly reduces the GPU latency cost. We find that RMCTS is often more than 40 times faster than MCTS-UCB when searching a single root state, and about 3 times faster when searching a large batch of root states. The recursion in RMCTS is based on computing optimized posterior policies at each game state in the search tree, starting from the leaves and working back up to the root. Here we use the posterior policy explored in "Monte-Carlo tree search as regularized policy optimization" [1] . Their posterior policy is the unique policy which maximizes the expected reward given estimated action rewards minus a penalty for diverging from the prior policy. The tree explored by RMCTS is not defined in an adaptive manner, as it is in MCTS-UCB. Instead, the RMCTS tree is defined by following prior network policies at each node. This is a disadvantage, but the speedup advantage is more significant, and in practice we find that RMCTS-trained networks match the quality of MCTS-UCB...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies

📝 Original Info

📝 Abstract

📄 Full Content

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Start searching

No results found