AUPO -- Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm

February 22, 2026

Reading time: 1 minute

...

📝 Original Info

Title: AUPO – Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm
ArXiv ID: 2510.23214
Date: 2025-10-27
Authors: 정보 없음 (논문에 저자 정보가 제공되지 않았습니다.)

📝 Abstract

We introduce a novel, drop-in modification to Monte Carlo Tree Search's (MCTS) decision policy that we call AUPO. Comparisons based on a range of IPPC benchmark problems show that AUPO clearly outperforms MCTS. AUPO is an automatic action abstraction algorithm that solely relies on reward distribution statistics acquired during the MCTS. Thus, unlike other automatic abstraction algorithms, AUPO requires neither access to transition probabilities nor does AUPO require a directed acyclic search graph to build its abstraction, allowing AUPO to detect symmetric actions that state-of-the-art frameworks like ASAP struggle with when the resulting symmetric states are far apart in state space. Furthermore, as AUPO only affects the decision policy, it is not mutually exclusive with other abstraction techniques that only affect the tree search.

AUPO -- Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Start searching

No results found