Minimax and Bayes Optimal Best-Arm Identification

Minimax and Bayes Optimal Best-Arm Identification
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study investigates minimax and Bayes optimal strategies for fixed-budget best-arm identification. We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase, and we design an adaptive experiment within this framework to efficiently identify the best arm, defined as the one with the highest expected outcome. In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate samples uniformly across arms to eliminate clearly suboptimal arms and to estimate outcome variances. Before entering the second stage, we solve a Gaussian minimax game, which yields a sampling ratio and a decision rule. In the second stage, samples are allocated according to this sampling ratio. After the sampling phase, the procedure enters the recommendation phase, where we select an arm using the decision rule. We prove that this single strategy is simultaneously asymptotically minimax and Bayes optimal for the simple regret, and we establish upper bounds that coincide exactly with our lower bounds, including the constant terms.


💡 Research Summary

This paper addresses the fixed‑budget best‑arm identification (BAI) problem, where a decision maker must allocate a pre‑specified number of samples T across K stochastic arms and, after sampling, recommend the arm with the highest expected reward. The performance metric is simple regret, defined as the expected difference between the mean of the true best arm and that of the recommended arm.

The authors propose a two‑stage adaptive design called TS‑SPAS (Two‑Stage Saddle‑Point Allocation with Screening). The procedure consists of a sampling phase followed by a recommendation phase. In the sampling phase, the total budget T is split into two stages.

Stage 1 (Pilot/Screening). A modest fraction of the budget is spent sampling each arm uniformly. This yields crude estimates of each arm’s mean and variance and allows the algorithm to discard arms that are clearly sub‑optimal with high confidence. The screening step reduces the effective number of arms that must be considered in the second stage, thereby improving sample efficiency.

Stage 2 (Saddle‑Point Allocation). After screening, the remaining arms are allocated samples according to a solution of a Gaussian minimax game. Formally, the algorithm solves

\


Comments & Academic Discussion

Loading comments...

Leave a Comment