An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration

An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the best arm identification (BAI) problem in the $K-$armed bandit framework with a modification - the agent is allowed to play a subset of arms at each time slot instead of one arm. Consequently, the agent observes the sample average of the rewards of the arms that constitute the probed subset. Several trade-offs arise here - e.g., sampling a larger number of arms together results in a wider view of the environment, while sampling fewer arms enhances the information about individual reward distributions. Furthermore, grouping a large number of suboptimal arms together albeit reduces the variance of the reward of the group, it may enhance the group mean to make it close to that containing the optimal arm. To solve this problem, we propose an algorithm that constructs $\log_2 K$ groups and performs a likelihood ratio test to detect the presence of the best arm in each of these groups. Then a Hamming decoding procedure determines the unique best arm. We derive an upper bound for the error probability of the proposed algorithm based on a new hardness parameter $H_4$. Finally, we demonstrate cases under which it outperforms the state-of-the-art algorithms for the single play case.


💡 Research Summary

The paper introduces a novel fixed‑budget best‑arm identification (BAI) framework that departs from the traditional single‑play setting of multi‑armed bandits. Instead of pulling a single arm at each round, the learner may select any subset of arms and observes only the sample average of the rewards of the selected arms. This “combinatorial exploration” creates a trade‑off: larger subsets provide a broader view of the environment but dilute information about individual arms, while smaller subsets yield more precise estimates of each arm’s mean.

To exploit this setting, the authors propose the Rapid Exploration (RE) algorithm. The key idea is to encode the identity of each arm using a binary Hamming code. Specifically, for K arms they construct log₂K groups G₁,…,G_{log₂K} where arm i (0‑based) belongs to group G_k if the k‑th bit of i’s binary representation is 1. Each group therefore contains K/2 arms, and each arm appears in exactly log₂K groups. This construction mirrors the parity‑check matrix of a Hamming code, guaranteeing that a single‑bit error (i.e., the presence or absence of the optimal arm in a group) can be uniquely identified with the minimal number of tests.

For each group the algorithm performs a likelihood‑ratio test (LRT) between two hypotheses: H₀ – the optimal arm is not in the group, and H₁ – it is. The test uses a worst‑case (uniform) prior and the observed group mean (which is the average of the individual arm means in that group). After T total pulls, the algorithm has collected enough samples to compute the LRT statistic for every group. The groups where H₁ is accepted form a binary vector; applying the Hamming decoding (i.e., solving H·a* = y mod 2) yields the unique index a* of the best arm.

The theoretical contribution centers on a new hardness parameter
\


Comments & Academic Discussion

Loading comments...

Leave a Comment