The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits

The Sampling Complexit y of Condorcet Winner Iden tiﬁcation in Dueling Bandits El Mehdi Saad ∗ UM6P College of Computing Rabat, Moro cco elmehdi.saad@um6p.ma Victor Th uot ∗ INRAE, MISTEA, Institut Agro, Univ. Mon tp ellier Mon tp ellier, F rance victor.thuot@inrae.fr Nicolas V erzelen INRAE, MISTEA, Institut Agro, Univ Mon tp ellier Mon tp ellier, F rance nicolas.verzelen@inrae.fr Abstract W e study b est-arm identiﬁcation in sto chastic dueling bandits under the sole assumption that a Condorcet winner exists, i.e., an arm that wins each noisy pairwise comparison with probabilit y at least 1 / 2 . W e in tro duce a new identiﬁcation pro cedure that exploits the full gap matrix ∆ i,j = q i,j − 1 2 (where q i,j is the probabilit y that arm i b eats arm j ), rather than only the gaps b etw een the Condorcet winner and the other arms. W e derive high-probability , instance-dep enden t sample-complexity guaran tees that (up to logarithmic factors) improv e the b est known ones by lev eraging informative comparisons b eyond those inv olving the winner. W e complement these results with new low er b ounds which, to our kno wledge, are the ﬁrst for Condorcet-winner identiﬁcation in sto chastic dueling bandits. Our low er-bound analysis isolates the in trinsic cost of lo cating informative entries in the gap matrix and estimating them to the required conﬁdence, establishing the optimality of our non-asymptotic bounds. Ov erall, our results rev eal new regimes and trade-oﬀs in the sample complexity that are not captured b y asymptotic analyses based only on the exp ected budget. Keyw ords. Best arm iden tiﬁcation, Dueling bandits, Query complexit y , Condorcet winner. 1 Motiv ation and High-Lev el Ov erview In man y mo dern machine learning applications, obtaining trustw orth y absolute feedbac k can b e diﬃcult, expensive, or systematically biased. By contrast, relativ e judgmen ts are often easier to elicit and can be highly informative. This is esp ecially apparent in information retriev al and recommendation systems, where users more naturally compare t wo alternativ es such as rankings, mo dels, or in terfaces than pro vide calibrated relev ance scores [Joac hims et al., 2007, Hofmann et al., 2016]. The dueling bandits framew ork formalizes this paradigm b y allo wing a learner to adaptiv ely query pairs of arms and observe only a noisy binary outcome indicating which arm is preferred. A t eac h round, the learner selects a pair of tw o arms ( i, j ) ∈ [ K ] × [ K ] and observ es the outcome of their duel: the feedback is 1 if arm i is preferred to arm j , and 0 otherwise. This observ ation is modeled as a Bernoulli random v ariable with unknown parameter q i,j ∈ [0 , 1] . The collection of pairwise preference probabilities is represen ted by the matrix Q = ( q i,j ) i,j ∈ [ K ] . Since self-comparisons are uninformativ e and preferences are an ti-symmetric, namely q i,j = 1 − q j,i for all i, j ∈ [ K ] , the ∗ Equal contribution 1 unkno wn matrix Q satisﬁes a sk ew-symmetry condition. Equiv alen tly , we deﬁne the gap matrix ∆ = (∆ i,j ) i,j ∈ [ K ] b y ∆ i,j : = q i,j − 1 / 2 , whic h satisﬁes ∆ i,j = − ∆ j,i and ∆ i,i = 0 . Sto c hastic dueling bandits hav e b een studied extensively under a v ariety of structural as- sumptions on Q and with m ultiple notions of optimalit y [Bengs et al., 2021, Komiy ama et al., 2015, F alahatgar et al., 2017, 2018, Ren et al., 2020, Jamieson et al., 2015, Zoghi et al., 2015a, Haddenhorst et al., 2021b]. Unlik e in the classical multi-armed bandit setting, deﬁning an “optimal arm” is not immediate, which has led to several comp eting winner deﬁnitions; see the surv ey of Bengs et al. [2021]. In this work, w e fo cus on instances where a distinguished arm i ∗ ∈ [ K ] defeats, in exp ectation, every other arm, i.e., q i ∗ ,j > 1 / 2 for all j ∈ [ K ] \ { i ∗ } . Suc h an arm is called a Condor c et winner (CW) and is unique; w e also refer to it as the optimal arm. Most existing w ork on dueling bandits assumes the existence of a CW [Zoghi et al., 2014, 2015b, Li et al., 2020, K omiyama et al., 2015, Chen and F razier, 2017, Saha and Gaillard, 2022, Saha and Gupta, 2022], or even imp oses the stronger requiremen t that the arms admit a total order [Y ue et al., 2012, Y ue and Joac hims, 2009, Chen and F razier, 2017]. Alternativ e notions of optimality are discussed in Section A. Ob jectiv e: Condorcet winner identiﬁcation. Giv en δ ∈ (0 , 1) , the learner m ust output the CW i ∗ with probability at least 1 − δ b y adaptiv ely and sequentially choosing pairs ( i, j ) to b e compared and c ho osing a stopping time. W e ev aluate an algorithm b y its (random) num b er of duels N δ , called the budget , and w e seek instance-dep endent guarantees in terms of the centered gaps ∆ i,j = q i,j − 1 2 , whic h enco de b oth preference direction (e.g., ∆ i,j > 0 means i b eats j ) and statistical diﬃcult y . State-of-the-art. Although CW identiﬁcation has attracted quite a lot of atten tion [K omiyama et al., 2015, Ailon et al., 2014, Chen and F razier, 2017, Saha and Gaillard, 2022, Peköz et al., 2022], the optimal budget for this task remains po orly understo o d. Karnin [2016] introduced a v eriﬁcation-based approach. F or a ﬁxed gap matrix ∆ , the exp ected budget of this procedure asymptotically satisﬁes lim δ → 0 E [ N δ ] log(1 /δ ) ≤ c X i  = i ∗ min j :∆ i,j < 0 1 ∆ 2 i,j , (1) where c is a positive n umerical constan t. The b ound (1) in terprets as the sum, o ver all non-CW arms i , of log (1 /δ ) / [ min j ∆ i,j ] 2 whic h is the minimal budget required to chec k whether the row ∆ i, · is non-negativ e if an oracle pro vides to the learner the information on the b est opp onen t of i . In that resp ect, (1) seems the b est we can hope for. How ever, this bound (1) p ossibly hides imp ortan t characteristics of the budget: (i) The b ound (1) is purely asymptotic and hides sizable additiv e O ( K 2 ) terms whose optimality is questionable. (ii) F urthermore, in pure-exploration bandit problems, the exp ected budget E [ N δ ] is possibly m uch smaller than the (1 − δ )-quan tile on N δ and thereb y provides a to o optimistic view on the sample complexit y of the problem; see e.g. Mannor and T sitsiklis [2004]. Analysis of high-probability b ounds on the budget is also of paramoun t imp ortance when one wan ts to mo ve to ﬁxed budget problems, as we do here. Recently , Maiti et al. [2024] developed quite a diﬀerent algorithm that exploits that the CW row is the unique one with only p ositiv e gaps. They obtained a high-probabilit y guarantee of the budget of the order of H cw ( δ ) where H cw ( δ ) := log(1 /δ ) X i  = i ∗ 1 ∆ 2 i ∗ ,i . (2) In the sp eciﬁc scenario where the CW is the strongest opp onen t of every suboptimal arm , that is, i ∗ = argmin j :∆ i,j < 0 ∆ i,j , ∀ i  = i ∗ , (CW-SO) the conditions (1) and (2) are matching. Ho wev er, the b ound (2) can b e ov erly conserv ativ e when some arm i is nearly tied with i ∗ as it largely ignores p otentially informative comparisons among 2 sub optimal arms. V ery few low er b ounds hav e b een developed for CW iden tiﬁcation. The closest to our setting are due to Haddenhorst et al. [2021a], who study the ‘testiﬁcation’ problem of (i) testing whether a CW exists and (ii) identifying it when it do es. They deriv e a low er b ound on the exp ected budget of order (2) . Since our ob jective is iden tiﬁcation alone under the standing assumption that a CW exists, their results unfortunately do not imply lo wer b ounds for CW iden tiﬁcation. More precisely , the construction in Haddenhorst et al. [2021a] fundamen tally leverages the testing comp onen t, and thus cannot be adapted verbatim to an iden tiﬁcation-only pro of. Altogether, these results suggest that (2) (or equiv alently (1) ) is the optimal sampling complexity under the restrictiv e condition (CW-SO) , although w e are not a ware of a matching low er b ound. Ho wev er, b ey ond this scenario, the instance-dep enden t query complexity of CW identiﬁcation is far from b eing understo o d. This naturally raises the following open problem. Op en Problem. What is the sampling c omplexity of CW identiﬁc ation and how do es it dep end on gap matrix ∆ ? This question falls within structur e d pure exploration, where the feedbac k is noisy but constrained b y an underlying latent object (here, a skew-symmetric matrix p ossessing a p ositive ro w), so the goal is to exploit structure rather than estimate all en tries. Related challenges arise in noisy pay oﬀ matrix games Maiti [2025], e.g., in pure Nash equilibrium iden tiﬁcation. Con tributions. As a starting p oint, we conﬁrm that, in the asymptotic regime δ → 0 and for the exp e cte d budget, the scaling in (1) is essen tially optimal b y developing a matching instance- dep enden t lo wer bound –see Theorem 3.1. This also conﬁrms that, under Condition (CW-SO) , the b ound (2) is optimal. How ever, b eyond this sp eciﬁc scenario, the sampling complexity of CW iden tiﬁcation is m uch more subtle when we aim for non-asymptotic and high-probability guarantees on the budget. Our main contributions are threefold: (i) w e in tro duce new elimination-based algorithms for b oth the ﬁxed budget and the ﬁxed conﬁdence settings and provide non-asymptotic guaran tees, (ii) we establish matc hing low er b ounds. (iii) Overall, this allo ws to highlight the trade-oﬀs and the m ultiple strategies that underlie CW identiﬁcation . F or the sake of simplicit y , w e mainly discuss our results in the δ -P A C setting, although analogous results are pro ved for ﬁxed budget problems. A t a high level, our elimination-based pro cedure (FC-CWI), describ ed in Algorithms (1) and (2) , iterativ ely scores the curren t candidates for CW using subroutines that (i) se ar ch in the gap matrix ∆ for informative comparisons and exploit its skew-symmetric structure, and (ii) estimate the signs of the discov ered en tries with suﬃcient accuracy . Candidates are then ranked b y these scores and a constan t fraction of arms is eliminated at eac h round. Our analysis rev eals a delicate dependence on the full gap matrix ∆ . Indeed, pro viding evidence that i ∗ is the CW either amoun ts to sho wing that all the CW gaps { ∆ i ∗ ,i } i  = i ∗ are p ositiv e or amounts to showing that all arms i  = i ∗ are not CW. The evidence of sub-optimalit y for a giv en arm i  = i ∗ is gov erned both b y the num ber of negativ e entries in its ro w, K i ; < 0 := |{ j : ∆ i,j < 0 }| , and by the magnitudes of these gaps, denoted b y the ordered v alues ∆ i, (1) ≤ · · · ≤ ∆ i, ( K i ; < 0 ) < 0 . F or each i  = i ∗ , ﬁx an in teger s i ≤ K i ; < 0 and write s = ( s 1 , . . . , s K ) . The following results will in volv e a trade-oﬀ in s . Our analysis decomp oses the complexit y into H cw ( δ ) –see (2) –, whic h corresp onds to the cost of separating i ∗ from ev ery comp etitor only relying on duels with i ∗ , as w ell as tw o new comp onents: • Explor ation/Sele ction c ost. This term quan tiﬁes the eﬀort required to select a negative entry whose absolute v alue is at least | ∆ i, ( s i ) | in eac h sub optimal row. H explore ( s , δ ) := max i  = i ∗ K log (1 /δ ) s i ∆ 2 i, ( s i ) + X i  = i ∗ K s i ∆ 2 i, ( s i ) , (3) Note that the righ t-hand-side expression is indep endent of δ and accoun ts for the fact that lo oking for an en try at least | ∆ i, ( s i ) | out of K dep ends on both the num b er s i of suc h entries and the magnitude | ∆ i, ( s i ) | . The log(1 /δ ) -dep endency only arises for a single arm i  = i ∗ . 3 • Certiﬁc ation c ost. This term corresp onds to the num b er of samples required to estimate the signs of the selected gaps (at the exploration step) at conﬁdence lev el 1 − δ H certify ( s , δ ) := X i  = i ∗ log(1 /δ ) ∆ 2 i, ( s i ) . Our main upp er b ound sho ws that, with probability at least 1 − δ , the budget N δ of F C-CWI satisﬁes N δ ≲ H cw ( δ ) ∧ min ( s i ) i  = i ∗ ∀ i, s i ≤ K i ; < 0 ∧ K/ 8 { H certify ( s , δ ) + H explore ( s , δ ) } , (4) where the notation ≲ hides logarithmic factors in K, (∆ i, (1) ) i  = i ∗ and a log log (1 /δ ) factor. Under the scenario (CW-SO) , our procedure still achiev es budget smaller than H cw ( δ ) as in Maiti et al. [2024] but also achiev es b etter guaran tees for other gap matrices ∆ , where the budget is driv en b y the righ t-hand side in (4) . In the ab ov e inﬁmum in (4) , the smaller the s i ’s are, the smaller H certify ( s , δ ) is, but the exploration cost H explore ( s , δ ) for lo calizing a go o d candidate can increase for small s i ’s. In the following, we denote s ∗ ∆ as the v ector ( s ∗ i ) i  = i ∗ ac hieving the b est trade-oﬀ in Equation (4) . W e interpret s ∗ ∆ as an eﬀectiv e sparsity of ∆ , although it also dep ends on δ . Imp ortan tly , our algorithm do es not take s ∗ ∆ as input and therefore automatic al ly achiev es the b est balance captured b y (4). T o c haracterize the optimality of (4) , w e establish low er bounds on the δ -quan tile of any algorithm. Although we state distribution-dependent-lik e results in Section 3, w e discuss here its Corollary 3.3 whic h has a lo cal minimax ﬂa v or. The budget condition (4) only dep ends on the gap matrix ∆ through three v ectors: (i) the row i ∗ of the CW ∆ i ∗ , · , (ii) the eﬀective sparsit y s ∗ ∆ , and (iii) the gaps at the sparsity lev el s ∗ : (∆ i, ( s ∗ i ) ) i  = i ∗ . Given any gap matrix ∆ , we deﬁne the collection D ( ∆ ) of gap matrices ˜ ∆ that lea ve the Condorcet winner i ∗ ∆ , the eﬀective sparsity s ∗ ∆ , and the gaps (∆ i, ( s ∗ i ) ) i  = i ∗ unc hanged D ( ∆ ) := { ˜ ∆ s.t. i ∗ ˜ ∆ = i ∗ ∆ , s ∗ ˜ ∆ = s ∗ ∆ , ( ˜ ∆ i, ( s ∗ i ) ) i  = i ∗ = (∆ i, ( s ∗ i ) ) i  = i ∗ } . (5) Corollary 3.3 then states the follo wing minimax low er b ound on the (1 − δ ) -quan tile of the budget: inf A sup ˜ ∆ ∈ D ( ∆ ) inf n χ > 0 s.t.: P ˜ ∆ ,A ( N δ ⩽ χ ) ⩽ δ o ≳ H certify ( s ∗ ∆ , δ ) + H explore ( s ∗ ∆ , δ ) , (6) where the inﬁmum is tak en ov er any δ -correct algorithm A . Imp ortantly , this show cases that the exploration/certiﬁcation trade-oﬀ un veiled in (4) is una voidable and in trinsic to the sample complexit y of CW-identiﬁcation. Em blematic regimes. As the sample complexity is quite intricate in the general case, we discuss some sp eciﬁc regimes to emphasize k ey phenomena. • Fixed probability regime . In (4) , when H cw ( δ ) is not the minimum, then the sample com- plexit y H certify ( s ∗ ∆ , δ ) + H explore ( s ∗ ∆ ) in both upp er and low er b ounds exhibit t wo additiv e terms, one of them b eing δ -indep enden t. When δ is considered as a ﬁxed quan tity (ﬁxed probabilit y), the corresp onding term P i  = i ∗ K s ∗ i ∆ 2 i, ( s ∗ i ) ≍ P i  = i ∗ K ∥ ∆ − i ∥ 2 2 b ecomes the dominan t term, where ∆ − i,j := min (∆ i,j , 0) . In particular, this term can scale like K 2 when all s ∗ i s are small. • Small probabilit y regime . Similarly to Karnin [2016], consider the asymptotic regime where log (1 /δ ) goes to inﬁnit y , while K and ∆ are ﬁxed. Then, our lo w er and upp er bounds on the (1 − δ ) -quantile of the budget are of the form log  1 δ  inf s " X i  = i ∗ 1 ∆ 2 i, ( s i ) + max i  = i ∗ K s i ∆ 2 i, ( s i ) # , 4 whereas the b ound in Karnin [2016] on the exp ected budget only inv olves the smaller quan tity P i  = i ∗ log(1 /δ ) ∆ 2 i, (1) . This emphasizes that there is a signiﬁcant gap betw een guarantees in exp ectation or in quantile of the budget, esp ecially when there is heterogeneity in the ∆ i, (1) s. This phenomenon is also central for the analysis of ﬁxed-budget algorithms in Section 2 and B. All the w ay through these tw o extreme regimes, both (4) and (6) illustrate a trade-oﬀ betw een exploration and certiﬁcation. Intuitiv ely , when log (1 /δ ) increases, the eﬀectiv e sparsity s ∗ ∆ tends to decrease so the algorithm explores other arms more thoroughly to iden tify stronger opp onents. T ec hnical Inno v ations. Our Algorithms 1 and 2 are based on a new iterative scoring strategy that builds on the selection, for eac h ’active’ arm i , of a strong opp onent as well as the estimation of some quantile of the estimation ∆ i,. . F or that purp ose, we need to introduce a new active quan tile estimation algorithm achieving optimal ϵ -error sim ultaneously for all ϵ –see App endix C. Apart from Theorem 3.1 which builds up on fairly standard arguments, our main low er b ounds use no vel approac hes and tec hniques as our aim is to lo wer bound the (1 − δ ) quan tile of the budget. F or that purp ose, we reduce the problem to an active m ultiple testing problem of the existence of negativ e entries within a v ector of size K − 1 . Organization. In Section 2, we presen t our algorithms and pro v e the instance-dependent upper b ounds in b oth the ﬁxed-budget and ﬁxed-conﬁdence settings. Section 3 establishes instance- dep enden t low er b ounds for the ﬁxed-conﬁdence setting. W e conclude in Section 4 with a discussion of implications, limitations, and directions for future w ork. Section A discusses related w ork and further p ositions our con tributions within the literature. The ﬁxed-budged low er b ounds along with all the pro ofs are also p ostp oned to the app endix. 2 Upp er Bounds: Algorithm and Guaran tees This section presen ts our main identiﬁcation pro cedure and the upp er bounds announced in Section 1. Our starting p oint is FB-CWI (Fixed Budget CW Identiﬁcation, Algorithm 1), a ﬁxed- budget routine that serves as the main building blo ck. W e then obtain a ﬁxed-conﬁdence ( δ -correct) algorithm by equipping FB-CWI with v eriﬁcation steps and running it under a standard doubling sc hedule ov er the budget. W e ﬁrst describ e FB-CWI and state its guaran tees in Theorem 2.1, and then explain the ﬁxed-conﬁdence extension; the resulting pro cedure is giv en in Algorithm 2 and analyzed in Theorem 2.2. FB-CWI is an elimination procedure initialized with A 1 = [ K ] : at each round k , it assigns a score S k ( α ) to every active arm α ∈ A k , ranks the arms accordingly , and discards the b ottom 1 / 8 fraction (so | A k +1 | = ⌊ 7 | A k | / 8 ⌋ ). Therefore, the num b er of rounds is O (log K ) . The core of FB-CWI is the score computation, whose purp ose is to keep the CW rank ed abov e the elimination threshold. At round k , w e split a budget of order T / log ( K ) across the activ e set A k . F or eac h α ∈ A k , w e devote one quarter of its share to search for a strong opp onent by running Sequen tial Halving (SH) [Karnin et al., 2013] on the instance of the duels { ( β , α ) : β ∈ [ K ] \ { α }} , yielding an opp onent α ( s ) that is likely to beat α , and another quarter to estimate the gap ∆ α,α ( s ) via an empirical mean; this estimate deﬁnes the strong-opponent component of the score. W e call α ( s ) ‘strong’ because it is selected from all K arms (not only from A k ): this tends to penalize sub-optimal arms more sharply , at the price of higher uncertain ty due to the larger search space. Relying only on the strong-opp onent term can b e brittle: if the CW is nearly tied with some arm, the selected opp onent may yield a gap estimate close to zero and pro vide little separation with the elimination threshold. W e therefore add a ‘weak-opponent’ term that yields adaptivity 5 to larger gaps with the CW. More sp eciﬁcally , writing ∆ ( k ) := ∆ A k × A k , sk ew-symmetry implies that at least half of the en tries of ∆ ( k ) are non-p ositiv e, and a simple pigeon-hole type argument implies that at least | A k | / 4 ro ws contain at least | A k | / 4 non-positive en tries (Lemma H.6 in the app endix). Accordingly , for each α ∈ A k w e estimate a p oin t whose v alue lies b etw een the 1 / 8 - and 1 / 4 -quan tiles of the ro w (∆ α,β ) β ∈ A k (via Range-Quantile ). This lo wer-tail statistic is t ypically negativ e for many sub-optimal arms pushing them into the b ottom- 1 / 8 region, while for the CW it remains p ositiv e and leverages the fact that most of its gaps can still be large. Subroutines: Sequential Halving and Range-Quantile. Our score construction relies on t wo subroutines. F or the strong-opp onen t search we use Sequential Halving (SH) [Karnin et al., 2013], chosen for its adaptiv e guarantees on simple regret, whic h translate in our context in to gap-dep enden t guarantees –see Zhao et al. [2023] and Section C of the app endix. F or the weak- opp onen t choice, we in tro duce Range-Quantile (Algorithm 3), a general ﬁxed-budget procedure rev ealing a p oint in a prescribed quan tile range: given N arms with means ( µ i ) i ∈ [ N ] ordered as µ (1) ≤ · · · ≤ µ ( N ) and indices d < u , it returns an estimate ˆ t that falls b etw een the d -th and u -th means (up to an additive error ε ) with error probabilit y deca ying as exp ( − e Θ ( ( u − d ) 2 N 2 T ε 2 )) –see Theorem C.1). Imp ortan tly , Range-Quantile do es not require ε as input and is therefore sim ultaneously v alid for an y ϵ ; in FB-CWI w e instan tiate it with N = | A k | , ε = 1 2 ∆ i ∗ , ( ⌈| A k | / 8 ⌉ ) , d = ⌊| A k | / 8 ⌋ and u = ⌈| A k | / 4 ⌉ to obtain a v alue b etw een the 1 / 8 - and 1 / 4 -quan tiles of (∆ α,β ) β ∈ A k . Note that Maiti et al. [2024] gives a ﬁxed-conﬁdence routine that, giv en ( δ, ε ) , outputs a v alue in [ µ ( N/ 2) − ε, µ ( N/ 4+1) + ε ] with probability at least 1 − δ . Here, ε is instance-dependent and unkno wn, whic h motiv ates our adaptive Range-Quantile subroutine that do es not take ε as input. Guaran tees: in tuition. A failure can only o ccur if the CW is pushed into the b ottom- 1 / 8 region in some round, so the analysis boils down to controlling the separation betw een the CW score and the elimination cutoﬀ across the O ( log K ) rounds. The weak-opponent term already provides a baseline margin: at round k , the CW b eneﬁts from a positive lo wer-tail gap of size on the order of ∆ i ∗ , ( ⌈| A k | / 8 ⌉ ) , estimated with B k = Θ  T / ( | A k | log ( K ))  samples. Concentration bounds (combined with the Range-Quantile guaran tee) then give an error of the form exp ( − e Θ ( B k ∆ 2 i ∗ , ( ⌈| A k | / 8 ⌉ ) )) , and the w orst round is controlled via max k | A k | / 8 ∆ 2 i ∗ , ( ⌈| A k | / 8 ⌉ ) ≤ max i ∈ [ K − 1] i ∆ 2 i ∗ , ( i ) ≤ X i  = i ∗ 1 ∆ 2 i,i ∗ =: H cw , yielding a coarse rate exp  − e Θ( T /H cw )  (up to logarithmic factors). The strong-opp onen t term sharp ens this b ound by activ ely ﬁnding and certifying ne gative entries for sub-optimal arms. Fix s i ≤ K i,< 0 . F or eac h i  = i ∗ , SH requires a budget scaling with K/ ( s i ∆ 2 i, ( s i ) ) to ﬁnd an en try smaller than ∆ i, ( s i ) for arm i , while v erifying the sign of ∆ i, ( s i ) < 0 requires a budget that scales with 1 / ∆ 2 i, ( s i ) . Since each round remov es a constant fraction of arms, we only need a constant fraction of these searches to succeed. Provided that T exceeds the aggregate exploration o verhead H (0) explore ( s ) := P i  = i ∗ K/ ( s i ∆ 2 i, ( s i ) ) , this happens with high probability . Then, the remaining exponent in the probabilit y is go v erned b y H certify ( s ) := P i  = i ∗ 1 / ∆ 2 i, ( s i ) and the hardest-arm exploration term H (1) explore ( s ) := max i  = i ∗ K/ ( s i ∆ 2 i, ( s i ) ) , leading to Theorem 2.1. Theorem 2.1. L et s = ( s 1 , . . . , s K ) such that s i ≤ K i,< 0 for e ach i ∈ [ K ] . The output of A lgorithm 1 with input T , denote d ψ T , satisﬁes P ( ψ T  = i ∗ ) ≤ 27 K log( K ) log( T ) · exp  − c 1 · T log( T ) log( K ) H cw  , 6 Algorithm 1 FB-CWI + Certiﬁcation Input : Fixed budget ( T ), Certiﬁcation( δ, T , c ). k ← 1 , A 1 ← [ K ] , n ← log 2  T 2 K log 8 / 7 ( K )  . ϕ 1 , ϕ 2 ← True while | A k | > 1 do Let B k ← j T | A k | log 8 / 7 ( K ) k . for α ∈ A k do /* Finding a strong opponent */ • R un Algorithm SH with a budget ⌈ B k / 4 ⌉ and where the candidate arms are { ( β , α ) for β ∈ [ K ] \ { α }} . Let ( α ( s ) , α ) denote the output. • Query ⌈ B k / 4 ⌉ samples of ( α ; α ( s ) ) and compute the quantit y Z ( s ) k ( α ) (corresponding to the empirical mean of the gaps). /* Computing a weak score */ • R un Range-Quan tile on duels b etw een α and arms in A k \ { α } , with a budget ⌈ B k / 2 ⌉ and quan tiles ( ⌈| A k | / 8 ⌉ , ⌈| A k | / 4 ⌉ ) let Z ( w ) k ( α ) denote the output. end for Compute the scores S k ( α ) = min { Z ( s ) k ( α ) , 0 } + Z ( w ) k ( α ) for each α ∈ A k . Rank the arms in A k follo wing the scores S k ( · ) and put in A k +1 the top | A k | − ⌈| A k | / 8 ⌉ arms. /* Check fixed confidence */ Let ¯ α denote the arm ranked | A k | − ⌈| A k | / 8 ⌉ + 1 according to the scores S k ( · ) . ϕ 1 ← ϕ 1 · 1  S k ( ¯ α ) < − r 2 c log( T ) ⌈ B k / 4 ⌉ log  8 K 2 log 8 / 7 ( K ) log( T ) · n ( n +1) δ   k ← k + 1 . end while /* Use T queries to test the sign of gaps of I (unique arm in A k ) at confidence δ */ R un T est-CW with inputs ( δ, T ) to c heck the sign of gaps of arm I , let ϕ 2 denote its output. Return ϕ 1 ∨ ϕ 2 and I . 7 wher e c 1 is a numeric al c onstant. Mor e over, for any s , we also have P ( ψ T  = i ∗ ) ≤ 47 K log( K ) log ( T ) · exp − c 2 log 3 ( K ) log ( T ) · T − c 3 · H (0) explor e ( s ) log 5 ( H (0) explor e ( s )) H c ertify ( s ) + H (1) explor e ( s ) ! , wher e c 2 and c 3 ar e numeric al c onstants. Sketch. W e con trol P ( i ⋆ / ∈ A k +1 | i ⋆ ∈ A k ) uniformly ov er k and then apply a union bound ov er all rounds. First b ound. Fix a round k and condition on i ∗ ∈ A k . Let ∆ ( k ) = ∆ A k × A k and deﬁne E k := { α ∈ A k : ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 } , ∆ k : = ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) . By Lemma H.6, we hav e | E k | ≥ ⌈| A k | / 4 ⌉ . Hence, if i ∗ is eliminated, then some α ∈ E k m ust satisfy S k ( α ) ≥ S k ( i ∗ ) , and therefore P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ P ( ∃ α ∈ E k : S k ( α ) ≥ S k ( i ∗ )) ≤ P  S k ( i ∗ ) ≤ 1 2 ∆ k  + P  ∃ α ∈ E k : S k ( α ) ≥ 1 2 ∆ k  ≤ P  S k ( i ∗ ) ≤ 1 2 ∆ k  + X α ∈ E k P  Z ( w ) k ( α ) ≥ 1 2 ∆ k  , where w e work conditionally on A k and where, in the last step, w e used S k ( α ) ≤ Z ( w ) k ( α ) for α ∈ E k . Both terms are con trolled by concen tration inequalities together with the Range- Quantile guarantee with budget B k = Θ( T / ( | A k | log ( K ))) , (see Lemma D.2 for details) yielding P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ cK log( T ) · exp  − c ∆ 2 k B k  . Finally , since ⌈| A k | / 8 ⌉ ∆ 2 k ≤ P i  = i ∗ 1 / ∆ 2 i ∗ ,i = H cw , w e obtain ∆ 2 k B k ≳ T (log( K ) log( T ) H cw ) , and the ﬁrst rate follo ws after union b ounding ov er k ≤ k max . Second b ound. Fix s . A t round k , rank { ∆ α, ( s α ) } α ∈ E k in increasing order; denote ∆ E k :1 ≤ · · · ≤ ∆ E k : | E k | the ranked quantities, and set ¯ ∆ k := ∆ E k : ⌈ (7 / 8) | E k |⌉ ≤ 0 , and F k := { α ∈ E k : ∆ α, ( s α ) ≤ ¯ ∆ k } . Lemma D.4 ensures that if i ∗ is eliminated, then at least ⌈| F k | / 3 ⌉ elemen ts of F k surviv e, so that their score is at least equal to S k ( i ∗ ) . Therefore, P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k  + P    { α ∈ F k : S k ( α ) ≥ 1 2 ¯ ∆ k }   ≥ ⌈| F k | / 3 ⌉  . (7) The ﬁrst term is b ounded by exp ( − Ω( ¯ ∆ 2 k B k )) . F or the second, for each α ∈ F k the strong-opp onent SH step ﬁnds a negative witness of magnitude ∆ α, ( s α ) with high probabilit y , yielding an uniform b ound p k on P ( S k ( α ) ≥ ¯ ∆ k / 2) of order exp ( − Ω( B k Γ α / ( K log 3 K ))) (up to polylog factors), where Γ α := s α ∆ 2 α, ( s α ) . When T ≳ H (0) explore ( s ) , then w e show that p k is small, so the coun t in the r.h.s. of (7) admits an exp onential tail of order exp ( − Ω( T / ( log 4 ( K ) log ( T ) H (1) explore ( s )))) . Finally , we ha ve ¯ ∆ 2 k B k ≳ T / (log( K ) H certify ( s )) , giving P ( i ⋆ / ∈ A k +1 | i ⋆ ∈ A k ) ≤ exp − e Ω T − H (0) explore ( s ) H (1) explore ( s ) + H certify ( s ) !! , and a union b ound o ver k ≤ k max completes the pro of. 8 Algorithm 2 FC-CWI Input : c , conﬁdence input δ . φ ← False , T ← 8 K log( K ) . while ¬ φ do R un FB-CWI pro cedure (Algorithm 1) with inputs δ, T , c . Let φ, I b e its output. T ← 2 T . end while Return I . F rom ﬁxed budget to ﬁxed conﬁdence (t wo v eriﬁcations, stop on the ﬁrst). By Theorem 2.1, FB-CWI (Algorithm 1) achiev es error at most δ once the budget T is (up to p olylogs) of order H cw log(1 /δ ) ∧ min ( s i ) i  = i ∗ ∀ i, s i ≤ K i ; < 0 n H certify ( s ) + H (1) explore ( s )  log(1 /δ ) + H (0) explore ( s ) o . (8) Since these quantities are unknown, we run FB-CWI under a doubling schedule on T and stop as so on as a budgeted v eriﬁcation succeeds. It uses at most T queries per stage (in Algorithm 1, ϕ 1 is computed from the same samples as the scores, and ϕ 2 runs Test-CW with at most T / 2 extra queries). A direct v eriﬁcation is to certify that the returned arm I is Condorcet b y testing ∆ I ,j > 0 for all j  = I at lev el 1 − δ , whic h costs P j  = I log (1 /δ ) / ∆ 2 I ,j and matc hes the H cw log (1 /δ ) regime. When the second term in (8) is dominan t, our iden tiﬁcation relies instead on certifying sub-optimalit y: many sub-optimal arms hav e scores S k ( · ) that are t ypically negativ e while the CW remains p ositive. This motiv ates the second veriﬁcation ϕ 1 , which c hec ks that the elimination fron tier is on the negative side. W e stop at the ﬁrst success of ϕ 1 or ϕ 2 , whic h guaran tees δ -correctness. The complete pro of of Theorem 2.2 is presented in Section E of the appendix. Theorem 2.2. Ther e exists a c onstant c 0 such that the fol lowing holds for any δ ∈ (0 , 1 / 6) . Under the assumption of the existenc e of a CW, the output of A lgorithm 2, denote d ψ , with input ( δ, c ) wher e c ≥ c 0 satisﬁes P ( ψ δ  = i ∗ ) ≤ δ . Mor e over, the total numb er of queries N δ made by A lgorithm 2 satisﬁes, with pr ob ability at le ast 1 − δ , N δ ≤ ˜ c ·  H cw log(1 /δ ) ∧ min ( s i ) i  = i ∗ ∀ i, s i ≤ K i ; < 0  H c ertify ( s ) + H (1) explor e ( s )  log(1 /δ ) + H (0) explor e ( s )  , wher e ˜ c is pr op ortional to c and hides lo garithmic factors in K and (∆ i, (1) ) i  = i ∗ and a log log (1 /δ ) factor . Remarks (On the constan t c in the stopping rule) . The stopping c ondition in A lgorithm 1 involves a numeric al c onstant c 0 inherite d fr om the high-pr ob ability analysis of Cor ol lary C.2. This absolute value of c 0 c an b e made explicit by tr acking c onstants in the pr o of. Sinc e we did not optimize numeric al factors, we ke ep c 0 symb olic for r e adability. 3 Instance Dep enden t Lo w er Bounds In this section, w e provide complementary low er b ounds for the budget. First, Theorem 3.1 states a fully instance-dependent lo wer bound for the exp ected budget of any algorithm. Second, Theorem 3.2 establishes, to the best of our knowledge, the ﬁrst high-probability lo w er b ound on the budget for CW iden tiﬁcation in dueling bandits. 9 Denote b y D cw the class of dueling bandit en vironments that admits a CW 1 : D cw := n ∆ ∈ [ − 1 4 , 1 4 ] K × K : ∆ = − ∆ T and ∃ i ∗ ∈ [ K ] suc h that ∀ j  = i ∗ , ∆ i ∗ ,j > 0 o . (9) W e sa y that an algorithm A is δ -correct for CW identiﬁcation if, for any ∆ ∈ D cw , it identiﬁes i ∗ with error probability at most δ , that is, P ∆ ,A ( ˆ i  = i ∗ ( ∆ )) ⩽ δ , where P ∆ ,A denotes the probability 2 induced b y the interaction betw een A and the environmen t with gap matrix ∆ . Theorem 3.1. L et K ⩾ 2 and δ ∈ (0 , 1 / 6) and c onsider any gap matrix ∆ ∈ D cw . F or any algorithm A that is δ -c orr e ct on D cw , the budget N δ satisﬁes E ∆ ,A [ N δ ] ⩾ 1 4 X i  = i ∗ log(1 / (4 δ )) ∆ 2 i, (1) , and P ∆ ,A   N δ ⩾ 1 3 X i  = i ∗ log (1 / (6 δ )) ∆ 2 i, (1)   ⩾ δ. . (10) Pro of Sk etch. The bound (10) admits an in tuitive oracle interpretation. T o certify that i ∗ is the CW, the algorithm must provide evidence that all other arms i  = i ∗ are not CW. Imagine that an oracle reveals, for each i  = i ∗ , the “hardest” opp onent j ∗ ( i ) —that is, the one with largest negativ e gap ∆ i,j ∗ ( i ) = ∆ i, (1) . F o cusing solely on duels ( i, j ∗ ( i )) , one w ould still require at least | ∆ i,j ∗ ( i ) | − 2 log (1 /δ ) duels to reliably conclude that ∆ i, (1) < 0 . This is formalized using standard information-theoretic arguments. The extension to the (1 − δ ) -quan tile of the budget uses similar ideas. Remarks. The b ound (10) c orr esp onds to the minimal c ertiﬁc ation c ost as P i  = i ∗ log  1 4 δ  / ∆ 2 i, (1) is e qual to min s H certify ( s , δ / 4) . In p articular, it r eve als the asymptotic optimality of the exp e cte d budget (1) obtaine d by K arnin [2016] for δ → 0 and ∆ ﬁxe d. Consider the sp e ciﬁc sc enario (CW-SO) wher e the CW is the str ongest opp onent of every sub- optimal arm. Then, the b ound (10) r e duc es to H cw ( δ / 4) , establishing the optimality of A lgorithm 2 to gether with that of Maiti et al. [2024] in this sp e ciﬁc r e gime. Ho wev er, the problem is diﬀerent when the CW is not the strongest opponent. Establishing this requires new pro of techniques to low er b ound budget tails. Recall that, by deﬁnition, s ∗ ∆ = ( s ∗ i ) i  = i ∗ ac hieves the minimum in the sample complexity (4) , and let ∆ ( s ∗ ∆ ) : = (∆ i, ( s i ) ) i  = i ∗ denote the corresp onding gap. The pair ( s ∗ ∆ , ∆ ( s ∗ ∆ ) ) fully characterizes the minimum term in b ound (4) . W e consider, as deﬁned in (5) , the class D ( ∆ ) of instances containing ∆ , parametrized by ( s ∗ ∆ , ∆ ( s ∗ ∆ ) ) . Theorem 3.2. L et A b e a δ -c orr e ct algorithm for CW identiﬁc ation, with δ ⩽ 1 / 12 , and let ∆ ∈ D cw . A ssume that ∆ has no ties, that is, ∀ i  = j , ∆ i,j  = 0 . F or this matrix ∆ , one c an c onstruct a matrix ˜ ∆ by p ermuting the entries of ∆ in such a way that ˜ ∆ ∈ D ( ∆ ) , and such that P ˜ ∆ ,A   N δ ⩾ 1 3 max i  = i ∗ K i ; < 0 ∥ ∆ − i ∥ 2 log  1 6 δ  ∨ 1 37 log(2 K ) X i  = i ∗ K i ; < 0 ∥ ∆ − i ∥ 2   ⩾ δ . (11) Mor e over, for al l i  = i ∗ , the r ows satisfy ( ˜ ∆ i, ( j ) ) j ≤ K i ; < 0 = (∆ i, ( j ) ) j ≤ K i ; < 0 , i.e., ˜ ∆ i, · and ∆ i, · shar e the same K i ; < 0 ne gative entries, up to p ermutation. Remarks. Observe that ˜ ∆ has exactly the same sign structur e as ∆ (i.e., sign ( ∆ ) = sign ( ˜ ∆ ) ), and that the p ermutation pr eserves, in e ach r ow, the multiset of ne gative magnitudes. Intuitively, ˜ ∆ has the same dueling structur e as ∆ , exc ept that the algorithm c an no longer exploit any or dering structur e b etwe en the arms (such as SST). A mor e gener al version—The or em F.2, which also c overs ties and pr ovides an explicit c onstruction—is pr ovide d in A pp endix F.3. 1 The restriction to gaps in aw ay from − 1 / 2 and 1 / 2 is standard in low er b ounds with Bernoulli rewards. 2 denote E ∆ ,A for the corresp onding exp ectation 10 Pro of Sk etch of (11) . The key idea is to reduce CW identiﬁcation to m ultiple active signal detection problems [Castro, 2014]: for eac h i  = i ∗ , w e need to certify that the ro w ∆ k, · has at least a negative entry , this with a probability of error at most δ . Along the w ay , w e hav e to improv e state-of-the-art lo wer bounds for such detection problems. Consider any δ -correct algorithm A . W e introduce a collection ∆ ( π ) of gap matrices that diﬀer from ∆ as we p erm ute, on each ro w i  = i ∗ , the p osition of the negative en tries by some collection π = ( π i ) i  = i ∗ of p ermutations while preserving the skew symmetry . Then, w e ﬁx a speciﬁc arm i  = i ∗ and construct the gap matrices ∆ ( π ,i ) , b y setting to 0 the negativ e entries of the i -th ro w of ∆ ( π ) while preserving sk ew symmetry . As ∆ ( π ,i ) con tains tw o ro ws with non-negative entries, one easily deduces that the budget of A under ∆ ( π ,i ) is arbitrarily large with probabilit y 1 − δ . In con trast, under ∆ , A ﬁnishes b efore χ —the (1 − δ ) -quan tile of N δ under P ∆ ,A . Hence, we reduce A to an activ e testing problem with budget χ for any unkno wn gap matrix ˜ ∆ of the hypotheses H ( i ) 0 : ˜ ∆ = ∆ ( π ,i ) for some p erm. π vs. H ( i ) 1 : ˜ ∆ = ∆ ( π ) for some p erm. π . Since the p ermutation π is unknown to the learner, this allows to impro ve o ver (10) b y accounting for the fact that the algorithm m ust explore all p ossible p ositions of the negative en tries in row i . Then, b y a conv exity argument, w e deduce χ ≳ K i ; < 0 ∥ ∆ − i ∥ 2 log (1 /δ ) . Optimizing ov er i  = i ∗ yields the ﬁrst part of the b ound χ ≳ max i  = i ∗ K i ; < 0 ∥ ∆ − i ∥ 2 log(1 /δ ) . The second term P i  = i ∗ K i ; < 0 ∥ ∆ − i ∥ 2 in terprets as the total cost for testing all h yp otheses H ( i ) 0 against H ( i ) 1 with a constan t error probability . W e develop new argumen ts for this multiple-h yp otheses problem. Unlike the in volv ed tec hnique in Simcho witz et al. [2017], we reduce w.l.o.g. to the case where all tests ha ve similar complexity ∥ ∆ − i ∥ 2 K i ; < 0 and w e write β 2 for this common v alue. Then, w e build up on the symmetry of our problem to reduce to the case where eac h row receiv es the same sampling eﬀort χ/K . Relying again on low er b ounds on active signal detection, we get χ/K ≳ β − 2 , whic h leads to the desired result. W e believe that these argumen ts can generalize to other m ultiple activ e testing problems. The full proof is given in Appendix F.3. Remarks. While the b ound (10) fr om The or em 3.1 c aptur es the c ost of c ertiﬁc ation, the lower b ound (11) c aptur es the intrinsic ne e d of explor ation. Inde e d, fr om the classic al b ound of L emma H.1, we have min s H explor e ( s , δ ) = max i  = i ∗ K log (1 /δ ) / ∥ ∆ − i ∥ 2 + P i  = i ∗ K/ ∥ ∆ − i ∥ 2 , up to a factor log (2 K ) . Imp ortantly, the two lower b ounds in The or ems 3.1 and 3.2 thus imply the fol lowing c or ol lary. Corollary 3.3. L et A b e a δ -c orr e ct algorithm for CW identiﬁc ation, with δ ⩽ 1 / 12 , and let ∆ ∈ D cw . Then, one c an c onstruct ˜ ∆ ∈ D ( ∆ ) , such that P ˜ ∆ ,A ( N δ ≳ H c ertify ( s ∗ , δ ) + H explor e ( s ∗ , δ )) ⩾ δ , (12) wher e ≳ hides a lo g term in K and a numeric al c onstant. Pro of Sketc h. F or a ﬁxed pair ( s ∗ , ∆ ( s ∗ ∆ ) ) , w e construct ˜ ∆ ∈ D ( ∆ ) where eac h row i  = i ∗ has exactly s ∗ i negativ e entries equal to ∆ i, ( s ∗ i ) and the remaining negative entries are all equal to a small ϵ i > 0 . F or this instance, Theorem 3.1 yields H certify ( s ∗ , δ ) = P i  = i ∗ log (1 /δ ) / ∆ 2 i, ( s ∗ i ) . Moreo ver, one can c ho ose parameters such that Theorem 3.2 reduces to H explore ( s ∗ , δ ) . See App endix F.5. Fixed budget low er b ounds. In App endix B, we derive ﬁxed-budget lo wer b ounds that match the exp onential error decay of Theorem 2.1. Unlik e the instance-dependent ﬁxed-conﬁdence bounds ab o ve, these are of minimax nature. 11 4 Discussion In this man uscript, we consider δ -P A C Condorcet-winner identiﬁcation in sto chastic dueling bandits under the sole assumption that a CW exists. W e derive instance-dep enden t, high-probability sample-complexit y guarantees that exploit the full gap matrix, and complement them with new lo wer bounds highlighting diﬀeren t regimes dep ending on the underlying instance. W e also impro ve o v er the state-of-the-art b oth on the upp er b ound and the lo wer-bound side when ∆ satisﬁes stronger assumptions such as w eak sto chastic transitivit y (WST). This is further discussed in Section A. While we characterize the budget of FC-CWI as the inﬁm um of H cw ( δ ) , whic h corresp onds to a ‘direct search’ of the CW and of inf s H certify ( s , δ ) + H explore ( s , δ ) which corresp onds to an ‘elimination’ strategy , our distribution-dependent lo wer b ounds are not alwa ys matc hing. On the upp er-b ound side, our guarantees and pro of techniques can b e sub-optimal in hybrid regimes where, for example, the CW is the strongest opp onen t for a large fraction of arms while b eing nearly tied with a small subset of arms. In such instances, one w ould exp ect an optimal pro cedure to b eha ve heterogeneously: quic kly eliminate “easy” arms by lev eraging the (large) CW gaps, while dedicating targeted eﬀort to the near-ties b y probing the broader matrix to ﬁnd decisiv e witnesses against those ambiguous contenders. Our current analysis do es not fully capture this kind of mixed behavior, and impro ving it lik ely requires a sharper allo cation mechanism that explicitly adapts to a partial satisfaction of (CW-SO) across ro ws. On the low er-b ound side, although our results characterize the exploration/certiﬁcation trade-oﬀ in a lo cal-minimax sense, a sharp er fully instance-dep enden t low er b ound for the (1 − δ ) -quan tile of the budget remains an app ealing direction. This app ears tec hnically challenging b ecause it must control rare but costly adaptive searc h even ts. A more detailed discussion on lo wer b ounds reﬂecting this asp ect is presen ted in App endix F.6. More broadly , CWI is a structured pure-exploration problem ov er a laten t matrix, and closely related issues arise in noisy pa yoﬀ matrix games and Nash equilibrium iden tiﬁcation [Zhao et al., 2023, Maiti et al., 2024, 2025, Ito et al., 2025]. W e b elieve that the exploration-certiﬁcation decomp osition, together with the accompanying tail-oriented lo wer-bound viewpoint for the budget, can b e useful in these settings, and ma y help clarify analogous exp ectation-versus-high-probabilit y separations in equilibrium learning, whose optimal instance-dependent query complexit y remains largely op en – see Maiti [2025]. 12 A c knowledgemen ts This w ork has partially b een supp orted by ANR-21-CE23-0035 (ASCAI, ANR). References Nir Ailon, Zohar Karnin, and Thorsten Joac hims. Reducing dueling bandits to cardinal bandits. In In ternational Conference on Machine Learning, pages 856–864. PMLR, 2014. Jean-Y v es Audibert, Sébastien Bub eck, and Rémi Munos. Best arm identiﬁcation in m ulti-armed bandits. In COL T, pages 41–53, 2010. Viktor Bengs, Rób ert Busa-F ek ete, A dil El Mesaoudi-P aul, and Eyke Hüllermeier. Preference-based online learning with dueling bandits: A survey . The Journal of Mac hine Learning Research , 22 (1):278–385, 2021. Ralph Allan Bradley and Milton E T erry . Rank analysis of incomplete blo c k designs: I. the metho d of paired comparisons. Biometrika, 39(3/4):324–345, 1952. V Buldygin and K Moskvic hov a. The sub-gaussian norm of a binary random v ariable. Theory of probabilit y and mathematical statistics, 86:33–49, 2013. RM Castro. Adaptiv e sensing performance low er bounds for sparse signal detection and supp ort estimation. Bernoulli, 20(4):2217–2246, 2014. Bangrui Chen and Peter I F razier. Dueling bandits with weak regret. In In ternational Conference on Mac hine Learning, pages 731–739. PMLR, 2017. W ei Chen, Yihan Du, Longbo Huang, and Hao yu Zhao. Combinatorial pure exploration for dueling bandit. In International Conference on Machine Learning, pages 1531–1541. PMLR, 2020. Mo ein F alahatgar, Alon Orlitsky , V enkatadheera j Pichapati, and Ananda Theertha Suresh. Maxi- m um selection and ranking under noisy comparisons. In In ternational Conference on Machine Learning, pages 1088–1096. PMLR, 2017. Mo ein F alahatgar, A yush Jain, Alon Orlitsky , V enkatadheera j Pic hapati, and V aishakh Ravin- drakumar. The limits of maxing, ranking, and preference learning. In In ternational conference on mac hine learning, pages 1427–1436. PMLR, 2018. Sébastien Gerc hinovitz, Pierre Ménard, and Gilles Stoltz. F ano’s inequalit y for random v ariables. Statistical science, 35(2):178–201, 2020. Maximilian Graf, Victor Th uot, and Nicolas V erzelen. Clustering items through bandit feedback: Finding the righ t feature out of many . In F ort y-second International Conference on Mac hine Learning, ICML 2025, V ancouv er, BC, Canada, July 13-19, 2025, PMLR, 2025. Björn Haddenhorst, Viktor Bengs, Jasmin Brandt, and Eyke Hüllermeier. T estiﬁcation of condorcet winners in dueling bandits. In Uncertain ty in Artiﬁcial Intelligence , pages 1195–1205. PMLR, 2021a. Björn Haddenhorst, Viktor Bengs, and Eyke Hüllermeier. Iden tiﬁcation of the generalized condorcet winner in multi-dueling bandits. A dv ances in Neural Information Processing Systems , 34:25904– 25916, 2021b. Katja Hofmann, Lihong Li, and Filip Radlinski. Online ev aluation for information retriev al. F ound. T rends Inf. Retr. , 10:1–117, 2016. URL https://api.semanticscholar.org/CorpusID: 34529647 . 13 Shinji Ito, Haip eng Luo, T aira T suc hiya, and Y ue W u. Instance-dependent regret b ounds for learning tw o-play er zero-sum games with bandit feedback. arXiv preprint , 2025. Kevin Jamieson, Sumeet Katariy a, Atul Deshpande, and Rob ert Now ak. Sparse dueling bandits. In Artiﬁcial In telligence and Statistics, pages 416–424. PMLR, 2015. Thorsten Joac hims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay . Ev aluating the accuracy of implicit feedback from clic ks and query reform ulations in web searc h. A CM T ransactions on Information Systems (TOIS), 25(2):7–es, 2007. Zohar Karnin, T omer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In International conference on machine learning, pages 1238–1246. PMLR, 2013. Zohar S Karnin. V eriﬁcation based solution for structured mab problems. A dv ances in Neural Information Pro cessing Systems, 29, 2016. Junp ei K omiyama, Jun ya Honda, Hisashi Kashima, and Hiroshi Nakagaw a. Regret low er bound and optimal algorithm in dueling bandit problem. In Conference on learning theory , pages 1141–1154. PMLR, 2015. Junp ei Komiy ama, Juny a Honda, and Hiroshi Nakaga wa. Cop eland dueling bandit problem: Regret low er b ound, optimal algorithm, and computationally eﬃcient algorithm. In In ternational Conference on Mac hine Learning, pages 1235–1244. PMLR, 2016. T or Lattimore and Csaba Szep esvári. Bandit algorithms. Cambridge Univ ersity Press, 2020. Chang Li, Ilya Marko v, Maarten De Rijke, and Masrour Zoghi. Mergedts: A metho d for eﬀective large-scale online ranker ev aluation. A CM T ransactions on Information Systems (TOIS) , 38(4): 1–28, 2020. R Duncan Luce et al. Individual choice b ehavior, v olume 4. Wiley New Y ork, 1959. Arnab Maiti. Op en problem: Optimal instance-dep endent sample complexity for ﬁnding nash equilibrium in tw o pla yer zero-sum matrix games. In The Thirt y Eighth Ann ual Conference on Learning Theory, pages 6230–6234. PMLR, 2025. Arnab Maiti, Ross Bo czar, Kevin Jamieson, and Lillian Ratliﬀ. Near-optimal pure exploration in matrix games: A generalization of stochastic bandits & dueling bandits. In In ternational Conference on Artiﬁcial In telligence and Statistics, pages 2602–2610. PMLR, 2024. Arnab Maiti, Ross Boczar, Kevin Jamieson, and Lillian Ratliﬀ. Query-eﬃcient algorithm to ﬁnd all nash equilibria in a t wo-pla y er zero-sum matrix game. A CM T ransactions on Economics and Computation, 13(3):1–18, 2025. Shie Mannor and John N T sitsiklis. The sample complexit y of exploration in the m ulti-armed bandit problem. Journal of Machine Learning Research, 5(Jun):623–648, 2004. Erol P eköz, Sheldon M Ross, and Zhengyu Zhang. Dueling bandit problems. Probabilit y in the Engineering and Informational Sciences, 36(2):264–275, 2022. Robin L Plack ett. The analysis of p ermutations. Journal of the Ro yal Statistical So ciety Series C: Applied Statistics, 24(2):193–202, 1975. W en b o Ren, Jia Liu, and Ness Shroﬀ. The sample complexit y of b est- k items selection from pairwise comparisons. In In ternational Conference on Machine Learning , pages 8051–8072. PMLR, 2020. El Mehdi Saad, Nicolas V erzelen, and Alexandra Carp entier. A ctive ranking of experts based on their performances in many tasks. In In ternational Conference on Machine Learning , pages 29490–29513. PMLR, 2023. 14 Aadirupa Saha and Pierre Gaillard. V ersatile dueling bandits: Best-of-b oth w orld analyses for learning from relative preferences. In In ternational Conference on Machine Learning , pages 19011–19026. PMLR, 2022. Aadirupa Saha and Shubham Gupta. Optimal and eﬃcien t dynamic regret algorithms for non- stationary dueling bandits. In In ternational Conference on Machine Learning , pages 19027–19049. PMLR, 2022. Max Simc howitz, Kevin Jamieson, and Benjamin Rec ht. The simulator: Understanding adaptive sampling in the moderate-conﬁdence regime. In Conference on Learning Theory , pages 1794–1834. PMLR, 2017. Louis L Th urstone. A la w of comparative judgment. In Scaling, pages 81–92. Routledge, 2017. Yisong Y ue and Thorsten Joachims. In teractively optimizing information retriev al systems as a dueling bandits problem. In Pro ceedings of the 26th Annual International Conference on Mac hine Learning, pages 1201–1208, 2009. Yisong Y ue, Josef Broder, Rob ert Kleinberg, and Thorsten Joachims. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012. Y ao Zhao, Connor Stephens, Csaba Szep esvári, and Kw ang-Sung Jun. Revisiting simple regret: F ast rates for returning a goo d arm. In In ternational Conference on Machine Learning , pages 42110–42158. PMLR, 2023. Masrour Zoghi, Shimon A Whiteson, Maarten De Rijk e, and Remi Munos. Relativ e conﬁdence sampling for eﬃcien t on-line rank er ev aluation. In Pro ceedings of the 7th A CM in ternational conference on W eb searc h and data mining, pages 73–82, 2014. Masrour Zoghi, Zohar S Karnin, Shimon Whiteson, and Maarten De Rijk e. Copeland dueling bandits. Adv ances in neural information pro cessing systems, 28, 2015a. Masrour Zoghi, Shimon Whiteson, and Maarten de Rijk e. Mergerucb: A metho d for large-scale online rank er ev aluation. In Pro ceedings of the Eigh th ACM In ternational Conference on W eb Searc h and Data Mining, pages 17–26, 2015b. 15 A Comparison to related framew ork and h yp otheses in du- eling bandits T otal order implies a Condorcet winner. W eak sto chastic transitivit y (WST) is a standard suﬃcien t condition for a latent total order: for any i, j, k ∈ [ K ] , ∆ i,j ≥ 0 and ∆ j,k ≥ 0 imply ∆ i,k ≥ 0 . Thus (up to tie-breaking) preferences are transitive and admit a total order, whose maximal elemen t is a Condorcet winner (CW). CW/optimal-arm iden tiﬁcation in suc h total-order regimes is studied in, e.g., F alahatgar et al. [2017, 2018], which pro vide worst-case guaran tees under WST (notably an O ( K ε − 2 log ( K/δ )) b ound for ( ε, δ ) -maxing, i.e., returning an arm I suc h that P (∆ I ,i ∗ ≤ − ε ) ≤ δ )). Since WST is strictly stronger than merely assuming the existence of a CW, our CW-based bounds directly apply under WST. In particular, our b ound for FC-CW1 in Theorem 2.2 impro ves ov er the state-of-the art under WST without relying on that assumption. F urthermore, our low er b ounds in Theorems 3.1 and 3.2 provide no vel distribution-dependent and minimax low er b ounds under the WST assumption. Indeed, regarding Theorem 3.2, if the matrix ∆ satisﬁes WST, then the class D ( ∆ ) deﬁned in (5) considered in that theorem only con tains gap matrices satisfying WST. A stronger assumption is strong sto chastic transitivit y (SST), which adds magnitude constraints (for ordered i ≻ j ≻ k , ∆ i,k ≥ max { ∆ i,j , ∆ j,k } ); in particular, SST implies a CW and ensures that each sub optimal arm attains its largest loss against the CW. Instance-dep endent sample complexity under SST (often with mild regularit y suc h as STI) is c haracterized in F alahatgar et al. [2017], Ren et al. [2020], and our upper b ounds recov er these guarantees (of the order H cw ( δ ) ) up to logarithmic factors without even relying on the SST assumption. Finally , parametric random-utilit y mo dels such as Bradley–T erry–Luce, Plack ett–Luce, and Th urstone [Bradley and T erry, 1952, Luce et al., 1959, Plac kett, 1975, Thurstone, 2017] are more restrictiv e than SST and therefore fall within our scop e; in particular, our bounds recov er existing guaran tees for BTL (see, e.g., Ren et al. [2020]). See the surv ey of Bengs et al. [2021] for a broader o verview. W orks on other t yp es of winners. Bey ond the CW ob jective [Komiy ama et al., 2015, Ailon et al., 2014, Chen and F razier, 2017, Saha and Gaillard, 2022, Peköz et al., 2022], alternativ e notions include the Bor da winner [Chen et al., 2020, Jamieson et al., 2015] (maximizing av erage pairwise adv an tage) and the Cop eland winner [Zoghi et al., 2015a, K omiy ama et al., 2016] (maximizing the n umber of b eaten opp onents). Since the Borda and Condorcet winners can diﬀer, Borda-speciﬁc guaran tees do not transfer to our setting. Copeland winners alwa ys exist (possibly non-unique) and generalize the CW; ﬁxed-conﬁdence identiﬁcation is studied in Zoghi et al. [2015a], but when sp ecialized to CW instances the resulting complexit y is O  max i ∈ [ K ] P j  = i log(1 /δ ) / ∆ 2 i,j  , whic h is lo oser than our b ounds. B Minimax Fixed-Budget Lo w er Bounds In this section, we establish low er b ounds for the ﬁxed-budget CW iden tiﬁcation. In some w a y , these are the coun terparts of the results of Section 3, only that w e only deriv e minimax b ounds, similar to Corollary 3.3. W e specify a class of distributions, parametrized by the ro w of the Condorcet Winner. Let ∆ = (∆ 1 , ∆ 2 , . . . , ∆ K ) , with ∆ 1 = 0 and ∆ i ∈ (0 , 1 / 4) for i  = 1 . Deﬁne D (1) ( ∆ ) as the set of dueling feedbac k distributions whose gap matrix ∆ ∈ D cw is suc h that the row of the Condorcet winner i ∗ , namely ∆ i ∗ , · , is equal to ∆ up to a p erm utation σ . F ormally , D (1) (∆) : = n ∆ ∈ D cw : ∃ σ ∈ S K s.t. i ∗ ∆ = σ (1) , and ∆ i ∗ , · = σ (∆) o , (13) where S K is the set of p ermutations on [ K ] , and for any x ∈ R K , σ ( x ) = ( x σ ( i ) ) i ∈ [ K ] . The following 16 result is the coun terpart of Theorem 3.1. Theorem B.1. L et K ≥ 2 and T ∈ N ∗ and c onsider any ve ctor ∆ with ∆ 1 = 0 , and ∆ i ∈ (0 , 1 / 4) for i  = 1 . F or any algorithm A with a ﬁxe d budget T , one has max ∆ ∈ D (1) (∆) P ∆ ,A  ˆ i T  = i ∗  ≥ 1 4 exp  − 22 T H cw  , wher e H cw = P K i =2 1 ∆ 2 i . Remarks. This the or em r eve als that one c an exhibit a matrix ∆ for which the the exp onential err or de c ay sc aling as exp( − T /H cw ) in The or em 2.1 is tight. The pr o of c an b e found in A pp endix G.1. W e now turn to the coun terpart of Theorem 3.2. Consider the following class of environmen ts, for whic h the quantities ( s ∗ ∆ , ∆ ( s ∗ ) ) , deﬁned in Section 3 are equal to a given couple ( ∆ , s ) up to a p erm utation of the arms. By conv en tion, we extend s ∗ ∆ = ( s ∗ i ) i  = i ∗ in to a K -dimensional vector by ﬁxing as 0 the i ∗ -th en try , that is s ∗ i ∗ : = 0 . W e pro ceed similarly for ∆ ( s ∗ ) . D (2) (∆ , s ) : = n ∆ ∈ D cw : ∃ σ ∈ S K s.t. s ∗ ∆ = σ ( s ) and ∆ ( s ∗ ) = σ (∆) o , (14) Theorem B.2. L et T ∈ N ∗ , K a multiple of 8 , ∆ = (∆ 1 , . . . , ∆ K ) with ∆ 1 = 0 and ∆ i ∈ (0 , 1 / 4) for i  = 1 , and s = ( s 1 , . . . , s K ) with s 1 = 0 , 1 ⩽ s i ⩽ K / 4 for i = 2 , . . . , K . Then, any ﬁxe d-budget algorithm A satisﬁes max ∆ ∈ D (2) (∆ ,s ) P A, ∆ ( ˆ i T  = i ∗ ( ∆ )) ⩾ 1 4 exp   − 5 T max K/ 2 i =2 K s i ∆ 2 i   . (15) If, additional ly ∆ i = ∆ > 0 and s i = s ∈ [ K/ 4] for al l i ∈ { 2 , . . . , K / 2 } , then max ∆ ∈ D (2) (∆ ,s ) P ∆ ,A ( ˆ i T  = i ∗ ( ∆ )) ⩾ 1 4 exp  − 10 T ∆ 2 K  , (16) max ∆ ∈ D (2) (∆ ,s ) P ∆ ,A ( ˆ i T  = i ∗ ( ∆ )) ⩾ 1 2 − r 37 s ∆ 2 K 2 T . (17) Remarks. Equation (17) shows that for matric es wher e half the r ows e qual (up to p ermutation) a ve ctor with s entries at − ∆ and the r est ne ar zer o, any algorithm must p ay K 2 s ∆ 2 to r e ach a c onstant suc c ess pr ob ability. This is aligne d with the pr ob ability-indep endent c ost H (0) explore ( s )) that suﬀers A lgorithm 1 –se e The or em 2.1. Equation (15) establishes pr ob ability of err or is no smal ler than exp ( − T /H (1) explore ( s )) , while (16) implies that the quantity exp ( − T /H certify ( s )) is r e quir e d, at le ast for some sp e ciﬁc highly structur e d matric es. Comparison with Fixed Conﬁdence lo wer b ounds. The pro ofs for ﬁxed-conﬁdence and ﬁxed-budget settings are remarkably similar. This reﬂects the strong connection b etw een ﬁxed- budget algorithms and ﬁxed-conﬁdence algorithms with high-probabilit y budget bounds. Ho wev er, ﬁxed-budget minimax low er b ounds cannot b e directly deduced from the instance-dependent ﬁxed conﬁdence lo wer b ounds derived earlier. W e conjecture that obtaining instance-dep endent low er b ounds in the ﬁxed-budget setting is considerably more c hallenging—if not outright impossible. The fundamen tal reason lies in the nature of the algorithms themselves. Any δ -correct ﬁxed- conﬁdence algorithm must incorp orate an internal stopping criterion that chec ks that the iden tiﬁed arm ˆ i is indeed the CW, under the assumption that suc h a unique CW exists in D cw . Consider no w 17 what happ ens when applying this algorithm to a mo diﬁed en vironment with t wo “weak” Condorcet winner candidates i ∗ 1 and i ∗ 2 , where b oth ro ws satisfy ∆ i ∗ 1 , · ⩾ 0 and ∆ i ∗ 2 , · ⩾ 0 . The algorithm cannot distinguish betw een these candidates with probability 1 − δ , forcing an inﬁnite exp ected stopping time as the v eriﬁcation step never conﬁden tly resolves the ambiguit y . Fixed-budget algorithms fundamentally lack suc h stopping rules, rendering this inﬁnite-budget argumen t inapplicable. Instead, our low er bounds rely on symmetry argumen ts: for matrices where m ultiple sub optimal arms ha ve identical “diﬃcult y proﬁles” (i.e., identical m ultisets of negativ e gaps), any reasonable algorithm m ust select uniformly at random among the am biguous b est arms. Deriving such low er b ounds requires constructing highly symmetric matrices that exploit this randomization, a signiﬁcantly more restrictive condition than the instance-dep endent constructions used in ﬁxed conﬁdence. In Sections F and G, w e prop ose tw o complete constructions resp ectively for eac h setting. The pro ofs for ﬁxed-budget results are similar to the ﬁxed-conﬁdence ones, except that they are applied to sp eciﬁc highly symmetric matrices for the reasons describ ed ab o ve. C In termediate results: A daptiv e quan tile estimation In this section, we present an algorithm for quantile estimation with a ﬁxed budget of samples, and a high-probabilit y guarantee on the estimation error. This result is of indep endent interest and will b e used as a k ey building blo ck in the proofs of the main results. C.1 Quan tile Brack eting Consider a classical K -armed bandit setting, where w e are given K arms with means ( µ i ) i ∈ [ K ] . W e assume that the samples from the arms are b ounded 3 b y 1 , and without loss of generality that the means are distinct. Denote by µ (1) < µ (2) < . . . < µ ( K ) the ordered means of the K arms. F or t wo in tegers d ≤ u in [ K ] , our ob jective is to ﬁnd a point in the in terv al [ µ ( d ) , µ ( u ) ] . F or this task, a learner is giv en a ﬁxed budget of T queries, after which the learner outputs a quan tity q T . In this framework, w e are targeting an ‘adaptive’ guarantee in the follo wing sense. Giv en the budget T as input, we w ant the output to satisfy ∀ ϵ > 0 , P  q T / ∈  µ ( d ) − ϵ, µ ( u ) + ϵ  ≤ exp  − c · r ϵ 2 T log( T )  , where c is a p ositive universal constan t, and r is a p ositive quantit y dep ending only on d, u and K . The allocation strategy w e dev elop requires a suﬃciently large budget. Sp eciﬁcally , we assume that T ≥ 128 K u − d log 2  128 K u − d  . When T falls below this threshold, the resulting guaran tee becomes v acuous (the stated upper b ound on the error probability exceeds 1 ). In this regime, w e therefore resort to an arbitrary heuristic. Algorithm 3 implemen ts this b y explicitly branching betw een the small-budget case T < 128 K u − d log 2  128 K u − d  and the regime where the budget is large enough for the analysis to b e meaningful. Solving this problem requires balancing the tasks of lo cating the arms whose means fall within the desired rank range, and estimating these means accurately . The algorithm runs a multi-step sc heme indexed b y ℓ . At each level ℓ , it draws a random multi-set A ℓ of arms large enough to 3 The result can b e extended to sub-Gaussian v ariables 18 con tain, with go o d probability , representativ es of the [ d, u ] quantile range. Then, it allo cates Θ( ϵ − 2 ℓ ) samples p er selected arm so that empirical means are accurate up to ϵ ℓ . Then w e form three empirical quan tiles ˆ t (1) ℓ , ˆ t (2) ℓ , ˆ t (3) ℓ corresp onding to ranks slightly below, near the middle of, and sligh tly ab ov e [ d, u ] , yielding a noisy brac ket around the target in terv al. Finally , a Lepski-t yp e stabilit y rule selects the earliest level ¯ ℓ whose middle estimate ˆ t (2) ℓ remains consisten t with the brac kets produced at all ﬁner levels (up to the tolerance 2 ϵ ℓ ′ ), and returns ˆ t (2) ¯ ℓ . Theorem C.1 states that with budget T , the output lies in [ µ ( d ) − ϵ, µ ( u ) + ϵ ] with high probability for ev ery ϵ ∈ (0 , 1) , and the failure probabilit y decays essen tially as exp ( − Θ( ϵ 2 T / log T )) (up to the m ultiplicative factor 40 log 2 ( T ) and the extra log( 16 K u − d ) term). The quantit y r = min  d K , 1 − u K  ·  u − d K  2 , captures the diﬃculty of the target rank range as it decreases when the interv al is narrow er ( u − d small) or when it is close to the extremes (small d or large u ). When d and u are constant fractions of K , we hav e r = Θ(1) and the bound simpliﬁes to log ( T ) exp( − cϵ 2 T / log T ) . Algorithm 3 Range-Quantile ( K, d, u, T ) Input : K num b er of arms and a budget T , in tegers d ≤ u in [ K ] . L = ⌊ log 2 ( T / log 2 ( T )) ⌋ and ℓ min = l log 2  16 K u − d m /* Consider separately the case where we have a small budget */ if T ≤ 128 K u − d log 2  128 K u − d  or u = d then Allo cate budget uniformly ov er the arms and return the a verage of empirical means betw een ranks d and u end if /* Otherwise if we have large budget: */ for ℓ = ℓ min , . . . , L − 1 do Let ϵ ℓ = 2 · 2 − ( L − ℓ ) / 2 . Sample a set A ℓ of  ϵ 2 ℓ T log ( 16 K u − d ) log 2 ( T )  arms from [ K ] with replacement (duplicates are treated as diﬀeren t arms). Allo cate  log ( 16 K u − d ) 2 ϵ 2 ℓ  samples to eac h arm a ∈ A ℓ , and compute its empirical mean ˆ µ a . Rank the empirical means in A ℓ in increasing order: ˆ µ (1) ≤ · · · ≤ ˆ µ ( |A ℓ | ) . Let: ˆ t (1) ℓ = ˆ µ ( ⌈ 3 d + u 4 K |A ℓ | ⌉ ) , ˆ t (2) ℓ = ˆ µ ( ⌈ d + u 2 K |A ℓ | ⌉ ) , ˆ t (3) ℓ = ˆ µ ( ⌈ d +3 u 4 K |A ℓ | ⌉ ) . (18) end for Let ¯ ℓ = min ℓ ∈ J ℓ min ,L − 1 K n ∀ ℓ ′ ∈ { ℓ, . . . , L − 1 } : ˆ t (2) ℓ ∈ h ˆ t (1) ℓ ′ − 2 ϵ ℓ ′ , ˆ t (3) ℓ ′ + 2 ϵ ℓ ′ io , Return ˆ t (2) ¯ ℓ . Theorem C.1. Consider A lgorithm 3 with inputs ( K, d, u, T ) , such that u > d . Then, the output satisﬁes for any ϵ ∈ (0 , 1) : P  ˆ t (2) ¯ ℓ / ∈ [ µ ( d ) − ϵ, µ ( u ) + ϵ ]  ≤ 40 log 2 ( T ) exp   − c · r · ϵ 2 T log  16 K u − d  log 2 ( T )   , (19) wher e r = min  d K , 1 − u K   u − d K  2 is a p ositive quantity dep ending only on d , u and K , and c is an absolute numeric al c onstant. 19 Here, we did not try to optimize the constan ts. Next, w e state a corollary that will be used in the pro ofs of the main theorems. Corollary C.2. Consider A lgorithm 3 with inputs ( K, ⌈ K / 8 ⌉ , ⌈ K/ 4 ⌉ , T ) , wher e T ≥ 4 . Then, the output satisﬁes for any ϵ ∈ (0 , 1) : P  ˆ t (2) ¯ ℓ / ∈ [ µ ( ⌈ K/ 8 ⌉ ) − ϵ, µ ( ⌈ K/ 4 ⌉ ) + ϵ ]  ≤ log ( T ) exp  − c · ϵ 2 T log( T )  , wher e c is an absolute numeric al c onstant smal ler than 1 . of Cor ol lary C.2. Supp ose K ≥ 5 , then ⌈ K/ 4 ⌉ > ⌈ K/ 8 ⌉ . Let ϵ > 0 and I = [ µ ( ⌈ K/ 8 ⌉ ) − ϵ, µ ( ⌈ K/ 4 ⌉ ) + ϵ ] . W e hav e min  ⌈ K/ 8 ⌉ K , 1 − ⌈ K/ 4 ⌉ K  ≥ min  1 8 , 1 − K/ 4 + 1 K  ≥ 1 8 . (20) Moreo ver, using K ≥ 5 and K = 8 q + r where q ∈ N and r ∈ { 0 , . . . , 7 } , w e show that  ⌈ K/ 4 ⌉ − ⌈ K / 8 ⌉ K  2 ≥ 1 144 . (21) Applying Theorem C.1 with u = ⌈ K/ 4 ⌉ , d = ⌈ K/ 8 ⌉ and using the b ounds (20) and (21) w e obtain P  ˆ t (2) ¯ ℓ / ∈ I  ≤ min  1 , 40 log 2 ( T ) exp  − c 1 8 · 1 144 · ϵ 2 T 3 log 2 ( T )  ≤ log ( T ) exp  − c ′ ϵ 2 T log( T )  , where c ′ is a n umerical constan t. The last line follows by absorbing all numerical constants in to c ′ > 0 , using T ≥ 4 . Supp ose no w that K ∈ { 2 , 3 , 4 } , then ⌈ K/ 4 ⌉ = ⌈ K/ 8 ⌉ = 1 . In this case, Algorithm 3 allocates at least T / 4 samples to eac h arm and outputs the minimal empirical mean. Let a ∈ [ K ] denote the index corresponding to the arm with the smallest true mean, w e therefore ha ve (let ˆ t denote the output) P  ˆ t / ∈ [ µ ( ⌈ K/ 8 ⌉ ) − ϵ, µ ( ⌈ K/ 4 ⌉ ) + ϵ ]  = P  ˆ t < µ (1) − ϵ  + P  ˆ t > µ (1) + ϵ  ≤ P ( ˆ µ a > µ a + ϵ ) + K X i =1 P ( ˆ µ i < µ i − ϵ ) , where ˆ µ i denotes the empirical mean of arm i . W e conclude using Hoeﬀding’s inequality , with the fact that eac h arm receives at least T / 4 samples that P  ˆ t / ∈ [ µ ( ⌈ K/ 8 ⌉ ) − ϵ, µ ( ⌈ K/ 4 ⌉ ) + ϵ ]  ≤ 5 exp  − ϵ 2 T 2  , whic h corresp onds to the result. C.2 Pro of of Theorem C.1 Since the righ t-hand side of (19) do es not dep end on the v alues of µ (1) , . . . , µ ( K ) , it suﬃces to treat the strictly ordered case where µ (1) < µ (2) < · · · < µ ( K ) ; the case where some v alues are p erhaps iden tical follows b y a contin uit y argument. 20 Pr o of. If T ≤ 128 K u − d log 2  128 K u − d  the bound is v acuous. Assume that T ≥ 128 K u − d log 2  128 K u − d  so that log 2 ( log 2 ( T )) ≥ 1 and ℓ min ≤ L − 1 . W e in tro duce the following additional notation, for eac h ℓ ∈ { ℓ min , . . . , L − 1 } where ℓ min = l log 2  16 K u − d m , let N ℓ := |A ℓ | =     ϵ 2 ℓ T log  16 K u − d  log 2 ( T )     and T ℓ :=     log  16 K u − d  2 ϵ 2 ℓ     , (22) and let r 0 := d K , r 1 := 3 d + u 4 K , r 2 := d + u 2 K , r 3 := d + 3 u 4 K and r 4 := u K . The pro of follo ws the steps b elow • W e start by a sanity c heck, v erifying that the total num b er of queries made by Algorithm 3 is at most T , and that ¯ ℓ exists. • Next, we sho w an intermediary result ab out the quantities ˆ t ( i ) ℓ for i ∈ { 1 , 2 , 3 } in the form of an upp er-b ound on P  ˆ t ( i ) ℓ / ∈  µ  r i − 1 + r i 2 · K  − ϵ ℓ , µ  r i + r i +1 2 · K  + ϵ ℓ  . • Finally , we build on the obtained intermediary result to pro ve that the wa y ¯ ℓ is deﬁned allo ws to ha ve the stated guaran tees. Sanit y chec ks: Recall the expressions L = ⌊ log 2 ( T / log 2 ( T )) ⌋ , ϵ ℓ = 2 . 2 − ( L − ℓ ) / 2 and |A ℓ | =  ϵ 2 ℓ T log ( 16 K u − d ) log 2 ( T )  . Algorithm 3 comprises L − ℓ min iterations, for each iteration ℓ ∈ { ℓ min , . . . , L − 1 } it mak es |A ℓ | T ℓ queries. Thus, the total n um b er of queries is L − 1 X ℓ = ℓ min |A ℓ | ·     log  16 K u − d  2 ϵ 2 ℓ     = L − 1 X ℓ = ℓ min     ϵ 2 ℓ T log  16 K u − d  log 2 ( T )     ·     log  16 K u − d  2 ϵ 2 ℓ     ≤ L − 1 X ℓ = ℓ min ϵ 2 ℓ T log  16 K u − d  log 2 ( T ) ·   log  16 K u − d  2 ϵ 2 ℓ + 1   = L − 1 X ℓ = ℓ min T 2 log 2 ( T ) + ϵ 2 ℓ T log  16 K u − d  log 2 ( T ) < T 2 + T log  16 K u − d  log 2 ( T ) L − 1 X ℓ = ℓ min ϵ 2 ℓ ≤ T 2 + 4 T log  16 K u − d  log 2 ( T ) ≤ T , where w e used in the last line the threshold condition on T , giving log(16 K / ( u − d )) log 2 ( T ) ≥ 8 . F or the deﬁnition of the quan tit y ¯ ℓ , note that the set o ver which the minim um is tak en is not empt y , since it alwa ys contains L − 1 . 21 Step 2: Per-lev el quan tile control. In this step w e will prov e that for every level ℓ ∈ { ℓ min , . . . , L − 1 } and every i ∈ { 1 , 2 , 3 } P  ˆ t ( i ) ℓ / ∈ C ℓ,i  ≤ p ℓ , (23) where w e deﬁne (for each ℓ and i ∈ { 1 , 2 , 3 } ) C ℓ,i : =  µ  r i − 1 + r i 2 K  − ϵ ℓ , µ  r i + r i +1 2 K  + ϵ ℓ  , κ d,u : = min n d K , 1 − u K o u − d 60 K  2 , p ℓ : = 4 exp( − κ d,u N ℓ ) . Throughout this step, ﬁx ℓ ∈ { ℓ min , . . . , L − 1 } and i ∈ { 1 , 2 , 3 } . Deﬁne the tw o (random-sample) ranks r − : =  r i − 1 + 2 r i 3 N ℓ  , r + : =  2 r i + r i +1 3 N ℓ  , and deﬁne the t wo (population) brack et p oints m − : = µ  r i − 1 + r i 2 K  , m + : = µ  r i + r i +1 2 K  . Let γ 1 , . . . , γ N ℓ b e the (random) true means of the sampled m ultiset A ℓ , and let γ (1) ≤ · · · ≤ γ ( N ℓ ) b e their order statistics (ties brok en arbitrarily). Next, in tro duce the even t E ( i ) giv en by E ( i ) : =  γ ( r − ) < m −  ∪  γ ( r + ) > m +  . Then, b y a union b ound, P  ˆ t ( i ) ℓ / ∈ C ℓ,i  ≤ P  E ( i )  | {z } T erm 1 + P  ˆ t ( i ) ℓ / ∈ C ℓ,i and ¬ E ( i )  | {z } T erm 2 . (24) Next w e will control the probabilit y of E ( i ) (T erm 1 ). Deﬁne the counts M 1 : = |{ j ∈ [ N ℓ ] : γ j < m − }| , M 2 : = |{ j ∈ [ N ℓ ] : γ j > m + }| . Since A ℓ is obtained b y sampling arms i.i.d. with replacemen t from [ K ] , these are binomials: M 1 ∼ Bin   N ℓ , l r i − 1 + r i 2 K m − 1 K   , M 2 ∼ Bin   N ℓ , 1 − l r i + r i +1 2 K m K   . Moreo ver, b y deﬁnition of order statistics we hav e { γ ( r − ) < m − } ⊆ { M 1 ≥ r − } , { γ ( r + ) > m + } ⊆ { M 2 ≥ N ℓ − r + + 1 } . Therefore, P  E ( i )  ≤ P ( M 1 ≥ r − ) + P ( M 2 ≥ N ℓ − r + + 1) . Let us b ound the tw o terms in the upp er b ound ab ov e using binomial tail b ounds. Using ⌈ x ⌉ − 1 ≤ x and ⌈ x ⌉ ≥ x , w e hav e the parameter b ounds l r i − 1 + r i 2 K m − 1 K ≤ r i − 1 + r i 2 , 1 − l r i + r i +1 2 K m K ≤ 1 − r i + r i +1 2 . 22 Hence, M 1 and M 2 are sto chastically dominated by Bin ( N ℓ , r i − 1 + r i 2 ) and Bin ( N ℓ , 1 − r i + r i +1 2 ) , resp ectiv ely . Also, b y construction we hav e r i − 1 + 2 r i 3 − r i − 1 + r i 2 = r i − r i − 1 6 ,  1 − 2 r i + r i +1 3  −  1 − r i + r i +1 2  = r i +1 − r i 6 . Applying Ho eﬀding’s inequalit y to these dominating binomials yields P ( M 1 ≥ r − ) ≤ exp  − 2 N ℓ  r i − r i − 1 6  2  , P ( M 2 ≥ N ℓ − r + + 1) ≤ exp  − 2 N ℓ  r i +1 − r i 6  2  . Since for eac h j ∈ { 1 , 2 , 3 , 4 } one has r j − r j − 1 ≥ u − d 4 K , w e obtain T erm 1 = P  E ( i )  ≤ 2 exp  − N ℓ ( u − d ) 2 288 K 2  . (25) Next, let us upp er b ound T erm 2 in (24). On ¬ E ( i ) w e hav e γ ( r − ) ≥ m − and γ ( r + ) ≤ m + , hence [ m − − ϵ ℓ , m + + ϵ ℓ ] ⊇ [ γ ( r − ) − ϵ ℓ , γ ( r + ) + ϵ ℓ ] . Therefore, T erm 2 = P  ˆ t ( i ) ℓ / ∈ [ m − − ϵ ℓ , m + + ϵ ℓ ] and ¬ E ( i )  ≤ P  ˆ t ( i ) ℓ / ∈ [ γ ( r − ) − ϵ ℓ , γ ( r + ) + ϵ ℓ ]  . By Lemma C.4, this implies T erm 2 ≤ 2 exp( − κ d,u N ℓ ) . (26) Com bining (24), (25) and (26), and using that κ d,u ≤ ( u − d ) 2 / (288 K 2 ) , w e obtain P  ˆ t ( i ) ℓ / ∈ C ℓ,i  ≤ 4 exp( − κ d,u N ℓ ) = p ℓ , whic h is exactly (23). Step 3: Conclusion. If ϵ < 3 ϵ ℓ min , then the upp er b ound of the theorem is greater than 1 and the b ound is v acuous. Assume that ϵ ≥ 3 ϵ ℓ min . Let ℓ ⋆ b e the largest lev el such that 3 ϵ ℓ ⋆ ≤ ϵ . This implies in particular, since ℓ ⋆ + 1 violates the condition ab ov e, that ϵ < 3 ϵ ℓ ⋆ +1 = 3 √ 2 ϵ ℓ ⋆ , therefore ϵ ℓ ⋆ ≥ ϵ 3 √ 2 . (27) Next, w e will prov e that for any ℓ ∈ { ℓ min , . . . , L − 1 } , we ha ve P ( ˆ t (2) ¯ ℓ / ∈ [ µ ( d ) − 3 ϵ ℓ , µ ( u ) + 3 ϵ ℓ ]) ≤ 2(4 L + 1) p ℓ . Let ℓ ∈ { ℓ min , . . . , L − 1 } , recall that by deﬁnition of ¯ ℓ , for l ⩾ ¯ ℓ , one has ˆ t (2) ℓ ⩽ ˆ t (3) ℓ ′ + 2 ϵ ℓ ′ . Then, it holds that P  ˆ t (2) ¯ ℓ > µ ( u ) + 3 ϵ ℓ  ≤ P ( ¯ ℓ > ℓ ) + P ( ˆ t (3) ℓ > µ ( u ) + ϵ ℓ ) ≤ 4 Lp ℓ + p ℓ , 23 where we use Lemma C.3, whic h ensures that for ev ery ℓ ∈ { ℓ min , . . . , L − 1 } w e ha ve P ( ¯ ℓ > ℓ ) ≤ 4 Lp ℓ , and w e use the Bound 23 from step 2 with i = 3 . The second b ound P  ˆ t (2) ¯ ℓ < µ ( d ) − 3 ϵ ℓ  ≤ (4 L + 1) p ℓ , is pro ven using the same argumen ts (in particular Bound 23 with i = 1 ). Applying this b ound to ℓ ⋆ , using 3 ϵ ℓ ⋆ ⩽ ϵ , we ha ve P  ˆ t (2) ¯ ℓ / ∈ [ µ ( d ) − ϵ, µ ( u ) + ϵ ]  ≤ P  ˆ t (2) ¯ ℓ / ∈ [ µ ( d ) − 3 ϵ ℓ ⋆ , µ ( u ) + 3 ϵ ℓ ⋆ ]  ≤ 2(4 L + 1) p ℓ ⋆ . Next, in order to upp er b ound p ℓ ⋆ w e use the following lo wer b ound on N ℓ ⋆ N ℓ ⋆ =     ϵ 2 ℓ ⋆ T log  16 K u − d  log 2 ( T )     ≥ 1 2 ϵ 2 ℓ ⋆ T log  16 K u − d  log 2 ( T ) ≥ ϵ 2 T 36 log  16 K u − d  log 2 ( T ) , where we use the fact that from the assumption on the budget T , N ℓ ⋆ ⩾ 2 , and ϵ ℓ ⋆ ⩾ ϵ/ 3 √ 2 (see (27)). Therefore, using the deﬁnition of p ℓ , p ℓ ⋆ = 4 exp( − κ d,u · N ℓ ⋆ ) ≤ 4 exp   − min  d K , 1 − u K   u − d 60 K  2 · ϵ 2 T 36 log  16 K u − d  log 2 ( T )   = 4 exp   − cr ϵ 2 T log  16 K u − d  log 2 ( T )   , where c > 0 is an absolute constan t, and r = min  d K , 1 − u K   u − d K  2 . Finally , giv en that L ≤ log 2 ( T ) and 2(4 L + 1) · 4 ≤ 40 log 2 ( T ) for T ≥ 2 , P  ˆ t (2) ¯ ℓ / ∈ [ µ ( d ) − ϵ, µ ( u ) + ϵ ]  ≤ 40 log 2 ( T ) exp   − cr ϵ 2 T log  16 K u − d  log 2 ( T )   , whic h is the desired b ound. Belo w are tw o technical lemmas deferred here to av oid cluttering the pro of ab ov e. Lemma C.3. F or every ℓ ∈ { ℓ min , . . . , L − 1 } , we have P ( ¯ ℓ > ℓ ) ≤ 4 Lp ℓ . Pr o of. Supp ose that ¯ ℓ > ℓ , then using the deﬁnition of ¯ ℓ w e ha v e necessarily that there exists ℓ ′ ⩾ ℓ suc h that ˆ t (2) ℓ / ∈ I ℓ ′ , with I ℓ ′ = h ˆ t (1) ℓ ′ − 2 ϵ ℓ ′ , ˆ t (3) ℓ ′ + 2 ϵ ℓ ′ i . Therefore, P ( ¯ ℓ > ℓ ) ≤ X ℓ ′ ⩾ ℓ P  ˆ t (2) ℓ < ˆ t (1) ℓ ′ − 2 ϵ ℓ ′  + X ℓ ′ ⩾ ℓ P  ˆ t (2) ℓ > ˆ t (3) ℓ ′ + 2 ϵ ℓ ′  . Let m − : = µ ( ⌈ r 1 + r 2 2 K ⌉ ) , m + : = µ ( ⌈ r 2 + r 3 2 K ⌉ ) . 24 Let ℓ ′ ⩾ ℓ , the ev ent ˆ t (2) ℓ < ˆ t (1) ℓ ′ − 2 ϵ ℓ ′ implies ˆ t (1) ℓ ′ > m − + ϵ ℓ ′ or ˆ t (2) ℓ < m − − ϵ ℓ ′ . Since we ha ve ϵ ℓ ′ ≥ ϵ ℓ for ℓ ′ ⩾ ℓ , Bound 23 yields P ( ˆ t (1) ℓ ′ > m − + ϵ ℓ ′ ) ≤ p ℓ ′ and P ( ˆ t (2) ℓ < m − − ϵ ℓ ′ ) ≤ P ( ˆ t (2) ℓ < m − − ϵ ℓ ) ≤ p ℓ . Therefore, P ( ˆ t (2) ℓ < ˆ t (1) ℓ ′ − 2 ϵ ℓ ′ ) ≤ p ℓ + p ℓ ′ . Similarly , P ( ˆ t (2) ℓ > ˆ t (3) ℓ ′ + 2 ϵ ℓ ′ ) ≤ p ℓ + p ℓ ′ . Therefore, for ev ery ℓ ′ ⩾ ℓ , P ( ˆ t (2) ℓ / ∈ I ℓ ′ ) ≤ 2 p ℓ + 2 p ℓ ′ . Summing o ver ℓ ′ ⩾ ℓ , P ( ¯ ℓ > ℓ ) ≤ X ℓ ′ ⩾ ℓ (2 p ℓ + 2 p ℓ ′ ) ≤ 2 Lp ℓ + 2 X ℓ ′ ⩾ ℓ p ℓ ′ . Since N ℓ ′ increases with ℓ ′ (th us p ℓ ′ is decreasing), w e hav e P ℓ ′ ⩾ ℓ p ℓ ′ ≤ Lp ℓ , which gives P ( ¯ ℓ > ℓ ) ≤ 4 Lp ℓ . Lemma C.4. L et ℓ ∈ { ℓ min , . . . , L − 1 } and c onsider the notation intr o duc e d in the pr o of of The or em C.1. F or e ach i ∈ { 1 , 2 , 3 } , deﬁne the two indic es r − : =  r i − 1 + 2 r i 3 N ℓ  and r + : =  2 r i + r i +1 3 N ℓ  . Then P  ˆ t ( i ) ℓ / ∈  γ ( r − ) − ϵ ℓ , γ ( r + ) + ϵ ℓ  | A ℓ  ≤ 2 exp ( − κ d,u · N ℓ ) . Pr o of. Fix ℓ ∈ { ℓ min , . . . , L − 1 } and i ∈ { 1 , 2 , 3 } . Deﬁne q = ⌈ r i N ℓ ⌉ . Giv en that w e ha ve T ℓ = ⌈ log(16 K / ( u − d )) / (2 ϵ 2 ℓ ) ⌉ , let δ ℓ : = exp  − 2 T ℓ ϵ 2 ℓ  ≤ exp ( − log (16 K/ ( u − d ))) = u − d 16 K . (28) Moreo ver, for every arm j , giv en that samples are 1 -range bounded and ˆ µ j is computed with T ℓ samples, Ho eﬀding’s inequalit y gives P ( ˆ µ j ≤ γ j − ϵ ℓ | A ℓ ) ≤ δ ℓ , P ( ˆ µ j ≥ γ j + ϵ ℓ | A ℓ ) ≤ δ ℓ . (29) Next, w e pro ve the low er tail of the claimed b ound. Recall that ˆ t ( i ) ℓ = ˆ µ ( q ) and r − : = l r i − 1 +2 r i 3 N ℓ m and deﬁne the ev ent E − : = { ˆ µ ( q ) < γ ( r − ) − ϵ ℓ } . If E − o ccurs, then at least q empirical means are smaller than γ ( r − ) − ϵ ℓ (with q ⩾ r − ). Given that ( γ ( k ) ) k are in non-decreasing order, w e conclude that among the set G − : = { j : γ j ≥ γ ( r − ) } , then at least q − ( r − − 1) elements m ust satisfy ˆ µ j < γ ( r − ) − ϵ ℓ . F or each elemen t j ∈ G − w e hav e γ j ≥ γ ( r − ) , hence { ˆ µ j < γ ( r − ) − ϵ ℓ } ⊆ { ˆ µ j < γ j − ϵ ℓ } , using (28) and (29) w e hav e 25 that, conditionally on A ℓ , eac h such down ward deviation has probability at most ( u − d ) / (16 K ) . Therefore, w e conclude that P ( E − | A ℓ ) ≤ P ( Bin ( |G − | , ( u − d ) / (16 K )) ≥ q − ( r − − 1)) . Using the deﬁnitions of q , r − and r i , w e hav e q − ( r − − 1) = ⌈ r i N ℓ ⌉ −  r i − 1 + 2 r i 3 N ℓ  − 1  ≥ r i N ℓ − r i − 1 + 2 r i 3 N ℓ = r i − r i − 1 3 N ℓ = u − d 12 K N ℓ ≥ u − d 12 K |G − | . Applying Ho eﬀding’s inequalit y for binomials yields P ( E − | A ℓ ) ≤ exp − 2 |G − |  u − d 12 K − u − d 16 K  2 ! ≤ exp − 2 |G − |  u − d 48 K  2 ! . Finally , for i ∈ { 1 , 2 , 3 } w e hav e r i − 1 +2 r i 3 ≤ u K , hence |G − | = N ℓ − r − + 1 ≥ (1 − u/K ) N ℓ . Therefore, P ( E − | A ℓ ) ≤ exp − 2  1 − u K  N ℓ  u − d 48 K  2 ! ≤ exp ( − κ d,u · N ℓ ) . (30) Let us show the upp er tail of the claimed bound. W e follow similar steps as in the low er tail proof. Consider r + : = l 2 r i + r i +1 3 N ℓ m and deﬁne E + : = { ˆ µ ( q ) > γ ( r + ) + ϵ ℓ } . If E + o ccurs, then at least N ℓ − q + 1 empirical means exceed γ ( r + ) + ϵ ℓ . At most N ℓ − r + arms can hav e true mean larger than γ ( r + ) , therefore at least ( N ℓ − q + 1) − ( N ℓ − r + ) = r + − q + 1 arms from the set G + : = { j : γ j ≤ γ ( r + ) } m ust satisfy ˆ µ j > γ ( r + ) + ϵ ℓ . F or eac h j ∈ G + w e hav e γ j ≤ γ ( r + ) , therefore { ˆ µ j > γ ( r + ) + ϵ ℓ } ⊆ { ˆ µ j ≥ γ j + ϵ ℓ } , using (28) and (29) w e hav e that, conditionally on A ℓ , eac h such upw ard deviation has probabilit y at most ( u − d ) / (16 K ) conditionally on A ℓ . Thus, conditionally on A ℓ , w e hav e P ( E + | A ℓ ) ≤ P ( Bin ( |G + | , ( u − d ) / (16 K )) ≥ r + − q + 1) . Also, w e hav e r + − q + 1 ≥ 2 r i + r i +1 3 N ℓ − r i N ℓ = r i +1 − r i 3 N ℓ = u − d 12 K N ℓ . 26 Therefore, using the binomial Ho eﬀding b ound w e obtain P ( E + | A ℓ ) ≤ exp − 2 |G + |  u − d 48 K  2 ! ≤ exp − 2 r +  u − d 48 K  2 ! . Moreo ver, for i ∈ { 1 , 2 , 3 } we ha ve 2 r i + r i +1 3 ≥ d K , so r + ≥ d K N ℓ . Hence, P ( E + | A ℓ ) ≤ exp − 2 d K N ℓ  u − d 48 K  2 ! ≤ exp ( − κ d,u · N ℓ ) . (31) The conclusion follo ws by com bining (30), and (31) which leads to the bound P ( ˆ µ ( q ) / ∈ [ γ ( r − ) − ϵ ℓ , γ ( r + ) + ϵ ℓ ]) ≤ 2 exp ( − κ d,u N ℓ ) . C.3 A result on Sequential Halving Algorithm by Zhao et al. [2023] Consider a K -armed bandit problem with Bernoulli rew ard with unknown means µ 1 , . . . , µ K . As in the previous subsection, w e write µ (1) ≤ . . . ≤ µ ( K ) for its ordered v alues. Sequen tial halving is a classical elimination scheme for pure-exploration problems [Karnin et al., 2013]. It pro ceeds in at most ⌈ log 2 K ⌉ phases. Starting from the full set of K candidate arms, eac h phase sp ends appro ximately ⌊ T / log 2 K ⌋ samples b y distributing them uniformly across the surviving arms, then ranks arms b y their empirical means and discards the top half. Since our goal is to identify arms with the smallest mean, we retain the b ottom-ranked half after each phase. This pro cedure is kno wn to b e adaptiv e for simple-regret minimization, in the sense formalized b y Theorem C.5 b elo w. Theorem C.5. [F r om Zhao et al. [2023]] Consider the Algorithm SH with inputs T and K . The output I T satisﬁes for any ϵ > 0 and m ∈ [ K ] : P  µ I T ≥ µ ( m ) + ϵ  ≤ exp  − c mϵ 2 T K log 3 ( K )  , wher e c is a p ositive absolute c onstant. D Guaran tees on FB CWI pro cedure (Algorithm 1) In order to structure the proofs, in this section w e presen t guarantees ab out the output of Algorithm 1 when fed with input ( δ, T ) . More precisely the output being ( ϕ 1 ∨ ϕ 2 , I ) , here w e pro vide upp er b ounds on the probability of misidentiﬁcation error for the arm candidate I . This corresp onds to the typical kind of guaran tees encountered in context of b est arm identiﬁcation in the ﬁxed budget framew ork. In turn, w e apply these results in Section E to prov e the guaran tees presen ted in Theorem 2.2. D.1 First Upp er Bound Theorem D.1. The output of FB CWI (Algorithm 1) with input T satisﬁes: P ( ψ T  = i ∗ ) ≤ 27 K log( K ) log( T ) · exp  − c · T log( T ) log( K ) H cw  , 27 wher e c is a numeric al c onstant, and we r e c al l that H cw is deﬁne d by H cw = X i  = i ∗ 1 ∆ 2 i ∗ ,i , if ∆ i ∗ ,i > 0 for al l i ∈ [ K ] \ { i ∗ } and H cw = + ∞ otherwise. Notation: Let ∆ ( k ) ∈ [ − 1 / 2 , 1 / 2] | A k |×| A k | denote the sub-matrix of ∆ restricted to rows and columns in A k . F or α ∈ A k , let  ∆ ( k ) α, ( i )  i ∈{ 1 ,..., | A k |− 1 } denote the ordered gaps b etw een α and arms in A k \ { α } such that: ∆ ( k ) α, (1) ≤ · · · ≤ ∆ ( k ) α, ( | A k |− 1) . Since the gaps sub-matrix for arms in A k is skew-symmetric (i.e., ∀ i, j : ∆ ( k ) i,j = − ∆ ( k ) j,i ), the n umber of arms such that ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 is at least ⌈| A k | / 4 ⌉ (see Lemma H.6). Let E k ⊂ A k denote the last set of arms: E k := n α ∈ A k : ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 o . Finally , w e remind the reader that for any j ∈ [ K ] , the quan tities ∆ j, (1) ≤ · · · ≤ ∆ j, ( K − 1) corresp ond to the ordered gaps b et ween j and all arms in [ K ] \ { j } . Pro of of Theorem D.1. Supp ose that T ≥ 8 K log 8 / 7 ( K ) , otherwise the b ound is v acuous. Assume that ∆ i ∗ ,i > 0 for all i  = i ∗ . Otherwise, if ∆ i ∗ ,j = 0 for some j  = i ∗ , then H cw = + ∞ and the stated b ound is trivial. W e start b y b ounding the probability of the ev ent ψ T  = i ∗ b y the probabilities that i ∗ gets eliminated at some step k . Since i ∗ ∈ A 1 = [ K ] , the ev ent { ψ T  = i ∗ } implies that there exists a round k ∈ { 1 , . . . , k max − 1 } suc h that i ∗ ∈ A k but i ∗ / ∈ A k +1 . Hence, P ( ψ T  = i ∗ ) = P  k max − 1 [ k =1 { i ∗ ∈ A k , i ∗ / ∈ A k +1 }  ≤ k max − 1 X k =1 P ( i ∗ ∈ A k , i ∗ / ∈ A k +1 ) ≤ k max · max k ∈{ 1 ,...,k max − 1 } P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) . (32) Recall that k max ≤ ⌈ log 8 / 7 ( K ) ⌉ . Next, we upp er-b ound P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) . Fix k ∈ { 1 , . . . , k max − 1 } and condition on { i ∗ ∈ A k } . If i ∗ / ∈ A k +1 , then by the deﬁnition of the next set (keeping only the top fraction), the n umber of arms in A k with score smaller than S k ( i ∗ ) is at most ⌈| A k | / 8 ⌉ . Equiv alen tly , at least | A k | − ⌈| A k | / 8 ⌉ arms in A k ha ve score at least S k ( i ∗ ) . By Lemma H.7 applied to the skew-symmetric matrix ∆ ( k ) , we conclude that if | A k | ≥ 3 then the in tersection b etw een E k and A k +1 (whic h hav e a size of | A k | − ⌈| A k | / 8 ⌉ ) is non-empt y , therefore ∃ α ∈ E k : S k ( α ) ≥ S k ( i ∗ ) . Otherwise, if | A k | = 2 and i ∗ ∈ A k , w e necessarily hav e E k = A k \ { i ∗ } . W e conclude that P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ P  ∃ α ∈ E k : S k ( α ) ≥ S k ( i ∗ )  ≤ P  S k ( i ∗ ) ≤ 1 2 ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ )  | {z } T erm 1 + P  ∃ α ∈ E k : S k ( α ) ≥ 1 2 ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ )  | {z } T erm 2 . 28 Denote ∆ k : = ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) . By Lemma D.2, T erms 1 and 2 satisfy P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ ( K + 2 log ( T )) exp  − c ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  + K log ( T ) exp  − c ′ ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  ≤ 3 K log( T ) exp  − c 1 ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  , (33) where c 1 = min { c, c ′ } . Next, w e develop a bound on ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) using H cw = P i  = i ∗ 1 ∆ 2 i,i ∗ . Recall B k = j T | A k | log 8 / 7 ( K ) k . W e ha ve ∆ 2 k B k = ∆ 2 k $ T | A k | log 8 / 7 ( K ) % ≥ ∆ 2 k · T 2 | A k | log 8 / 7 ( K ) ≥ ∆ 2 k ⌈| A k | / 8 ⌉ · T 16 log 8 / 7 ( K ) ≥ T 16 log 8 / 7 ( K ) · P i  = i ∗ 1 ∆ 2 i ∗ ,i = T 16 log 8 / 7 ( K ) H cw , where w e used in the second line the fact that T ≥ 8 K log 8 / 7 ( K ) and Lemma H.2 in the last line (using ∆ k : = ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) ). Plugging this into (33) yields P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ 3 K log( T ) · exp − c 1 T 16 log 8 / 7 ( K ) log( ⌈ B k / 2 ⌉ ) H cw ! ≤ 3 K log( T ) exp  − c ′ 1 T log( K ) log ( T ) H cw  , (34) for n umerical constants c 1 , c ′ 1 > 0 (using log 8 / 7 ( K ) = Θ(log K ) and log( ⌈ B k / 2 ⌉ ) ≤ log ( T ) ). Finally , w e combine the bounds (32) and (34), and use k max ≤ ⌈ log 8 / 7 ( K ) ⌉ , we get P ( ψ T  = i ∗ ) ≤ ⌈ log 8 / 7 ( K ) ⌉ · 3 K log( T ) exp  − c ′ 1 T log( K ) log ( T ) H cw  , whic h yields the claimed form P ( ψ T  = i ∗ ) ≤ 27 K log( K ) log( T ) exp  − c T log( T ) log( K ) H cw  , for a n umerical constant c > 0 . It remains to pro ve the follo wing technical lemma. Lemma D.2. Consider step k in Algorithm 1 and assume that i ∗ ∈ A k . L et ∆ k : = ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) , then P  S k ( i ∗ ) ≤ 1 2 ∆ k  ≤ ( K + 2 log ( T )) exp  − c ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  , P  ∃ α ∈ E k : S k ( α ) ≥ 1 2 ∆ k  ≤ K log( T ) exp  − c ′ ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  , for numeric al c onstants c, c ′ > 0 . 29 Pr o of. Assume T ≥ 8 K log 8 / 7 ( K ) , this guarantees B k = j T | A k | log 8 / 7 ( K ) k ≥ 8 . Let c 1 denote the constan t from Corollary C.2. Pro of of the ﬁrst b ound: Let i ∗ s b e the strong opp onent c hosen for i ∗ at step k of Algorithm 1. Recall S k ( i ∗ ) = min { Z ( s ) k ( i ∗ ) , 0 } + Z ( w ) k ( i ∗ ) . Therefore, w e hav e P  S k ( i ∗ ) ≤ 1 2 ∆ k  ≤ P  Z ( s ) k ( i ∗ ) + Z ( w ) k ( i ∗ ) ≤ 1 2 ∆ k  + P  Z ( w ) k ( i ∗ ) ≤ 1 2 ∆ k  . (35) Recall that the ev ent n Z ( s ) k ( i ∗ ) + Z ( w ) k ( i ∗ ) ≤ 1 2 ∆ k o implies that  Z ( s ) k ( i ∗ ) ≤ − 1 4 ∆ k or Z ( w ) k ( i ∗ ) ≤ 3 4 ∆ k  , Com bining with Inequality 35 w e obtain P  S k ( i ∗ ) ≤ 1 2 ∆ k  ≤ P  Z ( s ) k ( i ∗ ) ≤ − 1 4 ∆ k  + 2 P  Z ( w ) k ( i ∗ ) ≤ 3 4 ∆ k  . (36) W e use Hoeﬀding’s inequality (Lemma H.10) to bound the ﬁrst term in the upp er b ound ab o ve. F or an y ﬁxed i ∈ [ K ] \ { i ∗ } and ϵ > 0 , w e hav e P  ˆ ∆ i ∗ ,i − ∆ i ∗ ,i ≤ − ϵ  ≤ exp  − ϵ 2 B k 2  , where ˆ ∆ i ∗ ,i is the empirical mean of duels b etw een ( i ∗ , i ) computed using ⌈ B k / 4 ⌉ samples. Therefore, applying the b ound ab o ve with ϵ = ∆ k / 4 and a union b ound o ver the arms, w e hav e P  Z ( s ) k ( i ∗ ) ≤ − 1 4 ∆ k  ≤ P  Z ( s ) k ( i ∗ ) − ∆ i ∗ ,i ∗ s ≤ − 1 4 ∆ k  ≤ ( K − 1) exp  − ∆ 2 k 32 B k  . (37) where w e used the fact that ∆ i ∗ ,j ≥ 0 for all j ∈ [ K ] . Now, using Corollary C.2 whic h giv es a guaran tee on the output Z ( w ) k ( i ∗ ) , w e hav e P  Z ( w ) k ( i ∗ ) ≤ 3 4 ∆ k  = P  Z ( w ) k ( i ∗ ) ≤ ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) − 1 4 ∆ k  ≤ log  B k 2  exp  − c 1 ∆ 2 k 32 log( ⌈ B k / 2 ⌉ ) B k  . (38) W e conclude b y combining (37), (38) and (36) that P  S k ( i ∗ ) ≤ 1 2 ∆ k  ≤ ( K − 1 + 2 log ( T )) exp  − c ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  , where c is a n umerical constant. Pro of of the second b ound: Fix α ∈ E k . By deﬁnition of E k , w e hav e ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 . 30 Moreo ver, min { Z ( s ) k ( α ) , 0 } ≤ 0 , hence P  S k ( α ) ≥ 1 2 ∆ k  = P  min { Z ( s ) k ( α ) , 0 } + Z ( w ) k ( α ) ≥ 1 2 ∆ k  ≤ P  Z ( w ) k ( α ) ≥ 1 2 ∆ k  ≤ P  Z ( w ) k ( α ) ≥ ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) + 1 2 ∆ k  , where the last step uses ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 . Applying Corollary C.2 to Z ( w ) k ( α ) then yields P  S k ( α ) ≥ 1 2 ∆ k  ≤ log  B k 2  exp  − c 1 ∆ 2 k 8 log( ⌈ B k / 2 ⌉ ) B k  . (39) Finally , union b ound o ver α ∈ E k giv es P  ∃ α ∈ E k : S k ( α ) ≥ 1 2 ∆ k  ≤ | E k | log  B k 2  exp  − c ′ ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  , W e ma y b ound | E k | b y K ; hence the abov e yields the stated form P  ∃ α ∈ E k : S k ( α ) ≥ 1 2 ∆ k  ≤ K log( T ) exp  − c ′ ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k  , for a n umerical constant c ′ > 0 . This pro v es the second inequality and concludes the lemma. D.2 Second Upp er Bound F or eac h i  = i ∗ , let ∆ i, ( k ) denote the ordered gaps (∆ i,j ) i  = j as ∆ i, (1) ≤ · · · ≤ ∆ i, ( K − 1) , Denote by K i ; < 0 the num b er of j suc h that ∆ i,j < 0 . F or each i ∈ [ K ] , let s i ≤ K i ; < 0 , and s = ( s 1 , . . . , s K ) . Here, we take the con ven tion K i ∗ ; < 0 = 0 . W e recall the expressions of the quan tities H certify ( s ) , H (0) explore ( s ) and H (1) explore ( s ) H certify ( s ) = X i  = i ∗ 1 ∆ 2 i, ( s i ) , H (1) explore ( s ) = max i  = i ∗ K s i ∆ 2 i, ( s i ) and H (0) explore ( s ) = X i  = i ∗ K s i ∆ 2 i, ( s i ) . Theorem D.3. F or any s such that 1 ≤ s i ≤ K i ; < 0 , it holds that P ( ψ T  = i ∗ ) ≤ 47 K log( K ) log( T ) exp − c 1 log 3 ( K ) log( T ) T − c 2 log 5 ( H (0) explor e ( s )) · H (0) explor e ( s ) H (1) explor e ( s ) + H c ertify ( s ) ! , wher e c 1 and c 2 ar e numeric al c onstants. W e restate and extend the notation in tro duced in the last section. Notation: Let ∆ ( k ) ∈ [ − 1 / 2 , 1 / 2] | A k |×| A k | denote the sub-matrix of ∆ restricted to lines and ro ws in A k . F or α ∈ A k , let  ∆ ( k ) α, ( i )  i ∈{ 1 ,..., | A k |− 1 } denote the ordered gaps b etw een α and arms in A k suc h that: ∆ ( k ) α, (1) ≤ · · · ≤ ∆ ( k ) α, ( | A k |− 1) . 31 Recall that since the gaps sub-matrix for arms in A k is skew-symmetric (i.e., ∀ i, j : ∆ ( k ) i,j = − ∆ ( k ) j,i ), the n um b er of arms such that ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 is at least ⌈| A k | / 4 ⌉ (see Lemma H.6). Let E k ⊂ A k denote the last set of arms E k := n α ∈ A k : ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 o . W e rank the quantities (∆ α, ( s α ) ) α ∈ E k . W e denote the rank ed sequence with ties brok en arbitrarily (∆ E k : i ) i ∈ [ | E k | ] ∆ E k :1 ≤ · · · ≤ ∆ E k : | E k | . Deﬁne the quan tity ¯ ∆ k b y ¯ ∆ k := ∆ E k : ⌈ 7 8 | E k | ⌉ ≤ 0 . (40) Observ e that when ¯ ∆ k = 0 , we necessarily hav e ∆ i, ( s i ) = 0 for some i ∈ [ K ] \ { i ∗ } , this implies in particular that H certify = ∞ and the bound becomes loose. Therefore, in the remainder of this pro of, w e assume that ¯ ∆ k < 0 . Let F k denote the subset of arms in E k suc h that ∆ α, ( s α ) ≤ ¯ ∆ k . F k :=  α ∈ E k : ∆ α, ( s α ) ≤ ¯ ∆ k  . (41) Finally , w e denote for eac h i  = i ∗ : Γ i := s i ∆ 2 i, ( s i ) , and let (Γ ( i ) ) i  = i ∗ corresp ond to the rank ed quan tities Γ (2) ≤ · · · ≤ Γ ( K ) , with ties brok en arbitrarily . Pro of of Theorem D.3. Fix s suc h that 1 ≤ s i ≤ K i ; < 0 for all i  = i ∗ (and K i ∗ ; < 0 = 0 b y con ven tion). Note that by the assumption of the uniqueness of the Condorcet winner we hav e K i ; < 0 ≥ 1 for any i  = i ∗ . Let c > 0 b e a numerical constan t (chosen smaller than the constan ts app earing in Corollary C.2 and Theorem C.5). W e assume that T ≥ 8 K log 8 / 7 ( K ) , otherwise the b ound of the theorem is v acuous. Similar to the proof of Theorem D.1, we start b y b ounding the probabilit y of the even t ψ T  = i ∗ b y the probabilities that i ∗ gets eliminated at some step k . W e hav e P ( ψ T  = i ∗ ) ≤ k max − 1 X k =1 P ( i ∗ / ∈ A k +1 , i ∗ ∈ A k ) ≤ k max · max k ≤ k max − 1 P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) . (42) Recall k max ≤ ⌈ log 8 / 7 ( K ) ⌉ . Fix k ∈ { 1 , . . . , k max − 1 } , w e will ﬁrst consider the case where | A k | ≥ 3 . The case where | A k | = 2 is simple and is left to the end of this pro of. Next, w e build the argument of our pro of on the observ ation that given i ∗ ∈ A k , the ev en t i ∗ / ∈ A k +1 implies in particular that the num b er of arms α ∈ A k with a score S k ( α ) larger than S k ( i ∗ ) is at least | A k | − ⌈| A k | / 8 ⌉ . Therefore, the ev en t i ∗ / ∈ A k +1 implies that the num b er of arms in F k with a score S k ( · ) larger than S k ( i ∗ ) is at least | A k +1 ∩ F k | ≥ ⌈| F k | / 3 ⌉ as stated in the follo wing lemma Lemma D.4. L et k ∈ { 1 , . . . , k max − 1 } , r e c al l the deﬁnition of F k given in (41) . W e have, if | A k | ≥ 3 then | A k +1 ∩ F k | ≥  1 3 | F k |  . This lemma implies that if i ∗ is eliminated at step k (i.e., i ∗ / ∈ A k +1 ), then many “bad” arms in F k b eat i ∗ . More precisely , since A k +1 consists of the top-scoring arms at step k , ev ery α ∈ A k +1 satisﬁes S k ( α ) ≥ S k ( i ∗ ) whenev er i ∗ / ∈ A k +1 . Therefore, { i ∗ / ∈ A k +1 } ⊆    { α ∈ F k : S k ( α ) ≥ S k ( i ∗ ) }   ≥  1 3 | F k |  . 32 In tro duce the threshold 1 2 ¯ ∆ k deﬁned b y (40) and split P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ P    { α ∈ F k : S k ( α ) ≥ S k ( i ∗ ) }   ≥  1 3 | F k |  ≤ P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k  | {z } T erm 1 + P    { α ∈ F k : S k ( α ) ≥ 1 2 ¯ ∆ k }   ≥  1 3 | F k |  | {z } T erm 2 . The follo wing lemma is a key step in the proof, we postp oned its pro of to the next subsection. Lemma D.5. Under the assumptions of The or em D.3, c onsider step k in A l gorithm 1. Then, we have P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k | i ∗ ∈ A k  ≤ ( K + log( T )) exp  − c ¯ ∆ 2 k log( B k ) B k  P  |{ α ∈ F k : 1 2 ¯ ∆ k ≤ S k ( α ) }| ≥ l 1 3 | F k | m ≤ exp − c log 3 ( K ) log ( T ) · T − c ′ log 5 ( H (0) explor e ( s )) · H (0) explor e ( s ) H (1) explor e ( s ) ! , wher e c and c ′ ar e p ositives numeric al c onstants. A direct application of the lemma ab o ve giv es (for numerical constants c, c ′ > 0 ) P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ ( K + log ( T )) exp − c ¯ ∆ 2 k log( B k ) B k ! + exp − c log 3 ( K ) log( T ) · T − c ′ log 5 ( H (0) explore ( s )) H (0) explore ( s ) H (1) explore ( s ) ! . (43) Next, w e will conv ert the dep endence of the b ound on ¯ ∆ k in to H certify ( s ) . Recall | E k | ≥ ⌈| A k | / 4 ⌉ . Since ¯ ∆ k = ∆ E k : ⌈ 7 8 | E k |⌉ , and the sequence (∆ E k : i ) i is non-decreasing and non-positive, the squared s equence (∆ 2 E k : i ) i is non-increasing. Applying Lemma H.2 to (∆ 2 E k : i ) i ∈ [ | E k | ] yields | A k | · 1 ¯ ∆ 2 k ≤ 4 | E k | · 1 ¯ ∆ 2 k ≤ 32 ·  | E k | 8  · 1 ∆ 2 E k : ⌈ 7 8 | E k |⌉ ≤ 32 X α ∈ E k 1 ∆ 2 α, ( s α ) ≤ 32 H certify ( s ) . (44) Using B k = j T | A k | log 8 / 7 ( K ) k ≥ T 2 | A k | log 8 / 7 ( K ) (and log( B k ) ≤ log ( T ) ), w e obtain ( K + log( T )) exp − c ¯ ∆ 2 k log( B k ) B k ! ≤ 2( K + log ( T )) exp  − c ′ T H certify ( s ) log( T ) log ( K )  , (45) for a n umerical constant c ′ > 0 (absorbing log 8 / 7 ( K ) = Θ(log K ) in to constants). Next combine (43) and (45) , and using exp ( − a ) + exp ( − b ) ≤ 2 exp ( − min { a, b } ) , w e get that if | A k | ≥ 3 , w e hav e P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ 3( K +log ( T )) exp − c log 3 ( K ) log( T ) · T − c 2 log 5 ( H (0) explore ( s )) H (0) explore ( s ) H (1) explore ( s ) + H certify ( s ) ! , 33 for a n umerical constant c 2 > 0 (renaming constan ts). T o conclude we need to consider the edge case where | A k | = 2 (last iteration). In this case we ha ve | E k | = | F k | =: { α } . Therefore, P ( i ∗ / ∈ A k +1 | i ∗ ∈ A k ) ≤ P ( S k ( α ) ≥ S k ( i ∗ )) ≤ P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k  + P  S k ( α ) ≥ 1 2 ¯ ∆ k  . The ﬁrst term in the upper b ound can b e b ounded using (D.5) , the second term can be b ounded using Lemma D.6. The resulting b ound is smaller than the one obtained when | A k | ≥ 3 . Finally , using (42) and k max ≤ ⌈ log 8 / 7 ( K ) ⌉ , and absorbing ⌈ log 8 / 7 ( K ) ⌉ and additive logarithms in to the prefactor, we obtain P ( ψ T  = i ∗ ) ≤ 47 K log( K ) log( T ) exp − c 1 log 3 ( K ) log( T ) · T − c 2 log 5 ( H (0) explore ( s )) H (0) explore ( s ) H (1) explore ( s ) + H certify ( s ) ! , whic h is the claim of Theorem D.3. D.3 Pro ofs of T ec hnical Lemmas D.3.1 Pro of of Lemma D.4 Pr o of. Recall ¯ ∆ k = ∆ E k : ⌈ 7 8 | E k |⌉ and F k = { α ∈ E k : ∆ α, ( s α ) ≤ ¯ ∆ k } , hence | F k | ≥ l 7 8 | E k | m . (46) Algorithm 1 k eeps | A k +1 | = | A k | − ⌈| A k | / 8 ⌉ arms, so for an y A k +1 ⊆ A k and F k ⊆ A k , | A k +1 ∩ F k | ≥ | A k +1 | + | F k | − | A k | = | F k | −  | A k | 8  . (47) Case 1: | A k | ≥ 5 . By Lemma H.6, | E k | ≥ ⌈| A k | / 4 ⌉ ≥ 2 , hence ⌈| A k | / 8 ⌉ ≤ ⌈| E k | / 2 ⌉ . Moreo ver, for ev ery integer m ≥ 2 , 2 l 7 m 8 m ≥ 3 l m 2 m , (48) (whic h follows b y writing m = 8 q + r and c hecking r ∈ { 0 , . . . , 7 } ; the only delicate residue r = 1 is harmless since then q ≥ 1 ). Applying (48) with m = | E k | and using (46) giv es ⌊ 2 3 | F k |⌋ ≥ ⌈| E k | / 2 ⌉ ≥ ⌈| A k | / 8 ⌉ . Plugging into (47) yields | A k +1 ∩ F k | ≥ | F k | − j 2 3 | F k | k = l 1 3 | F k | m . Case 2: | A k | ∈ { 3 , 4 } . Here ⌈| A k | / 8 ⌉ = 1 . By skew-symmetry of ∆ ( k ) , at most one row can ha v e all oﬀ-diagonal entries > 0 , hence at least | A k | − 1 rows ha ve ∆ ( k ) α, (1) ≤ 0 , so | E k | ≥ | A k | − 1 ∈ { 2 , 3 } . Then ⌈ 7 8 | E k |⌉ = | E k | , and since F k ⊆ E k , (46) implies | F k | = | E k | ≥ | A k | − 1 . Using (47), | A k +1 ∩ F k | ≥ | F k | − 1 , and for | F k | ∈ { 2 , 3 } this satisﬁes | F k | − 1 ≥ ⌈| F k | / 3 ⌉ . Com bining b oth cases prov es that for every | A k | ≥ 3 , | A k +1 ∩ F k | ≥ l 1 3 | F k | m . 34 D.3.2 Pro of of Lemma D.5 Pr o of. Fix a round k ∈ { 1 , . . . , k max − 1 } and recall ¯ ∆ k ≤ 0 by construction. Pr o of of the ﬁrst b ound P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k  ≤ ( K + log ( T )) exp − c ¯ ∆ 2 k log( B k ) B k ! . If ¯ ∆ k = 0 the bound is immediate. Assume ¯ ∆ k < 0 . Recall S k ( i ∗ ) = min { Z ( s ) k ( i ∗ ) , 0 } + Z ( w ) k ( i ∗ ) . Then P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k  ≤ P  Z ( s ) k ( i ∗ ) + Z ( w ) k ( i ∗ ) ≤ 1 2 ¯ ∆ k  + P  Z ( w ) k ( i ∗ ) ≤ 1 2 ¯ ∆ k  ≤ P  Z ( s ) k ( i ∗ ) ≤ 1 4 ¯ ∆ k  + 2 P  Z ( w ) k ( i ∗ ) ≤ 1 4 ¯ ∆ k  , where the last step uses ¯ ∆ k < 0 , hence { Z ( w ) k ( i ∗ ) ≤ 1 2 ¯ ∆ k } ⊆ { Z ( w ) k ( i ∗ ) ≤ 1 4 ¯ ∆ k } . The ﬁrst term in the last upp er-b ound is b ounded using Ho eﬀding and a union b ound ov er the opp onen t choice, P  Z ( s ) k ( i ∗ ) ≤ 1 4 ¯ ∆ k  ≤ ( K − 1) exp − ¯ ∆ 2 k 32 B k ! . The second term is b ounded b y Corollary C.2, P  Z ( w ) k ( i ∗ ) ≤ 1 4 ¯ ∆ k  ≤ P  Z ( w ) k ( i ∗ ) ≤ ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) + 1 4 ¯ ∆ k  ≤ log ( T ) exp − c ¯ ∆ 2 k log( ⌈ B k / 2 ⌉ ) B k ! . Finally , absorbing log( ⌈ B k / 2 ⌉ ) in to log ( B k ) and constan ts yields P  S k ( i ∗ ) ≤ 1 2 ¯ ∆ k  ≤ ( K + log ( T )) exp − c ¯ ∆ 2 k log( B k ) B k ! . Pr o of of the se c ond b ound P  |{ α ∈ F k : S k ( α ) ≥ 1 2 ¯ ∆ k }| ≥ ⌈| F k | / 3 ⌉  ≤ exp − c log 3 ( K ) log ( T ) · T − c ′ log 5 ( H (0) explore ( s )) · H (0) explore ( s ) H (1) explore ( s ) ! . W e start from the indicator-sum form P  |{ α ∈ F k : S k ( α ) ≥ 1 2 ¯ ∆ k }| ≥  | F k | 3  = P X α ∈ F k 1  S k ( α ) ≥ 1 2 ¯ ∆ k  ≥  | F k | 3  ! . (49) Next, we keep only the hardest 3 / 4 of F k . More formally , we rank Γ α = s α ∆ 2 α, ( s α ) o ver α ∈ F k as Γ F k :1 ≤ · · · ≤ Γ F k : | F k | , and let F (3 / 4) k b e the subset containing the top ⌈ 3 | F k | / 4 ⌉ arms with largest Γ α . Then | F k \ F (3 / 4) k | = ⌊| F k | / 4 ⌋ , so ( X α ∈ F k 1 ( S k ( α ) ≥ 1 2 ¯ ∆ k ) ≥  | F k | 3  ) ⊆      X α ∈ F (3 / 4) k 1 ( S k ( α ) ≥ 1 2 ¯ ∆ k ) ≥  | F k | 12       . 35 Let us dev elop a uniform p er-arm b ound on F (3 / 4) k . Lemma D.6 b elow giv es suc h a b ound Lemma D.6. L et α ∈ F (3 / 4) k . W e have P  S k ( α ) ≥ 1 2 ¯ ∆ k  ≤ (log ( T ) + K ) exp   − c ” T log 3 ( K ) log( T ) P i ∈ E k K s i ∆ 2 i, ( s i )   , (50) wher e c ” is a p ositive numeric al c onstant. Mor e over, if T ≥ c ′ H (0) explore ( s ) log 5  H (0) explore ( s )  , wher e c ′ : = 10 3 ∨ 960 c ” log 2 ( 960 c ” ) , we have P  S k ( α ) ≥ 1 2 ¯ ∆ k  ≤ 1 18 . In the remainder of this proof we assume that the condition T ≥ c ′ H (0) explore ( s ) log 5  H (0) explore ( s )  is satisﬁed, otherwise the upper b ound stated by the theorem is greater than 1 and is th us v acuous. Denote b y p k the b ound giv en by the lemma abov e p k : = 1 18 ∧ (log( T ) + K ) exp   − c ” T log 3 ( K ) log( T ) P i ∈ E k K s i ∆ 2 i, ( s i )   . W e use Lemma H.9, whic h is purely tec hnical and deferred to Section H, to obtain the following upp er b ound p k ≤ exp   − ¯ c 1 · T log 3 ( K ) log( T ) P i ∈ E k K s i ∆ 2 i, ( s i )   , (51) where ¯ c 1 is a numerical constan t dep ending only on c ” . Therefore, b y indep endence across arms in the construction of S k ( · ) since the algorithm uses indep enden t fresh samples p er arm, X α ∈ F (3 / 4) k 1  S k ( α ) ≥ 1 2 ¯ ∆ k  is sto c hastically dominated by M k ∼ Bin  3 4 | F k |  , p k  . Consequen tly , using the fact that p k ≤ 1 18 implies l | F k | 12 m − p k  3 4 | F k |  ≥ | F k | 24 w e hav e P    X α ∈ F (3 / 4) k 1  S k ( α ) ≥ 1 2 ¯ ∆ k  ≥  | F k | 12     ≤ P  M k ≥  | F k | 12  ≤ P  M k − E [ M k ] ≥ | F k | 24  . (52) Next, w e use Lemma H.12 whic h pro vides a deviation b ound for binomial v ariables in regimes where the parameters can b e small. Recall that M k is a binomial distribution with parameters ( p k , ⌈ 3 | F k | / 4 ⌉ ) . W e hav e P  M k − E [ M k ] ≥ | F k | 24  ≤ exp  − | F k | 864 ϕ ( p k )  , (53) where ϕ is the function deﬁned in Lemma H.1. Since we hav e prov ed that p k ≤ 1 18 , the expression of ϕ ( p k ) is therefore giv en by ϕ ( p k ) = 1 2 − p k log(1 − p k ) − log( p k ) . 36 Since the function ϕ is increasing on (0 , 1 / 2) and, since b y Lemma H.5 we ha ve, for any y > 0 , 0 < 1 2 − exp( − y ) log(1 − exp( − y )) − log (exp( − y )) ≤ 1 2 y , w e conclude using the b ound (51) that 1 ϕ ( p k ) ≥ 2 ¯ c 1 320 log( T ) log 3 ( K ) · T P i ∈ E k K s i ∆ 2 i, ( s i ) . Using the b ound ab o ve with (53) and | F k | ≥ 7 8 | E k | , w e hav e P  M k − E [ M k ] ≥ | F k | 24  ≤ exp   − | F k | 864 · ¯ c 1 320 log( T ) log 3 ( K ) · T P i ∈ E k K s i ∆ 2 i, ( s i )   ≤ exp   − ¯ c 2 · T | E k | log 3 ( K ) log( T ) P i ∈ E k K s i ∆ 2 i, ( s i )   , where ¯ c 2 is a n umerical constant. W e then use the fact that | E k | P i ∈ E k K s i ∆ 2 i, ( s i ) ≥ 1 max i  = i ∗ K s i ∆ 2 i, ( s i ) = 1 H (1) explore ( s ) . Plugging these t wo relations in to (52) then (49) yields for a numerical constan t ¯ c 3 P  |{ α ∈ F k : S k ( α ) ≥ 1 2 ¯ ∆ k }| ≥  | F k | 3  ≤ exp − ¯ c 3 · T log 3 ( K ) log( T ) H (1) explore ( s ) ! , as soon as T ≥ c ′ H (0) explore ( s ) log 5 ( H (0) explore ( s )) . Reintroducing the shift (to co ver smaller T ) giv es the stated b ound in Lemma D.5. Pr o of. of Lemma D.6 . Fix α ∈ F (3 / 4) k , let α ( s ) denote the strong opp onent chosen for α . W e ha ve P  S k ( α ) ≥ 1 2 ¯ ∆ k  = P  min { Z ( s ) k ( α ) , 0 } + Z ( w ) k ( α ) ≥ 1 2 ¯ ∆ k  ≤ P  Z ( s ) k ( α ) + Z ( w ) k ( α ) ≥ 1 2 ¯ ∆ k  ≤ P  Z ( s ) k ( α ) − ∆ α,α ( s ) ≥ − 1 4 ¯ ∆ k  + P  ∆ α,α ( s ) ≥ 7 8 ¯ ∆ k  + P  Z ( w ) k ( α ) ≥ − 1 8 ¯ ∆ k  . (54) Using Hoeﬀding’s concentration inequality with a union b ound ov er the p ossible c hoices of α ( s ) , w e ha ve P  Z ( s ) k ( α ) − ∆ α,α ( s ) ≥ − 1 4 ¯ ∆ k  ≤ ( K − 1) exp − ¯ ∆ 2 k 32 B k ! . (55) 37 Since α ∈ F k ⊂ E k , w e hav e ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 . Therefore, by Corollary C.2, w e get P  Z ( w ) k ( α ) ≥ − 1 8 ¯ ∆ k  ≤ P  Z ( w ) k ( α ) ≥ ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) − 1 8 ¯ ∆ k  ≤ log ( T ) exp − c · ¯ ∆ 2 k 64 log( ⌈ B k / 2 ⌉ )  B k 2  ! ≤ log ( T ) exp − c · ¯ ∆ 2 k 128 log( B k ) B k ! . (56) Since α ∈ F k , b y deﬁnition of F k giv en in (41), we ha ve ∆ α, ( s α ) ≤ ¯ ∆ k ≤ 0 . Therefore, P  ∆ α,α ( s ) ≥ 7 8 ¯ ∆ k  ≤ P  ∆ α,α ( s ) ≥ 7 8 ∆ α, ( s α )  . Then using Theorem C.5 w e hav e P  ∆ α,α ( s ) ≥ 7 8 ¯ ∆ k  ≤ P  ∆ α,α ( s ) ≥ 7 8 ∆ α, ( s α )  = P  ∆ α,α ( s ) ≥ ∆ α, ( s α ) − 1 8 ∆ α, ( s α )  ≤ exp − c · s α ∆ 2 α, ( s α ) 64 K log 3 ( K )  B k 4  ! ≤ exp − c 256 · s α ∆ 2 α, ( s α ) K log 3 ( K ) B k ! . (57) W e conclude b y plugging the b ounds (55), (56) and (57) into (54) P  S k ( α ) ≥ 1 2 ¯ ∆ k  ≤ ( K + log ( T )) exp − c ′ min ( Γ α K log 3 ( K ) , ¯ ∆ 2 k log( B k ) ) B k ! , where c ′ := c 1024 . Now it remains to pro v e that min ( Γ F k : ⌈| F k | / 4 ⌉ K log 3 ( K ) , ¯ ∆ 2 k log( B k ) ) B k ≥ c ′ T log 3 ( K ) log( T ) P i ∈ E k K Γ i . Recall Lemma H.2 giv es l | F k | 4 m Γ F k : ⌈ 1 4 | F k | ⌉ ≤ X i ∈ F k 1 Γ i ≤ X i ∈ E k 1 Γ i . Therefore, | F k | 4 P i ∈ E k 1 Γ i ≤ Γ F k : ⌈ 1 4 | F k | ⌉ . Hence, using the b ound ab o ve and the deﬁnition of B k w e obtain Γ F k : ⌈ 1 4 | F k | ⌉ K log 3 ( K ) B k ≥ | F k | 4 P i ∈ E k 1 Γ i · 1 K log 3 ( K ) · T 2 | A k | log 8 / 7 ( K ) = T 8 log 3 ( K ) log 8 / 7 ( K ) · 1 P i ∈ E k K Γ i · | F k | | A k | ≥ T 138 log 4 ( K ) P i ∈ E k K Γ i · | F k | | A k | , 38 Recall that | F k | ≥  7 8 | E k |  ≥  3 16 | A k |  . Therefore, the b ound ab ov e giv es Γ F k : ⌈ 1 4 | F k | ⌉ K log 3 ( K ) B k ≥ T 736 log 4 ( K ) P i ∈ E k K Γ i . (58) Moreo ver, w e hav e | A k | 1 ¯ ∆ 2 k ≤ 4 | E k | · 1 ¯ ∆ 2 k ≤ 32 ·  1 8 | E k |  1 ∆ 2 E k : ⌈ 7 8 | E k | ⌉ ≤ 32 · X α ∈ E k 1 ∆ 2 α, ( s α ) , (59) where w e used again Lemma H.2 in the second line. Therefore, w e hav e ¯ ∆ 2 k log( B k ) B k ≥ ¯ ∆ 2 k log( B k ) T 2 | A k | log 8 / 7 ( K ) ≥ 1 P α ∈ E k 1 ∆ 2 α, ( s α ) · T 64 log( B k ) log 8 / 7 ( K ) . (60) Therefore, com bining (58) and (60), we get, P  S k ( α ) ≥ 1 2 ¯ ∆ k  ⩽ l exp − c ′ min ( Γ F k : ⌈ 1 4 | F k | ⌉ K log 3 ( K ) , ¯ ∆ 2 k log( B k ) ) B k ! ≤ l exp      − c ′ 736 min          1 log 3 ( K ) X i ∈ E k K Γ i , 1 X α ∈ E k 1 ∆ 2 α, ( s α ) log( T )          T log( K )      ≤ l exp − c ′ 736 T log 3 ( K ) log( T ) P i ∈ E k K Γ i ! , (61) where w e used log ( T ) ≥ log ( K ) in the last line, and the pre-factor is l = (log ( T ) + K ) . F or the remainder of this pro of, w e denote H : = H (0) explore ( s ) . Let us prov e the last claim. Let c ” : = 10 3 ∨ 960 c ′ log 2 ( 960 c ′ ) , which implies that c ′ ≥ 960 log 2 ( c ”) c ” . The function T 7→ ( log ( T ) + K ) exp  − c ′ 736 T H log 3 ( K ) log( T )  is non-increasing on the interv al [ c ” · H log 5 ( H ) , + ∞ ) . Therefore, w e ha ve using (61) P  S k ( α ) ≥ 1 2 ¯ ∆ k  ≤ (log ( c ” H log 5 ( H )) + K ) exp  − c ′ 736 · c ” H log 5 ( H ) H log 3 ( K ) log( c ” H log 5 ( H ))  ≤ (log ( c ” H log 5 ( H )) + H ) exp  − c ′ 736 · c ” log 2 ( H ) log( c ” H log 5 ( H ))  ≤ (log ( c ” H log 5 ( H )) + H ) exp  − 4 log 2 ( c ”) · log 2 ( H ) log( c ” H log 5 ( H ))  . (62) where we used in the second line the facts that H ≥ K min i,j ∆ − 2 i,j ≥ 4 K (since | ∆ i,j | ≤ 1 2 ) and that c ′ ≥ 960 log 2 ( c ”) c ” b y deﬁnition of c ” . Next, w e show that 2 log 2 ( c ”) log 2 ( H ) log( c ” H log 5 ( H )) ≥ log ( c ” H ) , 39 this b ound is deriv ed just by studying the v ariations of a function and using c ” ≥ 10 3 b y deﬁnition and H ≥ 4 K ≥ 8 , the proof is deferred to Lemma H.8 in Section H. Combining the b ound ab ov e with (62), we obtain P  S k ( α ) ≥ 1 2 ¯ ∆ k  ≤ (log ( c ” H log 5 ( H )) + H ) exp ( − 2 log( c ” H )) ≤ log( c ” H log 5 ( H )) + H ( c ” H ) 2 ≤ 1 18 , where w e used 100 log ( c ” H log 5 ( H )) ≤ c ” H 2 and 36 H ≤ ( c ” H ) 2 , given that H ≥ 8 and c ” ≥ 10 3 . E Pro of of Theorem 2.2 This routine, presented in Algorithm 4, serves as one of the tw o certiﬁcation sub-pro cedures in the ﬁxed-conﬁdence algorithm. Given a conﬁdence level δ , a query budget T , and a candidate Condorcet winner I , it sequen tially tests whether one can certify—using at most T comparisons—that all pairwise gaps (∆ I ,i ) i  = I are p ositiv e with probability at least 1 − δ . The budget is allocated uniformly across these gaps, and the procedure terminates as so on as either (i) a negative gap is detected, (ii) all gaps are certiﬁed p ositiv e, or (iii) the budget T is exhausted. Algorithm 4 T est-CW Input : I ∈ [ K ] , δ, T . Initialize : C = [ K ] \ { I } , empirical means ˜ ∆ I ,j = 0 for j ∈ C , coun t v ariable t ← 1 . Let n ← log 2  T 4 K log 8 / 7 ( K )  . N I ,j ← 0 for all j ∈ [ K ] //Query count while C  = ∅ and t ≤ T do Sample duel ( I , j ) for j ∈ argmin j ∈ C N I ,j and up date corresp onding empirical means. N I ,j ← N I ,j + 1 , t ← t + 1 . /* Check the sign of the gaps using concentration */ for j ∈ C do if ˜ ∆ I ,j ≥ r log  K N 2 I ,j n ( n +1) δ  N I ,j then C ← C \ { j } . else if ˜ ∆ I ,j ≤ − r log  K N 2 I ,j n ( n +1) δ  N I ,j then break end if end for end while if C = ∅ then Return T rue else Return F alse end if E.1 Pro of of δ -correctness Let c 0 denote the absolute numerical constan t corresp onding to the one app earing in the upp er b ound of Corollary C.2. Theorem 2.2 states that Algorithm 2 with input δ ∈ (0 , 1) and c ≥ 2 /c 0 , 40 it outputs an arm diﬀerent from the CW with probabilit y at most δ . Let ψ δ denote the output of Algorithm 2 when the input is δ . W e will prov e that P ( ψ δ  = i ∗ ) ≤ δ . T o pro ve this claim, w e in tro duce the follo wing notation. In the n -th iteration (i.e., the n -th call to Algorithm 1), denote by ¯ α ( n ) , ϕ ( n ) 1 , ϕ ( n ) 2 , I ( n ) and T ( n ) the corresp onding v alues of ¯ α, ϕ 1 , ϕ 2 , I and T , and let φ ( n ) : = ϕ ( n ) 1 ∨ ϕ ( n ) 2 . F or conv enience deﬁne, for all n ≥ 1 , δ n : = δ 8 K 2 log 8 / 7 ( K ) log( T ( n ) ) n ( n + 1) . Let S ( n ) k ( · ) denote the score used at round k within the n -th call to Algorithm 1, and let Z ( w,n ) k ( · ) denote its weak component. Recall that each call to Algorithm 1 has at most k max ≤ ⌈ log 8 / 7 ( K ) ⌉ rounds. Finally , in the n -th call to Test-CW (Algorithm 4) with inputs ( I ( n ) , δ, T ( n ) ) , let ˜ ∆ I ( n ) ,j denote the ﬁnal empirical estimate of ∆ I ( n ) ,j for eac h j  = I ( n ) . If ψ δ  = i ∗ , then for some n ≥ 1 the algorithm m ust hav e certiﬁed an incorrect candidate, namely { I ( n )  = i ∗ } and { φ ( n ) = True } . Hence, by a union bound, P ( ψ δ  = i ∗ ) ≤ P  ∃ n ≥ 1 : I ( n )  = i ∗ , φ ( n ) = True  ≤ ∞ X n =1 P  I ( n )  = i ∗ , ϕ ( n ) 1 = True  + ∞ X n =1 P  I ( n )  = i ∗ , ϕ ( n ) 2 = True  . (63) W e ﬁrst bound the contribution of ϕ 1 . On the even t { I ( n )  = i ∗ , ϕ ( n ) 1 = True } , during the n -th run of Algorithm 1 the true Condorcet winner i ∗ m ust hav e b een eliminated at some round k < k max (otherwise the procedure would return I ( n ) = i ∗ ). By the deﬁnition of the selection ¯ α ( n ) and the condition ϕ ( n ) 1 = True , this implies that for some k < k max , S ( n ) k ( i ∗ ) < − s 2 c log( T ( n ) ) log(1 /δ n ) ⌈ B ( n ) k / 4 ⌉ . Using S ( n ) k ( i ∗ ) = min { ˆ ∆ ( k,n ) i ∗ ,u , 0 } + Z ( w,n ) k ( i ∗ ) (for the opponent u queried at that round) and the fact that min { x, 0 } + y < − η implies ( x < − η / 2) or ( y < − η / 2) , we get P  I ( n )  = i ∗ , ϕ ( n ) 1 = True  ≤ X k 0 for all u  = i ∗ , w e can center and apply Hoeﬀding’s inequality: for N = ⌈ B ( n ) k / 4 ⌉ , P ˆ ∆ ( k,n ) i ∗ ,u < − r c log( T ( n ) ) log(1 /δ n ) 2 N ! ≤ P ˆ ∆ ( k,n ) i ∗ ,u − ∆ i ∗ ,u < − r log(1 /δ n ) 2 N ! ≤ exp  − 2 N · log(1 /δ n ) 2 N  ≤ δ n . (65) 41 F or the second term in (64), Corollary C.2 (with constant c 0 ) giv es P Z ( w,n ) k ( i ∗ ) < − s c log ( T ( n ) ) log (1 /δ n ) 2 ⌈ B ( n ) k / 4 ⌉ ! ≤ P Z ( w,n ) k ( i ∗ ) < ∆ ( k ) i ∗ , ( ⌈| A k | / 8 ⌉ ) − s c log ( T ( n ) ) log (1 /δ n ) 2 ⌈ B ( n ) k / 4 ⌉ ! ≤ log  B ( n ) k 2  exp  − c 0 · c log ( T ( n ) ) log (1 /δ n ) 2 log ( ⌈ B ( n ) k / 2 ⌉ )  ≤ log  B ( n ) k 2  δ n , (66) where we used c ≥ 2 /c 0 , and B ( n ) k ⩽ T ( n ) . Plugging (65) and (66) in to (64) , summing ov er k < k max and using log( ⌈ B ( n ) k / 2 ⌉ ) ≤ log ( T ( n ) ) and k max ≤ ⌈ log 8 / 7 ( K ) ⌉ , we obtain ∞ X n =1 P  I ( n )  = i ∗ , ϕ ( n ) 1 = True  ≤ ∞ X n =1  ( k max ( K − 1)) δ n + k max log( T ( n ) ) δ n  ≤ ∞ X n =1 δ 2 n ( n + 1) ≤ δ 2 . (67) W e now b ound the con tribution of ϕ 2 in (63) . On the even t { I ( n )  = i ∗ , ϕ ( n ) 2 = True } , the n -th call to Test-CW returns True although I ( n ) is not the Condorcet winner. In particular, for some N ≥ 1 the test must ha ve accepted the comparison against i ∗ , meaning that ˜ ∆ I ( n ) ,i ∗ > v u u t log  K N 2 n ( n +1) δ  N . Hence, b y a union b ound ov er n ≥ 1 , N ≥ 1 and all i  = i ∗ , and since ∆ i,i ∗ ≤ 0 when i  = i ∗ , ∞ X n =1 P  I ( n )  = i ∗ , ϕ ( n ) 2 = True  ≤ ∞ X n =1 ∞ X N =1 X i  = i ∗ P     ˜ ∆ i,i ∗ > v u u t log  K N 2 n ( n +1) δ  N     ≤ ∞ X n =1 ∞ X N =1 X i  = i ∗ P     ˜ ∆ i,i ∗ − ∆ i,i ∗ > v u u t log  K N 2 n ( n +1) δ  N     ≤ ∞ X n =1 ∞ X N =1 X i  = i ∗ δ 4 K N 2 n ( n + 1) ≤ δ 2 , (68) where the last inequalit y follows from Hoeﬀding’s inequality and P N ≥ 1 1 / N 2 ≤ 2 . Finally , com bining (63) with (67) and (68) yields P ( ψ δ  = i ∗ ) ≤ δ 2 + δ 2 = δ , whic h concludes the pro of. E.2 Pro of of Theorem 2.2 (sample complex ity statemen t) W e build on the guaran tees established for Algorithm 1 (ﬁxed-budget elimination with certiﬁcation) to pro ve the second statemen t of Theorem 2.2. W e use the notation: for each i  = i ∗ , ∆ i, (1) ≤ · · · ≤ 42 ∆ i, ( K − 1) denotes the ordered list of gaps (∆ i,j ) j  = i , and K i ; < 0 :=   { j : ∆ i,j < 0 }   . Fix any vector s = ( s 1 , . . . , s K ) suc h that s i ≤ K i ; < 0 for all i  = i ∗ (and K i ∗ ; < 0 = 0 b y conv ention), and recall H certify ( s ) = X i  = i ∗ 1 ∆ 2 i, ( s i ) , H (1) explore ( s ) = max i  = i ∗ K s i ∆ 2 i, ( s i ) , H (0) explore ( s ) = X i  = i ∗ K s i ∆ 2 i, ( s i ) . Let c 1 b e the numerical constan t in Theorem D.1 and let c 2 , c 3 b e the numerical constan ts in Lemma D.5. Let c 0 b e the numerical constant in Corollary C.2, and assume c ≥ 2 /c 0 as in the statemen t of Theorem 2.2. F or concision, deﬁne G 1 ,δ := 32 c 1 H cw log( K ) log  32 c 1 K H cw δ  log  c − 1 1 H cw log( K /δ )  , G 2 ,δ := 512 c c 2 H certify ( s ) log 3 ( K ) log  32 c c 2 K H (0) explore ( s ) δ  , G 3 ,δ := 32 c c 2 H (1) explore ( s ) log 3 ( K ) log  2 K k max δ  log  32 c c 2 H (1) explore ( s ) log 3 ( K ) log  2 K k max δ  , G 0 := 2 c 3 c 2 H (0) explore ( s ) log 5  2 c 3 c 2 H (0) explore ( s )  . Algorithm 2 doubles the budget parameter T at eac h unsuccessful iteration; therefore, if one can sho w that whenever T ∈ h M δ , 2 M δ i , where M δ := min n G 1 ,δ , G 2 ,δ + G 3 ,δ + G 0 o , (69) the call to Algorithm 1 with inputs ( δ, T , c ) returns ϕ 1 ∨ ϕ 2 = True with probabilit y at least 1 − 6 δ , then it follo ws that the total n umber of queries N δ used by Algorithm 2 is at most a univ ersal constan t m ultiple of M δ with probabilit y at least 1 − 6 δ (since the sum of a doubling sc hedule up to the ﬁrst successful budget is at most 2 times that budget). W e now v erify this success probabilit y for an y T satisfying (69) , distinguishing tw o regimes dep ending on which term attains the minim um. Regime 1: G 1 ,δ ≤ G 2 ,δ + G 3 ,δ + G 0 . Then, w e focus on the regime T ∈ [ G 1 ,δ , 2 G 1 ,δ ] . In this regime w e certify correctness through the ﬁxed-budget guaran tee of Theorem D.1. Indeed, for T ∈ [ G 1 ,δ , 2 G 1 ,δ ] , w e hav e P ( I  = i ∗ ) ≤ 27 K log( K ) log( T ) exp  − c 1 T log( T ) log( K ) H cw  ≤ 27 K log( K ) log(2 G 1 ,δ ) exp  − c 1 G 1 ,δ log(2 G 1 ,δ ) log( K ) H cw  . (70) W e no w use the explicit deﬁnition of G 1 ,δ and the crude upp er b ound log(2 G 1 ,δ ) ≤ 16 log  c − 1 1 H cw log( K /δ )  , (71) whic h results from the expression of G 1 ,δ , K, H cw ≥ 2 and δ ∈ (0 , 1 / 6) . Using the b ound (71) and plugging it bac k in (70), w e obtain P ( I  = i ∗ ) ≤ 432 · K log( K ) log( c − 1 1 H cw log( K /δ )) · exp  − 2 · log(32 c − 1 1 K H cw /δ )  ≤ δ · 432 K log ( K ) log( c − 1 1 H cw log( K /δ )) · δ (32 c − 1 1 K H cw ) 2 ≤ δ . where in the last line w e used the fact that K, H cw ≥ 2 , δ ∈ (0 , 1 / 6) and c 1 ∈ (0 , 1) . It remains to 43 argue that, conditional on I = i ∗ , the auxiliary certiﬁcation T est-CW (Algorithm 4) returns True with probabilit y at least 1 − 2 δ , hence ov erall P ( ϕ 1 ∨ ϕ 2 = True ) ≥ 1 − 3 δ in this regime. Run T est-CW with inputs ( i ∗ , δ, T ) and let T i b e the num ber of comparisons allo cated to pair ( i ∗ , i ) . By construction, P i  = i ∗ T i ≤ T . If T est-CW returns False , then either it exhausted the budget without eliminating all opp onents, or it triggered a negative-deviation stopping rule. F ormally , deﬁne E 1 := n X i  = i ∗ T i = T o , E 2 := n ∃ j  = i ∗ , ∃ N ≥ 1 : ˜ ∆ i ∗ ,j ( N ) ≤ − r log( K N 2 n ( n +1) δ ) N o , so that { ϕ 2 = False } ⊆ E 1 ∪ E 2 . Since ∆ i ∗ ,j ≥ 0 for all j  = i ∗ , Ho eﬀding’s inequality and a union b ound giv e P ( E 2 ) ≤ X j  = i ∗ X N ≥ 1 P ˜ ∆ i ∗ ,j ( N ) − ∆ i ∗ ,j ≤ − r log( K N 2 n ( n +1) δ ) N ! ≤ X j  = i ∗ X N ≥ 1 δ K n ( n + 1) N 2 ≤ δ , (72) where w e used in the last line the fact that n ( n + 1) ≥ 2 ≥ π 2 / 6 . Next, for eac h i  = i ∗ deﬁne ¯ T i := 16 ∆ 2 i ∗ ,i log 32 K n ( n + 1) δ ∆ 2 i ∗ ,i ! . Lemma E.1 b elo w ensures that P i  = i ∗ ¯ T i < G 1 ,δ ≤ T , hence P ( E 1 ) = P   X i  = i ∗ T i = T   ≤ P   X i  = i ∗ T i ≥ G 1 ,δ   ≤ P  ∃ i  = i ∗ : T i > ¯ T i  ≤ X i  = i ∗ P ( T i > ¯ T i ) . (73) If T i > ¯ T i , then at time N = ¯ T i the arm i w as not eliminated, meaning ˜ ∆ i ∗ ,i ( ¯ T i ) < s log  K ¯ T 2 i n ( n +1) δ  ¯ T i . By Lemma E.1, the RHS is at most ∆ i ∗ ,i − r log( K ¯ T 2 i n ( n +1) δ ) 2 ¯ T i , hence P ( T i > ¯ T i ) ≤ P   ˜ ∆ i ∗ ,i ( ¯ T i ) − ∆ i ∗ ,i < − s log( K ¯ T 2 i n ( n +1) δ ) 2 ¯ T i   ≤ X N ≥ 1 δ K n ( n + 1) N 2 ≤ δ K , (74) where the last line uses Hoeﬀding and a union bound ov er N ≥ 1 . Combining (73) and (74) yields P ( E 1 ) ≤ δ . T ogether with (72) , w e obtain P ( ϕ 2 = False ) ≤ 2 δ , hence P ( ϕ 2 = True ) ≥ 1 − 2 δ when I = i ∗ . This completes Regime 1. Regime 2: G 1 ,δ > G 2 ,δ + G 3 ,δ + G 0 . Then T ∈ [ G 2 ,δ + G 3 ,δ + G 0 , 2( G 2 ,δ + G 3 ,δ + G 0 )] . W e sho w that, for such T , the certiﬁcation v ariable ϕ 1 in Algorithm 1 remains True with probabilit y at least 44 1 − 2 δ . Let ¯ α k b e the arm ranked | A k | − ⌈| A k | / 8 ⌉ + 1 at round k according to scores S k ( · ) . Deﬁne L k,δ := s 2 c log( T ) ⌈ B k / 4 ⌉ log  1 δ n,K  , δ n,K := δ 8 K 2 log 8 / 7 ( K ) log( T ) n ( n + 1) . By the up date rule for ϕ 1 , the even t { ϕ 1 = False } implies that for some k ≤ k max , S k ( ¯ α k ) ≥ − L k,δ . The deﬁnition of ¯ α k en tails that at most ⌈| A k | / 8 ⌉ arms ha ve score not larger than − L k,δ , i.e. { S k ( ¯ α k ) ≥ − L k,δ } ⊆ ( X α ∈ A k 1  S k ( α ) < − L k,δ  ≤  | A k | 8  ) . (75) W e no w relate − L k,δ to the threshold 1 2 ¯ ∆ k used in Lemma D.5. Recall the deﬁnitions (as in the pro of of Theorem D.3): let E k := n α ∈ A k : ∆ ( k ) α, ( ⌈| A k | / 4 ⌉ ) ≤ 0 o , | E k | ≥ ⌈| A k | / 4 ⌉ , and deﬁne the 7 / 8 -quan tile ¯ ∆ k := ∆ E k : ⌈ (7 / 8) | E k |⌉ ≤ 0 and the subset F k := n α ∈ E k : ∆ α, ( s α ) ≤ ¯ ∆ k o . Recall that b y deﬁnition w e hav e ¯ ∆ k ≤ 0 . Moreov er, ¯ ∆ k → 0 implies that H ( s ) certify → ∞ and the b ound resulting on the c hoice of s are v acuous in this case. W e therefore suppose that ¯ ∆ k < 0 . Lemma E.2 ensures that for all k ≤ k max and all T in the present regime, − L k,δ ≥ 1 2 ¯ ∆ k . (76) W e assume that | A k | ≥ 3 , the case | A k | = 2 is treated in the end. Using (76) inside (75) gives P ( S k ( ¯ α k ) ≥ − L k,δ ) ≤ P X α ∈ A k 1  S k ( α ) < 1 2 ¯ ∆ k  ≤  | A k | 8  ! = P X α ∈ A k 1  S k ( α ) ≥ 1 2 ¯ ∆ k  ≥ | A k | −  | A k | 8  ! . (77) Since | A k | − ⌈ | A k | / 8 ⌉ = | A k +1 | and Lemma D.4 giv es | A k +1 ∩ F k | ≥ ⌈| F k | / 3 ⌉ , the RHS of (77) is upp er b ounded b y P X α ∈ F k 1  S k ( α ) ≥ 1 2 ¯ ∆ k  ≥  | F k | 3  ! . Lemma D.5 then yields, for a n umerical constant c 2 > 0 , P ( S k ( ¯ α k ) ≥ − L k,δ ) ≤ exp − c 2 T − c 3 H (0) explore ( s ) log 5 ( H (0) explore ( s )) log 3 ( K ) log( T ) · 1 H (1) explore ( s ) ! . (78) Next, we use T ≥ G 3 ,δ + G 0 with Lemma E.3, which turns the last inequality in to a b ound on the exp onen t term of (78) leading to P ( S k ( ¯ α k ) ≥ − L k,δ ) ≤ δ 2 K k max . (79) whic h is the desired p er-round b ound. 45 Supp ose that | A k | = 2 , then we ha ve E k = F k : = { α } . Therefore P ( S k ( ¯ α k ) ≥ − L k,δ ) ≤ P  S k ( α ) ≥ 1 2 ¯ ∆  ≤ exp − c 2 T − c 3 H (0) explore ( s ) log 5 ( H (0) explore ( s )) log 3 ( K ) log( T ) · 1 H (1) explore ( s ) ! ≤ δ 2 K k max . where in the second line w e used Lemma D.6 (which pro vides a smaller upper b ound than the one giv en ab ov e). Finally , since { ϕ 1 = False } ⊆ S k ≤ k max { S k ( ¯ α k ) ≥ − L k,δ } , a union b ound and (79) yield P ( ϕ 1 = False ) ≤ k max X k =1 δ 2 K k max ≤ δ , whic h completes Regime 2. Conclusion. In either regime, for an y T satisfying (69) the call to Algorithm 1 returns ϕ 1 ∨ ϕ 2 = True with probabilit y at least 1 − 6 δ . Since Algorithm 2 doubles T un til this ev ent o ccurs, its total n umber of queries N δ is at most a universal constant m ultiple of M δ = min { G 1 ,δ , G 2 ,δ + G 3 ,δ + G 0 } with probabilit y at least 1 − 6 δ . Absorbing numerical constants into ¯ c 1 , ¯ c 2 yields the stated b ounds of Theorem 2.2. The lemmas b elo w are technical. Lemma E.1. Consider the notation intr o duc e d in the pr o of of The or em 2.2. Then we have X i  = i ∗ ¯ T i < G 1 ,δ . Mor e over, for al l i  = i ∗ 2 v u u t log  K ¯ T 2 i n ( n +1) δ  ¯ T i < ∆ i ∗ ,i . Pr o of. W e hav e X i  = i ∗ ¯ T i = X i  = i ∗ 16 ∆ 2 i ∗ ,i log 32 K n ( n + 1) δ ∆ 2 i ∗ ,i ! ≤ X i  = i ∗ 16 ∆ 2 i ∗ ,i log  32 K n ( n + 1) δ H cw  ≤ 16 H cw (log(32 K H cw /δ ) + log( n ( n + 1))) . (80) Therefore w e only need to prov e that 16 H cw (log(32 K H cw /δ ) + log( n ( n + 1))) ≤ G 1 ,δ , whic h is equiv alent to log  32 K H cw δ  + log( n ( n + 1)) ≤ 32 16 c 1 log( K ) log ( K H cw /δ ) log( c − 1 1 H cw log( K /δ )) . 46 Observ e that to prov e the b ound ab ov e we just need an upp er b ound on log ( n ( n + 1)) , more precisely , giv en that log ( K ) log( c − 1 1 H cw log( K /δ ) ≥ 2 and c 1 < 1 2 , it suﬃces the sho w that log( n ( n + 1)) ≤ 1 c 1 log( K ) log ( K H cw /δ ) (81) W e ha ve from the deﬁnition of n = log 2  T 4 K log 8 / 7 ( K )  and T ≤ 2 G 1 ,δ that n ≤ log 2 2 G 1 ,δ 2 K log 8 / 7 ( K ) ! ≤ log 2  32 log(8 / 7) c 1 H cw K log  32 K H cw c 1 δ  log  c − 1 1 H cw log( K /δ )   ≤ log 2  6 c 2 1 H 2 cw K log 2  32 K H cw c 1 δ  ≤ 2 log 2  6 c 1 H cw log  32 K H cw c 1 δ  This giv es log( n ( n + 1)) ≤ 2 log( n + 1) ≤ 2 log  2 log 2  12 c 1 H cw K log  32 K H cw c 1 δ  , whic h gives (81), and leads to the ﬁrst claim of the lemma. The second claim of the lemma is equiv alent to r log  K ¯ T 2 i n ( n +1) δ  ¯ T i < ∆ i ∗ ,i 2 whic h in turn is implied b y log  K ¯ T 2 i n ( n +1) δ  < 4 · log  32 K n ( n +1) δ ∆ 2 i ∗ ,i  , whic h is veriﬁed giv en the deﬁnition of ¯ T i . Lemma E.2. Consider the notation intr o duc e d in the pr o of of The or em 2.2. If ¯ ∆ k < 0 and T ∈ [ G 2 ,δ + G 3 ,δ + G 0 , 2( G 2 ,δ + G 3 ,δ + G 0 )] , then − L k,δ ≥ 1 2 ¯ ∆ k . Pr o of. Let δ n,K = δ 8 K 2 log 8 / 7 ( K ) log( T ) n ( n + 1) . Assume ¯ ∆ k < 0 . Since L k,δ ≥ 0 , the inequality − L k,δ ≥ 1 2 ¯ ∆ k is equiv alen t to L k,δ ≤ − 1 2 ¯ ∆ k , i.e. L 2 k,δ ≤ ¯ ∆ 2 k 4 . (82) Recalling L 2 k,δ = 2 c log( T ) ⌈ B k / 4 ⌉ log  1 δ n,K  , B k = $ T | A k | log 8 / 7 ( K ) % , w e hav e (82) is implied by log( T ) log  1 δ n,K  ≤ ¯ ∆ 2 k 8 c T | A k | log 8 / 7 ( K ) . (83) 47 Next, using the inequalit y (59) | A k | ¯ ∆ 2 k ≤ 32 X α ∈ E k 1 ∆ 2 α, ( s α ) , and the fact that i ∗ / ∈ E k (since all ∆ i ∗ ,j > 0 so ∆ ( k ) i ∗ , ( ⌈| A k | / 4 ⌉ ) > 0 ), we ha ve P α ∈ E k 1 ∆ 2 α, ( s α ) ≤ H certify ( s ) . Hence ¯ ∆ 2 k | A k | ≥ 1 32 H certify ( s ) . (84) Moreo ver, using the expression of δ n,K with n ≤ log 2  T 2 K log 8 / 7 ( K )  , w e hav e log  1 δ n,K  ≤ log  8 K 2 log 8 / 7 ( K ) δ  + log log( T ) + log ( n ( n + 1)) ≤ log  8 K 2 log 8 / 7 ( K ) δ  + log log( T ) + 2 log log 2 2 T K log 8 / 7 ( K ) ! . (85) Com bining (84), (85) with (83) w e conclude that we only need that T satisﬁes the b ound log( T )  log  8 K 2 log 8 / 7 ( K ) δ  + log log( T ) + 2 log log 2  2 T K log 8 / 7 ( K )  ≤ T 512 c log 8 / 7 ( K ) H certify ( s ) . (86) Giv e that T ≥ G 2 ,δ + G 0 , using the expressions of G 2 ,δ and G 0 , with the statemen t of the technical Lemma E.4, w e conclude that (86) is satiﬁed, which concludes the proof. Lemma E.3. Supp ose that T ≥ G 0 + G 3 ,δ . Then we have exp − c 2 T − c 3 H (0) explor e ( s ) log 5 ( H (0) explor e )( s ) log( K ) log ( T ) 1 H (1) explor e ( s ) ! ≤ δ 2 K k max . Pr o of. The desired inequality is equiv alent to T − c 3 H (0) explore ( s ) log 5 ( H (0) explore ( s )) log( T ) ≥ 1 c 2 H (1) explore ( s ) log 3 ( K ) log  2 K k max δ  . (87) Deﬁne A : = 32 c c 2 H (1) explore ( s ) log 3 ( K ) log  2 K k max δ  . By assumption, T ≥ 2 c 3 c 2 H (0) explore ( s ) log 5 ( H (0) explore ( s )) + A log( A ) . Since c 2 ≤ 1 (as is the case for the n umerical constan t c 2 coming from the preceding bounds), the ﬁrst term implies T ≥ 2 H (0) explore ( s ) log 5 ( H (0) explore ( s )) , hence T − c 3 H (0) explore ( s ) log 5 ( H (0) explore ( s )) ≥ T 2 . (88) Next, the function f ( x ) := x/ log x for x > e is increasing. Moreov er, we hav e log (2 K k max /δ ) > 1 and H (1) explore ( s ) ≥ 4 (since | ∆ | i, ( s i ) ≤ 1 2 and s i ≤ K ), and with c ≥ 1 and c 2 ≤ 1 this yields A ≥ 32 · 4 · (log 2) 3 · log 4 > e . Therefore, A log A > e and in particular log ( A log A ) > 0 . Since T ≥ A log A and f is increasing on ( e, ∞ ) , we obtain T log T = f ( T ) ≥ f ( A log A ) = A log A log( A log A ) . (89) 48 Finally , because A > e , w e ha ve log ( A log A ) = log A + log log A ≤ log A + log A = 2 log A , and therefore A log A log( A log A ) ≥ A log A 2 log A = A 2 . (90) Com bining (88), (89), and (90) gives T − c 3 H (0) explore ( s ) log 5 ( H (0) explore ( s )) log T ≥ 1 2 · T log T ≥ 1 2 · A 2 = A 4 . By the deﬁnition of A , A 4 = 8 c c 2 H 1 log 3 ( K ) log(2 K k max /δ ) ≥ 1 c 2 H 1 log 3 ( K ) log(2 K k max /δ ) , where w e used c ≥ 1 . This pro ves (87), and hence exp − c 2 T − c 3 H (0) explore ( s ) log 5 ( H (0) explore ( s )) log 3 ( K ) log( T ) H 1 ! ≤ e − log(2 K k max /δ ) = δ 2 K k max . Lemma E.4. L et K ≥ 2 , δ ∈ (0 , 1) , if T ≥ G 0 + G 2 ,δ , then log( T )  log  8 K 2 log 8 / 7 ( K ) δ  + log log( T ) + 2 log log 2  2 T K log 8 / 7 ( K )  ≤ T 512 c log 8 / 7 ( K ) H c ertify ( s ) . Pr o of. Let L := log 8 / 7 ( K ) , H := H certify ( s ) , and set M := 512 cLH , B := log  8 K 2 L δ  + 3 log(8 M ) + 10 . By the deﬁnition of G 0 and G 2 ,δ , and giv en c 2 < 1 / 8 the assumption T ≥ G 0 + G 2 ,δ implies in particular that T ≥ T 0 := 2048 · cLH B log (2048 · cLH B ) and T ≥ e 2 . Moreo ver, w e hav e for T ≥ 2 , log log 2  2 T K L  ≤ log log( T ) + 2 , Therefore the left-hand side is at most u ( T ) := log( T )  log  8 K 2 L δ  + 3 log log T + 4  . The function u ( T ) /T is decreasing on [ e 2 , ∞ ) . Hence for all T ≥ T 0 , u ( T ) ≤ T T 0 u ( T 0 ) . No w T 0 = 2048 · cLH B log (2048 · cLH B ) (given that G 2 ,δ ≥ 2048 · cLH B log (2048 · cLH B ) ) gives log T 0 = log(2048 · cLH B ) + log log(2048 · cLH B ) ≤ 2 log(2048 · cLH B ) , and log log T 0 ≤ log log(2048 · cLH B ) + 1 , 49 so u ( T 0 ) ≤ 2 log (2048 · cLH B )  log  8 K 2 L δ  + 3 log log(2048 · cLH B ) + 7  . Moreo ver, log log(2048 · cLH B ) ≤ log(2048 · cLH B ) = log(2048 · cLH ) + log B ≤ log (2048 · cLH ) + B , and B = log  8 K 2 L δ  + 3 log(2048 · cLH ) + 10 , hence log  8 K 2 L δ  + 3 log log(2048 · cLH B ) + 7 ≤ log  8 K 2 L δ  + 3 log(2048 · cLH ) + 3 B + 7 = 4 B − 3 ≤ 4 B . Therefore u ( T 0 ) ≤ 2 log (2048 · cLH B ) · 4 B = 8 B log (2048 · cLH B ) = T 0 512 · cLH . Com bining the previous displays yields u ( T ) ≤ T / M for all T ≥ T 0 , i.e. log( T )  log  8 K 2 L δ  + log log( T ) + 2 log log 2  2 T K L   ≤ T 512 cLH , whic h is exactly the desired inequality . F Pro ofs of Section 3 In this section, w e provide all proofs for the instance-dep endent ﬁxed-conﬁdence lo wer b ounds. W e b egin with a short roadmap in Subsection F.1 describing the classical change-of-measure argumen ts underlying all our constructions (along the wa y , we ﬁx some notation). In Subsection F.2, w e prov e Theorem 3.1, separating the exp ected budget b ound (Subsection F.2.1) from the high- probabilit y quantile bound (Subsection F.2.2). In Subsection F.3, w e explain the construction leading to Theorem 3.2 and state a more precise form ulation in Theorem F.2, prov ed in Subsection F.4. Corollary 3.3 follows in Subsection F.5. Finally , Subsection F.6 discusses lo wer bounds preserving CW row structure. F.1 Roadmap on c hange-of-measure low er b ounds All our pro ofs follo w a common three-step structure. First step: reference and alternative instances. W e ﬁx a reference instance ∆ ∈ D cw , that is, a gap matrix admitting a (unique) Condorcet winner i ∗ ( ∆ ) . In the ﬁxed-conﬁdence regime, our lo wer bounds are instance-dependent, so all constructions are built directly from the given K × K matrix ∆ = (∆ i,j ) i,j ∈ [ K ] . F or some results (e.g., Theorem 3.2), w e also consider a lo cal class of instances obtained from ∆ b y p erm uting the negative entries of eac h row, while preserving prescrib ed structural features (CW, 50 sign structure, multiset of negativ e entries, eﬀective sparsity , and so on). This leads to families { ∆ ( π ) } π indexed b y p ermutations π , but the reference ob ject remains ∆ . F or eac h sub optimal arm k  = i ∗ , w e construct an alternative instance ∆ ( k ) in suc h a wa y that i ∗ is no longer the (strong) Condorcet winner, while k b ecomes the CW or at least a w eak CW, in the sense that the k -th ro w of ∆ ( k ) con tains only non-negativ e en tries. A typical construction consists in mo difying only the k -th row of ∆ , for example b y setting all its negativ e en tries to a small constan t ϵ ≥ 0 , and updating the k -th column so as to preserv e symmetry . In more reﬁned arguments, w e ﬁrst p erm ute the negative en tries within each row. Second step: total v ariation. W e then exploit the prop erties of a given algorithm A in order to exhibit, for each k  = i ∗ , a separating even t B k on which the t wo la ws assign very diﬀeren t probabilities, P ∆ ,A ( B k ) is large (t ypically ≥ 1 − δ ), P ∆ ( k ) ,A ( B k ) is small (t ypically ≤ δ ). Here P ∆ ,A denotes the la w of all observ ations (and internal randomness) when algorithm A in teracts with environmen t ∆ . A natural c hoice, when A is δ -correct for CW identiﬁcation, is the even t { ˆ i = k } , where ˆ i is the recommendation output by A . F or quan tile (high-probabilit y) lo wer b ounds, w e additionally introduce the (1 − δ ) -quan tile χ of the budget N δ under P ∆ ,A , and we consider ev ents suc h as { N δ ≤ χ } or their intersections with iden tiﬁcation ev ents. The precise choice of B k v aries from theorem to theorem, but the goal is alwa ys to pro duce a set on which the tw o la ws P ∆ ,A and P ∆ ( k ) ,A ha ve v ery diﬀerent probabilities, thereby enforcing a large total-v ariation distance: TV  P ∆ ,A , P ∆ ( k ) ,A  ≥   P ∆ ,A ( B k ) − P ∆ ( k ) ,A ( B k )   . The total-v ariation distance is then controlled from abov e through a standard data-pro cessing inequalit y: we use either Pinsker’s inequalit y , the Bretagnolle–Huber inequality , or a F ano-type inequalit y (see Lemma H.4) to relate TV to the Kullback–Leibler div ergence. F or example, TV( P , Q ) ≤ q 1 2 KL( P , Q ) , or 1 − TV( P , Q ) ≥ 1 2 exp  − KL( P , Q )  . The c hoice dep ends on the error-probability regime: Bretagnolle–Hub er is conv enient in the very small- δ regime, while Pinsk er is often sharp er in mo derate-error regimes. Third step: decomp osition and con trol of the KL div ergence. The last step is to decom- p ose the Kullback–Leibler divergence b et ween the la ws induced by A under the tw o en vironments. Let N i,j denote the total num b er of observ ed duels of the ordered pair ( i, j ) b etw een time 1 and the stopping time N δ , and let N { i,j } = N i,j + N j,i b e the total num b er of observ ations of the unordered pair { i, j } . A standard KL-decomposition for adaptive bandit algorithms (see, e.g., Lattimore and Szep esv ári, 2020, Lemma 15.1) yields KL  P ∆ ,A , P ∆ ( k ) ,A  = X 1 ≤ i 0 (and adjusting the k -th column to preserv e symmetry), so that k b ecomes the CW in ∆ ( k ) . (ii) Separating ev en t and total v ariation. Since the algorithm is δ -correct for CW iden tiﬁcation, it must distinguish ∆ from eac h ∆ ( k ) with error at most δ when deciding betw een i ∗ and k as CW. Using the test ev ent B k = { ˆ i = k } , w e obtain that the total-v ariation distance b et ween P ∆ ,A and P ∆ ( k ) ,A is at least 1 − 2 δ , which, via the Bretagnolle–Huber inequalit y , yields a lo wer b ound on KL ( P ∆ ,A , P ∆ ( k ) ,A ) . (iii) KL decomp osition. W e then decompose this KL along unordered pairs. The t wo instances ∆ and ∆ ( k ) diﬀer only on duels in volving arm k , and for eac h such pair the Bernoulli parameters diﬀer by at most a constant of order | ∆ k, (1) | . A Bernoulli KL upp er b ound, combined with the decomp osition, forces E ∆ ,A [ N k ] ≳ ∆ − 2 k, (1) log (1 /δ ) , where N k coun ts duels in volving k . Summing this constraint ov er all k  = 1 yields the desired lo wer b ound (91). Pr o of. Step 1: reference and perturb ed instances. The argument is fully instance-dep endent: we ﬁx an arbitrary gap matrix ∆ ∈ D cw with Condorcet winner i ∗ = 1 , and work throughout with this sp eciﬁc instance. W e denote P the probabilit y induced by the in teraction b etw een ∆ and A . Let ϵ > 0 b e a constant, arbitrary small. Let k  = 1 b e an arm that is not the CW under ∆ . A simple wa y to mo dify ∆ so that k b ecomes the CW, is to mak e all non-p ositive en tries in the k -th ro w of ∆ equal to ϵ . 52 Construct the gap matrix ∆ ( k ) as follo ws. F or all i, j / ∈ { 1 , k } , set ∆ ( k ) i,j = ∆ i,j . Set ∆ ( k ) k, 1 = ϵ and ∆ ( k ) 1 ,k = − ϵ . Finally , for each j / ∈ { 1 , k } , deﬁne ∆ ( k ) k,j = ( ∆ k,j , if ∆ k,j > 0 , ϵ, if ∆ k,j ⩽ 0 , ∆ ( k ) j,k = − ∆ ( k ) k,j . (92) F or ϵ small enough, the mo diﬁed matrix ∆ ( k ) can b e represen ted as ∆ ( k ) =                      0 ∆ 1 , 2 · · · ∆ 1 ,k − 1 − ϵ ∆ 1 ,k +1 · · · ∆ 1 ,K ∆ 2 , 1 0 · · · ∆ 2 ,k − 1 − ( ϵ ∨ ∆ 2 ,k ) ∆ 2 ,k +1 · · · ∆ 2 ,K . . . . . . . . . . . . . . . . . . . . . ∆ k − 1 , 1 ∆ k − 1 , 2 · · · 0 − ( ϵ ∨ ∆ k − 1 ,k ) ∆ k − 1 ,k +1 · · · ∆ k − 1 ,K ϵ ϵ ∨ ∆ k, 2 · · · ϵ ∨ ∆ k,k − 1 0 ϵ ∨ ∆ k,k +1 · · · ϵ ∨ ∆ k,K ∆ k +1 , 1 ∆ k +1 , 2 · · · ∆ k +1 ,k − 1 − ( ϵ ∨ ∆ k +1 ,k ) 0 · · · ∆ k +1 ,K . . . . . . . . . . . . . . . . . . . . . ∆ K, 1 ∆ K, 2 · · · ∆ K,k − 1 − ( ϵ ∨ ∆ K,k ) ∆ K,k +1 · · · 0                      , where the blue entries indicate the diﬀerences with resp ect to the reference ∆ . In fact, only non-p ositiv e entries of ro w k are modiﬁed b etw een ∆ and ∆ ( k ) . Since ϵ > 0 , the k -th ro w ∆ ( k ) k, · is positive (aside from ∆ ( k ) k,k ). Hence, the CW of ∆ ( k ) is k . As standard in this line of w ork, our construction is motiv ated b y the fact that the instance ∆ ( k ) is hard to distinguish from the reference gap matrix ∆ . F or k ≥ 2 , denote by P ( k ) the distribution of the data when the underlying gap matrix is ∆ ( k ) . F ormally , when the algorithm queries a pair ( i, j ) with i < j , it receiv es a sample X i,j ∼ B  ∆ ( k ) i,j + 1 2  , where B ( p ) denotes the Bernoulli distribution with parameter p . Step 2: information-theoretic arguments. Let ˆ i denote the output of algorithm A , which is assumed to b e δ -correct ov er D cw . When the gap matrix is ∆ ( k ) the CW is k , so δ -correctness implies ∀ k  = i ∗ , P ( k ) ( ˆ i = k ) ≥ 1 − δ and P ( ˆ i = k ) ⩽ δ . By the deﬁnition of the total v ariation distance, TV  P , P ( k )  ≥   P ( ˆ i = k ) − P ( k ) ( ˆ i = k )   ≥ 1 − 2 δ . (93) F rom (93) , the Bretagnolle–Huber inequality (Theorem 14.2 in Lattimore and Szep esvári, 2020) yields 1 − 2 δ ⩽ TV  P , P ( k )  ⩽ 1 − 1 2 exp n − KL  P , P ( k )  o . (94) Step 3: computing the KL divergence and concluding on the budget. F or any unordered pair { i, j } with i  = j , denote by N { i,j } the num ber of duels in volving either ( i, j ) or ( j, i ) , that is, N { i,j } : = N i,j + N j,i , where N i,j : = |  t ∈ [ N δ ] : ( I t , J t ) = ( i, j )  | , 53 and N δ is the stopping time of the algorithm. Using the div ergence decomp osition lemma (Lemma 15.1 in Lattimore and Szepesvári, 2020) and the fact that the t wo instances ∆ and ∆ ( k ) diﬀer only on pairs in volving arm k , we obtain KL  P , P ( k )  = X 1 ⩽ i 0 , then by construction P k,i = P ( k ) k,i . Otherwise, if ∆ k,i < 0 , the corresp onding Bernoulli feedback distributions satisfy P k,i = B  1 2 + ∆ k,i  , P ( k ) k,i = B  1 2 + ϵ  , so that KL  P k,i , P ( k ) k,i  = kl  ∆ k,i + 1 2 , ϵ + 1 2  ⩽  ∆ k,i + 1 2 − ( ϵ + 1 2 )  2  − ϵ + 1 2  ϵ + 1 2  (96) ⩽ ( ϵ − ∆ k, (1) ) 2  − ϵ + 1 2  ϵ + 1 2  . (97) Here, (96) follo ws from the standard upp er bound kl ( p, q ) ⩽ ( p − q ) 2 / [ q (1 − q )] for p, q ∈ (0 , 1) , while (97) uses that ∆ 2 k,i ⩽ ∆ 2 k, (1) b y deﬁnition of ∆ k, (1) as the smallest negative en try in ro w k . Similarly , if ∆ k,j = 0 , P k,j = B  1 2  , P ( k ) k,j = B  1 2 + ϵ  , so that the same computation giv es KL  P k,j , P ( k ) k,j  ⩽ ϵ 2  − ϵ + 1 2  ϵ + 1 2  , whic h v anishes to 0 with ϵ → 0 . Com bining these b ounds with (95), and taking the limit ϵ → 0 , we obtain KL  P , P ( k )  ⩽ 4    K X i =1 i  = k 1 { ∆ k,i < 0 } E  N { k,i }     ∆ 2 k, (1) . Using the Bretagnolle–Hub er inequalit y (94) from Step 3 , we then get K X i =1 i  = k 1 { ∆ k,i < 0 } E  N { k,i }  ≥ 1 4 ∆ 2 k, (1) log 1 4 δ . Summing o ver k ≥ 2 , we conclude that the total n umber of queries N δ satisﬁes E [ N δ ] ≥ K X k  =1    K X i =1 i  = k 1 { ∆ k,i < 0 } E  N { k,i }     ≥ 1 4 K X k =2 1 ∆ 2 k, (1) log 1 4 δ , whic h is exactly the claimed low er b ound (91). 54 F.2.2 Bound in quantile in Theorem 3.1 In this paragraph, w e prov e the quantile b ound from Theorem 3.1, namely P ∆ ,A   N δ ⩾ 1 3 X i  = i ∗  1 6 δ  ∆ 2 i, (1)   ⩾ δ . (98) Sk etch of pro of. Even though exp ectation lo w er bounds are naturally weak er than quan tile b ounds, deducing a quantile low er b ound from its exp ectation counterpart is non trivial. Still, the argumen ts largely mirror the exp ectation pro of (paragraph F.2.1), w e follow the same three-step roadmap. (i) Reference instance and alternative instances : we ﬁx ∆ ∈ D cw with CW i ∗ and, for each k  = i ∗ , construct ∆ ( k ) b y zeroing the negativ e entries in ro w/column k . In particular, ∆ ( k ) has tw o nonnegative rows (those of 1 and k ) and therefore do es not b elong to D cw . (ii) Separating even t and total v ariation : we now use the stopping rule instead of the recommendation and consider the even t B = { N δ > Q } , where Q is the (1 − δ ) -quan tile of N δ under ∆ . This ev ent has small probabilit y under P , but (after a contin uity argument using p erturbations of ∆ ( k ) ) it has large probabilit y under P ( k ) , since A cannot quic kly decide b etw een ˆ i = 1 and ˆ i = k . (iii) KL decomposition : as before, we decomp ose the KL along pairs and use that ∆ and ∆ ( k ) diﬀer only on duels in volving k . The only extra ingredient is that, since B dep ends only on the ﬁrst Q observ ations, w e introduce a truncated algorithm ˜ A that stops at time Q , apply the KL decomp osition to ˜ A , and then rein terpret the resulting inequality as a lo w er b ound on Q , i.e., on the (1 − δ ) -quantile of N δ . Pr o of. Let A b e any δ -correct algorithm on D cw . Step 1: Reference and alternative instances. As for the exp ectation b ound, we ﬁx an y ∆ as the reference instance, with i ∗ = 1 . Denote P A for the probabilit y induced by the interaction betw een A and ∆ . Then, for each sub optimal arm k ∈ { 2 , . . . , K } , construct ∆ ( k ) b y setting to zero all entries ( k , j ) with ∆ k,j < 0 , that is, ∆ ( k ) is deﬁned b y Equation (92) with ϵ = 0 . The matrices ∆ and ∆ ( k ) diﬀer only in row/column k . By construction, rows 1 and k of ∆ ( k ) con tain only nonnegative en tries, then ∆ ( k ) con tains tw o weak CW, and 1 and k are tied. In particular, ∆ ( k ) / ∈ D cw . Denote P ( k ) A the distribution induced b y A interacting with ∆ ( k ) . Step 2: b ound in total v ariation and reduction to ﬁxed budget. Consider the recommendation rule ˆ i and the budget N δ of A . Deﬁne Q as the (1 − δ ) -quan tile of the budget under P : Q = inf { x > 0 s.t. P ( N δ ⩾ x ) ⩽ δ } . (99) Consider the ev ent B := { N δ ≤ χ } where the budget is smaller than χ . W e deﬁne a truncated version ˜ A of A with budget at most χ as follows: run A for t = 1 , . . . , χ ; if A stops b efore time χ , return ˜ i = ˆ i ; else stop at time χ and return ˜ i = 0 . Let ˜ N δ , ˜ i b e ˜ A ’s budget/recommendation, P ˜ A (resp. P ( k ) ˜ A ) its la w under ∆ (resp. ∆ ( k ) ). By construction, B = { ˜ i  = 0 } , so this ev ent is measurable with respect to the observ ations of algorithm ˜ A . Moreov er, it has the same probabilit y under A and ˜ A . W e no w low er b ound the total v ariation distance betw een P ˜ A and P ( k ) ˜ A . 55 By the deﬁnition of Q in (99), we hav e P ˜ A ( ˜ i = 0) = P A ( N δ > Q ) ⩽ δ. (100) Under ∆ ( k ) , the instance does not b elong to D cw , so w e cannot directly in vok e δ -correctness. W e therefore appro ximate ∆ ( k ) b y nearby instances in D cw . More precisely , deﬁne ∆ ( k,ϵ ) as (92) , i.e., by lifting all zero entries in ro w k of ∆ ( k ) to ϵ > 0 (and adjusting the k -th column to preserve symmetry). F or 0 < ϵ ⩽ 1 / 4 , the matrix ∆ ( k,ϵ ) lies in D cw and admits k as its CW. Similarly , deﬁne ∆ ( k, − ϵ ) b y subtracting ϵ to all zero entries in t the k -th row of ∆ ( k ) (and adjusting the k -th column to preserv e symmetry); so that ∆ ( k, − ϵ ) ∈ D cw and admits 1 as its CW. Let P ( k,ϵ ) A and P ( k, − ϵ ) A denote the la ws of A under ∆ ( k,ϵ ) and ∆ ( k, − ϵ ) , respectively . Since A is δ -correct on D cw , w e hav e P ( k,ϵ ) A ( ˆ i  = k ) ⩽ δ, P ( k, − ϵ ) A ( ˆ i  = 1) ⩽ δ . (101) Moreo ver, P ( k,ϵ ) A con verges in total v ariation to P ( k ) A as ϵ → 0 . Using these facts and letting ϵ → 0 , w e obtain P ( k ) A ( N δ ⩽ χ ) = P ( k ) A ( ˆ i = 1 , N δ ⩽ χ ) + P ( k ) A ( ˆ i  = 1 , N δ ⩽ χ ) = lim ϵ → 0 h P ( k,ϵ ) A ( ˆ i = 1 , N δ ⩽ χ ) + P ( k, − ϵ ) A ( ˆ i  = 1 , N δ ⩽ χ ) i ⩽ 2 δ , (102) where the last inequalit y follows from (101). Since ˜ A and A coincide up to time Q , we also hav e P ( k ) ˜ A ( ˜ i  = 0) = P ( k ) A ( N δ ⩽ Q ) , so (102) implies P ( k ) ˜ A ( ˜ i  = 0) ⩽ 2 δ. com bining (100) with this inequality , and writing B = { ˜ i = 0 } , we obtain TV  P ˜ A , P ( k ) ˜ A  ⩾ P ( k ) ˜ A ( B c ) − P ˜ A ( B ) ⩾ 1 − 3 δ . (103) Remarks. Intuitively, one may think of ˜ A as a ﬁxe d-budget algorithm with budget Q . This c an b e viewe d as a r e duction: fr om any δ -c orr e ct algorithm A that enjoys a high-pr ob ability c ontr ol on its budget (namely, P ( N δ ⩽ Q ) ≥ 1 − δ ), we c onstruct a ﬁxe d-budget algorithm ˜ A with budget Q that inherits the same distinguishing p ower b etwe en the r efer enc e instanc e and its p erturb ations. Step 3: computing the KL divergence. By the Bretagnolle–Hub er inequalit y (see, e.g., Lattimore and Szep esvári, 2020), we ha ve 1 − 3 δ ⩽ TV  P ˜ A , P ( k ) ˜ A  ⩽ 1 − 1 2 exp n − KL  P ˜ A , P ( k ) ˜ A  o . In particular, KL  P ˜ A , P ( k ) ˜ A  ⩾ log  1 6 δ  . (104) W e no w decomp ose KL  P ˜ A , P ( k ) ˜ A  . F or i  = j in [ K ] , recall that N i,j denotes the n umber of duels ( i, j ) , while N { i,j } = N i,j + N j,i is the n umber of duels with unordered pair { i, j } . By the standard KL decomp osition for adaptiv e pro cedures [Lattimore and Szep esvári, 2020], KL  P ˜ A , P ( k ) ˜ A  = X i  = j E ˜ A  N i,j  KL  P i,j , P ( k ) i,j  = X j :∆ k,j < 0 E ˜ A  N { k,j }  KL  P k,j , P ( k ) k,j  , (105) 56 since ∆ and ∆ ( k ) diﬀer only on duels { k , j } with ∆ k,j < 0 . F or j with ∆ k,j < 0 , P ( k ) k,j = B (1 / 2) , P k,j = B (1 / 2 + ∆ k,j ) with ∆ k,j ∈ [ − 1 / 4 , 0] . Th us, KL  P k,j , P ( k ) k,j  = 1 2 log  1 2 1 2 + ∆ k,j  + 1 2 log  1 2 1 2 − ∆ k,j  = − 1 2 log  1 − 4∆ 2 k,j  ⩽ 8 log  4 3  ∆ 2 k,j , (106) where (106) follo ws from sup x ∈ [0 , 1 / 4] − log(1 − x ) x ⩽ 4 log  4 3  , applied with x = 4∆ 2 k,j ∈ [0 , 1 / 4] . Plugging these b ounds in to (105), we obtain KL  P ˜ A , P ( k ) ˜ A  ⩽ 8 log  4 3  X j :∆ k,j < 0 E ˜ A [ N { k,j } ] ∆ 2 k,j ⩽ 8 log  4 3    X j :∆ k,j < 0 E ˜ A [ N { k,j } ]   ∆ 2 k, (1) , (107) since ∆ 2 k,j ⩽ ∆ 2 k, (1) whenev er ∆ k,j < 0 . Combining (104)–(107), we obtain 1 8 log(4 / 3) 1 ∆ 2 k, (1) log  1 6 δ  ⩽ X j :∆ k,j < 0 E ˜ A [ N { k,j } ] . (108) Since ˜ A uses at most χ duels, w e hav e almost surely , under P ˜ A , X i  = i ∗ X j :∆ k,j < 0 N { k,j } ⩽ ˜ N δ ⩽ χ , in particular, the same b ound also holds in exp ectation E ˜ A . Summing in (108) ov er k  = i ∗ yields 1 8 log(4 / 3) X k  = i ∗ 1 ∆ 2 k, (1) log  1 6 δ  ⩽ X k  = i ∗ X j :∆ k,j < 0 E ˜ A [ N { k,j } ] ⩽ χ . Using the numerical bound 8 log (4 / 3) < 3 and the deﬁnition of Q as the (1 − δ ) -quan tile of N δ then giv es the high-probability lo wer b ound (98). F.3 Pro of of Theorem 3.2 In this subsection, w e prov e the high-probability low er b ound from Theorem 3.2. W e ﬁrst in tro duce additional notation and state a more precise version of the result, in particular, w e consider here the case where ∆ might con tains some ties. Notation and precise form ulation Consider a matrix ∆ with entries in  − 1 4 , 1 4  that admits a unique CW i ∗ ( ∆ ) , that is, ∆ ∈ D cw . Without loss of generality , w e assume that i ∗ ( ∆ ) = 1 , and we k eep this matrix ﬁxed throughout this section. W e no w deﬁne a family of environmen ts obtained from ∆ by permuting its en tries in a sp eciﬁc wa y . This will lead to a more precise formulation of Theorem 3.2. 57 Fix an an tisymmetric matrix Σ = ( σ i,j ) 1 ⩽ i,j ⩽ K deﬁned, for an y 1 ⩽ i  = j ⩽ K , by σ i,j = ( sign(∆ i,j ) , if ∆ i,j  = 0 , f tb ( i, j ) , if ∆ i,j = 0 , (109) where f tb : [ K ] 2 → {− 1 , 0 , 1 } is an an tisymmetric function used as a tie-breaking conv ention. In other words, Σ records the sign pattern of the gap matrix ∆ , with a ﬁxed rule when ∆ i,j = 0 . Note that Σ is antisymmetric, since ∆ is antisymmetric and f tb is an tisymmetric by assumption. F or no w, we do not sp ecify how w e ﬁx this conv en tion, as the precise c hoice will arise naturally at the v ery end of our pro ofs. F or a sub optimal arm i ∈ { 2 , . . . , K } , denote by Σ − i the set of arms that b eat i (with ties brok en according to f tb ), Σ − i : =  j ∈ [ K ] \ { i } : σ i,j = − 1  . F or an y i ∈ { 2 , . . . , K } , deﬁne Π i ( Σ ) as the set of p erm utations of Σ − i : Π i ( Σ ) : =  π i : Σ − i → Σ − i   π i is a bijection  , and set Π( Σ ) : = Π 2 ( Σ ) × · · · × Π K ( Σ ) . (110) Fix a p ermutation π : = ( π i ) K i =2 ∈ Π( Σ ) . F or each ro w i  = 1 , we permute the en tries of ∆ indexed by Σ − i according to π i . T o preserve an tisymmetry , we apply the same p erm utation π i to the corresp onding en tries in column i . W e thus deﬁne the matrix ∆ ( π ) as follows: for all ( i, j ) ∈ [ K ] 2 , ∆ ( π ) i,j : =        ∆ i,π i ( j ) , if σ i,j = − 1 , − ∆ j,π j ( i ) , if σ i,j = 1 , ∆ i,j , else , (111) the last condition migh t happ en when σ i,j = 0 , whic h only happ ens for ties. The p erm utations in Π( Σ ) destroy an y exploitable ordering structure b etw een arms while preserving, in eac h row, the multiset of non-p ositive entries and hence the row-wise hardness parameters ( K i ; < 0 , ∥ ∆ − i ∥ 2 2 ) . W e no w list properties that are preserved b y the p ermutation π . In the follo wing lemma, a row j is said to b e a weak CW for a giv en gap matrix if all entries in its row are nonnegativ e. Lemma F.1. F or any Σ satisfying (109) and any π ∈ Π( Σ ) , we have 1. ∆ ( π ) = − ( ∆ ( π ) ) T (antisymmetry) 2. ∀ j  = 1 , ∆ ( π ) 1 ,j ⩾ 0 (r ow 1 is a we ak CW) 3. ∀ i  = 1 , K i,< 0 ( ∆ ) = K i,< 0 ( ∆ ( π ) ) and K i, ⩽ 0 ( ∆ ) = K i, ⩽ 0 ( ∆ ( π ) ) (same sign st ructur e) 4. ∀ i  = 1 , ∀ s ⩽ K i, ⩽ 0 , ∆ ( π ) i, ( s ) = ∆ i, ( s ) (same or der e d nonp ositive entries) In p articular, if ∆ has no ties (that is, ∀ i  = j , ∆ i,j  = 0 ), then ∆ ( π ) ∈ D cw . Mor e over, for any s , H explor e ( s , δ ) and H c ertify ( s , δ ) r emains unchaine d under any p ermutation π ∈ Π( Σ ) . Pr o of of L emma F.1. 1. An ti-symmetry Let ( i, j ) ∈ [ K ] 2 with i  = j . Assume that ( σ i,j = − 1) which is equiv alen t to j ∈ Σ − i . By the ﬁrst line in Equation (111) , one has ∆ ( π ) i,j = ∆ ( π ) i,π i ( j ) . Then, b y the second line applied with ( j, i ) ( σ j,i = − 1 ), one has 58 ∆ ( π ) j,i = − ∆ i,π i ( j ) = − ∆ ( π ) i,j . The case σ i,j = 1 is treated similarly . F or ( σ i,j = 0) , one also hav e σ j,i = 0 and ∆ π i,j = ∆ i,j = − ∆ j,i = − ∆ π j,i . It pro ves the an ti-symmetry of ∆ ( π ) . 2. CW ro w By deﬁnition, i ∗ ∈ Σ − i for any i  = i ∗ (and i ∗ = 1 by assumption). In particular, ∆ ( π ) 1 ,i = ∆ π i (1) ,i ⩾ 0 , the non-negativity comes from the fact that π i (1) ∈ Σ − i , that is π i (1) also b eats i (or is ev en with i in the case where ∆ contains ties). 3. Sign structure. F or ev ery non-CW arm i  = 1 , the set of indices Σ − i con tains only entries j suc h that ∆ i,j ⩽ 0 , and (111) only p erm utes the entries in that set. Hence the m ultiset of nonp ositive en tries in row i is preserv ed, and so are the counts K i,< 0 and K i, ⩽ 0 . 4. Order statistics. Since the multiset of nonp ositive en tries in row i is preserved up to p ermutation, the ordered sequence (∆ i, ( s ) ) s ⩽ K i, ⩽ 0 is unchanged, which giv es ∆ ( π ) i, ( s ) = ∆ i, ( s ) for all s ⩽ K i, ⩽ 0 . In particular, for any ve ctor s with s i ⩽ K i ; < 0 (a condition indep endent of π ), the gaps ∆ ( π ) i, ( s i ) coincide with ∆ i, ( s i ) , so that H explore ( s , δ ) and H certify ( s , δ ) remain unchanged under any π ∈ Π( Σ ) . No w we are ready to state Theorem F.2, whic h directly implies Theorem 3.2 and provides a more constructiv e formulation. Theorem F.2. L et A b e a δ -c orr e ct algorithm over the class D cw , with δ ⩽ 1 / 12 . L et ∆ ∈ D cw b e such that i ∗ ( ∆ ) = 1 . Deﬁne χ as the smal lest p ositive numb er such that, for any sign c onvention Σ (satisfying (109) ) and any π ∈ Π( Σ ) , one has P ∆ ( π ) ,A ( N δ > χ ) ⩽ δ . Equivalently, χ : = inf ( x > 0 : sup Σ sup π ∈ Π( Σ ) P ∆ ( π ) ,A ( N δ > x ) ⩽ δ ) . (112) In other wor ds, χ is a uniform (1 − δ ) -quantile of the budget, taken in the worst c ase over al l admissible sign c onventions and p ermutations. Then, χ satisﬁes χ ⩾ 1 8 log(4 / 3) K max i =2 K i ; ⩽ 0 ∥ ∆ − i ∥ 2 log  1 6 δ  , (113) χ ⩾ 1 128 log(4 / 3) · 1 log(2 K ) K X i =2 K i ; ⩽ 0 ∥ ∆ − i ∥ 2 . (114) W e p ostp one the pro of of Theorem F.2 to the following Subsection F.4, which is divided b et ween the pro of of Equation (113) and Equation (114). W e ﬁrst explain how this result directly implies Theorem 3.2. Pr o of of The or em 3.2. Let A b e a δ -correct algorithm ov er the class D cw , with δ ⩽ 1 / 12 . Fix any matrix ∆ ∈ D cw with CW i ∗ = 1 . Consider χ as deﬁned in Equation (112). Com bining the numerical b ounds 8 log (4 / 3) < 3 and 128 log (4 / 3) < 37 with the deﬁnition of χ and the lo wer b ounds (113) and (114) , w e obtain that there exist a sign conv en tion Σ and a p erm utation π ∈ Π( Σ ) such that P ∆ ( π ) ,A   N δ ⩾ 1 3 · max i  = i ∗ K i ; ⩽ 0 ∥ ∆ − i ∥ 2 log  1 6 δ  ∨ 1 37 log(2 K ) X i  = i ∗ K i ; ⩽ 0 ∥ ∆ − i ∥ 2   ⩾ δ . 59 By Lemma F.1, if ∆ con tains no ties, then the matrix ∆ π admits a Condorcet winner and v eriﬁes all properties required of the matrix ˜ ∆ from Theorem 3.2. This pro ves Theorem 3.2 in the no-ties case. Remarks. Observe that, by the se c ond p oint of L emma F.1, it is p ossible that ∆ ( π ) admits no CW, when ∆ admit some ties. Y et it stil l admits a we ak Condor c et winner, in the same that ∆ π 1 , · admits only non-ne gative entries, while al l the other r ows admit at le ast one ne gative entry. Then, one c an c onstruct a matrix ∆ π ,ϵ as close as we ne e d to ∆ ( π ) and such that ∆ ( π ,ϵ ) admit 1 as CW. W e c an for instanc e add a smal l c onstant ϵ > 0 to the ﬁrst r ow of ∆ ( π ,ϵ ) [and − ϵ to the ﬁrst c olumn]. F or ϵ smal l enough, the given matrix ∆ ( π ,ϵ ) shar es most pr op erties of ∆ . F.4 Pro of of Theorem F.2 F.4.1 Pro of of Equation (113) in Theorem F.2 Let δ ⩽ 1 / 12 . Let A b e an y δ -correct algorithm o ver the en tire class D cw . Fix ∆ ∈ D cw with CW i ∗ ( ∆ ) = 1 . Recall the deﬁnition χ : = inf ( x > 0 : sup Σ sup π ∈ Π( Σ ) P ∆ ( π ) ,A ( N δ > x ) ⩽ δ ) . W e pro ve Equation (113), namely χ ⩾ 1 8 log(4 / 3) K max i =2 K i ; ⩽ 0 ∥ ∆ − i ∥ 2 log  1 6 δ  . Sk etch of pro of. W e follo w the three-step roadmap of Section F.1. The arguments are v ery similar to the pro of of the quantile proof of Theorem 3.1 in paragraph F.2.2, except that we w ork with a lo cal class of p erm uted instances and av erage ov er all p erm utations of ∆ . (i) Reference and alternativ e instances. W e ﬁx ∆ ∈ D cw with i ∗ ( ∆ ) = 1 and consider the lo cal class of reference (∆ ( π ) ) π ∈ Π obtained by p ermuting, within each non-CW ro w, the p ositions of its negativ e entries according to π . F or each sub optimal arm k , we construct the alternativ e instance ∆ ( π ,k ) b y lifting to 0 all negative entries in row k , so that rows 1 and k b ecome nonnegative and the instance has tw o weak CW s (hence lies outside D cw ). (ii) Separating even t and total v ariation. As in Subsection F.2.2, we truncate A at the w orst-case (1 − δ ) -quan tile χ to obtain a ﬁxed-budget algorithm ˜ A , and w e use the ev ent { N δ > χ } as a separating ev ent. This yields a total-v ariation low er b ound TV ( P ( π ) ˜ A , P ( π ,k ) ˜ A ) ≥ 1 − 3 δ , hence a KL low er bound of order log (1 / (6 δ )) for eac h k . (iii) KL decomp osition and extraction of K k ; ⩽ 0 / ∥ ∆ − k ∥ 2 2 . W e decomp ose the KL along pairs and use that ∆ ( π ) and ∆ ( π ,k ) diﬀer only on duels inv olving k . A v eraging ov er p erm utations π k ∈ Π k ( Σ ) symmetrizes the con tribution of all negative entries in ro w k , and yields a lo wer b ound χ ≳ K k ; ⩽ 0 ∥ ∆ − k ∥ − 2 2 log (1 /δ ) . Finally , choosing the row k that maximizes K k ; ⩽ 0 / ∥ ∆ − k ∥ 2 2 giv es (113) . (iv) Tie-breaking con ven tion. In the last step, we a tie-breaking con ven tion to conclude. Pr o of. Step 1: Reference and alternativ e instances. F or no w, ﬁx a tie-breaking conv ention f tb and a sign matrix Σ as in (109) . The sp eciﬁc choice of Σ will b e made in the ﬁnal step of the pro of. Fix a permutation π ∈ Π( Σ ) (see Equation (110) ), and consider the corresp onding matrix ∆ ( π ) deﬁned in Equation (111) . W e denote by P ( π ) A the distribution of the observ ations induced b y the in teraction b etw een algorithm A and the environmen t with gap matrix ∆ ( π ) . 60 Fix a sub optimal arm k ∈ { 2 , . . . , K } . The precise choice of k will b e made explicit in the last step of the proof. W e construct the gap matrix ∆ ( π ,k ) b y setting to zero all en tries ( k , j ) with j ∈ Σ − k : = { j ∈ [ K ] : σ k,j = − 1 } . Recall that, b y the deﬁnition of Σ (see (109) ), if j ∈ Σ − k then ∆ ( π ) k,j ≤ 0 , that is Σ − k con tains all arms that b eat strictly k , together with some arms that are tied with k . W e deﬁne ∆ ( π ,k ) i,j : =        ∆ ( π ) i,j , if i  = k and j  = k , ∆ ( π ) i,j , if ( i = k and σ k,j = 1) or ( j = k and σ k,i = 1) , 0 , if ( i = k and σ k,j = − 1) or ( j = k and σ k,i = − 1) . (115) The matrices ∆ ( π ) and ∆ ( π ,k ) diﬀer only in ro w/column k . Moreov er, by construction, rows 1 and k of ∆ ( π ,k ) con tain only nonnegative en tries, so that ∆ ( π ,k ) / ∈ D cw . W e denote b y P ( π ,k ) A the distribution of the observ ations induced b y the in teraction b etw een algorithm A and the en vironment with gap matrix ∆ ( π ,k ) . Step 2: information-theoretic arguments. This step is identical to Step 2 in the pro of of the quantile b ound (98) (Section F.2.2). Consider the ev en t B := { N δ ⩽ χ } , and deﬁne the truncated algorithm ˜ A as in that proof: run A up to time χ , returning ˜ i = ˆ i if A stops b efore χ , and ˜ i = 0 otherwise. By the deﬁnition of χ in (112) —a uniform upp er bound on the (1 − δ ) -quan tile of N δ under an y P ( π ) A —w e hav e P ( π ) ˜ A ( ˜ i = 0) = P ( π ) A ( N δ > χ ) ⩽ δ . Since ∆ ( π ,k ) can b e approximated by CW instances admitting 1 or k as CW (as in (101) – (102) ), w e obtain P ( π ,k ) ˜ A ( ˜ i  = 0) ⩽ 2 δ . W riting B := { ˜ i = 0 } , this yields TV  P ( π ) ˜ A , P ( π ,k ) ˜ A  ⩾ (1 − 2 δ ) − δ = 1 − 3 δ. (116) Remarks. The r esult fr om Step 2 c an b e interpr ete d as a r e duction scheme. Consider the signal dete ction pr oblem of testing H 0 : µ = 0 versus H 1 ( π k ) : µ =  ∆ k,π k ( ℓ )  ℓ ∈ Σ − k in a b andit setting. Equation (116) shows that, for any p ermutation π k , ˜ A is 2 δ -c orr e ct for this signal dete ction pr oblem, with a budget b ounde d by χ (indep endently of the p ermutation). This r e duction is the main novelty of our pr o of te chnique. The r emaining ar guments in Step 3 build up on the liter atur e on active signal dete ction [Castr o, 2014, Saad et al., 2023, Gr af et al., 2025]. Step 3: computing the KL divergence. By the Bretagnolle–Hub er inequality (see, e.g., Lattimore and Szep esvári, 2020), the conclusion of Step 2 (116) implies KL  P ( π ,k ) ˜ A , P ( π ) ˜ A  ⩾ log  1 6 δ  . (117) W e no w compute KL  P ( π ,k ) ˜ A , P ( π ) ˜ A  . Observe that w e tak e the la w under the alternativ e instance ∆ ( π ,k ) in the left side of the div ergence: this ensures that the expectation E ( π ,k ) ˜ A [ N { k,j } ] app earing in the KL decomp osition (b elow) do es not dep end on π k , which will allow us to av erage o ver p erm utations π k ∈ Π k ( Σ ) . 61 By the standard decomposition of the Kullback–Leibler divergence for adaptiv e procedures (see, e.g., Lattimore and Szep esvári, 2020, Lemma 15.1), KL  P ( π ,k ) ˜ A , P ( π ) ˜ A  = X i  = j E ( π ,k ) ˜ A  N i,j  KL  P ( π ,k ) i,j , P ( π ) i,j  = X j ∈ Σ − k E ( π ,k ) ˜ A  N { k,j }  KL  P ( π ,k ) k,j , P ( π ) k,j  , (118) since the t wo instances diﬀer only on pairs ( k, j ) or ( j, k ) with j ∈ Σ − k , b y construction of ∆ ( π ,k ) . No w ﬁx j ∈ Σ − k . By the deﬁnitions of ∆ ( π ) and ∆ ( π ,k ) (Equations (111) and (115)), w e hav e P ( π ,k ) k,j = B  1 2  , P ( π ) k,j = B  1 2 + ∆ k,π k ( j )  , where ∆ k,π k ( j ) ∈ [ − 1 / 4 , 0] . Hence, using the bound on kl from (106), KL  P ( π ,k ) k,j , P ( π ) k,j  ⩽ 8 log  4 3  ∆ 2 k,π k ( j ) , Plugging these b ounds in to (118), we obtain KL  P ( π ,k ) ˜ A , P ( π ) ˜ A  ⩽ 8 log  4 3  X j ∈ Σ − k E ( π ,k ) ˜ A [ N { k,j } ] ∆ 2 k,π k ( j ) . A k ey prop erty of our construction is that ∆ ( π ,k ) do es not actually dep end on π k : all en tries p erm uted by π k in row k of ∆ ( π ) are set to 0 under ∆ ( π ,k ) . Consequently , E ( π ,k ) ˜ A do es not dep end on π k . Giv en π ′ k ∈ Π k ( Σ ) , we denote π ′ = ( π 2 , . . . , π k − 1 , π ′ k , π k +1 , . . . ) where we only c hange π k . A v eraging the previous inequality o ver π ′ k ∈ Π k ( Σ ) while k eeping ( π l ) l  = k ﬁxed, w e get 1 | Π k ( Σ ) | X π ′ k ∈ Π k ( Σ ) KL  P ( π ,k ) ˜ A , P ( π ′ ) ˜ A  ⩽ 8 log  4 3  1 | Π k ( Σ ) | X π ′ k ∈ Π k ( Σ ) X j ∈ Σ − k E ( π ,k ) ˜ A [ N { k,j } ] ∆ 2 k,π ′ k ( j ) = 8 log  4 3  1 | Σ − k |   X j ∈ Σ − k E ( π ,k ) ˜ A [ N { k,j } ]     X j ∈ Σ − k ∆ 2 k,j   , (119) where w e used Lemma H.3 to symmetrize ov er all p ermutations π ′ k of Σ − k . By deﬁnition of Σ − k , it holds that Σ − k ⊂ { j ∈ [ K ] \ { k } : ∆ k,j ⩽ 0 } , so that X j ∈ Σ − k ∆ 2 k,j = ∥ ∆ − k ∥ 2 . Moreo ver, b y construction, the mo diﬁed algorithm ˜ A has a budget upp er b ounded b y χ , and X j ∈ Σ − k E ( π ,k ) ˜ A [ N { k,j } ] ⩽ χ . F rom there, w e get 1 | Π k ( Σ ) | X π ′ k ∈ Π k ( Σ ) KL  P ( π ,k ) ˜ A , P ( π ′ ) ˜ A  ⩽ 8 log  4 3  ∥ ∆ − k ∥ 2 | Σ − k | χ . (120) 62 Finally , com bining (120) with the Bretagnolle–Hub er b ound (117), we obtain χ ⩾ 1 8 log(4 / 3) | Σ − k | ∥ ∆ − k ∥ 2 log  1 6 δ  . (121) Step 4: choice of con v ention Σ and conclusion. It remains to c ho ose an appropriate arm k , and a con ven tion f tb in the deﬁnition of the sign matrix Σ (see Equation (109)). Consider k ∗ ∈ K arg max k =2 K k ; ⩽ 0 ∥ ∆ − k ∥ 2 , as a sub optimal arm for whic h detecting a negative en try in its row is the most costly . Fix a tie-breaking con ven tion f tb : [ K ] 2 → {− 1 , 0 , 1 } suc h that f tb ( k ∗ , i ) = − 1 for an y i  = k ∗ . F or this c hoice, we ha ve | Σ − k ∗ | ∥ ∆ − k ∗ ∥ 2 = K k ∗ ; ⩽ 0 ∥ ∆ − k ∗ ∥ 2 = max i  = i ∗ K i ; ⩽ 0 ∥ ∆ − i ∥ 2 . Plugging this in to (121) yields χ ⩾ 1 8 log(4 / 3) max i  = i ∗ K i ; ⩽ 0 ∥ ∆ − i ∥ 2 log  1 6 δ  , whic h establishes the ﬁrst inequality (113) on χ . F.4.2 Pro of of Equation (114) in Theorem F.2 Let δ ⩽ 1 / 12 . Let A b e an y δ -correct algorithm o ver the entire class D cw , and ﬁx ∆ ∈ D cw with CW i ∗ ( ∆ ) = 1 . In this subsection, w e prov e Equation (114) from Theorem F.2, using the same instance construction as in the pro of of b ound (113). Recall this b ound: χ ⩾ 1 64 log(4 / 3) 1 log(2 K ) K X i =2 K i ; ⩽ 0 ∥ ∆ − i ∥ 2 . Sk etch of pro of. W e follow the three-step roadmap of Section F.1, reusing the instance con- struction from the pro of of (113) but with a more reﬁned separating even t. The key diﬀerences are (ii) a reﬁned even t B k that also con trols the num b er of duels inv olving arm k , and (iii) the use of Pinsk er’s inequality and Jensen to a verage ov er multiple arms simultaneously . (i) Reference and alternative instances. As b efore, we consider the lo cal class ( ∆ ( π ) ) π ∈ Π( Σ ) and, for each k ∈ { 2 , . . . , K } , the alternative ∆ ( π ,k ) . W e deﬁne the ro w-wise hardness β 2 k = ∥ ∆ − k ∥ 2 / | Σ − k | , assumed ordered increasingly . Let I b e the index that maximizes ( k − 1) /β 2 k , corresp onding to the w orst-case a verage hardness ov er the ﬁrst k ro ws. (ii) Separating ev ent and total v ariation. Instead of { N δ ≤ χ } , w e use the even t B k = { N δ ⩽ χ, N k, · ⩽ 4 χ/ ( I − 1) } , where N k, · coun ts duels b et ween k and opp onen ts in Σ − k . Deﬁne a truncation ˜ A k that outputs 1 B k , using budget at most 4 χ/ ( I − 1) on k . Under ∆ ( π ,k ) , P ( 1 B k = 1) ⩽ 2 δ . Under ∆ ( π ) , we use a pigeonhole argumen t and the a verage probabilit y is larger than 3 / 4 − δ . This yields av eraged TV larger than 1 / 2 —the k ey no velt y of this proof technique. (iii) KL decomposition and extraction of the sum. By Pinsker’s inequalit y , the av eraged TV low er b ound implies an av eraged KL low er bound. Each KL is upp er b ounded as previously , and now w e av erage ov er k = 2 , . . . , I and use the truncation constraint N k, · ⩽ 4 χ/ ( I − 1) to get χ ≳ ( I − 1) /β 2 I . By the deﬁnition of I , this gives χ ≳ P K i =2 K i ; ⩽ 0 / ∥ ∆ − i ∥ 2 2 , up to logarithmic factors. 63 Pr o of. Step 1: construction of instances. Fix a tie-breaking con ven tion f tb and a sign matrix Σ (see (109) ). Again, they will b e chosen in the last step of the pro of. Fix a permutation π ∈ Π( Σ ) and use ∆ ( π ) as the reference matrix (see (111) ). Deﬁne, for each k ∈ { 2 , . . . , K } , the row-wise hardness β 2 k (Σ) : = ∥ ∆ − k ∥ 2 | Σ − k | . (122) Without loss of generalit y , assume that the arms are ordered so that β 2 2 ⩽ β 2 3 ⩽ · · · ⩽ β 2 K . Deﬁne I : = arg max k =2 ,...,K k − 1 β 2 k , (123) the index that captures the w orst-case av erage hardness ov er the ﬁrst k ro ws. As alternativ e instances, we consider the family { ∆ ( π ,k ) } k =2 ,...,I . Step 2: information-theoretic arguments. In this step, we construct an even t under which algorithm A should behav e diﬀeren tly depending on whether it interacts with ∆ ( π ) or ∆ ( π ,k ) . T o capture the sum lo wer b ound (114) , we need a more reﬁned even t than in the pro of of (113) . F or k ∈ { 2 , . . . , I } , deﬁne B k : = { N δ ⩽ χ } ∩  N k, · ⩽ 4 χ I − 1  , (124) where N k, · : = X i ∈ Σ − k N { k,i } denotes the total n umber of duels inv olving arm k against opponents in Σ − k . Remarks. The event B k is designe d as fol lows. The b ound (113) shows that the quantity β − 2 k char acterizes the budget ne e de d to ﬁnd a ne gative entry in r ow k of ∆ ( π ) , uniformly over al l p ermutations π . . Identifying the CW i ∗ = 1 amounts to solving simultane ously K − 1 such signal dete ction pr oblems, one p er sub optimal r ow. By the deﬁnition of I in (123) and L emma H.1, it is natur al to think of the simpliﬁe d r e gime wher e ( β − 2 2 , . . . , β − 2 I ) ar e of the same or der, so that arms 2 , . . . , I ar e e qual ly har d to eliminate. In that c ase, any r e asonable algorithm should al lo c ate its samples r oughly uniformly acr oss r ows 2 , . . . , I , and the event B k describ es this exp e cte d b ehavior for a δ -c orr e ct algorithm A . W e no w compute the even t B k with a truncated procedure ˜ A k that uses a total budget at most χ , and at most 4 χ/ ( I − 1) comparisons inv olving arm k against an opponent in Σ − k . Deﬁne the follo wing pro cedure ˜ A k . F or t = 1 , . . . , χ , run algorithm A . If A stops b efore time χ , compute N k, · and output ψ k : = 1 { N k, · ⩽ 4 χ/ ( I − 1) } . Otherwise, stop at time t = χ and set ψ k = 0 . By construction, the binary decision ψ k computed b y ˜ A k satisﬁes ψ k = 1 B k . W e write P ( π ) ˜ A k (resp. P ( π ,k ) ˜ A k ) for the distribution induced by the in teraction b etw een algorithm ˜ A k and the en vironmen t with gap matrix ∆ ( π ) (resp. ∆ ( π ,k ) ). W e now low er bound the total v ariation distance betw een P ( π ) ˜ A k and P ( π ,k ) ˜ A k using the ev ent B k . First, B k ⊂ { N δ ⩽ χ } . Since A is δ -correct ov er D cw and ∆ ( π ,k ) can b e approximated by instances in D cw as in the pro of of (102), we obtain, for any k  = 1 , P ( π ,k ) ˜ A k ( B k ) ⩽ P ( π ,k ) ˜ A k ( N δ ⩽ χ ) ⩽ 2 δ . (125) Next, consider P ( π ) ˜ A k and the complemen t B c k . W e hav e B c k = { N δ > χ } ∪ { N δ ⩽ χ, N k, · > 4 χ/ ( I − 1) } . 64 Since A is δ -correct and ∆ ( π ) ∈ D cw , the deﬁnition of χ implies P ( π ) ˜ A k ( N δ > χ ) = P ( π ) A ( N δ > χ ) ⩽ δ . (126) W e no w av erage the second term of B c k o ver k ∈ { 2 , . . . , I } : 1 I − 1 I X k =2 P ( π ) ˜ A k  N δ ⩽ χ, N k, · > 4 χ/ ( I − 1)  = E ( π ) A " 1 I − 1 I X k =2 1 { N δ ⩽ χ, N k, · > 4 χ/ ( I − 1) } # . Since P K k =2 N k, · ⩽ N δ , on the even t { N δ ⩽ χ } at most ( I − 1) / 4 indices k ∈ { 2 , . . . , I } can satisfy N k, · > 4 χ/ ( I − 1) . Hence, 1 I − 1 I X k =2 1 { N δ ⩽ χ, N k, · > 4 χ/ ( I − 1) } ⩽ 1 4 , whic h yields 1 I − 1 I X k =2 P ( π ) ˜ A k  N δ ⩽ χ, N k, · > 4 χ/ ( I − 1)  ⩽ 1 4 . (127) Com bining (125), (126), and (127), we obtain 1 I − 1 I X k =2 TV  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  ⩾ 1 I − 1 I X k =2  P ( π ,k ) ˜ A k ( B c k ) − P ( π ) ˜ A k ( B c k )  ⩾ 1 − 2 δ − 1 I − 1 I X k =2 P ( π ) ˜ A k ( B c k ) ⩾ (1 − 2 δ ) −  δ + 1 4  ⩾ 1 2 , where the last inequalit y uses the assumption δ ⩽ 1 / 12 . Finally , we apply a data-pro cessing inequalit y . In this regime whic h do es not dep end on δ , w e use Pinsker’s inequalit y whic h implies that 1 2 ⩽ 1 I − 1 I X k =2 TV  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  ⩽ 1 I − 1 I X k =2 r 1 2 KL  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  . (128) Remarks. W e c an again interpr et this r esult as a r e duction ar gument. A ver aging (125) , (126) , and (127) over π ∼ U nif (Π( Σ )) , we obtain that ther e exists some k ∈ { 2 , . . . , I } (indep endent of π ) such that ˜ A k is 1 / 2 -c orr e ct, with budget at most 4 χ/ ( I − 1) , for the active signal dete ction pr oblem H 0 : µ = 0 vs H 1 : µ =  ∆ k,π k ( ℓ )  ℓ ∈ Σ − k , π k ∼ U nif (Π k ( Σ )) . Step 3: computing the KL divergence. W e no w conclude by computing the KL divergence ab ov e. Fix k ∈ { 2 , . . . , I } . Giv en π ′ k ∈ Π k ( Σ ) , w e write π ′ = ( π 2 , . . . , π k − 1 , π ′ k , π k +1 , . . . , ) . Using the same computation as in Equation (120), we obtain 1 | Π k ( Σ ) | X π k ∈ Π k ( Σ ) KL  P ( π ,k ) ˜ A k , P ( π ′ ) ˜ A k  ⩽ 8 log  4 3  X j ∈ Σ − k E ( π ,k ) ˜ A k [ N { k,j } ] β 2 k ⩽ 8 log  4 3  · 4 χ I − 1 · β 2 I , 65 where the last inequality uses the facts that P j ∈ Σ − k E ( π ,k ) ˜ A k [ N { k,j } ] ⩽ 4 χ/ ( I − 1) by construction of ˜ A k , and that, b y our ordering assumption, β 2 k ⩽ β 2 I for all k ∈ { 2 , . . . , I } . A veraging additionally o ver all permutations π = ( π 2 , . . . , π K ) ∈ Π( Σ ) , we obtain 1 | Π( Σ ) | X π ∈ Π( Σ ) 1 I − 1 I X k =2 KL  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  ⩽ 8 log  4 3  · 4 χ I − 1 · β 2 I . (129) Finally , com bining Pinsker’s inequalit y (128) with Jensen’s inequality , we get 1 2 ⩽ 1 | Π( Σ ) | X π ∈ Π( Σ ) 1 I − 1 I X k =2 r 1 2 KL  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  ⩽ v u u t 1 2 · 1 | Π( Σ ) | X π ∈ Π( Σ ) 1 I − 1 I X k =2 KL  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  ⩽ s 1 2 · 8 log  4 3  · 4 χ I − 1 · β 2 I , where the last inequalit y follows from (129). Rearranging yields χ ⩾ 1 64 log(4 / 3) I − 1 β 2 I = 1 64 log(4 / 3) max i =2 ,...,K i − 1 β 2 i , (130) where the second equality follows from the deﬁnition of I (see (123) ). F rom Lemma H.1, we ha ve max i =2 ,...,K i − 1 β 2 i ⩾ 1 log(2 K ) K X i =2 1 β 2 i = 1 log(2 K ) K X i =2 | Σ − i | ∥ ∆ − i ∥ 2 . Step 4: choice of con v ention Σ and conclusion. W e claim that there exists a tie-breaking con ven tion f tb that satisﬁes K X i =2 | Σ − i | ∥ ∆ − i ∥ 2 ⩾ 1 2 K X i =2 K i ; ⩽ 0 ∥ ∆ − i ∥ 2 . (131) Then, the conclusion (114) directly follows from the (130) together with (131). W e ﬁnally ﬁnish with a tec hnical construction of a tie-breaking that satisﬁes (131) . Consider an y sub optimal arm i  = 1 . F or any j  = i , it holds that  j ∈ Σ − i  ⇐ ⇒  ∆ i,j < 0  or  ∆ i,j = 0 , f tb ( i, j ) = − 1  , so that | Σ − i | = K X j =1 j  = i 1 { ∆ i,j < 0 } + 1 { ∆ i,j =0 } 1 { f tb ( i,j )= − 1 } . 66 Summing o ver i = 2 , . . . , K giv es K X i =2 | Σ − i | ∥ ∆ − i ∥ 2 = K X i =2 K X j =1 j  = i 1 ∥ ∆ − i ∥ 2  1 { ∆ i,j < 0 } + 1 { ∆ i,j =0 } 1 { f tb ( i,j )= − 1 }  = K X i =2 K X j =1 j  = i 1 ∥ ∆ − i ∥ 2 1 { ∆ i,j < 0 } + K X i =2 1 ∥ ∆ − i ∥ 2 1 { ∆ i, 1 =0 } 1 { f tb ( i, 1)= − 1 } + X 2 ⩽ i 1 , 1 {∥ ∆ − i ∥ > ∥ ∆ − j ∥} − 1 {∥ ∆ − i ∥ < ∥ ∆ − j ∥} , if 2 ≤ i < j and ∥ ∆ − i ∥  = ∥ ∆ − j ∥ , 1 , if 2 ≤ i < j and ∥ ∆ − i ∥ = ∥ ∆ − j ∥ . The con ven tion f tb for the ro w of the CW implies that K X i =2 1 ∥ ∆ − i ∥ 2 1 { ∆ i, 1 =0 } 1 { f tb ( i, 1)= − 1 } = K X i =2 1 ∥ ∆ − i ∥ 2 1 { ∆ i, 1 =0 } . (133) Moreo ver, the expression of f tb ( i, j ) for 2 ⩽ i < j ⩽ K is chosen so that X 2 ⩽ i 0 b e small. F or simplicit y , assume K is a multiple of 8 , and set d = K/ 2 . W e construct the K × K an tisymmetric matrix M ϵ = M ϵ ( s, ∆) : M ϵ =  A − D D ⊤ Λ  , (135) 67 where A , D , and Λ are d × d matrices deﬁned b elo w. The matrix A is the d × d an tisymmetric matrix with ﬁrst ro w A 1 , · = (0 , ϵ, . . . , ϵ ) ∈ R d , and for i = 2 , . . . , d , A i, · = ( − ϵ, . . . , − ϵ, 0 |{z} i -th , ϵ, . . . , ϵ ) ∈ R d . The matrix D is the d × d matrix with nonnegativ e en tries where D 1 , · = ( ϵ, . . . , ϵ ) ∈ R d , and for i = 2 , . . . , d , D i, · = (∆ i , . . . , ∆ i | {z } s i times , ϵ, . . . , ϵ ) ∈ R d , whic h is p ossible since s i ⩽ d . T o construct Λ , recall d is a multiple of 4 and s i ∈ { 1 , . . . , d/ 4 } . Deﬁne Λ as the blo ck matrix: Λ =     J ϵ − Λ (0) ϵ 1 Λ (3) Λ (0) J ϵ − Λ (1) ϵ 1 − ϵ 1 Λ (1) J ϵ − Λ (2) − Λ (3) − ϵ 1 Λ (2) J ϵ     , where 1 is the ( d/ 4) × ( d/ 4) all-ones matrix, J ϵ is the ( d/ 4) × ( d/ 4) an tisymmetric matrix with ϵ ab o ve the diagonal, and for l ∈ { 0 , . . . , 3 } , Λ ( l ) is the ( d/ 4) × ( d/ 4) matrix where the i -th ro w is Λ ( l ) i, · = (∆ j , . . . , ∆ j | {z } s j times , ϵ, . . . , ϵ ) ∈ R d/ 4 , j = d + d 4 l + i. The matrix M ϵ = M ( s, ∆ , ϵ ) is clearly antisymmetric. Its ﬁrst ro w is (0 , ϵ, . . . , ϵ ) , so i ∗ = 1 and M ϵ ∈ D cw . F or eac h arm i = 2 , . . . , K , row i of M ϵ con tains exactly s i en tries of magnitude − ∆ i , with all other negative en tries equal to − ϵ . F or suﬃcien tly small ϵ > 0 , the optimal sparsity s ∗ M ϵ ac hieving the minim um in (4) equals s , with associated gaps ( M ϵ i, ( s i ) ) i  = i ∗ = ( − ∆ i ) i  = i ∗ . Moreov er, M ϵ has no ties since ∆ i  = 0 and ϵ > 0 . Consider Corollary 3.3. Let δ ⩽ 1 / 12 . F or ∆ ∈ D cw , construction yields ϵ > 0 small suc h that M ϵ ( s ∗ ∆ , ∆ ( s ∗ ) ) ∈ D ( ∆ ) with no ties. Theorem 3.2 applied to M ϵ giv es ˜ ∆ ∈ D ( M ϵ ) = D ( ∆ ) satisfying P ˜ ∆ ,A   N δ ⩾ 1 3 max i  = i ∗ K ϵ i ; < 0 ∥ ( M ϵ i ) − ∥ 2 log  1 6 δ  ∨ 1 37 log(2 K ) X i  = i ∗ K ϵ i ; < 0 ∥ ( M ϵ i ) − ∥ 2   ⩾ δ, where K ϵ i ; < 0 coun ts negative en tries of row i . By construction, K ϵ i ; < 0 ⩾ K / 8 for i = 2 , . . . , K . F or small ϵ , s ∗ i ∆ 2 i, ( s ∗ i ) ⩽ ∥ ( M ϵ i ) − ∥ 2 ⩽ s ∗ i ∆ 2 i, ( s ∗ i ) + ( K − s ∗ i ) ϵ 2 , ⩽ 2 s ∗ i ∆ 2 i, ( s ∗ i ) yielding P ˜ ∆ ,A   N δ ⩾ 1 48 max i  = i ∗ K s ∗ i ∆ 2 i, ( s ∗ i ) log  1 6 δ  ∨ 1 592 log(2 K ) X i  = i ∗ K s ∗ i ∆ 2 i, ( s ∗ i )   ⩾ δ. This scales as H explore ( s ∗ ) up to log K factors, proving the ﬁrst part of Corollary 3.3. The H certify ( s ∗ , δ ) term follows from the quantile bound in Theorem 3.1: by construction of M ϵ , w e hav e ˜ ∆ i, (1) = ˜ ∆ i, ( s ∗ i ) for all i  = i ∗ , so H certify ( s ∗ , δ ) = X i  = i ∗ 1 ∆ i, ( s ∗ i ) = X i  = i ∗ 1 ˜ ∆ i, (1) , and the lo wer bound applies directly . 68 F.6 Lo w er Bounds Preserving CW Row Structure Consider ∆ ∈ D cw . F or simplicit y , assume that ∆ has no ties. 4 Let Σ be its sign matrix as in Equation (109) , and let π ∈ Π( Σ ) (see (110) ) with asso ciated matrix ∆ π deﬁned in Equation (111) . By Lemma F.1, ∆ π has the same gap structure as ∆ : it preserves all signs (hence all pairwise preferences) and gap magnitudes up to reordering. Ho wev er, ∆ π ma y alter the Condorcet winner ro w, so in general H cw ( ∆ π )  = H cw ( ∆ ) , and the construction can even drastically increase it: H cw ( ∆ π ) ≫ H cw ( ∆ ) . In this section, we explain ho w the low er bound techniques from the pro of of Theorem 3.2 can b e adapted to also preserv e the CW row. T o this end, deﬁne ˜ Π ( Σ ) ⊂ Π( Σ ) as the subset of p ermutations preserving the CW row. F or eac h i ∈ { 2 , . . . , K } , let ˜ Π i ( Σ ) b e p erm utations of Σ − i that ﬁx i ∗ = 1 : ˜ Π i ( Σ ) : =  π i : Σ − i → Σ − i   π i is a bijection and π i (1) = 1  , (136) and set ˜ Π ( Σ ) : = ˜ Π 2 ( Σ ) × · · · × ˜ Π K ( Σ ) . F or π ∈ ˜ Π ( Σ ) , construct ∆ ( π ) via Equation (111) . In addition to prop erties 1, 3, and 4 of Lemma F.1, we hav e: Lemma F.3. F or any π ∈ ˜ Π( Σ ) , ∆ ( π ) satisﬁes: 2’ ∆ ( π ) 1 , · = ∆ 1 , · (CW r ow pr eservation). W e can then deriv e the following theorem, analogous to Theorem 3.2: Theorem F.4. L et A b e a δ -c orr e ct algorithm over D cw , with δ ⩽ 1 / 12 . L et ∆ ∈ D cw have no ties. Deﬁne ˜ χ : = inf ( x > 0 : sup π ∈ ˜ Π( Σ ) P ∆ ( π ) ,A ( N δ > x ) ⩽ δ ) . (137) Then, ˜ χ ⩾ 1 16 log(4 / 3) max i  = i ∗ 1 ∆ 2 i ∗ ,i ∧ K i ; < 0 ∥ ∆ − i ∥ 2 ! log  1 6 δ  , (138) ˜ χ ⩾ 1 128 log(4 / 3) 1 log(2 K ) X i  = i ∗ 1 ∆ 2 i ∗ ,i ∧ K i ; < 0 ∥ ∆ − i ∥ 2 ! . (139) Similarly to Section 3, we deﬁne a sub class of D ( ∆ ) (see (5) ) that additionally preserves the CW ro w ∆ i ∗ , · : D 0 ( ∆ ) = { ˜ ∆ ∈ D ( ∆ ) s.t. ( ˜ ∆ i ∗ ,i ) i  = i ∗ = (∆ i ∗ ,i ) i  = i ∗ } . (140) Corollary F.5. L et A b e a δ -c orr e ct algorithm over D cw , with δ ⩽ 1 / 12 . Let ∆ ∈ D cw . Then ther e exists ˜ ∆ ∈ D 0 ( ∆ ) such that, with P ˜ ∆ ,A -pr ob ability at le ast δ , the budget N δ satisﬁes N δ ≳ X i  = i ∗ log(1 /δ ) ∆ 2 i ∗ ,i ∨ ∆ 2 i, ( s ∗ i ) + max i  = i ∗ log(1 /δ ) ∆ 2 i ∗ ,i ∨ ∥ ∆ i, ( s ∗ i ) ∥ 2 K i ; < 0 + X i  = i ∗ 1 ∆ 2 i ∗ ,i ∨ ∥ ∆ i, ( s ∗ i ) ∥ 2 K i ; < 0 , (141) wher e ≳ hides lo garithmic K factors and numeric al c onstants. Pr o of. The pro of follows b y taking M ϵ as in the pro of of Corollary 3.3 (see App endix F.5), except with the ﬁrst row ﬁxed as ∆ i ∗ , · . This constructs M ϵ ∈ D 0 ( ∆ ) where each row i has ⩾ K i ; < 0 negativ e en tries. The corollary then follows from Theorem F.4 and the quantile bound in Theorem 3.1. 4 If ∆ contains ties, w e ﬁx the con ven tion f tb ≡ 0 , i.e., w e never permute zero en tries, hence k eeping them uninformative. 69 Remarks. This r eve als the fundamental tr ade-oﬀ b etwe en eliminating sub optimal arms against the CW versus ﬁnding b etter c omp etitors among them. W e identify thr e e r e gimes. When the CW is the str ongest opp onent (CW-SO) , H cw was alr e ady pr ove d fr om The or em 3.1 to b e high-pr ob ability optimal, achieve d by A lgorithm (2) and Maiti et al. [2024]. A ctual ly, K arnin [2016] pr oves that it is even optimal for exp e ctation of the budget, at le ast in the asymptotic r e gime of δ → 0 . In the CW-uniformly-p o or-opp onent r e gime, ∀ i  = i ∗ , ∆ 2 i ∗ ,i ⩽ ∥ ∆ − i ∥ 2 K i ; < 0 , (CW-PO) we have H cw ⩾ H certify ( s ∗ , δ ) + H explore ( s ∗ , δ ) . Our b ound (4) impr oves Maiti [2025], and Cor ol- lary F.5 pr oves minimax optimality of H certify ( s ∗ , δ ) + H explore ( s ∗ , δ ) over D 0 ( ∆ ) . In the CW-interme diate-opp onent r e gime, ∀ i  = i ∗ , ∥ ∆ − i ∥ 2 K i ; < 0 ⩽ ∆ 2 i ∗ ,i ⩽ ∆ 2 i, ( s ∗ i ) , (CW-IO) a tr ansition o c curs b etwe en c onstant- δ (wher e the lower b ound matches H cw ) and δ → 0 r e gimes (wher e it c an b e much smal ler). A ﬁner c ombinatorial analysis is ne e de d to pinp oint the exact tr ade-oﬀ. Pr o of of The or em F.4. Assume without loss of generalit y that i ∗ = 1 . W e start with the pro of of Equation (138) , which follo ws the proof of Equation (113) . F rom careful inspection, Steps 1, 2, and 3 apply v erbatim, replacing χ by ˜ χ and Π b y ˜ Π . Fix k  = i ∗ . With the same notation and construction, one constructs ˜ A with budget upper b ounded b y ˜ χ such that log  1 6 δ  ⩽ 8 log  4 3  1 | ˜ Π k ( Σ ) | X π k ∈ ˜ Π k ( Σ ) X j ∈ Σ − k E ( π ,k ) ˜ A [ N { k,j } ]∆ 2 k,π k ( j ) , (142) with the only diﬀerence in the subsequen t computation. F or π ∈ ˜ Π k , w e hav e π k (1) = 1 and π k | Σ − k \{ 1 } is a bijection, so | ˜ Π k ( Σ ) | ≃ S k i where k i := | Σ − k \ { 1 }| = K k ; < 0 − 1 . Separating the role of 1 and the rest of Σ − k in (142), 1 | ˜ Π k ( Σ ) | X π k ∈ ˜ Π k ( Σ ) X j ∈ Σ − k E ( π ,k ) ˜ A [ N { k,j } ]∆ 2 k,π k ( j ) = E ( π ,k ) ˜ A [ N { k, 1 } ]∆ 2 k, 1 + 1 | S k i | X π k ∈ ˜ Π k ( Σ ) X j ∈ Σ − k \{ 1 } E ( π ,k ) ˜ A [ N { k,j } ]∆ 2 k,π k ( j ) , where E ( π ,k ) ˜ A is indep enden t of π k . By Lemma H.3, 1 8 log(4 / 3) log  1 6 δ  ⩽ E ( π ,k ) ˜ A [ N { k, 1 } ]∆ 2 k, 1 + 1 K k ; < 0 − 1 X j ∈ Σ − k \{ 1 } E ( π ,k ) ˜ A [ N { k,j } ] X j ∈ Σ − k \{ 1 } ∆ 2 k,j ⩽ ∆ 2 k, 1 ˜ χ + ∥ ∆ − k ∥ 2 − ∆ 2 k, 1 K k ; < 0 − 1 ˜ χ, using P j ∈ Σ − k \{ 1 } E ( π ,k ) ˜ A [ N { k,j } ] ⩽ ˜ χ and P j ∈ Σ − k \{ 1 } ∆ 2 k,j = ∥ ∆ − k ∥ 2 − ∆ 2 k, 1 . Finally , ∆ 2 k, 1 + ∥ ∆ − k ∥ 2 − ∆ 2 k, 1 K k ; < 0 − 1 ⩽ 2  ∆ 2 k, 1 ∨ ∥ ∆ − k ∥ 2 K k ; < 0  . 70 T aking the maxim um ov er k  = i ∗ yields ˜ χ ⩾ 1 16 log(4 / 3) max k  = i ∗ 1 ∆ 2 k, 1 ∨ ∥ ∆ − k ∥ 2 K k ; < 0 log  1 6 δ  , whic h is Equation (138). The pro of of (139) follows the pro of of (114) step-b y-step, highlighting diﬀerences below. Step 1: Deﬁne ˜ β k = ∆ 2 k, 1 ∨ ∥ ∆ − k ∥ 2 K k ; < 0 and ˜ I := arg max K k =2 k − 1 ˜ β k . Step 2: The same even t (using ˜ I ) b ounds the probabilities, so (128) holds. Step 3: The KL upp er b ound computation adapts as ab ov e, yielding 1 | ˜ Π | X π ∈ ˜ Π 1 I − 1 I X k =2 KL  P ( π ,k ) ˜ A k , P ( π ) ˜ A k  ⩽ 8 log  4 3  · 4 ˜ χ I − 1 · ˜ β 2 I . (143) whic h conclude from rearranging. Step 4 is unnecessary since Σ av oids ties by conv ention. G Pro ofs of Section B In this section, we pro vide all low er b ound pro ofs for the ﬁxed-budget setting. As discussed at the end of App endix B, these proofs closely parallel those in Appendix F, but the ﬁxed-budget nature requires new argumen ts. F or completeness, w e revisit all results with minimax-style formulations and provide a nearly self-con tained presentation. In Subsection G.1, w e pro ve Theorem B.1. Theorem B.2 follo ws in Subsection G.2. Roadmap for Fixed-Budget Lo wer Bounds The pro ofs in this section follow the same three-step c hange-of-measure pattern in tro duced for the ﬁxed-conﬁdence case in Subsection F.1. Ho wev er, the ﬁxed-budget setting requires three important adaptations, whic h w e no w describ e in detail. In the ﬁxed-budget setting, we aim to lo wer b ound the worst-case error inf A sup M ∈ D P A,M ( ˆ i T  = i ∗ ) , where the inﬁm um is ov er all algorithms A with ﬁxed budget T , and D is some class of instances. Step 1: Reference and alternative instances. In the ﬁxed-conﬁdence setting, the reference instance can be any arbitrary gap matrix ∆ ∈ D cw . Here, we instead construct a highly symmetric reference matrix M ∈ D that serv es as a “hard instance” for the minimax bound. F or each sub optimal arm k  = i ∗ , w e construct an alternative instance M ( k ) b y setting all negative entries in row k (and corresp onding column entries to preserve antisymmetry) to zero. This ensures M ( k ) / ∈ D cw . Step 2: Separating even t and total v ariation b ound. Unlik e ﬁxed-conﬁdence algorithms, whic h use a stopping time N δ to construct ev ents on which the tw o distributions disagree, ﬁxed- budget algorithms ha ve no stopping rule. Instead, w e exploit the symmetry of the reference matrix M , together with the recommendation rule. 71 Step 3: KL decomp osition. This step is conceptually identical to the ﬁxed-conﬁdence pro ofs. The KL div ergence decomp oses as KL( P M , P M ( k ) ) = X k j, 0 , if i = j. 72 Th us M (1) has the form M (1) =                 0 ∆ 2 ∆ 3 ∆ 4 · · · ∆ K − 1 ∆ K − ∆ 2 0 ∆ 3 ∆ 4 · · · ∆ K − 1 ∆ K − ∆ 3 − ∆ 3 0 ∆ 4 · · · ∆ K − 1 ∆ K − ∆ 4 − ∆ 4 − ∆ 4 0 · · · ∆ K − 1 ∆ K . . . . . . . . . . . . . . . . . . . . . − ∆ K − 1 − ∆ K − 1 − ∆ K − 1 − ∆ K − 1 · · · 0 ∆ K − ∆ K − ∆ K − ∆ K − ∆ K · · · − ∆ K 0                 . By construction, w e hav e M (1) ∈ D (1) (∆) , and its Condorcet winner is i ∗ ( M (1) ) = 1 . F or eac h k ⩾ 2 , deﬁne the matrix M ( k ) as follo ws: • F or i, j  = k , set M ( k ) i,j = M (1) i,j . • F or j < k , set M ( k ) k,j = ∆ j +1 and M ( k ) j,k = − ∆ j +1 . • F or j ⩾ k , set M ( k ) k,j = M (1) k,j and M ( k ) j,k = M (1) j,k . The matrix M ( k ) can b e written as M ( k ) =                      0 ∆ 2 · · · ∆ k − 1 − ∆ 2 ∆ k +1 · · · ∆ K − ∆ 2 0 · · · ∆ k − 1 − ∆ 3 ∆ k +1 · · · ∆ K . . . . . . . . . . . . . . . . . . . . . − ∆ k − 1 − ∆ k − 1 · · · 0 − ∆ k ∆ k +1 · · · ∆ K ∆ 2 ∆ 3 · · · ∆ k 0 ∆ k +1 · · · ∆ K − ∆ k +1 − ∆ k +1 · · · − ∆ k +1 − ∆ k +1 0 · · · ∆ K . . . . . . . . . . . . . . . . . . . . . − ∆ K − ∆ K · · · − ∆ K − ∆ K − ∆ K · · · 0                      , where the blue en tries indicate the diﬀerences with resp ect to M (1) . It is straightforw ard to chec k that, for each k , M ( k ) ∈ D cw . These matrices hav e three key prop erties: (i) M ( k ) do es not ha ve the same Condorcet winner as M (1) , indeed i ∗ ( M ( k ) ) = k ; (ii) w e hav e M ( k ) ∈ D (1) ( ∆ ) , indeed the k -th row M ( k ) k, · is equal to ∆ up to a p ermutation; and (iii) the en vironment with gap matrix M ( k ) is diﬃcult to distinguish from the one deﬁned b y M (1) in terms of KL div ergence. F or k ⩾ 2 , denote by P ( k ) the distribution of the data when the underlying gap matrix is M ( k ) . Step 2: TV b ound. Let A b e a δ -correct algorithm ov er D (1) ( ∆ ) , and let ˆ i denote its output. F or any k ⩾ 1 , when the true gap matrix is M ( k ) the Condorcet winner is k , and M ( k ) ∈ D (1) ( ∆ ) . Then, the deﬁnition of ϵ T (see (144)) implies ∀ k ∈ [ K ] , P ( k ) ( ˆ i  = k ) ⩽ ϵ T . 73 In particular, w e hav e 1 − 2 ϵ T ⩽ P (1) ( ˆ i  = k ) − P ( k ) ( ˆ i  = k ) ⩽ TV  P (1) , P ( k )  . Then, with Bretagnolle–Hub er inequalit y , we ha ve 1 − 2 ϵ T ⩽ TV  P (1) , P ( k )  ⩽ 1 − 1 2 exp  − KL( P (1) , P ( k ) )  . Step 3: computing the KL div ergence and concluding. F or i < j in [ K ] , let N { i,j } denote the total n um b er of observed duels b etw een i and j under algorithm A . Using the divergence decomp osition lemma (Lemma 15.1 in Lattimore and Szep esvári, 2020), w e hav e KL  P (1) , P ( k )  = X 1 ⩽ i 0 . Consider the modiﬁed matrix ∆ ϵ obtained from ∆ b y lifting to ϵ > 0 all oﬀ-diagonal null en tries of on CW row i ∗ [and − ϵ to ( j, i ∗ ) ], so that, for ϵ small enough, ∆ ϵ admits i ∗ as a (strong) Condorcet winner, and ∆ ϵ ∈ D cw . Now, for ϵ small enough, it holds that s ∗ ∆ ϵ = s ∗ ∆ and ∆ ϵ ( s ∗ ) = ∆ ( s ∗ ) . Then, P A, ∆ ϵ ( ˆ i T  = i ∗ ( ∆ )) ⩽ max ∆ ∈ D (2) (∆ ,s ) P A, ∆ ( ˆ i T  = i ∗ ( ∆ )) . Since A has a ﬁxed budget T , one can tak e the limit ϵ → 0 in the inequalit y abov e. T aking a maxim um ov er ∆ ∈ D (3) (∆ , s ) , one therefore obtains max ∆ ∈ D (3) (∆ ,s ) P A, ∆ ( ˆ i T  = i ∗ ( ∆ )) ⩽ max ∆ ∈ D (2) (∆ ,s ) P A, ∆ ( ˆ i T  = i ∗ ( ∆ )) . W e ha ve D (2) (∆ , s ) ⊂ D (3) (∆ , s ) , so the other side of the inequalit y is clear. No w, we are ready to prov e Theorem B.2. Assume that K is a multiple of 8 , and denote d = K / 2 . Let ( ∆ , s ) b e such that ∆ = (∆ i ) i ∈ [ K ] with ∆ 1 = 0 and (∆ 2 , . . . , ∆ K ) ∈ (0 , 1 / 4) K − 1 . Let s = ( s 1 , s 2 , . . . , s K ) with s 1 = 0 , 1 ⩽ s i ⩽ K / 4 for i = 2 , . . . , K . Consider the class D (3) ( ∆ , s ) as deﬁned in Equation (148) . Fix an algorithm A with a ﬁxed budget T , and deﬁne the maxim um error of A across D (3) (∆ , s ) as ϵ T : = max ∆ ∈ D (3) (∆ ,s ) P A, ∆ ( ˆ i T  = i ∗ ( ∆ )) . (149) 75 The pro of of Theorem B.2 is divided in three lemmas, corresponding to the three terms in the lo wer bound. Lemma G.2. W e have ϵ T ⩾ 1 4 exp − 16 log(4 / 3) T max d i =2 K s i ∆ 2 i ! . (150) Lemma G.3. If ( ∆ , s ) ar e c onstants on the indic es i ∈ { 2 , . . . , d } , that is, ∃ ( µ, s ) such that, for any i ∈ { 1 , . . . , d } , ∆ i = µ , and s i = s , then ϵ T ⩾ 1 2 − s 128 log(4 / 3) T K 2 sµ 2 . (151) Lemma G.4. With the same assumption as L emma G.3, then ϵ T ⩾ 1 4 exp − 32 log(4 / 3) T K µ 2 ! . (152) Pr o of of The or em B.2. Recall that ϵ T (see (149) ) is the maximum error o ver D (3) ( ∆ , s ) . F rom Lemma G.1, it is also equal to the maximum error ov er D (2) ( ∆ , s ) . T ogether, Lemmas G.2, G.3, and G.4 directly imply Theorem B.2. Pr o of of L emma G.2. Recalling the deﬁnition of ϵ T from (149) , ϵ T = max ∆ ∈ D (3) (∆ ,s ) P A, ∆ ( ˆ i T  = i ∗ ( ∆ )) , w e wan t to prov e the follo wing b ound, equiv alent to (150) T ⩾ 1 16 log(4 / 3) d max i =2 K s i ∆ 2 i log  1 4 ϵ T  . Step 1: reference matrix M π . F or reference, we consider the same matrix as in Subsection F.5, taking ϵ = 0 . F or completeness, we recall this construction here. W e assumed for simplicit y that K is a m ultiple of 8 , and denote d = K/ 2 . Consider the K × K an tisymmetric matrix M deﬁned b y M =  0 − D D ⊤ Λ  , (153) where D and Λ are tw o d × d matrices sp eciﬁed b elow. The matrix D is the d × d matrix with nonnegative entries suc h that the ﬁrst row is D 1 , · = (0 , . . . , 0) ∈ R d , and for an y i = 2 , . . . , d , D i, · = (∆ i , . . . , ∆ i | {z } s i times , 0 , . . . , 0) ∈ R d , whic h is p ossible since s i ⩽ d for i = 2 , . . . , d . T o construct Λ , recall that d is assumed to b e a multiple of 4 and that we assumed s i ∈ { 1 , . . . , d/ 4 } for all i ∈ { d + 1 , . . . , K } . Deﬁne Λ as the following block matrix: Λ =     0 − Λ (0) 0 Λ (3) Λ (0) 0 − Λ (1) 0 0 Λ (1) 0 − Λ (2) − Λ (3) 0 Λ (2) 0     , 76 where, for l ∈ { 0 , . . . , 3 } , the sub-matrix Λ ( l ) is the d/ 4 × d/ 4 matrix suc h that, for i ∈ { 1 , . . . , d/ 4 } , the i -th ro w of Λ ( l ) is Λ ( l ) i, · = (∆ j , . . . , ∆ j | {z } s j times , 0 , . . . , 0) ∈ R d/ 4 with j = d + d 4 l + i . (154) Ov erall, M is clearly antisymmetric b y construction. Moreo ver, for eac h arm i = 2 , . . . , K , the i -th row of M con tains exactly s i negativ e entries of magnitude ∆ i . The ﬁrst ro w is equal to 0 , so that M ∈ D wcw (see (146) ) and the (w eak) Condorcet winner is i ∗ = 1 . Finally , since in eac h row the negativ e entries are constan t, we hav e s ∗ M = ( s 1 , . . . , s K ) and M ∈ D (3) (∆ , s ) . W e use the same p ermutation construction as in the pro of of Theorem 3.2. W e recall this construction here. Let Π b e the set of permutations, where π = ( π 1 , . . . , π d ) ∈ Π if π i is a p erm utation of { 1 , . . . , d } for any i ∈ [ d ] . F rom an y π ∈ Π , deﬁne M π as the matrix obtained b y permuting the d ﬁrst columns of D according to π in the follo wing wa y: M π =  0 − D π ( D π ) ⊤ Λ  , (155) where, for an y ( i, j ) ∈ [ d ] 2 , D π i,j = D i,π i ( j ) . By construction, for an y π ∈ Π , we still ha v e M π ∈ D (3) (∆ , s ) . Alternativ e instance M ( π ,k ) . Fix a sub optimal arm k ∈ { 2 , . . . , d } . Construct the gap matrix M ( π ,k ) b y setting to zero all en tries in the k -th row and the k -th column of M π . By construction, ro ws 1 and k of M ( π ,k ) only scontain zero en tries, so that M ( π ,k ) do es not admit a unique Condorcet winner. W e denote by P ( π ,k ) A the distribution of the observ ations induced b y the interaction betw een algorithm A and the environmen t with gap matrix M ( π ,k ) . Step 2: information-theoretic arguments. Consider the recommendation rule ˆ i and the budget T of algorithm A . By deﬁnition of ϵ T , A can b e considered as ϵ T -correct o ver D (3) (∆ , s ) . Denote again b y ˆ i the recommendation of algorithm A . Observe that we alw ays ha v e { ˆ i  = 1 } or { ˆ i  = k } . Therefore, 1 | Π | X π ∈ Π P ( π ,k ) A ( ˆ i  = 1) ⩾ 1 2 or 1 | Π | X π ∈ Π P ( π ,k ) A ( ˆ i  = k ) ⩾ 1 2 . Without loss of generalit y , we assume that 1 | Π | X π ∈ Π P ( π ,k ) A ( ˆ i  = 1) ⩾ 1 2 . (156) In the other case, we should consider as reference matrix the matrix ˜ M obtained b y exc hanging ro ws 1 and k of M so that i ∗ ( ˜ M ) = k . Observe that w e still hav e ˜ M ∈ D (3) ( ∆ , s ) . The rest of the pro of is the same up to minor mo diﬁcations. Consider the ev ent B : = { ˆ i  = 1 } . F or any π , we ha ve M π ∈ D (3) ( ∆ , s ) and i ∗ ( M π ) = 1 . W e can then use the fact that A is ϵ T -correct o ver this class, b y deﬁnition of ϵ T ((149)), to get P π A ( B ) = P π A ( ˆ i  = 1) ⩽ ϵ T . (157) 77 No w w e use Lemma H.4, a F ano-type inequalit y presented as Prop osition 4 in Gerc hinovitz et al. [2020], to obtain P ( π ,k ) A ( B ) ⩽ KL  P ( π ,k ) A , P π A  + log(2) − log  P π A ( B )  . A v eraging ov er π ∈ Π and using (156) and (157), we get 1 2 ≤ 1 | Π | X π ∈ Π P ( π ,k ) A ( B ) ⩽ 1 | Π | P π ∈ Π KL  P ( π ,k ) A , P π A  + log(2) log  1 /ϵ T  . Observ e that, for an y π ∈ Π , M π and M ( π ,k ) diﬀer only in row k , and that that M ( π ,k ) do es not dep end on the p ermutation π k . Denote as π ( − k ) the v ector of p ermutations obtained from π = ( π 1 , . . . , π d ) ∈ Π b y removing the k -th comp onen t– π ( − k ) = ( π 1 , . . . , π k − 1 , π k +1 , . . . , π d ) . Denote as Π ( − k ) as the family { π ( − k ) } π ∈ Π . Observe that M ( π ,k ) do es not depend on π k . F or a ﬁxed π ( − k ) ∈ Π ( − k ) , w e hav e M ( π ,k ) = M ( π ( − k ) ,k ) . Then, we write the inequalit y ab o ve as 1 2 ⩽ 1 | Π ( − k ) | P π ( − k ) ∈ Π ( − k ) 1 | S d | P π ′ k ∈ S d KL  P ( π ( − k ) ,k ) A , P π ′ A  + log(2) log  1 /ϵ T  , (158) where inside the sum, w e denote as π ′ for the p ermutation obtained from π ( k ) and π ′ k b y π ′ = ( π 1 , . . . , π k − 1 , π ′ k , π k +1 , . . . , π d ) . Step 3: computing the KL divergence. W e now bound, for a ﬁxed π ( − k ) ∈ Π ( − k ) , 1 | S d | X π ′ k ∈ S d KL  P ( π ( − k ) ,k ) A , P π ′ A  . This computation has already b een carried out in the pro of of Theorem 3.2, see Equation (120) , and one has 1 | S d | X π k ∈ S d KL  P ( π ( − k ) ,k ) A , P π ′ A  ⩽ 8 log  4 3  ∥ M k, · ∥ 2 d T , (159) where, b y construction of M , w e hav e ∥ M k, · ∥ 2 = s k ∆ 2 k . A v eraging ov er Π ( − k ) (159), and com bining Equation (158), w e get T ⩾ 1 16 log(4 / 3) d s k ∆ 2 k log  1 4 ϵ T  , whic h holds for any k ∈ { 2 , . . . , K } . This is exactly the desired b ound (150). Pr o of of L emma G.3. Assume additionally that there exist µ > 0 and s ∈ [ d ] suc h that, for every i ∈ { 2 , . . . , d } , ∆ i = µ and s i = s . W e wan t to pro ve Bound (151), that is ϵ T ⩾ 1 2 − r 128 log(4 / 3) sµ 2 K 2 T . Step 1: reference and alternative instances. Consider ¯ M as the matrix deﬁned by ¯ M =  0 − ¯ D ¯ D ⊤ Λ  , (160) 78 where Λ is as in (154), and ¯ D is the d × d matrix such that, for an y i = 1 , . . . , d , ¯ D i, · = ( µ, . . . , µ | {z } s times , 0 , . . . , 0) ∈ R d . Observ e that the ﬁrst d ro ws of ¯ M are equal, and that ¯ M do es not admit a Condorcet winner; in particular, ¯ M ∈ D (3) (∆ , s ) . Again, for an y π ∈ Π , deﬁne ¯ M π as the matrix obtained by permuting the d ﬁrst ro ws of ¯ M according to π 1 , . . . , π d as in (155) . F or this part of the pro of, we denote b y P π A the distribution of the observ ations induced by the interaction betw een algorithm A and the en vironment with gap matrix ¯ M π . Construction of p erturb ed instances ¯ M ( π ,k ) . Consider an y arm k ∈ [ d ] . W e construct ¯ M ( π ,k ) as the matrix obtained from ¯ M π b y setting to zero all en tries in the k -th ro w and the k -th column. By construction, ¯ M ( π ,k ) ∈ D (3) ( ∆ , s ) and i ∗ ( ¯ M ( π ,k ) ) = k . W e denote by P ( π ,k ) A the distribution of the observ ations induced b y the interaction betw een algorithm A and the en vironment with gap matrix ¯ M ( π ,k ) . Step 2: b ound on the total v ariation distance. Consider the recommendation rule ˆ i and the budget T of algorithm A , which is ϵ T -correct ov er D (3) (∆ , s ) , b y deﬁnition of the maximum error ϵ T . In tuitively , under ¯ M there is no Condorcet winner among the ﬁrst d arms, so algorithm A cannot systematically decide in fav our of a speciﬁc subset of them, and it m ust make a large error on at least half of these arms. Indeed, it alw ays holds that { ˆ i ∈ [ | 1; d/ 2 | ] } or { ˆ i ∈ [ | d/ 2 + 1; d | ] } . Therefore, 1 | Π | X π ∈ Π P π A ( ˆ i ∈ [1; d/ 2]) ⩾ 1 2 or 1 | Π | X π ∈ Π P π A ( ˆ i ∈ [ d/ 2 + 1; d ]) ⩾ 1 2 . Without loss of generalit y 5 w e assume that 1 | Π | X π ∈ Π P π A ( ˆ i ∈ [1; d/ 2]) ⩾ 1 2 . (161) F or an y k ∈ [ d/ 2] , consider the even t B k : =  ˆ i = k  ∪  N { k, ·} > 16 T K  , N { k, ·} denotes the n umber of duels inv olving arm k and an adv ersary in [ d + 1; K ] b et ween time t = 1 and time T , that is, N { k, ·} = | n t ∈ [ T ] : ∃ j ∈ [ d + 1; K ] with { I t , J t } = { k , j } o | . Observ e ﬁrst that, for any ﬁxed π , P π A do es not dep end on k , so that 1 d/ 2 d/ 2 X k =1 P π A ( ˆ i = k ) = 1 d/ 2 P π A ( ˆ i ∈ [1; d/ 2]) . A v eraging ov er π ∈ Π and using (161), we obtain 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π A ( ˆ i = k ) ⩽ 1 d . (162) 5 in the other case, we consider k ∈ [ d/ 2 + 1 , d ] and run the same arguments 79 No w, by deﬁnition, the family  N { k, ·}  k ∈ [ d/ 2] coun ts duels with pairwise disjoint sets of arms, for time-steps b et ween 1 and T , so that d/ 2 X k =1 N { k, ·} ⩽ T . F rom this upper b ound, a simple counting argumen t implies that at most a fraction 1 / 4 of the arms in [ d/ 2] can satisfy N { k, ·} > 16 T K = 4 T d/ 2 . Hence, 1 d/ 2 d/ 2 X k =1 1 { N { k, ·} > 16 T K } ⩽ 1 4 . T aking exp ectation with resp ect to the probabilit y 1 | Π | P π ∈ Π P π A , w e obtain 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π A  N { k, ·} > 16 T K  ⩽ 1 4 . (163) Com bining (162) and (163), and using d ⩾ 4 , we get 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π A ( B k ) ⩽ 1 4 + 1 d ⩽ 1 2 . (164) No w, consider B k under P ( π ,k ) A . Observe that B c k ⊂ { ˆ i  = k } . The en vironment ¯ M ( π ,k ) admits k as Condorcet winner and belongs to D (3) ( ∆ , s ) . Using that A is ϵ T -correct ov er this class, b y deﬁnition of ϵ T , w e obtain, for any π ∈ Π , P ( π ,k ) A ( B c k ) ⩽ P ( π ,k ) A ( ˆ i  = k ) ⩽ ϵ T . (165) The ev ent B k has the additional prop erty that it is measurable by an algorithm whic h runs A but uses at most 16 T K duels in volving arm k . Deﬁne the follo wing pro cedure ˜ A k . F or t = 1 , . . . , T , run algorithm A . At each time t , compute N { k, ·} ( t ) as the num b er of duels in volving k b efore time t , if N { k, ·} ( t ) > 16 T /K , stop sampling, and return ψ k = 1 . If the algorithm has not stopp ed by time T –that is, if N { k, ·} = N { k, ·} ( T ) ⩽ 16 T /K – compute ˆ i T and output ψ k : = 1 ˆ i T = k . By construction, the decision ψ k pro duced by ˜ A k satisﬁes ψ k = 1 B k . Moreov er, for any en vironment ν , we ha ve P ˜ A k ,ν ( B k ) = P A,ν ( B k ) . F rom these observ ations, w e deduce TV   1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P ( π ,k ) ˜ A k , 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π ˜ A k   ⩾ 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P ( π ,k ) ˜ A k ( B k ) − 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π ˜ A k ( B k ) = 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P ( π ,k ) A ( B k ) − 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π A ( B k ) . Using (164) and (165), we obtain TV   1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P ( π ,k ) ˜ A , 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π ˜ A   ⩾ 1 − ϵ T − 1 2 = 1 2 − ϵ T (166) 80 Finally , using the conv exity of the total v ariation distance together with Pinsker’s inequality and (166), w e get 1 2 − ϵ T ⩽ TV   1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P ( π ,k ) ˜ A k , 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π P π ˜ A k   ⩽ v u u t 1 2 1 d/ 2 d/ 2 X k =1 1 | Π | X π ∈ Π KL  P ( π ,k ) ˜ A k , P π ˜ A k  . (167) Step 3: computing the KL divergence. An imp ortant prop erty of pro cedure ˜ A k is that the budget sp ent on duels with arm k is upp er b ounded b y 16 T /K ; precisely , for an y environmen t ν , E ˜ A k ,ν   d X i = d/ 2 N { k,i }   ⩽ 16 T K . (168) W e no w upp er b ound 1 | Π | X π ∈ Π KL  P ( π ,k ) ˜ A k , P π ˜ A k  . As in previous proofs, we ﬁx π 1 , . . . , π k − 1 , π k +1 , . . . and w e av erage o ver the π k ’s. Hence, for an y k ∈ [ d/ 2] , we get 1 | S d | X π k ∈ S d KL  P ( π ,k ) ˜ A k , P π ˜ A k  ⩽ 8 log  4 3  ∥ ¯ M k, · ∥ 2 d E ( π ,k ) ˜ A k   d X i = d/ 2 N { k,i }   ⩽ 8 log  4 3  sµ 2 d 16 T K , where w e use ∥ ¯ M k, · ∥ 2 = sµ 2 and Equation (168). Gathering Equation (167) with the b ound ab ov e, and rearranging (using d = K/ 2 ), w e obtain ϵ T ⩾ 1 2 − r 128 log(4 / 3) sµ 2 K 2 T , whic h is exactly the desired b ound (151). Pr o of of L emma G.4. Consider again the constan t case where there exist µ > 0 and s ∈ [ d ] suc h that, for ev ery i ∈ { 1 , . . . , d } , ∆ i = µ and s i = s . W e wan t to pro ve the follo wing b ound, equiv alent to (152): T ⩾ 1 32 log(4 / 3) K µ 2 log  1 4 ϵ T  . Step 1. Again, assume that ∆ and s are constan t. T ake the matrix ¯ M deﬁned in (160). Fix for no w k ∈ { 1 , . . . , d/ 2 } . Consider ¯ M ( k ) , the matrix obtained from ¯ M b y setting to zero the k -th row and the k -th column of ¯ M . Recall that ¯ M do es not admit a Condorcet winner, while ¯ M ( k ) ∈ D (3) ( ∆ , s ) with i ∗ ( ¯ M ( k ) ) = k . W e denote by P A (resp. P ( k ) A ) the distribution of the observ ations induced by the interaction b et ween algorithm A and the en vironment with gap matrix ¯ M (resp. ¯ M ( k ) ). 81 Step 2: b ound on the total v ariation distance. Consider the ev ent B : =  ˆ i ∈ [ d/ 2]  . As in the pro of of (151), we can assume without loss of generality 6 that P A ( ˆ i ∈ [ d/ 2]) ⩾ 1 2 , so that P A ( B ) ⩽ P A ( ˆ i ∈ [ d/ 2]) ⩽ 1 2 . (169) No w, consider B under P ( k ) A . Observe that B c ⊂ { ˆ i  = k } . By deﬁnition of the maximum error ϵ T , A is ϵ T -correct o ver D (3) , so that P ( k ) A ( B c ) ⩽ P ( k ) A ( ˆ i  = k ) ⩽ ϵ T . (170) No w, by the F ano-type inequality from Lemma H.4, it holds that P A ( B c ) ⩽ KL( P A , P ( k ) A ) + log(2) − log( P ( k ) A ( B c )) , and using (169) and (170), we obtain 1 2 ⩽ P A ( B c ) ⩽ KL( P A , P ( k ) A ) + log(2) − log( P ( k ) A ( B c )) ⩽ KL( P A , P ( k ) A ) + log(2) − log( ϵ T ) . (171) Step 3: computing the KL divergence. W e no w upp er b ound KL  P A , P ( k ) A  . F rom the decomp osition of the KL div ergence and the deﬁnition of ¯ M ( k ) , w e hav e KL  P A , P ( k ) A  = d X i = d/ 2 E A [ N { k,i } ] kl  1 2 + ¯ M k,i , 1 2  ⩽ d X i = d/ 2 E A [ N { k,i } ] 8 log(4 / 3) µ 2 , where w e use the fact that the row ¯ M k, · only tak es v alues in { 0 , µ } . Com bining (171) with the b ound ab ov e and rearranging, we obtain 1 16 log(4 / 3) 1 µ 2 log  1 4 ϵ T  ⩽ 1 8 log(4 / 3) 1 µ 2 KL  P A , P ( k ) A  ⩽ d X i = d/ 2 E A [ N { k,i } ] . Summing o ver k ∈ [ d/ 2] , w e obtain K 32 log(4 / 3) 1 µ 2 log  1 4 ϵ T  ⩽ d/ 2 X k =1 d X i = d/ 2 E A [ N { k,i } ] ⩽ T , whic h is the desired b ound (152). 6 Otherwise, choose B = { ˆ i ∈ [ d/ 2; d ] } , and take k in [ d/ 2; d ] ev erywhere. 82 H T ec hnical Results H.1 Deterministic b ounds Lemma H.1 (Section 6.1 of Audibert et al. [2010]) . L et x 1 , . . . , x K denote a de cr e asing se quenc e of p ositive numb ers. W e have: max k ∈{ 1 ,...,K } k x 2 k ≤ K X i =1 x 2 i ≤ log (4 K ) max k ∈{ 1 ,...,K } k x 2 k . Lemma H.2. L et x 1 , . . . x n denote a se quenc e of p ositive numb ers such that x 1 ≤ · · · ≤ x n . Then we have for any p ∈ (0 , 1) pn x ( ⌈ pn ⌉ ) ≤ n X i =1 1 x i . Pr o of. W e hav e 1 x n ≤ · · · ≤ 1 x 1 . Therefore pn x ⌈ pn ⌉ ≤ ⌈ pn ⌉ x ⌈ pn ⌉ ≤ ⌈ pn ⌉ X i =1 1 x i ≤ n X i =1 1 x i . Lemma H.3. L et S d denote the set of p ermutations of { 1 , . . . , d } . L et a 1 , . . . , a d and b 1 , . . . , b d b e two se quenc es of numb ers. W e have 1 d ! X σ ∈ S d d X i =1 a i b σ ( i ) = 1 d d X i =1 a i ! d X i =1 b i ! . Pr o of. The result is just a consequence of summation manipulation. W e hav e X σ d X i =1 a i b σ ( i ) = d X i =1 a i X σ b σ ( i ) = d X i =1 a i d ! d d X i =1 b i = ( d − 1)! d X i =1 a i ! d X i =1 b i ! . Belo w we present a useful F ano-type inequality presented as Prop osition 4 in Gerchino vitz et al. [2020] Lemma H.4. L et P and Q b e two pr ob ability distributions, and let A b e an event such that Q ( A ) ∈ (0 , 1) . W e have P ( A ) ≤ KL ( P , Q ) + log(2) − log( Q ( A )) . Mor e gener al ly, for al l pr ob ability p airs P i , Q i and al l events A i , wher e i ∈ { 1 , . . . , N } , with 0 < 1 N P N i =1 Q i ( A i ) < 1 , we have 1 N N X i =1 P i ( A i ) ≤ 1 N P N i =1 KL ( P i , Q i ) + log(2) − log  1 N P N i =1 Q i ( A i )  . Lemma H.5. F or any x > 0 , we have 0 < 1 2 − e − x log(1 − e − x ) − log( e − x ) < 1 2 x . 83 Pr o of. Let y = e − x ∈ (0 , 1) and deﬁne h ( y ) := log  1 − y y  = log(1 − y ) − log y , R ( y ) := 1 2 − y h ( y ) ( y  = 1 2 ) . Let us start with a proof of the p ositivity of the middle expression. The function h is strictly decreasing on (0 , 1) and satisﬁes h ( 1 2 ) = 0 , hence sign ( h ( y )) = sign ( 1 2 − y ) . Therefore R ( y ) > 0 for all y  = 1 2 . At y = 1 2 , b oth n umerator and denominator v anish, by l’Hôpital’s rule, lim y → 1 / 2 R ( y ) = − 1 h ′ (1 / 2) = − 1 −  1 1 − y + 1 y    y =1 / 2 = 1 4 d. Th us the middle expression is well-deﬁned b y contin uit y and is strictly p ositive for all x > 0 . No w let us sprov e the stated upp er b ound. Since x = log(1 /y ) > 0 , we need to show R ( y ) < 1 2 log(1 /y ) . If y < 1 2 (so h ( y ) > 0 ) we need to pro ve that 2 log(1 /y )  1 2 − y  < h ( y ) . whic h is equiv alen t to 2 y log (1 /y ) + log (1 − y ) > 0 , deﬁne g ( y ) : = 2 y log (1 /y ) + log (1 − y ) , therefore w e need to show that ∀ y ∈ (0 , 1 / 2) , g ( y ) > 0 . If y > 1 2 (so h ( y ) < 0 ), the same manipulation yields the equiv alent condition g ( y ) < 0 . Hence it suﬃces to pro ve g ( y ) > 0 on (0 , 1 2 ) and g ( y ) < 0 on ( 1 2 , 1) . A direct computation sho ws g ′′ ( y ) = − 2 y − 1 (1 − y ) 2 < 0 ( y ∈ (0 , 1)) , so g is strictly conca ve. Moreov er, lim y ↓ 0 g ( y ) = 0 and g (1 / 2) = 2 · 1 2 log 2 + log (1 / 2) = 0 . By strict conca vity , this implies g ( y ) > 0 for all y ∈ (0 , 1 / 2) . Finally , g ′ ( y ) = 2 log(1 /y ) − 2 − 1 1 − y , g ′ (1 / 2) = 2 log 2 − 4 < 0 . Since g is concav e, g ′ is nonincreasing, so g ′ ( y ) ≤ g ′ (1 / 2) < 0 for all y ≥ 1 / 2 . Th us g is strictly decreasing on [1 / 2 , 1) , and hence g ( y ) < 0 for all y ∈ (1 / 2 , 1) . Therefore R ( y ) < 1 / (2 log(1 /y )) = 1 / (2 x ) for all x > 0 , concluding the pro of. Lemma H.6. L et d b e an inte ger gr e ater than 1 . L et M ∈ R d × d such that M is skew symmetric (i.e. ∀ i, j ∈ [ d ] : M i,j = − M j,i ). Then then numb er of lines of M with at le ast ⌈ ( d + 1) / 4 ⌉ non-p ositive entries is at le ast ⌈ ( d + 1) / 4 ⌉ . Pr o of. F or i ∈ [ d ] deﬁne s i := |{ j ∈ [ d ] : M i,j ≤ 0 }| . 84 Since the matrix M is sk ew symmetric (i.e. M i,j = − M j,i for an y i, j ), for ev ery unordered pair { i, j } where i  = j , at least one of the t wo quan tities M i,j or M j,i is nonp ositive. Therefore, the n umber of oﬀ-diagonal non-positive en tries is at least  d 2  . W e conclude b y taking into accoun t the diagonal en tries that d X i =1 s i ≥  d 2  + d = d ( d + 1) 2 . The conclusion follo ws by a simple con tradiction argument. Lemma H.7. L et n ≥ 3 and M b e an n × n skew-symmetric matrix, let E denote the set of r ows such that the numb er of non-p ositive entries is at le ast ⌈ n/ 4 ⌉ + 1 , then the interse ction of E with any subset of { 1 , . . . , n } of c ar dinality n − ⌈ n/ 8 ⌉ is non-empty. Pr o of. Assume n ≥ 3 and set a := ⌈ n/ 8 ⌉ , b := ⌈ n/ 4 ⌉ . Supp ose b y contradiction that there exists S ⊆ [ n ] with | S | = n − a and S ∩ E = ∅ . Then each ro w i ∈ S has at most b non-p ositiv e entries, hence at least n − b p ositiv e entries. Let P := |{ ( i, j ) : M ij > 0 }| b e the total n umber of p ositive en tries. Summing o ver rows in S gives P ≥ | S | ( n − b ) = ( n − a )( n − b ) . On the other hand, skew-symmetry implies that for eac h unordered pair { i, j } with i  = j , at most one of M ij , M j i is p ositiv e, hence P ≤  n 2  = n ( n − 1) 2 . Th us ( n − a )( n − b ) ≤ n ( n − 1) 2 . But for n ≥ 5 , using ⌈ x ⌉ ≤ x + 1 , n − a ≥ 7( n − 1) 8 , n − b ≥ 3( n − 1) 4 ⇒ ( n − a )( n − b ) ≥ 21 32 ( n − 1) 2 > n ( n − 1) 2 , a contradiction; and the remaining cases n = 3 , 4 are chec k ed directly: ( n − a )( n − b ) = 4 > 3 and 9 > 6 , resp ectively . Hence no such S exists, i.e. every S with | S | = n − ⌈ n/ 8 ⌉ intersects E . Lemma H.8. L et x > 10 3 and y > 8 . Then 2 log 2 ( x ) log 2 ( y ) log  xy (log y ) 5  ≥ log( xy ) . Pr o of. Since x > 10 3 and y > 8 , we hav e log x > log (10 3 ) > 6 and log y > log 8 > 2 . Hence log x log y − (log x + log y ) = (log x − 1)(log y − 1) − 1 > 0 , so log x log y ≥ log( xy ) . (172) Moreo ver, log  xy (log y ) 5  = log( xy ) + 5 log log y . Since y > 8 implies log y > 2 > 1 , w e hav e log log y ≤ log y , and thus log  xy (log y ) 5  ≤ log ( xy ) + 5 log y = log x + 6 log y . Therefore, 2 log x log y − (log x + 6 log y ) = (2 log y − 1) log x − 6 log y ≥ 6(2 log y − 1) − 6 log y = 6(log y − 1) > 0 , 85 so 2 log x log y ≥ log  xy (log y ) 5  . (173) Multiplying (172) and (173) yields 2 log 2 ( x ) log 2 ( y ) ≥ log( xy ) log  xy (log y ) 5  . Since log  xy (log y ) 5  > 0 , dividing by it giv es the claim. Lemma H.9. L et K ≥ 2 , H ≥ 4 , and T ≥ 8 K log 8 / 7 ( K ) . Deﬁne p k : = 1 18 ∧ (log T + K ) exp  − c T log 3 ( K ) log( T ) H  . A ssume T ≥ c 0 H log 5 ( H ) for some numeric al c onstant c 0 lar ge enough (dep ending only on c ). Then ther e exists a numeric al c onstant c ′ > 0 (e.g. c ′ = c/ 2 ) such that p k ≤ exp  − c ′ T log 3 ( K ) log( T ) H  . Pr o of. Let us introduce the following notation x : = T log 3 ( K ) log( T ) H , A : = log T + K . Then p k ≤ Ae − cx . Since T ≥ 8 K log 8 / 7 ( K ) and K ≥ 2 , w e ha ve A = log T + K ≤ T , so log A ≤ log T and p k ≤ exp( − cx + log T ) . Th us it suﬃces to prov e log T ≤ c 2 x , i.e. T (log T ) 2 ≥ 2 H log 3 ( K ) c . (174) Since T ≥ 8 K log 8 / 7 ( K ) > K , we hav e log 3 ( K ) ≤ (log T ) 3 . Therefore (174) follows from T (log T ) 2 ≥ 2 H c (log T ) 3 , whic h is equiv alent to T (log T ) 5 ≥ 2 H c . Let g ( t ) : = t/ ( log t ) 5 , whic h is increasing for t ≥ e 5 . Cho ose c 0 large enough so that T ≥ c 0 H log 5 ( H ) ≥ e 5 for all H ≥ 4 . Then, with T 0 : = c 0 H log 5 ( H ) , we ha ve g ( T ) ≥ g ( T 0 ) and g ( T 0 ) = c 0 H log 5 ( H )  log( c 0 H log 5 ( H ))  5 . F or H ≥ 4 , w e hav e log log H ≤ log H , so log( c 0 H log 5 ( H )) = log c 0 + log H + 5 log log H ≤ log c 0 + 6 log H ≤  6 + log c 0 log 4  log H = : C 0 log H . Hence g ( T 0 ) ≥ c 0 C 5 0 H . T aking c 0 large enough so that c 0 C 5 0 ≥ 2 c yields g ( T ) ≥ g ( T 0 ) ≥ 2 H c , pro ving (174). Therefore p k ≤ exp( − ( c/ 2) x ) , i.e. the claim with c ′ = c/ 2 . 86 H.2 Concen tration inequalities Belo w is Ho eﬀding concentration inequalit y . Lemma H.10. L et X 1 , . . . , X n b e indep endent r andom variables such that a i ≤ X i ≤ b i almost sur ely. L et S n = P n i =1 X i . Then we have for al l t > 0 : P ( S n − E [ S n ] ≤ − t ) ≤ exp  − 2 t 2 P n i =1 ( b i − a i ) 2  . Belo w we restate tw o results on the concen tration of the sum of independent binary random v ariable from Buldygin and Moskvic ho v a [2013]. First, let us introduce some notation. Let ξ denote a sub-Gaussian random v ariable, its sub-Gaussian standard is deﬁned b y: τ ( ξ ) := inf  a ≥ 0 : E [exp ( λξ )] ≤ exp  a 2 λ 2 2  , λ ∈ R  . Lemma H.11 (Theorem 2.1 in Buldygin and Moskvic ho v a [2013] ) . L et X denote a Bernoul li r andom variable with p ar ameter p ∈ [0 , 1] . Then we have τ 2 ( X − p ) = ϕ ( p ) , wher e ϕ ( . ) is the function deﬁne d by: ϕ ( p ) =      0 , p ∈ { 0 , 1 } ; 1 4 , p = 1 2 ; 1 2 − p log(1 − p ) − log( p ) , p ∈ (0 , 1) \  1 2  . The lemma b elo w gives a concen tration b ound on the binomial random v ariables. Lemma H.12. L et X j for j ∈ { 1 , . . . , n } denote a se quenc e of indep endent Bernoul li r andom variables with p ar ameter p ∈ [0 , 1] . Deﬁne S n = P n j =1 ( X j − p ) . Then, we have for al l x > 0 P ( S n ≥ x ) ≤ exp  − x 2 2 nϕ ( p )  , wher e ϕ is deﬁne d in L emma H.11. W e also have for al l x > 0 P ( S n ≤ − x ) ≤ exp  − x 2 2 nϕ ( p )  . Pr o of. This is a direct consequence of Chernoﬀ ’s b ound with Lemma H.11. 87

The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment