Global Sequential Testing for Multi-Stream Auditing

Global Sequen tial T esting for Multi-Stream Auditing Beepul Bharti 1 , 2 ∗ bbharti1@jh u.edu Am bar Pal 3 † am barpal@amazon.com Jeremias Sulam 1 , 2 jsulam1@jh u.edu 1 Mathematical Institute for Data Science (MINDS), Johns Hopkins Universit y 2 Departmen t of Biomedical Engineering, Johns Hopkins Univ ersity 3 Amazon Resp onsible AI Abstract A cross many risk sensitiv e areas, it is critical to con tinuously audit the performance of mac hine learning systems and detect any unusual behavior quickly . This can b e modeled as a sequen tial h yp othesis testing problem with k incoming streams of data and a global null h yp othesis that asserts that the system is w orking as exp ected across all k streams. The standard global test emplo ys a Bonferroni correction and has an exp ected stopping time b ound of O (ln k / α ) when k is large and the signiﬁcance lev el of the test, α , is small. In this work, w e construct new sequen tial tests b y using ideas of merging test martingales with diﬀerent trade-oﬀs in exp ected stopping times under diﬀerent, sparse or dense alternativ e hypotheses. W e further derive a new, balanced test that achiev es an improv ed exp ected stopping time b ound that matches Bonferroni’s in the sparse setting but that naturally results in O ( 1 / k ln 1 / α ) under a dense alternative. W e empirically demonstrate the eﬀectiveness of our prop osed tests on synthetic and real-world data. Con ten ts 1 In tro duction 2 2 Problem F orm ulation 3 3 A Primer in Sequen tial T esting 4 4 Standard Global Sequential T esting 6 5 P ow erful Global T ests via Merging 8 6 Exp erimen ts 10 7 Conclusion 13 A Proofs 17 B Useful Lemmas and Inequalities 31 C Algorithms 36 D A dditional Exp erimen tal Details & Results 37 ∗ Corresp onding author. † This work is not related to AP’s p osition at Amazon. 1 1 In tro duction P ow erful machine learning (ML) systems are increasingly deplo yed in high-stakes decision-making con texts suc h as healthcare [ 32 ] and criminal justice [ 29 ], where they inﬂuence critical decisions ab out individuals. Consequen tly , the developmen t of auditing procedures that can con tinually ev aluate a ML system’s p erformance has become an essen tial and active area of study [ 25 , 5 , 1 , 19 , 36 ]. Notably , the imp ortance of such auditing pro cedures has b een explicitly emphasized b y regulatory and gov ernance b o dies including the US Oﬃce of Science and T echnology P olicy [ 42 ], the Europ ean Union [ 6 ], and the United Nations [ 35 ]. Statistical h yp othesis testing pro vides a rigorous framew ork to p erform online auditing of an ML system. Supp ose, as a thought exp eriment, that we are tasked with auditing a newly deploy ed state-of-the-art medical foundation mo del at a hospital. Supp ose also that this mo del can p erform zero-shot diagnoses across v arious imaging mo dalities, lik e X-rays, computed tomography (CT), magnetic resonance imaging (MRI), and more. W e receiv e a contin uous stream of data representing the mo del’s predictions. T o audit the mo del’s p erformance on a sp eciﬁc mo dality , such as X-rays, w e can deﬁne the following null hypothesis: H 0 : The mo del is working as intended on X-rays. Moreo ver, we can contin uously test this n ull b y lev eraging se quential h yp othesis tests. Unlik e traditional hypothesis tests [ 20 ], which require collecting a ﬁxed batch of data up to a pre-sp eciﬁed time to obtain a v alid p -v alue, sequen tial tests allow data collection to stop at arbitrary , data- dep enden t times once suﬃcient evidence against the null hypothesis has accumulated. Ev aluating a system’s p erformance on a single stream is imp ortan t, but testing it across multiple streams (e.g., demographic groups, geographic lo cations, etc) is crucial for ensuring fairness, safety , and reliabilit y [ 34 , 22 , 11 ]. F or example, in our foundation mo del, w e would like to ensure it is p erforming eﬀectively across v arious imaging mo dalities to ensure safe patient outcomes. F ormally , considering k streams of data, each corresp onding to a diﬀeren t mo dality , we woul d like to raise a ﬂag whenever the mo del p erforms sub-optimally on any stream. In other w ords, we require eﬃcient sequen tial tests for the glob al n ull h yp othesis: H global 0 : The mo del is working as intended across all image modalities. T esting this null is equiv alen t to testing the in tersection of k h yp otheses H 1 , 0 , . . . , H k, 0 , where eac h H i, 0 asserts that the foundation mo del is working as in tended for a sp eciﬁc mo dalit y , or data stream. Imp ortan tly , v arious alternatives may arise: a single stream might b e problematic (a sparse alternativ e), or man y streams might present issues (a dense alternativ e). F or both scenarios, and ev erything in b et ween, we need eﬃcien t and p o w erful sequen tial tests. W e dev elop and study sequen tial tests for this global null hypothesis. F or a signiﬁcance lev el α ∈ (0 , 1) , the most natural, oﬀ-the-shelf test for this problem emplo ys a Bonferroni correction o ver k sequen tial tests, one for eac h stream, resulting in an exp ected stopping (or rejection) time that is dominated by ln k / α , in b oth sparse and dense settings. While simple, this guaran tee is unnecessarily large in cases where many streams are not b ehaving as in tended. Observing that the Bonferroni test is merging the evidence from eac h stream in one of man y w ay , we study diﬀerent merging strategies, highligh ting their strengths and w eaknesses. Moreo ver, we present a new test that balances the strengths of diﬀerent merging strategies and ac hieves an exp ected stopping time guaran tee of O (ln k / α ) in the sparse setting and O ( 1 / k ln 1 / α ) in the dense setting, enabling p ow erful and eﬃcien t global testing under a sp ectrum of alternativ es. 2 1.1 Con tributions T o summarize, our main contributions are: 1. W e study t wo sequential tests for the global n ull hypothesis that leverage well known m ulti- v ariate strategies and to ols from high-dimensional online learning. W e provide no vel upp er b ounds on their exp ected stopping times demonstrating that these natural approaches pro ve to b e not b etter than Bonferroni’s. 2. W e dev elop new sequential tests for the global null employing ideas of merging evidence across streams via av erages and pro ducts. W e pro vide upp er b ounds on their exp ected stopping times, sho wcasing when they are (and are not) p o w erful, and dev elop a balanced approach that inherits the adv an tages of eac h. 3. W e p erform experiments on synthetic and real-world datasets to sho w case our theoretical ﬁndings, demonstrating the utility of our sequential tests. 2 Problem F orm ulation Building on previous w orks [ 10 , 5 ], we formalize the auditing problem as a statistical hypothesis test. Sp eciﬁcally , let P i b e a distribution supp orted on Z ⊂ [ − 1 , 1] , where the random v ariable Z ∼ P i captures the system’s b eha vior, with larger | Z | indicating greater deviations from the exp ected b eha vior. As an example, Z might indicate the error of a predictive mo del. Giv en data p oints Z i, 1 , Z i, 2 , . . . iid ∼ P i , auditing a system’s p erformance on P i can b e framed as a hypothesis test with the follo wing n ull and alternativ e h yp otheses: H i, 0 : E P i [ Z i, 1 ] = 0 and H i, 1 : E P i [ Z i, 1 ]  = 0 . (1) The null H i, 0 asserts that the system is p erforming as intended on distribution i , while the alternative H i, 1 suggests a deviation from expected b ehavior. Note that this is simply a b ounded mean test [ 40 ], where the goal is to determine whether the mean of the b ounded random v ariable distributed according to P i equals zero, which has b een very useful in the con text of auditing [ 10 , 5 ]. In this w ork, we mov e b eyond single audits and seek to ev aluate whether a system is op erating as intended across k distributions, P 1 , . . . , P k . F ormally , letting [ k ] : = { 1 , . . . , k } and observing data from all distributions, Z i, 1 , Z i, 2 , . . . iid ∼ P i for eac h i ∈ [ k ] , we aim to test the glob al n ull and alternativ e h yp otheses: H global 0 : ∀ i ∈ [ k ] , E P i [ Z i, 1 ] = 0 and H global 1 : ∃ i ∈ [ k ] suc h that E P i [ Z i, 1 ]  = 0 . (2) The global n ull holds if and only if every null H i, 0 is true. Thus, it pro vides a precise auditing pro cess: rejecting H global 0 implies that there is at least one distribution P i on which the system is p erforming incorrectly . Unlik e traditional h yp othesis testing [ 20 ], we consider the setting where data is collected o ver time and our goal is to c ontinuously test H global 0 . Sp eciﬁcally , we assume k parallel data streams, ( Z i,t ) t ≥ 1 ≡ ( Z i, 1 , Z i, 2 , . . . ) , where Z i, 1 , Z i, 2 , . . . iid ∼ P i for all i ∈ [ k ] := { 1 , . . . , k } . The task is to sequentially test H global 0 , con tinually ev aluating whether the system is op erating normally across all streams, and raising an alarm if there is an issue in an y stream. Most importantly , our goal is to dev elop sequen tial testing pro cedures that allow us to reject H global 0 as so on as suﬃcien t evidence against it emerges. 3 A dditional Notation. F or k streams, ( Z 1 ,t ) t ≥ 1 , . . . , ( Z k,t ) t ≥ 1 , denote ∆ i = E [ Z i, 1 ] 1 , ∆ m = max i ∈ [ k ] ∆ i , and ∆ s = P k i =1 ∆ 2 i . By deﬁnition, under the null H global 0 , these quan tities equal zero. Under the alternativ e H global 1 , ∃ i ∈ [ k ] , such that ∆ i  = 0 . As a result, under H global 1 , ∆ m  = 0 and ∆ s  = 0 . 2.1 Related W ork Safe anytime v alid testing, m ulti-stream sequen tial testing, and auditing, all draw on several distinct lines of research. W e discuss w orks in these researc h areas and those most related to ours. Safe anytime-v alid testing. The works of Shafer [ 31 ], V o vk and W ang [ 38 ], Shekhar and Ramdas [ 33 ], W audby-Smith and Ramdas [ 40 ], W audby-Smith, Sandov al, and Jordan [ 41 ], and Ramdas and W ang [ 26 ] detail the testing by b etting framework and provide n umerous results regarding martingales and e-v alues and provide us with the machinery to construct our testing pro cedures. Sev eral other works study testing m ultiple hypotheses via sequential hypothesis tests [ 44 , 15 , 45 ] though they study pro cedures with controlled false disco very rate. Auditing ML systems via hypothesis testing. The recent w ork of Cen and Alur [ 2 ] op era- tionalize mo dern algorithmic auditing using statistical hypothesis testing. In doing so, they highlight t wo key auditing challenges. First, they detail how m uch access to an ML system (in addition to blac k-b o x access) is needed to obtain the evidence required for a meaningful audit. Second, they pro vide a wa y to formally link complex statistical auditing metho ds and law. How ever, they fo cus on classical non-sequen tial settings where one has a ﬁxed sample. On the other hand, v arious w orks fo cus on developing sequential hypothesis tests to contin uously audit deploy ed systems. Ch ugg et al. [ 5 ] construct sequen tial tests to contin ually audit ML predictors for parity-fairness and Dai et al. [ 10 ] construct tests for real-time auditing to verify if demographic groups are harmed within a rep orting based framework. More recen t works ha v e developed sequen tial tests to ev aluate large language mo dels (LLM). Chen and W ang [ 3 ] prop ose tests to detect LLM generated texts and Ric hter et al. [ 28 ] construct tests to detect b eha vioral shifts in LLMs. These works also brieﬂy discuss m ulti-stream extensions and global testing, though they all p erform a Bonferroni correction. Multi-stream sequential testing. A few w orks explore sequen tial testing in a m ulti-stream setting. Kaufmann and Koolen [ 16 ] use a mixture-based approac h to construct w ealth pro cesses for single streams and com bine them using a pro duct strategy . How ever, they assume parametric mo dels for their data streams, mo deling them as 1-dimensional exp onential families, which is diﬀerent from our general nonparametric setting. The work of Cho, Gan, and Kallus [ 4 ] is closely related to ours, alas employing a diﬀerent notion of multi-stream data: A t eac h time, they observe a single outcome Z t along with another random v ariable A t , taking a v alue in [ k ] that determines the mem b ership of Z t . That is, every observ ed p oint b elongs to one of k p ossible groups. In contrast, in our setting one observ es k outcomes at every time point, each one from a diﬀeren t stream. Additionally , [ 4 ] do es not pro vide exp ected stopping times for their tests, as we do here. Note that these previous w orks, as well as ours, diﬀer from those addressing m ulti-stream changepoint detection, where the primary ob jective is to detect, as quic kly as p ossible, when the distribution from which the data stream is arriving from abruptly changes [ 8 , 43 ]. 3 A Primer in Sequen tial T esting W e lev erage to ols from safe anytime v alid inference to construct sequential tests. A sequential test ϕ ≡ ( ϕ t ) t ≥ 1 for a null h yp othesis H 0 mak es a decision ϕ t ∈ { 0 , 1 } at each (discrete) time t based 1 The Z i, 1 , . . . , Z i,t iid ∼ P i , so E [ Z i, 1 ] = · · · = E [ Z i,t ] = ∆ i . W e use Z i, 1 for clarity and notational conv enience. 4 on the data observ ed up to time t . A decision ϕ t = 1 corresponds to rejecting H 0 , while ϕ t = 0 indicates failing to reject H 0 . A sequen tial test ϕ is commonly required to satisfy tw o key criteria. First, it must be lev el- α , that is, the probability that ϕ incorrectly rejects H 0 m ust b e at most α simultane ously acr oss al l time steps . Deﬁnition 3.1 (Level- α Sequential T est) . A se quential test ϕ for H 0 is level- α if sup P ∈ H 0 P ( ∃ t ≥ 1 : ϕ t = 1) ≤ α . The second criteria is that ϕ b e p ow erful. In sequen tial testing, one typically requires ϕ to b e asymptotically p ow erful [ 33 , 5 ], i.e., it satisﬁes inf P ∈ H 1 P ( ∃ t ≥ 1 : ϕ t = 1) = 1 . Another, comple- men tary notion, requires that, for ϕ that hav e sto chastic process, these pro cesses hav e a fast growth rate [ 41 ] under the alternative. Here, as others do [ 33 , 41 ], w e regard a test to b e p ow erful if its (exp ected) stopping time under the alternative is small, making use of the following deﬁnition. Deﬁnition 3.2 (Stopping Time) . The stopping time of a level- α se quential test ϕ is the p ositive inte ger value d r andom variable τ = min { t : ϕ t = 1 } . The stopping time is deﬁned as the ﬁrst (p otentially random) time at which ϕ rejects the n ull. W e will design lev el- α sequen tial tests for H global 0 that stop quic kly under H global 1 , c haracterized by upp er b ounds on their exp ected stopping time. T o do so, we recall some deﬁnitions and prop erties of nonnegativ e sup ermartingales. Martingales. A stochastic pro cess M ≡ ( M t ) t ≥ 1 is a P -martingale with resp ect to a stream ( Z t ) t ≥ 1 if E P [ M t +1 | Z 1 , . . . , Z t ] = M t , ∀ t ≥ 1 . It is a P -sup ermartingale if E P [ M t +1 | Z 1 , . . . , Z t ] ≤ M t , ∀ t ≥ 1 . The following result on P -supermartingales is cen tral to our w ork. Theorem 3.1 (Ville’s Inequality [ 37 ]) . L et M b e a nonne gative P -sup ermartingale with an initial value M 0 ≥ 0 . Then ∀ α > 0 , P ( ∃ t ≥ 1 : M t ≥ 1 /α ) ≤ α E P [ M 0 ] . P -sup ermartingales and Ville’s inequality are key to ols for constructing sequential tests, and w e will no w lev erage these to test a stream-sp eciﬁc null H i, 0 . This will provide the foundations to develop sequen tial tests for the global null. Note that all following discussions regarding martingales can b e easily generalized to sto c hastic pro cesses adapted to the underlying canonical ﬁltration. W e omit this generalization for improv ed readability . 3.1 T esting a Stream-Sp eciﬁc Null H i, 0 T o construct a sequential test for a single H i, 0 , w e rely on the testing-by-b etting framework [ 31 ]. This in volv es starting with w ealth W i, 0 = 1 and iteratively b etting on the v alues of Z i,t from the stream ( Z i,t ) t ≥ 1 with the goal of maximizing our w ealth. A p ositive b et indicates our b elief that Z i,t > 0 (resp. negative indicates < 0) , and the magnitude represents ho w muc h we are willing to b et. W e structure our b ets so our exp ected w ealth increases if E P i [ Z ]  = 0 (i.e., if the null is false) and sta ys constan t otherwise. Thus, our w ealth serv es as a measure of evidenc e against the n ull: the more wealth, the stronger our b elief that the n ull is false. F ormally , at every time t ≥ 1 we design a pa yoﬀ function S i,t : R → R ≥ 0 that satisﬁes E P [ S i,t ( Z i,t ) | Z i, 1 , . . . , Z i,t − 1 ] ≤ 1 (3) for an y P ∈ H i, 0 . Then, w e receiv e Z i,t and update our w ealth according to W i,t = W i,t − 1 · S i,t ( Z i,t ) = Q t j =1 S i,j ( Z i,j ) . One can notice that the wealth pro cess W i ≡ ( W i,t ) t ≥ 1 is a non-negative P -sup ermartingale for an y P ∈ H i, 0 so Ville’s inequalit y ensures that rejecting when W i,t ≥ 1 /α constitutes a level- α sequential test [ 33 ]. Th us, to design a p o werful level- α sequential test for H i, 0 5 it suﬃces to construct a wealth process that (1) is P -sup ermartingale for any P ∈ H i, 0 , and (2) gro ws rapidly for an y P ∈ H i, 1 . F ollo wing previous w orks [ 31 , 27 , 33 , 5 , 40 , 30 , 24 ], at eac h time t w e use the pay oﬀ S i,t ( Z i,t ) = 1 + λ i,t · Z i,t . The sequence ( λ i,t ) t ≥ 1 is [ − 1 / 2 , 1 / 2 ] -v alued and selected using Online Newton Step (ONS) (Algorithm 1 ) 2 [ 9 , 12 ]. ONS guarantees the wealth gro ws exp onen tially under the alternative (see Lemma B.4 ). With these c hoices, denote the resulting wealth pro cess as W ons i ≡ ( W ons i,t ) t ≥ 1 and test ϕ ons i ≡ ( ϕ ons i,t ) t ≥ 1 with ϕ ons i,t = 1 [ W ons i,t ≥ 1 /α ] . (4) Ch ugg et al. [ 5 ] show that ϕ ons i is lev el- α and that, under H 1 ,i , its stopping time τ i = min { t : W ons i,t ≥ 1 /α } ob eys E [ τ i ] = O  1 ∆ 2 i ln 1 α · ∆ 2 i  . First, this shows ϕ ons i adapts to the hardness of the alternative: a larger | ∆ i | implies less time required to reject H i, 0 . Second, the smaller the signiﬁcance level α , the longer the stopping time (logarithmically). W e provide a pro of of this result for completeness in Section A.1 . 4 Standard Global Sequen tial T esting The machinery developed th us far forms the foundation of the common testing strategies one ma y emplo y to test H global 0 . 4.1 Bonferonni correction The standard sequen tial test for H global 0 uses the stream-sp eciﬁc tests ϕ ons i [ 5 , 10 , 3 , 28 ]. Giv en parallel w ealth pro cesses W ons i , one for each H i, 0 , the natural test for H global 0 is deﬁned as ϕ bonf ≡ ( ϕ bonf t ) t ≥ 1 , with ϕ bonf t = 1  max i ∈ [ k ] W ons i,t ≥ k /α  . (5) This test trac ks the single highest wealth at eac h time t and rejects H global 0 using a Bonferroni-st yle correction that raises the rejection threshold to k / α . The follo wing result shows that ϕ bonf is a lev el- α sequen tial test and pro vides an upp er b ound on its exp ected stopping time under the alternativ e. Prop osition 4.1 (Chugg et al. [ 5 ]) . The test ϕ bonf is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping ob eys E [ τ ] = O  1 ∆ 2 m ln k α · ∆ 2 m  . (6) A pro of is included in Section A . W e pause to discuss this b ound’s dep endence on k and ∆ m . Dep endence on k . Keeping the maximum mean ∆ m ≤ ϵ , the upp er b ound increases with k as ln k . While the dep endence is only logarithmic, the b ound always increases, regardless of whether the new streams provide additional evidence against the n ull. 2 The ONS algorithm deﬁned in Algorithm 1 can be used to learn b etting fractions in R k . Here, the sequence ( λ i,t ) t ≥ 1 represen ts univ ariate b etting fractions, and the simple ONS update rule for learning them is outlined in Section C . 6 Dep endence on ∆ m . H global 0 can fail to hold in man y w ays, including when a ma jorit y , or ev en all, of the streams ha ve nonzero means. Regardless, k eeping k ﬁxed, the upper b ound dep ends on evidence from all streams only through the maxim um mean. Both of these dep endencies rev eal a k ey limitation of ϕ bonf : in cases when a ma jorit y of the streams ha ve nonzero means, ϕ bonf seems to fail to fully utilize the p otential evidence present in all the streams that could b e used to reject faster. 4.2 Global Sequential T ests via Multiv ariate Strategies The previous approach was based on making a decision after incorp orating evidence from the indi- vidual k wealth pro cesses, each one of them arising from a strategy that is indep endent of all the other streams. A more natural approach to incorp orate information across k streams is to consider all k outcomes simultaneous ly and determine optimal multiv ariate strategies for maximizing w ealth gro wth, drawing from to ols in m ultiv ariate betting, p ortfolio theory , and adaptiv e, high-dimensional online learning metho ds [ 12 , 7 , 33 ]. The ﬁrst multiv ariate approach relies on ideas from p ortfolio optimization [ 7 ]. Letting  Z t = ( Z 1 ,t , . . . , Z k,t ) ∈ [ − 1 , 1] k b e the k outcomes, at eac h t we make multivariate b ets  λ t = ( λ 1 ,t , . . . , λ k,t ) . That is, w e b et fractions of our total w ealth across al l k outcomes Z 1 ,t , . . . , Z k,t , whic h enables us to ac hiev e higher wealth [ 7 ]. This leads to the m ultiv ariate pay oﬀ S t = 1 + ⟨  λ t ,  Z t ⟩ (7) and resulting w ealth pro cess W mv-ons ≡ ( W mv-ons t ) t ≥ 1 where W mv-ons t = Q t j =1 S j and the m ultiv ariate b ets  λ t are computed via ONS (Algorithm 1 ). In the following result, w e sho w that ( W mv-ons t ) t ≥ 1 is a P -martingale for any P ∈ H global 0 and ϕ mv-ons ≡ ( ϕ mv-ons t ) t ≥ 1 , with ϕ mv-ons t = 1 [ W mv-ons t ≥ 1 /α ] is a lev el- α sequential test. W e also pro vide a b ound for the exp ected stopping time under the alternative. Theorem 4.1. The sto chastic pr o c ess W mv-ons is a P -martingale for any P ∈ H global 0 . Thus, ϕ mv-ons is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping time ob eys E [ τ ] = O  k ∆ 2 m ln  k α · ∆ 2 m  . (8) A pro of is pro vided in Section A.3 . The b ound is worse than that of ϕ bonf , having an additional k factor, th us being m uch w eaker. Conceptually , placing multiv ariate b ets  λ t enables higher w ealth gro wth, but the rate at which wealth gro ws is slow ed b y a factor of k b ecause ONS must simultane- ously determine k b etting fractions λ 1 ,t , . . . , λ k,t . F or a more thorough discussion, see Section A.3 . T o get around this limitation, one can place univ ariate b ets λ t on linear transformations of the outcomes  Z t that eﬀectiv ely highlight whic h streams provide the most evidence against the null [ 33 ]. In other words, one can use the follo wing pa yoﬀ function S t = 1 + λ t · ⟨ u t ,  Z t ⟩ (9) where | λ t | ≤ 1 / 2 and ∥ u t ∥ 1 ≤ 1 ∀ t ≥ 1 . The sequence ( λ t ) t ≥ 1 is determined by ONS, while the sequence ( u t ) t ≥ 1 is determined by the fol low the r e gularize d le ader (FTRL) algorithm recalled in Algorithm 2 , on the stream ((  Z t , −  Z t )) t ≥ 1 . The resulting pro cess ( W ftrl t ) t ≥ 1 is a P -martingale under the global n ull, th us ϕ ftrl ≡ ( ϕ ftrl t ) t ≥ 1 , with ϕ ftrl t = 1  W ftrl t ≥ 1 /α  is a lev el- α sequen tial test. Moreo ver, it has an expected stopping time guaran tee iden tical to that of ϕ bonf . 7 Theorem 4.2 (Stopping time of ϕ ftrl ) . The sto chastic pr o c ess W ftrl is a P -martingale for any P ∈ H global 0 . Thus, the test ϕ ftrl is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping time ob eys E [ τ ] = O  1 ∆ 2 m ln k α · ∆ 2 m  . (10) A pro of is provided in Section A.4 . The b ound is identical to that of Bonferroni’s, having the same dep endence on k and ∆ m , thus inheriting the limitations discussed earlier. In conclusion, the t wo natural m ultiv ariate approaches outlined in this subsection do no b etter than ϕ bonf . In the follo wing sections, w e in tro duce sequential tests that do and ac hieve faster wealth growth. 5 P o w erful Global T ests via Merging W e will now take a diﬀeren t approac h and leverage recent adv ances in mer ging martingales [ 39 , 26 ] to construct sequential tests for H global 0 . Recall that eac h wealth pro cess, W ons i , quan tiﬁes evidence against its n ull h yp othesis, H i, 0 , with larger wealth indicating stronger evidence. These pro cesses also pro vide evidence against the global n ull, and th us can be combined into a single pro cess that trac ks evidence against H global 0 . With appropriate mer ging strategies, the merged pro cess is a P -martingale under H global 0 , and so can be used to develop lev el- α sequen tial tests. The Bonferroni test sp eciﬁc in Eq. ( 5 ) pro vides a sp e- ciﬁc example of merging evidence–through the maximal stream-sp eciﬁc wealth–but other strategies ha ve remained unexplored. W e will now study tw o merging strategies that use a verages and pro d- ucts, highlighting their complemen tary strengths, b efore developing a third, combined strategy that inherits b oth their strengths. 5.1 Pro duct W ealth Pro cess Giv en wealth processes { W ons i } k i =1 , we deﬁne the product wealth process as W prod ≡ ( W prod t ) t ≥ 1 , where W prod t is the pro duct of the k individual w ealth pro cesses at each time. In other words, W prod t = k Y i =1 W ons i,t . (11) W e no w show that W prod is a P -martingale under H global 0 and th us the test ϕ prod ≡ ( ϕ prod t ) t ≥ 1 , where ϕ prod t = 1 h W prod t ≥ 1 /α i , is lev el- α . W e also provide an upp er b ound on the exp ected stopping time under the alternative. Theorem 5.1 (Stopping time of ϕ prod ) . The sto chastic pr o c ess W prod is a P -martingale for any P ∈ H global 0 . Thus, ϕ prod is a level- α se quential test for H glob al 0 . Mor e over, under H glob al 1 , its exp e cte d stopping time ob eys E [ τ ] = O  T prod  wher e T prod = 1 ∆ s ln 1 α + k ∆ s ln k ∆ s + k ∆ 2 s ln k ∆ 2 s . (12) A proof is provided in Section A.5 with all constant factors made explicit. A few remarks are in order. F or simplicity , consider γ ∈ [ k ] and supp ose under the alternative the ﬁrst γ means are nonzero, i.e., ∆ i = ϵ  = 0 when i ≤ γ , and ∆ i = 0 otherwise. In this setting, Eq. ( 12 ) ev aluates to E [ τ ] = O  1 γ ϵ 2 ln 1 α + k γ ϵ 2 ln k γ ϵ 2 + k γ 2 ϵ 4 ln k γ 2 ϵ 4  . 8 Dense alternativ e γ = k . Here, in the large k and small α regime, the dominating term is 1 / k ln 1 / α . This is signiﬁcantly better than the ln k dep endence in the Bonferroni b ound in Prop osition 4.1 . As eac h wealth pro cess grows, their pro duct W prod pro vides ov erwhelming evidence against H global 0 , enabling rapid rejection. Sparse alternative γ = 1 . Here, when k is large and α small, the dominating term b oils do wn to k ln k + ln 1 / α . This is signiﬁcantly worse than the dep endence of the Bonferroni b ound. In this case, the growth of W prod due to the single non-n ull stream is hea vily inhibited by the remaining streams ﬂuctuating around 1 . 5.2 A v erage W ealth Pro cess Giv en the pro cesses { W ons i } k i =1 , w e deﬁne the av erage wealth pro cess as W ave ≡ ( W ave t ) t ≥ 1 where W ave t = 1 k k X i =1 W ons i,t . (13) A t each time t , W ave t is the mean of the k individual wealth processes at that t , taking a 1 / k fraction of w ealth from each. W e now show that W ave is a P -martingale for any P ∈ H global 0 and th us the test ϕ ave ≡ ( ϕ ave t ) t ≥ 1 , with ϕ ave t = 1 [ W ave t ≥ 1 /α ] , is level α . Moreov er, we provide an upp er b ound on the exp ected stopping time under the alternativ e. Theorem 5.2 (Stopping time of ϕ ave ) . The sto chastic pr o c ess W ave is a P -martingale for any P ∈ H global 0 . Thus, the test ϕ ave is a level- α se quential test for H glob al 0 . Mor e over, under H glob al 1 , its exp e cte d stopping time ob eys: E [ τ ] = O (min { T , T bonf } ) (14) wher e T = k ∆ s ln 1 α + k ∆ s ln k ∆ s + k ∆ 2 s ln k ∆ 2 s and T bonf = 1 ∆ 2 m ln k α · ∆ 2 m . (15) A pro of is pro vided in Section A.6 . Again, we pause to discuss ho w the b ound b eha ves as the n umber of streams with nonzero means c hanges. As earlier, consider γ ∈ [ k ] , and supp ose under the alternativ e that ∆ i = ϵ when i ≤ γ , and ∆ i = 0 otherwise. Then, the bound ev aluates to E [ τ ] = O  k γ ϵ 2 ln 1 α + k γ ϵ 2 ln k γ ϵ 2 + k γ 2 ϵ 4 ln 1 γ 2 ϵ 4  . Dense alternativ e γ = k . Here, when k is large and α is small, T bonf and T are dominated by ln k / α and ln 1 / α resp ectively . Hence, the upp er b ound is dominated by ln 1 / α . In this case, ϕ ave do es pro vide a b eneﬁt ov er ϕ bonf , just not to the same extent as ϕ prod . Sparse alternativ e γ = 1 . Here, in the large k and small α regime, T is dominated b y k ln 1 / α , whereas T bonf is dominated by ln k / α . Hence, the upp er b ound matc hes that of ϕ bonf . This happ ens b ecause only W ons 1 gro ws ov er time while the other wealth pro cess W ons 2 , . . . , W ons k ﬂuctuate around small v alues. In turn, the a verage w ealth, W ave t = 1 k W ons 1 ,t , remains close to the maxim um w ealth,  1 k max i ∈ [ k ] W ons i,t  t ≥ 1 , which is the pro cess track ed b y ϕ bonf . Thus, ϕ ave b eha v es almost identically to ϕ bonf . This discussion indicates that ϕ ave is eﬀective when evidence is concentrated in only a few streams, and p erforms well, just not as go o d as ϕ prod , when many streams contain evidence against the null. 9 T able 1: Comparison of the expected stopping time upp er b ounds, showing the dominating terms and their dep endence on k and α for large k and small α . T est Alternativ e Sparse γ = 1 Dense γ = k Bonferroni ϕ bonf (Prop osition 4.1 ) ln 1 / α + ln k ln 1 / α + ln k MV-ONS ϕ mv-ons (Theorem 4.1 ) k ln 1 / α + k ln k k ln 1 / α + k ln k FTRL ϕ ftrl (Theorem 4.2 ) ln 1 / α + ln k ln 1 / α + ln k Pro duct ϕ prod (Theorem 5.1 ) ln 1 / α + k ln k 1 / k ln 1 / α A v erage ϕ ave (Theorem 5.2 ) ln 1 / α + ln k ln 1 / α Balanced ϕ balance (Theorem 5.3 ) ln 1 / α + ln k 1 / k ln 1 / α T able 1 shows the comparison b etw een the tests in the large k , small α regime, sho wing the com- plemen tary strengths of the tw o tests. W e will now construct a wealth pro cess that b alanc es the strengths W ave and W prod , pro ducing a test that p erforms w ell for b oth sparse and dense alterna- tiv es. 5.3 A Balanced Sequen tial T est Giv en the t wo wealth pro cesses W prod and W ave , the balanced pro cess W balance ≡ ( W balance t ) t ≥ 1 is deﬁned as W balance t = 1 2 W ave t + 1 2 W prod t (16) and asso ciated test ϕ balance ≡ ( ϕ balance t ) t ≥ 1 where ϕ balance t = 1  W balance t ≥ 1 /α  . Since W ave and W prod are b oth P -martingales for any P ∈ H global , their a verage is as well. The next result sho ws that ϕ balance is a level- α sequential test for H global 0 , and provides a b ound on its exp ected stopping time under the alternative. Theorem 5.3 (Stopping time of ϕ balance ) . The test ϕ balance is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping time ob eys E [ τ ] ≤ min { T prod , T bonf } . A pro of is provided in Section A.7 . Crucially , when the alternativ e is dense, i.e. evidence is dispersed across a ma jority of streams, ϕ balance b eha v es like ϕ prod . On the other hand, when the alternative is sparse, i.e., evidence is contained in just a few streams, ϕ balance b eha v es lik e ϕ ave (and ϕ bonf ). In this w ay , ϕ balance alw ays rejects nearly as quic kly as the b est p ossible test, regardless of the sparsit y of the alternativ e. T able 1 presents the stopping time guarantees for the tw o sparsity regimes across all the tests discussed, showing that ϕ balance obtains the b est of b oth worlds . Before mo ving on, w e note that this is not the only wa y of merging processes with impro ved exp ected stopping times. F or instance, another alternative could rely on constructing a test that rejects whenev er the pro duct and Bonferroni tests reject by employing a union b ound argument. In other w ords, rejecting whenever max { W prod t , W bonf t } ≥ 2 / α , where W bonf t = 1 / k max i ∈ [ k ] W ons i,t . Ho wev er, this test has the same exp ected stopping time as that for ϕ balance , and its w ealth process is never higher than that of W balance . 6 Exp erimen ts W e no w v alidate our theory on synthetic and real world data, fo cusing on how diﬀeren t levels of non-n ull streams aﬀect the stopping time of the diﬀerent tests. 10 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (a)  k 1 k  = 0 . 05 (b)  k 1 k  = 0 . 30 (c)  k 1 k  = 0 . 75 Figure 2: T op: Distribution of stopping times, ov er 1,000 sim ulations, for v arious sequential tests across settings with v arying prop ortions of streams with nonzero means. A test rejects when its corresp onding wealth process exceeds 1 / α for α = 0 . 01 . The dashed v ertical line is the empirical mean of the stopping times. Bottom: T ra jectories of v arious wealth pro cesses across settings with diﬀeren t amounts of nonzero means. Eac h line represents the median tra jectory of a wealth pro cess o ver 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presen ted on a logarithmic scale. W ealth pro cesses are clipp ed to 10 − 3 for visualization purp oses. 6.1 Syn thetic W e b egin with a synthetic example with k = 250 streams, of whic h k 1 < k are non-n ulls. F or j k 1 k k ∈ { 0 . 05 , 0 . 30 , 0 . 75 } fraction of streams, Z i,t ∼ Uniform ( a, b ) , with a and b chosen such that | E P i [ Z ] | = 0 . 1 and V ar P i [ Z ] = 0 . 2 . F or the remaining k − k 1 streams, Z i,t ∼ Uniform  − p 3 / 5 , p 3 / 5  , yielding E P i [ Z ] = 0 and V ar P i [ Z ] = 0 . 2 . W e use ϕ bonf , ϕ ftrl , ϕ ave , ϕ prod , and ϕ balance to test the global null. Results for ϕ mv-ons are omitted since its p o or p erformance obscures the presentation of our results but this, along with additional exp erimen ts for other v alues of k and prop ortions j k 1 k k , are presen ted in Section D . This exp eriment is run 1,000 times. Sparse alternativ e j k 1 k k = 0 . 01  . The stopping time distributions in Fig. 2a illustrate that when 1% of streams hav e nonzero means, ϕ bonf , ϕ ave , and ϕ ftrl ha ve the smallest stopping times. In con trast, ϕ prod fails to reject ev en after 1,000 time steps (only the ﬁrst 350 time steps are shown). Notably , the stopping time distribution of ϕ balance , sho wn in the top panel, is nearly identical to those of ϕ bonf , ϕ ave , and ϕ ftrl . The pro cesses track ed by these tests are nearly identical and grow at 11 a similarly fast rate. In con trast, W prod do es not grow, causing ϕ prod to not reject. Mo derate alternativ e j k 1 k k = 0 . 30  . When the proportion of streams with nonzero means increases to 30%, the b ehavior of the tests b egin to c hange. The stopping time distribution in Fig. 2b shows that all of the tests ha ve nearly identical distributions. This ﬁnding is supp orted b y Fig. 2b , where we see all the wealth pro cesses cross 1 / α around t ≈ 140 − 200 . The in teresting ﬁnding is how the wealth pro cesses grow. The wealth processes trac ked b y ϕ bonf , ϕ ave , ϕ ftrl , and ϕ balance remain nearly identical and grow at the same rate. In contrast, W prod initially decreases, approac hing zero, b efore ﬁnally rapidly increasing around t ≈ 160 . Dense alternativ e j k 1 k k = 0 . 75  . When 75% of streams hav e nonzero means, the b ehavior of the tests and wealth pro cesses change signiﬁcan tly . The stopping time distributions in Fig. 2c sho w that ϕ prod has the smallest times, with ϕ balance nearly matching its b ehavior. On the other hand, ϕ bonf , ϕ ftrl , and ϕ ave ha ve similar, but larger stopping times than ϕ prod and ϕ balance . The w ealth pro cesses W prod and W balance are nearly iden tical, decreasing tow ards zero brieﬂy b efore rapidly increasing and reac hing 1 / α in t ≤ 50 . The wealth pro cesses track ed b y ϕ bonf , ϕ ave , and ϕ ftrl , are all similar, gro wing slo wer and more gradually than W prod and W balance and rejecting at a later time. 6.2 Zero-shot medical image classiﬁcation W e ev aluate ConceptCLIP [ 21 ], the ﬁrst medical vision–language pretraining mo del designed to p erform accurate prediction tasks on medical images across div erse mo dalities. W e assess whether the zero-shot ConceptCLIP mo del f ( X ) is multiac cur ate [ 17 , 13 ] across k = 10 groups, i.e. we ev aluate if the predictions of f are unbiased on every group. The groups we consider are deﬁned b y inter sections of imaging mo dalities and anatomical regions, such as lung computed tomography (CT) , chest X-ray , breast mammogram , retinal optical ct (OCT) , and more. Additional details regarding the datasets are in Section D . Sp eciﬁcally , w e consider k streams of medical image–lab el pairs ( X i,t , Y i,t ) t ≥ 1 , where ( X i,j , Y i,j ) ∼ P i for eac h group i ∈ [ k ] , and test the global null hypothesis: H global 0 : ∀ i ∈ [ k ] , E P i [ Z ] = 0 . (17) where Z = f ( X ) − Y . Empirical estimates of E P i [ Z ] for all groups are display ed in T able 3 , provided in Section D . In this exp erimen t, w e are interested in seeing which tests reject the n ull the fastest. Zero-shot multiaccuracy . F rom T able 3 , we see that ConceptCLIP makes biased predictions on a ma jorit y of groups, as | E P i [ Z ] | ≥ 0 . 05 for eight of ten groups. The stopping times for the v arious tests and wealth processes display ed in Section 6.2 supp ort this ﬁnding. The tests ϕ prod and ϕ balance reject the quic kest, on a verage, and their wealth processes grow rapidly . In contrast, ϕ mv-ons , ϕ ftrl , and ϕ ave reject at later times. This is reﬂected in their w ealth pro cesses, whic h gro w slow er than W prod and W balance . Multiaccuracy adjustment . F or illustrativ e purp oses, we post-pro cess the predictions of Con- ceptCLIP to b e m ultiaccurate [ 13 ] on all images except those of colon endoscopy , to sim ulate a sparse alternative where only stream has a nonzero mean. That is for all groups E P i [ Z ] ≈ 0 except for colon endoscopy images, whic h has E P i [ Z ] ≈ − 0 . 09 . Section 6.2 displa ys the stopping times for the v arious tests and wealth pro cesses and, as expected, we see that ϕ bonf and ϕ ave reject the quic kest, as their wealth pro cesses grow the fastest. Imp ortantly , ϕ balance nearly matches the p er- formance of these tests. On the other hand, the remaining tests also reject, just at later times as their w ealth pro cesses grow more gradually . 12 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 20 40 60 80 Stopping Time 0 250 Count φ balance 0 20 40 60 80 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 10 3 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (a) Zero-shot ConceptCLIP m ultiaccuracy te sting results. 0 100 Count φ bonf Distribution of Stopping Times 0 100 Count φ mv-ons 0 100 Count φ ftrl 0 100 Count φ prod 0 100 Count φ av e 0 100 200 300 400 Stopping Time 0 100 Count φ balance 0 100 200 300 400 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 10 3 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (b) P ost-pro cessed ConceptCLIP m ultiaccuracy testing results. Figure 3: Left plot of each ﬁgure: Distribution of stopping times, o ver 1,000 runs, for v arious sequen tial tests. A test rejects when its corresp onding wealth pro cess exceeds 1 / α for α = 0 . 01 . The dashed vertical line is the empirical mean of the stopping times. Righ t plot of eac h ﬁgure: V arious wealth pro cess tra jectories. Each line represents the median tra jectory of a wealth pro cess o ver 1,000 runs, with shaded areas indicating the 25% and 75% quantiles. 7 Conclusion W e studied a testing problem where giv en sev eral parallel streams of data, reﬂecting the p erfor- mance of an ML system, the goal is to raise an alarm whenever any of the streams shows unusual b eha vior. W e mo deled this as a sequen tial hypothesis testing problem, gathering evidence against the global n ull hypothesis that all data streams ha ve mean zero. Utilizing to ols in mean testing for a single stream, we demonstrated that natural extensions of existing algorithms (such as Bonferroni corrections and multiv ariate ONS) fail to pro duce eﬃcient tests, mainly b ecause of their inabilit y to adapt to the n umber of streams with non-zero means. Th us, we studied additional tests which relied on diﬀerent strategies of merging evidence from multiple streams. With our theoretical ﬁndings, we designed a balanced test with improv ed exp ected rejection times, rejecting faster than all the other tests while having con trolled t yp e-I error. Imp ortantly , our balanced test draws the b eneﬁts of b oth pro duct and av erage merging pro cesses without having to know in adv ance the kind of alternative (sparse or dense). Finally , we empirically v alidated our results on b oth synthetic and real data, sho wing that our theoretical conclusions lead to faster and impro ved practical metho ds for ﬂagging erroneous b ehavior in multi-stream mo dels. 13 Impact Statemen t This pap er presents work whose goal is to adv ance the ﬁeld of mac hine learning auditing and multiple h yp othesis testing. The full so cietal impact of our w ork is complex and diﬃcult to predict. Y et, w e hope that our eﬀorts to wards developing eﬃcien t to ols to monitor mac hine learning predictors con tribute to their responsible use in so ciety . A c kno wledgements This w ork w as supp orted by NIH aw ard R01CA287422. References [1] Jack Bandy. “ Problematic mac hine behavior: A systematic literature review of algorithm audits”. In: Pr o c e e dings of the acm on human-c omputer inter action 5.CSCW1 (2021), pp. 1– 34. [2] Sarah H Cen and Rohan Alur. “ F rom transparency to accountabilit y and back: A discussion of access and evidence in ai auditing”. In: Pr o c e e dings of the 4th A CM Confer enc e on Equity and A c c ess in Algorithms, Me chanisms, and Optimization . 2024, pp. 1–14. [3] Can Chen and Jun-Kun W ang. “ Online Detection of LLM-Generated T exts via Sequen tial Hy- p othesis T esting b y Betting”. In: F orty-se c ond International Confer enc e on Machine L e arning . 2025. url : https://openreview.net/forum?id=khFk7sdv9o . [4] Brian Cho, Kyra Gan, and Nathan Kallus. “ P eeking with PEAK: sequential, nonparametric comp osite hypothesis tests for means of multiple data streams”. In: Pr o c e e dings of the 41st International Confer enc e on Machine L e arning . 2024, pp. 8487–8509. [5] Ben Chugg et al. “ Auditing fairness b y b etting”. In: A dvanc es in Neur al Information Pr o c essing Systems 36 (2023), pp. 6070–6091. [6] Europ ean Commission. Pr op osal for a R e gulation L aying Down Harmonise d Rules on Artiﬁcial Intel ligenc e (A rtiﬁcial Intel ligenc e A ct) . Europ ean Union, 2021. [7] Thomas M Cov er. “ Universal p ortfolios”. In: Mathematic al ﬁnanc e 1.1 (1991), pp. 1–29. [8] Miklós Csörgő and La jos Horv áth. “ 20 nonparametric metho ds for c hangep oin t problems”. In: Handb o ok of statistics 7 (1988), pp. 403–425. [9] Ashok Cutk osky and F rancesco Orabona. “ Black-box reductions for parameter-free online learning in banach spaces”. In: Confer enc e On L e arning The ory . PMLR. 2018, pp. 1493–1529. [10] Jessica Dai et al. “ F rom Individual Exp erience to Collective Evidence: A Rep orting-Based F ramew ork for Identifying Systemic Harms”. In: F orty-se c ond I nternational Confer enc e on Machine L e arning . 2025. [11] Jeﬀrey Dastin. “ Amazon scraps secret AI recruiting to ol that sho wed bias against women”. In: Ethics of data and analytics . Auerbach Publications, 2022, pp. 296–299. [12] Elad Hazan, Amit Agarwal, and Sat yen Kale. “ Logarithmic regret algorithms for online conv ex optimization”. In: Machine L e arning 69.2 (2007), pp. 169–192. [13] Ursula Héb ert-Johnson et al. “ Multicalibration: Calibration for the (computationally -identiﬁa- ble) masses”. In: International Confer enc e on Machine L e arning . PMLR. 2018, pp. 1939–1948. [14] W assily Ho eﬀding. “ Probability inequalities for sums of b ounded random v ariables”. In: Jour- nal of the Americ an statistic al asso ciation 58.301 (1963), pp. 13–30. 14 [15] Adel Jav anmard and Andrea Montanari. “ Online rules for control of false disco very rate and false disco v ery exceedance”. In: The A nnals of statistics 46.2 (2018), pp. 526–554. [16] Emilie Kaufmann and W outer M Koolen. “ Mixture martingales revisited with applications to sequen tial tests and conﬁdence in terv als”. In: Journal of Machine L e arning R ese ar ch 22.246 (2021), pp. 1–44. [17] Michael P Kim, Amirata Ghorbani, and James Zou. “ Multiaccuracy: Black-box p ost-pro cessing for fairness in classiﬁcation”. In: Pr o c e e dings of the 2019 AAAI/A CM Confer enc e on AI, Ethics, and So ciety . 2019, pp. 247–254. [18] Colin McDiarmid et al. “ On the metho d of b ounded diﬀerences”. In: Surveys in c ombinatorics 141.1 (1989), pp. 148–188. [19] Danaë Metaxa et al. “ Auditing algorithms: Understanding algorithmic systems from the out- side in”. In: F oundations and T r ends ® in Human–Computer Inter action 14.4 (2021), pp. 272– 344. [20] Jerzy Neyman and Egon Sharp e Pearson. “ IX. On the problem of the most eﬃcien t tests of statistical hypotheses”. In: Philosophic al T r ansactions of the R oyal So ciety of L ondon. Series A, Containing Pap ers of a Mathematic al or Physic al Char acter 231.694-706 (1933), pp. 289– 337. [21] Y uxiang Nie et al. “ Conceptclip: T ow ards trustw orthy medical ai via concept-enhanced con- trastiv e langauge-image pre-training”. In: arXiv e-prints (2025), arXiv–2501. [22] Ziad Obermeyer and Sendhil Mullainathan. “ Dissecting racial bias in an algorithm that guides health decisions for 70 million p eople”. In: Pr o c e e dings of the c onfer enc e on fairness, ac c ount- ability, and tr ansp ar ency . 2019, pp. 89–89. [23] F rancesco Orab ona. A Mo dern Intr o duction to Online L e arning . 2025. arXiv: 1912 . 13213 [cs.LG] . url : https://arxiv.org/abs/1912.13213 . [24] F rancesco Orab ona and Kw ang-Sung Jun. “ Tight concentrations and conﬁdence sequences from the regret of universal portfolio”. In: IEEE T r ansactions on Information The ory 70.1 (2023), pp. 436–455. [25] Inioluw a Deb orah Ra ji and Joy Buolam wini. “ A ctionable auditing: Inv estigating the impact of publicly naming biased p erformance results of commercial ai pro ducts”. In: Pr o c e e dings of the 2019 AAAI/A CM Confer enc e on AI, Ethics, and So ciety . 2019, pp. 429–435. [26] Aadity a Ramdas and Ruo du W ang. “ Hyp othesis T esting with E-v alues”. In: F oundations and T r ends ® in Statistics 1.1–2 (July 2025), pp. 1–390. issn : 2978-4220. doi : 10 . 1561 / 3600000002 . url : http://dx.doi.org/10.1561/3600000002 . [27] Aadity a Ramdas et al. “ Game-theoretic statistics and safe an ytime-v alid inference”. In: Sta- tistic al Scienc e 38.4 (2023), pp. 576–601. [28] Leo Rich ter et al. “ An Auditing T est to Detect Beha vioral Shift in Language Mo dels”. In: The Thirte enth International Confer enc e on L e arning R epr esentations . 2025. url : https : //openreview.net/forum?id=h0jdAboh0o . [29] Cynthia Rudin, Caroline W ang, and Beau Cok er. “ The age of secrecy and unfairness in re- cidivism prediction”. In: Harvar d Data Scienc e R eview 2.1 (2020), p. 1. [30] J Jon Ryu and Alankrita Bhatt. “ On conﬁdence sequences for b ounded random pro cesses via univ ersal gam bling strategies”. In: IEEE T r ansactions on Information The ory (2024). [31] Glenn Shafer. “ T esting by b etting: A strategy for statistical and scien tiﬁc communication”. In: Journal of the R oyal Statistic al So ciety Series A: Statistics in So ciety 184.2 (2021), pp. 407– 431. 15 [32] K Shaila ja, Banoth Seetharamulu, and MA Jabbar. “ Mac hine learning in healthcare: A re- view”. In: 2018 Se c ond international c onfer enc e on ele ctr onics, c ommunic ation and aer osp ac e te chnolo gy (ICECA) . IEEE. 2018, pp. 910–914. [33] Shubhansh u Shekhar and Aadit ya Ramdas. “ Nonparametric tw o-sample testing b y b etting”. In: IEEE T r ansactions on Information The ory 70.2 (2023), pp. 1178–1203. [34] Padma Susarla, Dexter Purnell, and Ken Scott. “ Zillow’s artiﬁcial in telligence failure and its impact on p erceived trust in information systems”. In: Journal of Information T e chnolo gy T e aching Cases (2024), p. 20438869241279865. [35] UNESCO. R e c ommendation on the Ethics of Artiﬁcial Intel ligenc e . 2021. [36] Briana V ecc hione, Karen Levy, and Solon Baro cas. “ Algorithmic auditing and so cial justice: Lessons from the history of audit studies”. In: Pr o c e e dings of the 1st A CM Confer enc e on Equity and A c c ess in Algorithms, Me chanisms, and Optimization . 2021, pp. 1–9. [37] Jean Ville. Etude critique de la notion de c ol le ctif . V ol. 3. Gauthier-Villars Paris, 1939. [38] Vladimir V ovk and Ruo du W ang. “ E-v alues: Calibration, com bination and applications”. In: The A nnals of Statistics 49.3 (2021), pp. 1736–1754. [39] Ruo du W ang. “ The only admissible w ay of merging arbitrary e-v alues”. In: Biometrika 112.2 (2025), asaf020. [40] Ian W audby-Smith and Aadit ya Ramdas. “ Estimating means of b ounded random v ariables b y b etting”. In: Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy 86.1 (2024), pp. 1–27. [41] Ian W audby-Smith, Ricardo Sandov al, and Mic hael I. Jordan. Universal L o g-Optimality for Gener al Classes of e-pr o c esses and Se quential Hyp othesis T ests . 2025. arXiv: 2504 . 02818 [math.ST] . url : https://arxiv.org/abs/2504.02818 . [42] WHO. The Blueprint for an AI Bil l of Rights . 2022. url : https : / / www . whitehouse . gov / ostp/ai- bill- of- rights . [43] Liyan Xie et al. “ Sequential (quic kest) change detection: Classical results and new directions”. In: IEEE Journal on Sele cte d Ar e as in Information The ory 2.2 (2021), pp. 494–514. [44] Ziyu Xu and Aadity a Ramdas. “ Online multiple testing with e-v alues”. In: International Con- fer enc e on Artiﬁcial Intel ligenc e and Statistics . PMLR. 2024, pp. 3997–4005. [45] Tijana Zrnic, Aadity a Ramdas, and Michael I Jordan. “ Asynchronous online testing of m ultiple h yp otheses”. In: Journal of Machine L e arning R ese ar ch 22.33 (2021), pp. 1–39. 16 A Pro ofs F or the following pro ofs, w e recall our setting established in Section 2 , introduce some additional notation, and deﬁne some additional prop erties of sto chastic pro cesses. Setting, Notation, & Deﬁnitions. W e ha v e k parallel data streams ( Z 1 ,t ) t ≥ 1 , . . . , ( Z k,t ) t ≥ 1 . F or eac h i ∈ [ k ] , the i th stream consists of i.i.d points and w e denote its mean as ∆ i = E [ Z i, 1 ] . Moreo ver, w e deﬁne ∆ m = max i ∈ k ∆ i , and ∆ s = P k i =1 ∆ 2 i . Also, for all t ≥ 1 , we denote  Z t = ( Z 1 ,t , . . . , Z k,t ) ∈ R k to b e the k -dimensional v ector of all the outcomes, Z 1 ,t , . . . , Z k,t , at time t . T o analyze prop erties of these vectors we use norms. Let ∥ · ∥ 1 b e the 1-norm on R k . By deﬁnition, the dual norm is ∥ · ∥ ∞ . In our setting, ∥  Z t ∥ ∞ ≤ 1 for all t ≥ 1 . Moreov er, b y deﬁnition, ∥  Z t ∥ ∞ = ∆ m . By deﬁnition, under the null H global 0 , ∆ i , ∆ m , and ∆ s equal zero. Under the alternative H global 1 , ∃ i ∈ [ k ] , suc h that ∆ i  = 0 . As a result, under H global 1 , ∆ m  = 0 and ∆ s  = 0 . A sto chastic pro cess M ≡ ( M t ) t ≥ 1 is adapted to the stream ( Z t ) t ≥ 1 if E P [ M t | Z 1 , . . . , Z t ] = M t ∀ t ≥ 1 , and predictable if E P [ M t | Z 1 , . . . , Z t − 1 ] = M t ∀ t ≥ 1 . A.1 Pro of of lev el- α con trol and stopping time of ϕ ons i Pr o of. W e ﬁrst show that ϕ ons i ≡ ( ϕ ons i,t ) t ≥ 1 is a lev el- α sequential test for H i, 0 . Then we derive an upp er b ound on its expected stopping time. The pro of tec hnique comes from [ 5 ]. Lev el- α test . Let α ∈ (0 , 1) . Consider any i ∈ [ k ] . Recall ϕ ons i,t = 1 [ W ons i,t ≥ 1 /α ] , where W ons i,t = t Y j =1 (1 + λ i,j · Z i,j ) (18) and the sequence ( λ i,t ) t ≥ 1 is determined by ONS (Algorithm 1 ). First, by construction the wealth pro cess W ons i has initial v alue W ons i, 0 = 1 . Second, ( λ i,t ) t ≥ 1 is a [ − 1 / 2 , 1 / 2 ] -v alued predictable sequence, th us W ons i is adapted and W ons i,t ≥ 0 for all t ≥ 1 . F urthermore, for any P ∈ H i, 0 , E P [ W ons i,t | Z i, 1 , . . . ,Z i,t − 1 ] = E P   t Y j =1 (1 + λ i,j · Z i,j ) | Z i, 1 , . . . , Z i,t − 1   = t − 1 Y j =1 (1 + λ i,j · Z i,j ) · E P [1 + λ i,t · Z i,t | Z i, 1 , . . . , Z i,t − 1 ] = W ons i,t − 1 · (1 + λ i,t · E P [ Z i,t | Z i, 1 , . . . , Z i,t − 1 ]) = W ons i,t − 1 · (1 + 0) = W ons i,t − 1 . Therefore ( W ons i,t ) t ≥ 1 is a non-negative P -martingale for any P ∈ H i, 0 with initial v alue 1 and so, by Ville’s Inequalit y [ 37 ] , sup P ∈ H i, 0 P ( ∃ t ≥ 1 : ϕ ons i,t = 1) = sup P ∈ H i, 0 P ( ∃ t ≥ 1 : W ons i,t ≥ 1 /α ) ≤ α. (19) Exp ected stopping time. Let α ∈ (0 , 1) . By deﬁnition, τ i := min { t : ϕ ons i,t = 1 } = min { t : W ons i,t ≥ 1 /α } . Since τ i is a non-negative integer v alued random v ariable, for any P ∈ H i, 1 E P [ τ i ] = ∞ X t =1 P ( τ i > t ) ≤ ∞ X t =1 P ( W ons i,t < 1 /α ) (20) = ∞ X t =1 P (ln( W ons i,t ) < ln(1 /α )) = ∞ X t =1 P ( E i,t ) (21) 17 where E i,t = { ln( W ons i,t ) < ln(1 /α ) } . Now, deﬁning A i,t = t X j =1 Z i,j V i,t = t X j =1 ( Z i,j ) 2 , (22) b y Lemma B.4 w e hav e ln( W ons i,t ) ≥ ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 ln (4 t ) , ∀ t ≥ 1 . (23) F urthermore, ∀ t ≥ 1 , we know | A i,t | ≤ P j ≤ t | Z i,j | ≤ t and V i,t ≤ t . Thus, E i,t ⊆  ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 ln (4 t ) < ln(1 /α )  (24) =  ( A i,t ) 2 < 4( V i,t + | A i,t | )(ln(1 /α ) + 2 ln(4 t ))  (25) ⊆  ( A i,t ) 2 < 8 t ln(1 /α ) + 16 t ln(4 t )  (26) ⊆  ( A i,t ) 2 < 16 t ln(4 t/α )  (27) ⊆ n | A i,t | < 4 p t ln(4 t/α ) o . (28) Since A i,t = P j ≤ t Z i,t where Z i,j ∈ [ − 1 , 1] by Lemma B.2 , with probability at least 1 − 1 / t 2 | A i,t | ≥ E [ A i,t ] − p 4 t ln(2 t ) = t · | ∆ i | − p 4 t ln(2 t ) . (29) By Lemma B.6 , 1 2 t · | ∆ i | ≥ p 4 t ln(2 t ) for all t ≥ 32 | ∆ i | 2 ln  32 | ∆ i | 2  (30) 1 2 t · | ∆ i | ≥ 4 p t ln(4 t/α ) for all t ≥ 128 | ∆ i | 2 ln  256 α · | ∆ i | 2  . (31) As a result, for t ≥ T := 128 | ∆ i | 2 ln  256 α · | ∆ i | 2  + 32 | ∆ i | 2 ln  32 | ∆ i | 2  = O  1 ∆ 2 i ln 1 α · ∆ 2 i  . (32) w e ha ve t · | ∆ i | ≥ 4 p t ln(4 t/α ) + p 4 t ln(2 t ) . (33) Therefore, b y the law of total probability , for t ≥ T P ( E i,t ) ≤ P  | A i,t | < 4 p t ln(4 t/α ) + p 4 t ln(2 t )  ≤ 1 /t 2 (34) and so we can conclude E [ τ i ] ≤ ∞ X t =1 P ( E i,t ) = T X t =1 P ( E i,t ) + ∞ X t = T P ( E i,t ) ≤ T + ∞ X t = T 1 t 2 ≤ T + π 2 6 . (35) 18 A.2 Pro of of Prop osition 4.1 Prop osition 4.1 (Chugg et al. [ 5 ]) . The test ϕ bonf is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping ob eys E [ τ ] = O  1 ∆ 2 m ln k α · ∆ 2 m  . (6) Pr o of. W e ﬁrst show that ϕ bonf ≡ ( ϕ bonf t ) t ≥ 1 is a level- α sequential test for H global 0 . Then w e derive the upp er b ound on the exp ected stopping time. Lev el- α test . Let α ∈ (0 , 1) . Recall ϕ bonf t = 1 h max i ∈ [ k ] W ons i,t ≥ k α i , where ∀ i ∈ [ k ] W ons i,t = t Y j =1 (1 + λ i,j · Z i,j ) (36) and the wealth pro cesses W ons i for all i ∈ [ k ] are non-negativ e P -martingales for their resp ectiv e n ulls H i, 0 with initial v alue 1 . By Ville’s Inequalit y [ 37 ] and a union b ound sup P ∈ H i, 0 P ( ∃ t ≥ 1 : ϕ bonf t = 1) = sup P ∈ H i, 0 P  ∃ t ≥ 1 : max i ∈ [ k ] W ons i,t ≥ k α = 1  (37) ≤ P  ∃ t ≥ 1 : ∪ k i =1 W ons i,t ≥ k /α  (38) ≤ k X i =1 P ( ∃ t ≥ 1 : W ons i,t ≥ k /α ) ≤ α k k = α. (39) Exp ected stopping time . Let α ∈ (0 , 1) . By deﬁnition, τ := min { t : ϕ bonf t = 1 } = min { t : max i ∈ [ k ] W ons i,t ≥ k /α } . Since τ is a non-negative integer v alued random v ariable, for any P ∈ H global 1 and an y i ∈ [ k ] E P [ τ ] = ∞ X t =1 P ( τ > t ) ≤ ∞ X t =1 P  max i ∈ [ k ] W ons i,t < k /α  (40) ≤ ∞ X t =1 P  W ons i,t < k /α  . (41) Applying the same argument used in the exp ected stopping time pro of of ϕ ons i presen ted in Sec- tion A.1 , we get E [ τ ] ≤ T + π 2 6 where T = 128 ( | ∆ i | ) 2 ln  256 ( α / k ) · ( | ∆ i | ) 2  + 32 ( | ∆ i | ) 2 ln  32 ( | ∆ i | ) 2  . (42) Since this holds for any i ∈ [ k ] E [ τ ] ≤ min i ∈ [ k ]  128 | ∆ i | 2 ln  256 k α · | ∆ i | 2  + 32 | ∆ i | 2 ln  32 | ∆ i | 2  + π 2 6 . (43) F urthermore, min i ∈ [ k ]  128 | ∆ i | 2 ln  256 k α · | ∆ i | 2  + 32 | ∆ i | 2 ln  32 | ∆ i | 2  ≤ 128 ∆ 2 m ln  256 k α · ∆ 2 m  + 32 ∆ 2 m ln  32 ∆ 2 m  (44) concluding the pro of. 19 A.3 Pro of of Theorem 4.1 Theorem 4.1. The sto chastic pr o c ess W mv-ons is a P -martingale for any P ∈ H global 0 . Thus, ϕ mv-ons is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping time ob eys E [ τ ] = O  k ∆ 2 m ln  k α · ∆ 2 m  . (8) Before pro viding the pro of, we make the following remark. Remark. This upp er b ound has an additional k factor compared to the stopping time of ϕ bonf . In tuitively , while placing multiv ariate b ets  λ t w ould enable our wealth to grow higher, the rate at whic h wealth gro ws is slo wed by a factor of k b ecause ONS must simultaneously determine k b etting fractions λ 1 ,t , . . . , λ k,t . T o b e more precise, the additional k arises b ecause the diﬀerence betw een ln( W mv-ons t ) and the highest achiev able log w ealth (using the b est single ﬁxed multiv ariate b et  λ in hindsigh t) is b ounded by O ( k ln t ) . F ormally , by Theorem B.2 max ∥  λ ∥ 1 ≤ 1 / 2 t X j =1 ln(1 + ⟨  λ,  Z j ⟩ ) − t X j =1 ln(1 + ⟨  λ t ,  Z j ⟩ ) = O ( k ln( t )) . (45) This b ound is optimal in t , but gro ws unfav orably with k [ 12 ]. Consequently , the upp er b ound has an additional k factor. Pr o of. W e ﬁrst show that ϕ mv-ons ≡ ( ϕ mv-ons t ) t ≥ 1 is a level- α sequen tial test for H global 0 . Then w e deriv e the upp er b ound on its exp ected stopping time under H global 1 . Lev el- α test . Let α ∈ (0 , 1) . Recall ϕ mv-ons t = 1 [ W mv-ons t ≥ 1 /α ] , where W mv-ons t = t Y j =1  1 + ⟨  λ j ,  Z j ⟩  (46) and the sequence (  λ t ) t ≥ 1 is determined b y ONS. First, by construction W mv-ons has initial v alue W mv-ons 0 = 1 . Second, (  λ t ) t ≥ 1 is a predictable sequence satisfying ∥  λ t ∥ ≤ 1 / 2 for all t ≥ 1 . Thus, W mv-ons is adapted and W mv-ons t ≥ 0 for all t ≥ 1 . Moreo v er, for any P ∈ H global 0 , E P [ W mv-ons t | F t − 1 ] = E P   t Y j =1  1 + ⟨  λ j ,  Z j ⟩  |  Z 1 , . . . ,  Z t − 1   = t − 1 Y j =1  1 + ⟨  λ j ,  Z j ⟩  · E P h 1 + ⟨  λ t ,  Z t ⟩ |  Z 1 , . . . ,  Z t − 1 i = W mv-ons t − 1 ·  1 + D  λ t , E P h  Z t |  Z 1 , . . . ,  Z t − 1 iE = W mv-ons t − 1 · (1 + 0) = W mv-ons t − 1 . Th us, ( W mv-ons t ) t ≥ 1 is a non-negativ e P -martingale for any P ∈ H global 0 with initial v alue 1 and so, b y Ville’s Inequality [ 37 ] , sup P ∈ H global 0 P ( ∃ t ≥ 0 : ϕ mv-ons t = 1) = sup P ∈ H global 0 P ( ∃ t ≥ 0 : W mv-ons t ≥ 1 /α ) ≤ α. (47) 20 Exp ected stopping time . Let α ∈ (0 , 1) . By deﬁnition, τ := min { t : ϕ mv-ons t = 1 } = min { t : W mv-ons t ≥ 1 /α } . Since τ is a non-negative integer v alued random v ariable, for any P ∈ H global 1 E P [ τ ] = ∞ X t =1 P ( τ > t ) ≤ ∞ X t =1 P ( W mv-ons t < 1 /α ) (48) = ∞ X t =1 P (ln( W mv-ons t ) < ln(1 /α )) = ∞ X t =1 P ( E t ) (49) where E t = { ln( W ons t ) < ln(1 /α ) } . By Lemma B.4 , for an y u ∈ R k satisfying ∥ u ∥ 1 = 1 w e ha ve ln( W mv-ons t ) ≥ 1 4  P t j =1 ⟨ u,  Z j ⟩  2 P t j =1 ⟨ u,  Z j ⟩ 2 +    P t j =1 ⟨ u,  Z j ⟩    − 2 k ln (4 t ) , ∀ t ≥ 1 . (50) F urthermore ∀ t ≥ 1 ,    P t j =1 ⟨ u,  Z j ⟩    ≤ P j ≤ t |⟨ u,  Z j ⟩| ≤ t and P t j =1 ⟨ u,  Z j ⟩ 2 ≤ t . Thus, E t ⊆      1 4  P t j =1 ⟨ u,  Z j ⟩  2 P t j =1 ⟨ u,  Z j ⟩ 2 +    P t j =1 ⟨ u,  Z j ⟩    − 2 k ln (4 t ) < ln(1 /α )      (51) ⊆      t X j =1 ⟨ u,  Z j ⟩   2 < 8 t ln(1 /α ) + 16 tk ln(4 t )    (52) ⊆          t X j =1 ⟨ u,  Z j ⟩       < p 8 t ln(1 /α ) + p 16 tk ln(4 t )    (53) where in the last line we use √ a + b ≤ √ a + √ b for a, b ≥ 0 . Note, for all t ≥ 1 , ⟨ u,  Z t ⟩ ≤ ∥ u ∥ 1 ∥ Z t ∥ ∞ ≤ 1 . Thus, P t j =1 ⟨ u,  Z j ⟩ is a sum of inner pro ducts ⟨ u,  Z j ⟩ ∈ [ − 1 , 1] and so, by Lemma B.2 , with probability at least 1 − 1 / t 2       t X j =1 ⟨ u,  Z j ⟩       ≥ t ·    D u, E [  Z 1 ] E    − p 4 t ln(2 t ) . (54) Since u is an y v ector in R k satisfying ∥ u ∥ 1 = 1 , the inequality ab ov e holds for the u achieving the maximal inner product, i.e. u = argmax ∥ u ∥ 1 =1 ⟨ u, E [  Z 1 ] ⟩ = ∥ E [  Z 1 ] ∥ ∞ = ∆ m 3 . Thus, moving forw ard, let the righ t hand side equal t · ∥ E [  Z 1 ] ∥ ∞ − p 4 t ln(2 t ) . By Lemma B.6 , 1 3 t · ∆ m ≥ p 16 tk ln(4 t ) for all t ≥ 288 k ∆ 2 m ln  576 k ∆ 2 m  (55) 1 3 t · ∆ m ≥ p 4 t ln(2 t ) for all t ≥ 72 ∆ 2 m ln  72 ∆ 2 m  (56) and 1 3 t · ∆ m ≥ p 8 t ln(1 /α ) for t ≥ 72 ∆ 2 m ln  1 α  . As a result, for t ≥ T := 288 k ∆ 2 m ln  576 k ∆ 2 m  + 72 ∆ 2 m ln  72 ∆ 2 m  + 72 ∆ 2 m ln  1 α  (57) 3 By construction,  Z 1 ,  Z 2 , . . . are iid so E [  Z 1 ] = · · · = E [  Z t ] . F or simplicit y and notational conv enience we use E [  Z 1 ] to denote the mean of the  Z t . 21 w e ha ve t · ∆ m ≥ p 8 t ln(1 /α ) + p 16 tk ln(4 t ) + p 4 t ln(2 t ) . (58) Therefore, b y the law of total probability , for t ≥ T P ( E t ) ≤ P         t X j =1 ⟨ u,  Z j ⟩       < p 8 t ln(1 /α ) + p 16 tk ln(4 t )   ≤ 1 /t 2 (59) and so we can conclude E [ τ ] ≤ ∞ X t =1 P ( E t ) = T X t =1 P ( E t ) + ∞ X t = T P ( E t ) ≤ T + ∞ X t = T 1 t 2 ≤ T + π 2 6 . (60) A.4 Pro of of Theorem 4.2 Theorem 4.2 (Stopping time of ϕ ftrl ) . The sto chastic pr o c ess W ftrl is a P -martingale for any P ∈ H global 0 . Thus, the test ϕ ftrl is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping time ob eys E [ τ ] = O  1 ∆ 2 m ln k α · ∆ 2 m  . (10) Pr o of. Lev el- α test . Let α ∈ (0 , 1) . Recall ϕ ftrl t = 1 [ W ftrl t ≥ 1 /α ] , where W ftrl t = t Y j =1  1 + λ j ⟨  u j ,  Z j ⟩  (61) and ( λ t ) t ≥ 1 and (  u t ) t ≥ 1 are predictable sequences pic ked b y ONS and follo w the regularized leader, resp ectiv ely . By deﬁnition, the pro cess ( W ftrl t ) t ≥ 1 is adapted with initial v alue W 0 = 1 . Moreo ver, for an y P ∈ H global 0 , E P h W ftrl t | F t − 1 i = E P   t Y j =1  1 + λ j ⟨  u j ,  Z j ⟩  |  Z 1 , . . . ,  Z t − 1   = t − 1 Y j =1  1 + λ j ⟨  u j ,  Z j ⟩  · E P h 1 + λ t ⟨  u t ,  Z t ⟩ |  Z 1 , . . . ,  Z t − 1 i = W ftrl t − 1 ·  1 + λ t D  u t , E P h  Z t |  Z 1 , . . . ,  Z t − 1 iE = W ftrl t − 1 · (1 + 0) = W ftrl t − 1 whic h shows that ( W mv-ons t ) t ≥ 1 is a P -martingale. Due to ONS and FTRL, we ha ve ∥ λ t ∥ 1 ≤ 1 / 2 , ∀ t ≥ 1 , and ∥  u t ∥ 1 ≤ 1 , ∀ t ≥ 1 , and so ∀ t ≥ 1 , W ftrl t ≥ 0 . Therefore, by Ville’s Inequality [ 37 ] , sup P ∈ H global 0 P ( ∃ t ≥ 0 : ϕ ftrl t = 1) = sup P ∈ H global 0 P ( ∃ t ≥ 0 : W ftrl t ≥ 1 /α ) ≤ α (62) 22 Exp ected stopping time . Let α ∈ (0 , 1) . By deﬁnition, τ := min { t : ϕ ftrl t = 1 } = min { t : W ftrl t ≥ 1 /α } . Since τ is a non-negative integer v alued random v ariable, for any P ∈ H global 1 E P [ τ ] = ∞ X t =1 P ( τ > t ) ≤ ∞ X t =1 P ( W ftrl t < 1 /α ) (63) = ∞ X t =1 P (ln( W ftrl t ) < ln(1 /α )) = ∞ X t =1 P ( E t ) (64) where E t = { ln( W ftrl t ) < ln(1 /α ) } . Now, deﬁne A t = t X j =1 ⟨ u j ,  Z j ⟩ V t = t X j =1 ⟨ u j ,  Z j ⟩ 2 . (65) Since, ∥ u j ∥ 1 ≤ 1 for all j = 1 , . . . , t , we know |⟨ u j ,  Z j ⟩| ≤ ∥ u j ∥ 1 ∥  Z j ∥ ∞ ≤ 1 . Th us, by Theorems B.1 and B.2 ln( W ftrl t ) ≥ ( A t ) 2 4( V t + | A t | ) − 2 ln (4 t ) , ∀ t ≥ 1 . (66) F urthermore, ∀ t ≥ 1 , we hav e | A t | ≤ P j ≤ t |⟨ u j ,  Z j ⟩| ≤ t and V t ≤ t . Thus, E t ⊆  ( A t ) 2 4( V t + | A t | ) − 2 ln (4 t ) < ln(1 /α )  (67) =  ( A t ) 2 < 4( V t + | A t | )(ln(1 /α ) + 2 ln(4 t ))  (68) ⊆  ( A t ) 2 < 8 t ln(1 /α ) + 16 t ln(4 t )  (69) ⊆ n | A t | < 4 p t ln(4 t/α ) o (70) where in the last line we use √ a + b ≤ √ a + √ b for a, b ≥ 0 . Deﬁne ˜ u = argmax ∥ u ∥ 1 ≤ 1 t X j =1 ⟨ u,  Z j ⟩ , u ∗ = argmax ∥ u ∥ 1 ≤ 1 ⟨ u, E [  Z 1 ] ⟩ 3 . (71) Since the ( u t ) t ≥ 1 w ere determined by FTRL (Algorithm 2 ), b y Lemma B.5 , A t ≥ t X j =1 ⟨ ˜ u,  Z j ⟩ − p 4 t ln(2 k ) ≥ t X j =1 ⟨ u ∗ ,  Z j ⟩ − p 4 t ln(2 k ) (72) and therefore E t ⊆ n | A t | < 4 p t ln(4 t/α ) o (73) ⊆ n A t < 4 p t ln(4 t/α ) o (74) ⊆    t X j =1 ⟨ u ∗ ,  Z j ⟩ < 4 p t ln(4 t/α ) + p 4 t ln(2 k )    (75) ⊆    t X j =1 ⟨ u ∗ ,  Z j ⟩ < p 2(16 t ln(4 t/α ) + 4 t ln(2 k ))    (76) ⊆    t X j =1 ⟨ u ∗ ,  Z j ⟩ < p 32 t ln(8 tk /α )    . (77) 23 Since P t j =1 ⟨ u ∗ ,  Z j ⟩ is a sum of random v ariables ⟨ u ∗ ,  Z j ⟩ in [ − 1 , 1] , b y Lemma B.2 , with probabilit y at least 1 − 1 / t 2 ,       t X j =1 ⟨ u ∗ ,  Z j ⟩       ≥ t ·    ⟨ u ∗ , E [  Z 1 ] ⟩    − p 4 t ln(2 t ) . (78) Note ⟨ u ∗ , E [  Z 1 ] ⟩ = ∥ E [  Z 1 ] ∥ ∞ = ∆ m , so the right hand side equals t · ∆ m − p 4 t ln(2 t ) . By Lemma B.6 , 1 2 t · ∆ m ≥ p 32 t ln(8 tk /α ) for all t ≥ 256 ∆ 2 m ln  1024 k α · ∆ 2 m  (79) 1 2 t · ∆ m ≥ p 4 t ln(2 t ) for all t ≥ 32 ∆ 2 m ln  32 ∆ 2 m  . (80) As a result, for t ≥ T := 256 ∆ 2 m ln  1024 k α · ∆ 2 m  + 32 ∆ 2 m ln  32 ∆ 2 m  (81) w e ha ve t · ∆ m ≥ p 32 t ln(4 tk /α ) + p 4 t ln(2 t ) . (82) Therefore, b y the law of total probability , for t ≥ T P ( E t ) ≤ P  | A t | < p 32 t ln(4 tk /α ) + p 4 t ln(2 t )  ≤ 1 /t 2 (83) and so we can conclude E [ τ ] ≤ ∞ X t =1 P ( E t ) = T X t =1 P ( E t ) + ∞ X t = T P ( E t ) ≤ T + ∞ X t = T 1 t 2 ≤ T + π 2 6 . (84) A.5 Pro of of Theorem 5.1 Theorem 5.1 (Stopping time of ϕ prod ) . The sto chastic pr o c ess W prod is a P -martingale for any P ∈ H global 0 . Thus, ϕ prod is a level- α se quential test for H glob al 0 . Mor e over, under H glob al 1 , its exp e cte d stopping time ob eys E [ τ ] = O  T prod  wher e T prod = 1 ∆ s ln 1 α + k ∆ s ln k ∆ s + k ∆ 2 s ln k ∆ 2 s . (12) Pr o of. W e ﬁrst sho w that ϕ prod = ( ϕ ons i,t ) t ≥ 1 is a level- α sequential test for H global 0 . Then w e derive the upp er b ound on its exp ected stopping time. Lev el- α test . Let α ∈ (0 , 1) . Recall ϕ prod t = 1 [ W prod t ≥ 1 /α ] , where W prod t = k Y i =1 W ons i,t . (85) 24 F or all i ∈ [ k ] , the wealth pro cesses W ons i are non-negative P -martingales for their resp ective nulls H i, 0 with initial v alue 1. By deﬁnition, they are also non-negativ e P -martingales for P ∈ H global 0 . F urthermore, W prod 0 = 1 , W prod 0 ≥ 0 for all t ≥ 1 , and for an y P ∈ H global 0 E P [ W prod t | Z 1 , 1 , . . . , Z 1 ,t − 1 , . . . , Z k, 1 , . . . , Z k,t − 1 ] = E " k Y i =1 W ons i,t | Z 1 , 1 , . . . , Z 1 ,t − 1 , . . . , Z k, 1 , . . . , Z k,t − 1 # = k Y i =1 E  W ons i,t | Z i, 1 , . . . , Z i,t − 1  = k Y i =1 W ons i,t − 1 . Therefore W prod is a non-negative P -martingale under H global 0 with initial v alue 1 and so, by Ville’s Inequalit y [ 37 ] , sup P ∈ H global 0 P ( ∃ t ≥ 0 : ϕ prod t = 1) = sup P ∈ H global 0 P ( ∃ t ≥ 0 : W prod t ≥ 1 /α ) ≤ α. (86) Exp ected stopping time . Let α ∈ (0 , 1) . By deﬁnition, τ := min { t : ϕ prod t = 1 } = min { t : W prod t ≥ 1 /α } . Since τ is a non-negative integer v alued random v ariable, for any P ∈ H global 1 E [ τ ] = ∞ X t =1 P ( τ > t ) = ∞ X t =1 P ( W prod t < 1 /α ) = ∞ X t =1 P k Y i =1 W ons i,t < 1 /α ! (87) = ∞ X t =1 P ln k Y i =1 W ons i,t ! < ln(1 /α ) ! (88) = ∞ X t =1 P k X i =1 ln  W ons i,t  < ln(1 /α ) ! (89) = ∞ X t =1 P ( E t ) , (90) where E t = n P k i =1 ln  W ons i,t  < ln(1 /α ) o . Now, deﬁning A i,t = t X j =1 Z i,j , V i,t = t X j =1 ( Z i,j ) 2 , (91) b y Lemma B.4 , ∀ i ∈ [ k ] , we hav e the following guarantee for W ons i,t ln( W ons i,t ) ≥ ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 ln (4 t ) , ∀ t ≥ 1 . (92) Therefore k X i =1 ln( W ons i,t ) ≥ k X i =1 ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 k ln(4 t ) , ∀ t ≥ 1 . (93) 25 F urthermore, ∀ i ∈ [ k ] , ∀ t ≥ 1 , | A i,t | ≤ P j ≤ t | Z i,j | ≤ t and V i,t ≤ t . Thus, E t ⊆ ( k X i =1 ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 k ln(4 t ) < ln(1 /α ) ) (94) ⊆ ( k X i =1 ( A i,t ) 2 < 8 t ln(1 /α ) + 16 tk ln(4 t ) ) . (95) No w consider the function ψ ( Z 1 , 1 , . . . , Z 1 ,t , . . . , Z k, 1 , . . . , Z k,t ) = k X i =1 ( A i,t ) 2 . (96) Consider the v ector ( Z 1 , 1 , . . . , Z k,t ) and another vector ( ¯ Z 1 , 1 , . . . , ¯ Z k,t ) which diﬀers from the ﬁrst v ector in exactly one co ordinate. Since ψ is symmetric with resp ect to ( Z 1 , 1 , . . . , Z k,t ) , w e can assume this co ordinate is Z 1 , 1 without loss of generality . Then   ψ ( Z 1 , 1 , . . . , Z k,t ) − ψ ( ¯ Z 1 , 1 , . . . , ¯ Z k,t )   =      ( Z 1 , 1 ) 2 − ( ¯ Z 1 , 1 ) 2 + 2 t X t ′ =2 Z 1 ,t ′ ! ( Z 1 , 1 − ¯ Z 1 , 1 )      (97) ≤ 2 + 4( t − 1) ≤ 4 t. (98) So, b y McDiarmid’s Inequality [ 18 ] , we hav e P  ψ − E [ ψ ] ≤ − β  ≤ exp  − 2 β 2 16 k t 3  . (99) Setting the righ t hand side equal to 1 /t 2 and solving for β yields β = t 1 . 5 p 16 k ln( t ) . Thus, with probabilit y at least 1 − 1 / t 2 : k X i =1 ( A i,t ) 2 ≥ E [ ψ ] − t 1 . 5 p 16 k ln( t ) = k X i =1 E [( A i,t ) 2 ] − t 1 . 5 p 16 k ln( t ) (100) ≥ k X i =1 E [ A i,t ] 2 − t 1 . 5 p 16 k ln( t ) (101) = t 2 k X i =1 ∆ 2 i − t 1 . 5 p 16 k ln( t ) (102) = t 2 ∆ s − t 1 . 5 p 16 k ln( t ) , (103) where the last equality holds b ecause, by deﬁnition, ∆ s = P k i =1 ∆ 2 i . No w, b y Lemma B.6 , 1 3 t 2 ∆ s ≥ 16 tk ln(4 t ) for all t ≥ 96 k ∆ s ln  192 k ∆ s  (104) 1 3 t 2 ∆ s ≥ t 1 . 5 p 16 k ln( t ) for all t ≥ 288 k ∆ 2 s ln  144 k ∆ 2 s  (105) and furthermore 1 3 t 2 ∆ s ≥ 8 t ln(1 /α ) for all t ≥ 24 ∆ s ln(1 /α ) . As a result, for t ≥ T 1 := 96 k ∆ s ln  192 k ∆ s  + 288 k ∆ 2 s ln  144 k ∆ 2 s  + 24 ∆ s ln  1 α  (106) 26 w e ha ve t 2 ∆ s ≥ t 1 . 5 p 16 k ln( t ) + 8 tk ln(1 /α ) + 16 tk ln(4 t ) . (107) Therefore, b y the law of total probability , for t ≥ T 1 P ( E t ) ≤ P k X i =1 ( A i,t ) 2 < 8 tk ln(1 /α ) + 16 tk ln(4 t ) ! ≤ 1 /t 2 (108) and so we can conclude E [ τ ] ≤ ∞ X t =1 P ( E t ) = T 1 X t =1 P ( E t ) + ∞ X t = T 1 P ( E t ) ≤ T 1 + ∞ X t = T 1 1 t 2 ≤ T 1 + π 2 6 . (109) A.6 Pro of of Theorem 5.2 Theorem 5.2 (Stopping time of ϕ ave ) . The s to chastic pr o c ess W ave is a P -martingale for any P ∈ H global 0 . Thus, the test ϕ ave is a level- α se quential test for H glob al 0 . Mor e over, under H glob al 1 , its exp e cte d stopping time ob eys: E [ τ ] = O (min { T , T bonf } ) (14) wher e T = k ∆ s ln 1 α + k ∆ s ln k ∆ s + k ∆ 2 s ln k ∆ 2 s and T bonf = 1 ∆ 2 m ln k α · ∆ 2 m . (15) Pr o of. W e ﬁrst sho w that ϕ ave = ( ϕ ave t ) t ≥ 1 is a lev el- α sequen tial test for H global 0 . Then we deriv e the upp er b ound on its exp ected stopping time. Lev el- α test . Let α ∈ (0 , 1) . Recall ϕ ave t = 1 [ W ave t ≥ 1 /α ] , where W ave t = 1 k k X i =1 W ons i,t . (110) F or all i ∈ [ k ] , the wealth pro cesses W ons i are P -martingales for P in their resp ective nulls H i, 0 with initial v alue 1. By deﬁnition, they are also P -martingales for P ∈ H global 0 . F urthermore W ave 0 = 1 , W ave 0 ≥ 0 for all t ≥ 1 , and for an y P ∈ H global 0 E P [ W ave t | Z 1 , 1 , . . . , Z 1 ,t − 1 , . . . , Z k, 1 , . . . , Z k,t − 1 ] = E " 1 k k X i =1 W ons i,t | Z 1 , 1 , . . . , Z 1 ,t − 1 , . . . , Z k, 1 , . . . , Z k,t − 1 # = 1 k k X i =1 E  W ons i,t | Z i, 1 , . . . , Z i,t − 1  = 1 k k X i =1 W ons i,t − 1 . 27 Th us W ave t is a P -martingale under H global 0 with initial v alue 1 and so, by Ville’s Inequality [ 37 ] , sup P ∈ H global 0 P ( ∃ t ≥ 0 : ϕ ave t = 1) = sup P ∈ H global 0 P ( ∃ t ≥ 0 : W ave t ≥ 1 /α ) ≤ α. (111) Exp ected stopping time . Let α ∈ (0 , 1) . By deﬁnition, τ := min { t : ϕ ave t = 1 } = min { t : W ave t ≥ 1 /α } . Since τ is a non-negative integer v alued random v ariable, for any P ∈ H global 1 E [ τ ] = ∞ X t =1 P ( τ > t ) ≤ ∞ X t =1 P ( W ave t < 1 /α ) = ∞ X t =1 P 1 k k X i =1 W ons i,t < 1 /α ! (112) = ∞ X t =1 P ln 1 k k X i =1 W ons i,t ! < ln(1 /α ) ! (113) ≤ ∞ X t =1 P 1 k k X i =1 ln  W ons i,t  < ln(1 /α ) ! (114) = ∞ X t =1 P k X i =1 ln  W ons i,t  < k ln(1 /α ) ! (115) = ∞ X t =1 P ( E t ) (116) where E t = n P k i =1 ln  W ons i,t  < k ln(1 /α ) o and 1 k P k i =1 ln  W ons i,t  ≤ 1 k P k i =1 W ons i,t due to Jensen’s inequalit y . F ollowing the stopping time pro of of ϕ prod in Section A.5 , we hav e k X i =1 ln( W ons i,t ) ≥ k X i =1 ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 k ln(4 t ) , ∀ t ≥ 1 . (117) where A i,t = P t j =1 Z i,j and V i,t = P t j =1 ( Z i,j ) 2 . Since | A i,t | ≤ P j ≤ t | Z i,j | ≤ t and V i,t ≤ t for all i ∈ [ k ] and all t ≥ 1 , w e ha ve E t ⊆ ( k X i =1 ( A i,t ) 2 4( V i,t + | A i,t | ) − 2 k ln(4 t ) < k ln(1 /α ) ) (118) ⊆ ( k X i =1 ( A i,t ) 2 < 8 t ( k ln(1 /α ) + 2 k ln(4 t )) ) (119) = ( k X i =1 ( A i,t ) 2 < 8 tk ln(1 /α ) + 16 tk ln(4 t ) ) . (120) F rom the stopping time pro of of ϕ prod in Section A.5 , we know with probability at least 1 − 1 / t 2 : k X i =1 ( A i,t ) 2 ≥ t 2 ∆ s − t 1 . 5 p 16 k ln( t ) . (121) By Lemma B.6 , 1 3 t 2 ∆ s ≥ 16 tk ln(4 t ) for all t ≥ 96 k ∆ s ln  192 k ∆ s  (122) 1 3 t 2 ∆ s ≥ t 1 . 5 p 16 k ln( t ) for all t ≥ 288 k ∆ 2 s ln  144 k ∆ 2 s  (123) 28 and furthermore 1 3 t 2 ∆ s ≥ 8 tk ln(1 /α ) for all t ≥ 24 k ∆ s ln(1 /α ) . As a result, for t ≥ T 1 := 96 k ∆ s ln  192 k ∆ s  + 288 k ∆ 2 s ln  144 k ∆ 2 s  + 24 k ∆ s ln(1 /α ) (124) w e ha ve t 2 k X i =1 ∆ 2 i ≥ t 1 . 5 p 16 k ln( t ) + 8 tk ln(1 /α ) + 16 tk ln(4 t ) . (125) Therefore, b y the law of total probability , for t ≥ T 1 P ( E t ) ≤ P k X i =1 ( A i,t ) 2 < 8 tk ln(1 /α ) + 16 tk ln(4 t ) ! ≤ 1 /t 2 (126) and so we can conclude E [ τ ] ≤ ∞ X t =1 P ( E t ) = T 1 X t =1 P ( E t ) + ∞ X t = T 1 P ( E t ) ≤ T 1 + ∞ X t = T 1 1 t 2 ≤ T 1 + π 2 6 . (127) One can also see that for any i ∈ [ k ] E [ τ ] = ∞ X t =1 P ( τ > t ) ≤ ∞ X t =1 P ( W ave t < 1 /α ) = ∞ X t =1 P 1 k k X i =1 W ons i,t < 1 /α ! (128) ≤ ∞ X t =1 P  1 k W ons i,t < 1 /α  (129) = ∞ X t =1 P  W ons i,t < k /α  (130) ≤ T 2 + π 2 6 (131) where T 2 := 128 ∆ 2 m ln  256 k α · ∆ 2 m  + 32 ∆ 2 m ln  32 ∆ 2 m  (132) due to the pro of of Prop osition 4.1 in Section A.2 . Thus, for any P ∈ H global 1 , the exp ected stopping time of ϕ ave ob eys E [ τ ] ≤ min { T 1 , T 2 } . (133) A.7 Pro of of Theorem 5.3 Theorem 5.3 (Stopping time of ϕ balance ) . The test ϕ balance is a level- α se quential test for H global 0 . Mor e over, under H global 1 , its exp e cte d stopping time ob eys E [ τ ] ≤ min { T prod , T bonf } . Pr o of. W e ﬁrst sho w that ϕ balance = ( ϕ balance t ) t ≥ 1 is a lev el- α sequen tial test for H global 0 . Then we deriv e the upp er b ound on its exp ected stopping time. 29 Lev el- α test . Let α ∈ (0 , 1) . Recall ϕ balance t = 1 [ W balance t ≥ 1 /α ] , where W balance t = 1 2 W ave t + 1 2 W prod t . (134) The wealth pro cesses W ave and W prod are b oth non-negativ e P -martingales with initial v alue 1 for an y P ∈ H global 0 . Thus, W balance 0 = 1 , W balance t ≥ 0 for all t ≥ 1 , and for an y P ∈ H global 0 E [ W balance t | Z 1 , 1 , . . . , Z 1 ,t , . . . , Z k, 1 , . . . , Z k,t ] (135) = E  1 2 W ave t + 1 2 W prod t | Z 1 , 1 , . . . , Z 1 ,t , . . . , Z k, 1 , . . . , Z k,t  (136) = 1 2 E [ W ave t | Z 1 , 1 , . . . , Z 1 ,t , . . . , Z k, 1 , . . . , Z k,t ] + 1 2 E h W prod t | Z 1 , 1 , . . . , Z 1 ,t , . . . , Z k, 1 , . . . , Z k,t i (137) = 1 2 W ave t − 1 + 1 2 W prod t − 1 . (138) As a result, W balance t is a non-negative P -martingale with initial v alue 1 for any P ∈ H global 0 and so, b y Ville’s Inequality [ 37 ] , sup P ∈ H global 0 P ( ∃ t ≥ 0 : ϕ balance t = 1) = sup P ∈ H global 0 P ( ∃ t ≥ 0 : W balance t ≥ 1 /α ) ≤ α. (139) Exp ected stopping time . Let α ∈ (0 , 1) . By deﬁnition, τ := min { t : ϕ balance t = 1 } = min { t : W balance t ≥ 1 /α } . since τ is a non-negative integer v alued random v ariable, for any P ∈ H global 1 E [ τ ] = ∞ X t =1 P ( τ > t ) ≤ ∞ X t =1 P ( W balance t < 1 /α ) = ∞ X t =1 P  1 2 W ave t + 1 2 W prod t < 1 /α  (140) < ∞ X t =1 P  1 2 W prod t < 1 /α  (141) = ∞ X t =1 P  W prod t < 2 /α  . (142) By the pro of of Theorem 5.1 provided in Section A.5 , w e know ∞ X t =1 P  W prod t < 2 /α  ≤ T 1 + π 2 6 (143) where T 1 := 96 k ∆ s ln  192 k ∆ s  + 288 k ∆ 2 s ln  144 k ∆ 2 s  + 24 ∆ s ln  2 α  . (144) W e can also b ound the expected stopping time in the follo wing manner. E [ τ ] < ∞ X t =1 P  1 2 W ave t < 1 /α  = ∞ X t =1 P ( W ave t < 2 /α ) . (145) By the exp ected stopping time proof of ϕ ave pro vided in Section A.6 , we know ∞ X t =1 P ( W ave t < 2 /α ) ≤ T 2 + π 2 6 (146) 30 where T 2 := 128 ∆ 2 m ln  256 · (2 k ) α · ∆ 2 m  + 32 ∆ 2 m ln  32 ∆ 2 m  . (147) Th us, for any P ∈ H global 1 , the exp ected stopping time of ϕ balance ob eys E [ τ ] ≤ min { T 1 , T 2 } . (148) B Useful Lemmas and Inequalities In this section, w e presen t results that are used throughout the pro ofs in Section A . F or cited results, their pro ofs can b e found in the referenced w orks; otherwise, pro ofs are provided here. Lemma B.1 (Hoeﬀding’s Inequality [ 14 ]) . L et X 1 , . . . , X t b e indep endent r andom variables such that X j ∈ [ a j , b j ] almost sur ely. Then, for al l β > 0 , P         t X j =1 X j − E   t X j =1 X j         ≥ β   ≤ 2 exp − 2 β 2 P t j =1 ( b j − a j ) 2 ! . (149) Lemma B.2 (Concentration of b ounded random v ariables [ 5 ]) . L et X 1 , . . . , X t b e indep endent r andom variables such that X j ∈ [ − 1 , 1] almost sur ely and E [ X j ] = µ . with pr ob ability at le ast 1 − 1 / t 2       t X j =1 X j       ≥ t · | µ | − p 4 t ln(2 t ) . (150) Pr o of. By Ho eﬀding’s Inequality [ 14 ] P         t X j =1 X j − E   t X j =1 X j         ≥ β   ≤ 2 exp  − β 2 / 2 t  . (151) Setting the righ t hand side equal to 1 /t 2 and solving for β yields β = p 2 t ln(2 t 2 ) . Th us, with probabilit y at least 1 − 1 / t 2 ,             t X j =1 X j       −       E   t X j =1 X j               ≤       t X j =1 X j − E   t X j =1 X j         ≤ p 2 t ln(2 t 2 ) ≤ p 4 t ln(2 t ) (152) whic h implies with probabilit y at least 1 − 1 / t 2       t X j =1 X j       ≥       E   t X j =1 X j         − p 4 t ln(2 t ) = t · | µ | − p 4 t ln(2 t ) . (153) 31 Lemma B.3 (McDiarmid’s Inequality [ 18 ]) . L et X 1 , . . . , X n b e indep endent r andom variables taking values in some set X . L et f : X n → R b e a function that satisﬁes the fol lowing b ounde d diﬀer enc es pr op erty: for every i ∈ [ n ] and for al l p ossible values x 1 , . . . , x n , x ′ i ∈ X , | f ( x 1 , . . . , x n ) − f ( x 1 , . . . , x i − 1 , x ′ i , x i +1 , . . . , x n ) | ≤ c i , wher e c i ar e c onstants. Then, for any ϵ > 0 , the fol lowing ine qualities holds: P ( f ( X 1 , . . . , X n ) − E [ f ( X 1 , . . . , X n )] ≥ ϵ ) ≤ exp  − 2 ϵ 2 P n i =1 c 2 i  (154) P ( f ( X 1 , . . . , X n ) − E [ f ( X 1 , . . . , X n )] ≤ − ϵ ) ≤ exp  − 2 ϵ 2 P n i =1 c 2 i  . (155) Theorem B.1 (Theorem 20 [ 9 ]) . L et ∥ · ∥ b e an y norm on R d and ∥ · ∥ ∗ b e its dual norm. Deﬁne K = { v ∈ R d : ∥ v ∥ ≤ 1 / 2 } . Fix any ve ctor u ∈ R d satisfying ∥ u ∥ = 1 and let ( g t ) t ≥ 1 ⊂ R d b e any se quenc e of ve ctors satisfying ∥ g t ∥ ∗ ≤ 1 for al l t ≥ 1 . Then, max λ ∈K t X j =1 ln(1 + ⟨ λ, g j ⟩ ) ≥ 1 4 ⟨ P t j =1 g j , u ⟩ 2 P t j =1 ⟨ g j , u ⟩ 2 +    D P t j =1 g j , u E    . (156) Theorem B.2 (Lemma 17 [ 9 ]) . L et ∥ · ∥ b e a norm on R d and ∥ · ∥ ∗ b e its dual norm. Deﬁne K = { v ∈ R d : ∥ v ∥ ≤ 1 / 2 } and let ( g t ) t ≥ 1 b e any se quenc e of ve ctors satisfying ∥ g t ∥ ∗ ≤ 1 for al l t ≥ 1 . Then, for β = 2 − ln 3 2 , the se quenc e ( λ t ) t ≥ 1 ⊂ K gener ate d by ONS (Algorithm 1 ) with input str e am ( g t ) t ≥ 1 satisﬁes t X j =1 − ln(1 + ⟨ λ j , g j ⟩ ) − min λ ∈K t X j =1 − ln(1 + ⟨ λ, g j ⟩ ) ≤ d   β 8 + 2 β ln   4 t X j =1 || g j || 2 ∗ + 1     . (157) By substituting β = 2 − ln 3 2 and using the b ounds β ≤ 8 / 17 and 2 / β ≤ 4 . 5 , one r etrieves the fol lowing pr e cise b ound state d in L emma 17 of [ 9 ]. t X j =1 − ln(1 + ⟨ λ j , g j ⟩ ) − min λ ∈K t X j =1 − ln(1 + ⟨ λ, g j ⟩ ) ≤ d   1 17 + 4 . 5 ln   4 t X j =1 || g j || 2 ∗ + 1     . (158) NOTE : T o pr ove the r esult ab ove, Cutkosky and Or ab ona [ 9 ] r ely on another r esult fr om their work, The or em 11 (p age 18). The pr o of of The or em 11 c ontains a minor typ o gr aphic al err or: in the se c ond e quation on p age 19 4 , the ﬁnal term in the ine quality has a factor of 2 / β , which should b e 1 / 2 β . F or β = 2 − ln(3) 2 , the quantity 2 β ≈ 4 . 44 , which le ads to the 4 . 5 factor in the b ound. The c orr e ct value should b e any value gr e ater 1 / 2 β ≈ 1 . 11 . The c orr e ct upp er b ound, with β = 2 − ln 3 2 is t X j =1 − ln(1 + ⟨ λ j , g j ⟩ ) − min λ ∈K t X j =1 − ln(1 + ⟨ λ, g j ⟩ ) ≤ d   2 − ln 3 16 + 1 2 − ln 3 ln   4 t X j =1 || g j || 2 ∗ + 1     (159) which is less than 2 d ln(4 t ) for al l t ≥ 1 . 4 W e refer to the version published in the 31st Annual Conference on Learning Theory . 32 Lemma B.4 (Log wealth low er bound) . Consider k p ar al lel data str e ams ( Z i,t ) t ≥ 1 , . . . , ( Z k,t ) t ≥ 1 . F or any data str e am ( Z i,t ) t ≥ 1 and its we alth pr o c ess W ons i,t = Q t j =1 (1 + λ i,j Z i,j ) wher e the ( λ i,t ) t ≥ 1 ar e determine d by running ONS (Algorithm 1 ) on the str e am ( Z i,t ) t ≥ 1 , by The or ems B.1 and B.2 , we have ln( W ons i,t ) ≥ 1 4 ( P t j =1 Z i,j ) 2 P t j =1 ( Z i,j ) 2 +    P t j =1 Z i,j    − 2 ln(4 t ) . (160) L et ∥ · ∥ b e a norm on R k and ∥ · ∥ ∗ b e its dual norm and deﬁne K = { v ∈ R k : ∥ v ∥ ≤ 1 / 2 } . Consider the str e am of multivariate outc omes, (  Z t ) t ≥ 1 wher e  Z t = ( Z 1 ,t , . . . , Z k,t ) and its c orr esp onding we alth pr o c ess W mv-ons t = Q t j =1 (1 + ⟨  λ j ,  Z j ⟩ ) wher e the (  λ j ) t ≥ 1 ar e determine d by running ONS (Algo- rithm 1 ) on the str e am (  Z t ) t ≥ 1 . Then, by The or ems B.1 and B.2 , for any ve ctor u ∈ R k satisfying ∥ u ∥ = 1 we have ln( W mv-ons t ) ≥ 1 4 ⟨ P t j =1 u,  Z j ⟩ 2 P t j =1 ⟨ u,  Z j ⟩ 2 +    D P t j =1 u,  Z j E    − 2 k ln(4 t ) . (161) Pr o of. W e pro ve the result for the stream of m ultiv ariate outcomes (  Z t ) t ≥ 1 . By Theorem B.2 , the sequence (  λ t ) t ≥ 1 ⊂ K generated by the ONS algorithm (Algorithm 1 ) with input stream (  Z t ) t ≥ 1 satisﬁes t X j =1 − ln(1 + ⟨  λ j ,  Z j ⟩ ) − min λ ∈K t X j =1 − ln(1 + ⟨  λ,  Z j ⟩ ) ≤ 2 k ln(4 t ) . (162) Since min  λ ∈K P t j =1 − ln(1 + ⟨  λ,  Z j ⟩ ) = − max  λ ∈K P t j =1 ln(1 + ⟨  λ,  Z j ⟩ ) this implies t X j =1 − ln(1 + ⟨  λ j ,  Z j ⟩ ) −   − max  λ ∈K t X j =1 ln(1 + ⟨  λ,  Z j ⟩ )   (163) = max  λ ∈K t X j =1 ln(1 + ⟨  λ,  Z j ⟩ ) − t X j =1 ln(1 + ⟨  λ j ,  Z j ⟩ ) ≤ 2 k ln(4 t ) , (164) whic h further implies t X j =1 ln(1 + ⟨  λ j ,  Z j ⟩ ) ≥ max  λ ∈K t X j =1 ln(1 + ⟨  λ,  Z j ⟩ ) − 2 k ln(4 t ) . (165) By Theorem B.1 , for any vector u ∈ R k satisfying ∥ u ∥ = 1 we hav e t X j =1 ln(1 + ⟨  λ j ,  Z j ⟩ ) ≥ 1 4 ⟨ P t j =1 u,  Z j ⟩ 2 P t j =1 ⟨ u,  Z j ⟩ 2 +    D P t j =1 u,  Z j E    − 2 k ln(4 t ) . (166) Note, P t j =1 ln(1 + ⟨  λ,  Z j ⟩ ) is precisely ln( W mv-ons t ) , th us w e ha v e prov en the result. F or a stream of univ ariate outcomes ( Z i,t ) t ≥ 1 and its corresp onding wealth pro cess W ons i,t , can apply an iden tical argumen t to get the following guarantee: ln( W ons i,t ) ≥ 1 4 ⟨ P t j =1 u,  Z j ⟩ 2 P t j =1 ⟨ u,  Z j ⟩ 2 +    D P t j =1 u,  Z j E    − 2 k ln(4 t ) . (167) 33 No w note, here Z i,t ∈ [ − 1 , 1] ⊂ R , th us k = 1 . F urthermore, there is only tw o vectors u ∈ R satisfying ∥ u ∥ = | u | = 1 , which are u = 1 and u = − 1 . Plugging either of these u into the equation ab o v e pro ves the univ ariate version of the result. Theorem B.3 ([ 23 ]) . Consider a str e am ( g t ) t ≥ 1 ⊆ [ − 1 , 1] d and denote S d := { v ∈ R d ≥ 0 | P d i =1 v i = 1 } . Deﬁne v ∗ = ar gmin v ∈S d P t j =1 −⟨ v , g j ⟩ . Then, the se quenc e of ve ctors ( v t ) t ≥ 1 gener ate d by running FTRL (A lgorithm 2 ) on input str e am ( g t ) t ≥ 1 ⊆ [ − 1 , 1] d satisﬁes t X j =1 −⟨ v j , g j ⟩ − min v ∈S d t X j =1 −⟨ v , g j ⟩ ≤ ln( d ) p ln( d ) + p ln( d ) ! √ t = 2 p ln( d ) t. (168) Note : This the or em is an applic ation of Cor ol lary 7.7 pr ovide d in Or ab ona [ 23 ]. Lemma B.5. Given k p ar al lel data str e ams ( Z i,t ) t ≥ 1 , . . . , ( Z k,t ) t ≥ 1 wher e Z i, 1 , Z i, 2 , . . . iid ∼ P i , deﬁne  Z t = ( Z 1 ,t , . . . , Z k,t ) . Denote ( ˜ Z t ) t ≥ 1 to b e the str e am of multivariate outc omes wher e ˜ Z t = (  Z t , −  Z t ) ∈ [ − 1 , 1] 2 k . Denote ( v t ) t ≥ 1 to b e the se quenc e gener ate d by running FTRL (Al- gorithm 2 ) on input str e am ( ˜ Z t ) t ≥ 1 ⊂ [ − 1 , 1] 2 k and v + t and v − t to b e ve ctors of the ﬁrst k and se c ond k entries of v t r esp e ctful ly, i.e. v t = ( v + t , v − t ) ∈ R 2 k . F urthermor e, denote ( u t ) t ≥ 1 to b e se quenc e wher e u t = v + t − v − t . Then, t X j =1 ⟨ u j ,  Z j ⟩ ≥ max ∥ u ∥ 1 ≤ 1 t X j =1 ⟨ u,  Z j ⟩ − 2 p ln(2 k ) t. (169) Pr o of. By Theorem B.3 , we hav e the following guarantee for the sequence ( v t ) t ≥ 1 : t X j =1 −⟨ v j , ˜ Z j ⟩ − min v ∈S 2 k t X j =1 −⟨ v , ˜ Z j ⟩ ≤ 2 p ln(2 k ) t. (170) No w observ e, recalling S 2 k = n v ∈ R 2 k ≥ 0 | P 2 k i =1 v i = 1 o t X j =1 −⟨ v j , ˜ Z j ⟩ − min v ∈S 2 k t X j =1 −⟨ v , ˜ Z j ⟩ (171) = max v ∈S 2 k t X j =1 ⟨ v , ˜ Z j ⟩ − t X j =1 ⟨ v j , ˜ Z j ⟩ (172) = max ( v + ,v − ) ∈ R 2 k ≥ 0 P k i =1 ( v + i + v − i )=1 t X j =1 ⟨ ( v + , v − ) , (  Z j , −  Z j ) ⟩ − t X j =1 ⟨ ( v + j , v − j ) , (  Z j , −  Z j ) ⟩ (173) = max ( v + ,v − ) ∈ R 2 k ≥ 0 P k i =1 ( v + i + v − i )=1 t X j =1 ⟨ v + ,  Z j ⟩ − ⟨ v − ,  Z j ⟩ − t X j =1 ⟨ v + j ,  Z j ⟩ − ⟨ v − j ,  Z j ⟩ (174) = max ( v + ,v − ) ∈ R 2 k ≥ 0 P k i =1 ( v + i + v − i )=1 t X j =1 ⟨ v + − v − ,  Z j ⟩ − t X j =1 ⟨ v + j − v − j ,  Z j ⟩ (175) = max ( v + ,v − ) ∈ R 2 k ≥ 0 P k i =1 ( v + i + v − i )=1 t X j =1 ⟨ v + − v − ,  Z j ⟩ − t X j =1 ⟨ u j ,  Z j ⟩ = max ∥ u ∥ 1 ≤ 1 t X j =1 ⟨ u,  Z j ⟩ − t X j =1 ⟨ u j ,  Z j ⟩ . (176) 34 Th us, max ∥ u ∥ 1 ≤ 1 t X j =1 ⟨ u,  Z j ⟩ − t X j =1 ⟨ u j ,  Z j ⟩ ≤ 2 p ln(2 k ) t (177) whic h concludes the proof. Lemma B.6. Fix A > 0 and 0 < B < A e . Then, t ≥ 2 B ln  A B  = ⇒ ln( At ) t ≤ B (178) Pr o of. Set L := ln  A B  and y := B t. (179) Since At = A B y = e L y , we hav e ln( At ) = L + ln( y ) . (180) Therefore ln( At ) t = L + ln( y ) y /B = B L + ln( y ) y . (181) T o pro v e the desired inequalit y it suﬃces to sho w then when y ≥ 2 L , we hav e L + ln( y ) ≤ y . (182) Since y ≥ 2 L we hav e L ≤ y / 2 . F urthermore, we ha ve y > 2 b ecause L > 1 , and for y > 2 , we ha ve ln( y ) < y / 2 . Thus L + ln y ≤ y 2 + y 2 ≤ y . (183) As a result, for A > 0 and 0 < B < A / e , we hav e ln( At ) t ≤ B whenev er t ≥ 2 B ln( A B ) . 35 C Algorithms Algorithm 1 Online Newton Step (ONS) Input: Stream ( g t ) t ≥ 1 ⊆ [ − 1 , 1] d 1: Initialize λ 1 =  0 ∈ R d , H 0 = I d 2: for j = 1 , . . . do 3: H j ← H j − 1 + 1 (1+ ⟨ λ j ,g j ⟩ ) 2 g j g T j 4: λ j +1 ← pro j H j ∥ v ∥ 1 ≤ 1 / 2  λ j − 2 2 − ln(3) H − 1 j g j  , where pro j H j ∥ v ∥ 1 ≤ 1 / 2 ( y ) = argmin ∥ v ∥ 1 ≤ 1 / 2 ⟨ H j ( v − y ) , v − y ⟩ 5: end for Remark : When the stream ( g t ) t ≥ 1 consists of g t ∈ [ − 1 , 1] , the update rule for λ t +1 simpliﬁes to [ 33 , 5 , 40 ]: λ j +1 = Π [ − 1 2 , 1 2 ] λ j − 2 2 − ln 3 ν j 1 + P j i =1 ( ν i ) 2 ! where ν i = − g i 1 + λ i · g i , (184) where Π [ − 1 2 , 1 2 ] is the pro jection onto the [ − 1 / 2 , 1 / 2] in terv al. Algorithm 2 Follo w the Regularized Leader (FTRL) on Linear Losses Input: Stream ( g t ) t ≥ 1 ⊆ [ − 1 , 1] d 1: for j = 1 , . . . do 2: v j ← argmin v ∈S d √ j √ ln( d ) ( P d i =1 v i ln( v i ) + ln( d )) + P j − 1 j ′ =1 −⟨ v , g j ′ ⟩ 3: end for Note : S d = n v ∈ R d ≥ 0 | P d i =1 v i = 1 o is the d -dimensional probability simplex. 36 D A dditional Exp erimen tal Details & Results D.1 Syn thetic: k = 25 F or k = 25 streams, the results for ϕ mv-ons are included as well. 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (a)  k 1 k  = 0 . 05 (b)  k 1 k  = 0 . 15 (c)  k 1 k  = 0 . 30 Figure B.2: T op: Distribution of stopping times, ov er 1,000 sim ulations, for v arious sequen tial tests across settings with v arying prop ortions of streams with nonzero means. A test rejects when its corresp onding wealth process exceeds 1 / α for α = 0 . 01 . The dashed v ertical line is the empirical mean of the stopping times. Bottom: T ra jectories of v arious wealth pro cesses across settings with diﬀeren t amounts of nonzero means. Eac h line represents the median tra jectory of a wealth pro cess o ver 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presen ted on a logarithmic scale. W ealth pro cesses are clipp ed to 10 − 3 for visualization purp oses. 37 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ mv-ons 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (a)  k 1 k  = 0 . 45 (b)  k 1 k  = 0 . 60 (c)  k 1 k  = 0 . 75 Figure B.4: T op: Distribution of stopping times, ov er 1,000 sim ulations, for v arious sequen tial tests across settings with v arying prop ortions of streams with nonzero means. A test rejects when its corresp onding wealth process exceeds 1 / α for α = 0 . 01 . The dashed v ertical line is the empirical mean of the stopping times. Bottom: T ra jectories of v arious wealth pro cesses across settings with diﬀeren t amounts of nonzero means. Eac h line represents the median tra jectory of a wealth pro cess o ver 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presen ted on a logarithmic scale. W ealth pro cesses are clipp ed to 10 − 3 for visualization purp oses. 38 D.2 Syn thetic: k = 100 F or k = 100 streams, we exclude results for ϕ mv-ons as the test b ecomes prohibitiv ely slo w when k is mo derately large, which obscures the presentation of results for the other tests. 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (a)  k 1 k  = 0 . 05 (b)  k 1 k  = 0 . 15 (c)  k 1 k  = 0 . 30 Figure B.6: T op: Distribution of stopping times, ov er 1,000 sim ulations, for v arious sequen tial tests across settings with v arying prop ortions of streams with nonzero means. A test rejects when its corresp onding wealth process exceeds 1 / α for α = 0 . 01 . The dashed v ertical line is the empirical mean of the stopping times. Bottom: T ra jectories of v arious wealth pro cesses across settings with diﬀeren t amounts of nonzero means. Eac h line represents the median tra jectory of a wealth pro cess o ver 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presen ted on a logarithmic scale. W ealth pro cesses are clipp ed to 10 − 3 for visualization purp oses. 39 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 250 Count φ bonf Distribution of Stopping Times 0 250 Count φ ftrl 0 250 Count φ prod 0 250 Count φ av e 0 100 200 300 Stopping Time 0 250 Count φ balance 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 0 100 200 300 Time 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 W ealth Growth of W ealth Pro cesses 1/ α = 100.00 1 k max i ∈ [ k ] W ons i W mv-ons W ftrl W prod W av e W balance (a)  k 1 k  = 0 . 45 (b)  k 1 k  = 0 . 60 (c)  k 1 k  = 0 . 75 Figure B.8: T op: Distribution of stopping times, ov er 1,000 sim ulations, for v arious sequen tial tests across settings with v arying prop ortions of streams with nonzero means. A test rejects when its corresp onding wealth process exceeds 1 / α for α = 0 . 01 . The dashed v ertical line is the empirical mean of the stopping times. Bottom: T ra jectories of v arious wealth pro cesses across settings with diﬀeren t amounts of nonzero means. Eac h line represents the median tra jectory of a wealth pro cess o ver 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presen ted on a logarithmic scale. W ealth pro cesses are clipp ed to 10 − 3 for visualization purp oses. 40 D.3 Zero-shot medical image classiﬁcation T able 2: Additional details ab out the datasets used in the zero-shot medical classiﬁcation exp eri- men t. The ﬁrst and second classes listed in the table corresp ond to Y = 0 and Y = 1 , resp ectiv ely . Most of the information provided in this table is deriv ed from Nie et al. [ 21 ]. Group Name Dataset Description Classes Link brain CT Brain T u- mor CT High-resolution CT scans for brain tumor detection. Health y T umor Ë brain MRI Brain T u- mor MRI Similar to the Brain T umor CT dataset but with MRIs. Health y T umor Ë covid X-ray Co vid- CXR2 16,000 chest X-ra y images including 2,300 p ositive COVID-19 images/ No ﬁnding Co vid-19 Ë breast mammogram Breast Can- cer 3,383 annotated mammogram im- ages fo cused on breast tumors. Normal T umor Ë breast ultrasound UBIBC Ultrasound images related to breast cancer. Benign Malignan t Ë colon pathology LC2500 10,000 pathology images from colon tissues. Normal A deno carcinomas Ë retinal oct Retinal OCT High-qualit y retinal OCT images. Normal Not Normal Ë colon endoscopy W CE Curated colon disease images. Normal Not Normal Ë lung pathology LC2500 15,000 pathology images from lung tissue. Normal Not Normal Ë covid ct CO VIDxCT CT scans of patients with Covid-19. Normal Co vid-19 Ë T able 3: E P i [ Z ] across diﬀerent groups deﬁned by mo dality and anatomical region Group E P i [ Z ] brain CT 0 . 31 brain MRI 0 . 07 covid X-ray 0 . 28 breast mammogram − 0 . 19 breast ultrasound − 0 . 07 colon pathology 0 . 09 retinal oct − 0 . 04 colon endoscopy − 0 . 10 lung pathology 0 . 000 covid ct − 0 . 30 41

Global Sequential Testing for Multi-Stream Auditing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment