Testing probability distributions using conditional samples

T esting probabilit y distributi ons using conditional samples Cl ´ emen t L. C anonne ∗ Dana Ron † Ro cco A. Se rv edio ‡ Jan uary 19, 2015 Abstract W e study a new framew ork for property testing of probability distributions, by considering distribution testing algorithms that ha ve access to a c onditio nal sampling or acle. This is an oracle that takes as input a subset S ⊆ [ N ] of the do main [ N ] o f the unknown probability distribution D a nd returns a dra w f rom the conditional probability distribution D r estricted to S . This new mo del allows considerable ﬂexibility in the des ign of distr ibutio n testing algorithms; in particula r , testing algo rithms in this mo del can b e a da ptive. W e study a wide ra nge of na tur al distr ibutio n tes ting pr oblems in this new fra mework and some of its v ariants, giving b oth uppe r a nd lower b ounds on query complexity . These pr ob- lems include tes ting whether D is the uniform distributio n U ; testing whether D = D ∗ for an explicitly provided D ∗ ; testing whether tw o unknown distributions D 1 and D 2 are e q uiv alen t; and estimating the v ariation distance b etw een D and the uniform distr ibution. A t a high level our ma in ﬁnding is that the new co nditional sa mpling fra mework we consider is a p owerful one: while all the problems ment ioned ab ov e hav e Ω( √ N ) sample complexity in the standard mo del (and in some cases the co mplexity m ust b e almost linear in N ), we give po ly(log N , 1 /ǫ )- query a lg orithms (and in some cases poly (1 /ǫ )-quer y algorithms indep endent of N ) fo r all these problems in o ur conditional s a mpling setting. ∗ ccanonne@c s.columbia. edu , Colum bia U niversi ty . S upp orted b y NSF grants CCF-1115703 and CCF -1319788. † danaron@po st.tau.ac.i l , T el A viv Univ ersit y . Su pp orted b y ISF gran ts 246 /08 and 671/13 . ‡ rocco@cs.c olumbia.edu , Colum b ia Un ivers ity . Su pp orted by N SF grants CC F-0915929 and CCF-11 15703. 0 Con ten ts 1 In tro duction 2 1.1 Bac kground: Distribution testing in the s tandard mo del . . . . . . . . . . . . . . . . 2 1.2 The conditional sampling mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 The w ork of Chakrab ort y et al. [ CFGM 13 ] . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Preliminaries 10 2.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Useful to ols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Some useful pro cedures 12 3.1 The pro cedu r e Comp are . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 The pro cedu r e Estima te-Neighborhoo d . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 The pro cedu r e Approx-E v al-Simula tor . . . . . . . . . . . . . . . . . . . . . . . . 1 6 4 Algorithms and low er b ounds for t esting uniformit y 27 4.1 A ˜ O (1 /ǫ 2 )-query PCOND algorithm f or testing u niformit y . . . . . . . . . . . . . . . 27 4.2 An Ω(1 /ǫ 2 ) lo wer b oun d for COND D algorithms that test u niformit y . . . . . . . . . 31 5 T est ing equiv alence to a known distribution D ∗ 33 5.1 A p oly (log n, 1 /ǫ )-query PCOND D algorithm . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 A (log N ) Ω(1) lo wer b ound for PCOND D . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 A p oly (1 /ǫ )-query COND D algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6 T est ing equality b etw ee n t w o unkno wn distributions 51 6.1 An appr oac h based on PCOND queries . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 An appr oac h based on simulating EV AL . . . . . . . . . . . . . . . . . . . . . . . . . 57 7 An algorithm for estimating the distance to uniformity 59 7.1 Finding a r eference p oin t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8 A ˜ O  log 3 N ǫ 3  -query ICOND D algorithm for t esting uniformity 65 9 An Ω(log N / log log N ) low er b ound for ICOND D algorithms that t est uniformity 68 9.1 A lo wer b oun d against non-adaptive algorithms . . . . . . . . . . . . . . . . . . . . 70 9.2 A lo wer b oun d against adaptive algorithms: Outline of the p r o of of T heorem 16 . . . 7 5 9.3 Extended transcripts and d ra wing D ∼ P No on th e ﬂy . . . . . . . . . . . . . . . . . . 76 9.4 Bounding d TV  A ( k ) , N otf , A ( k +1) , N otf  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 10 Conclusion 84 1 1 In tro duction 1.1 Bac kground: Distribution testing in the st andard mo del One of the m ost fund amen tal problem p aradigms in statistics is that of in f erring some information ab out an unknown pr obabilit y distribution D given access to indep enden t samples dra wn from it. More than a decade ago, Batu et al. [ BFR + 00 ] 1 initiated th e study of problems of this type from within the f ramew ork of pr op e rty testing [ RS96 , GGR98 ]. In a pr op ert y testing p roblem there is an u nknown “massive ob ject” that an al gorithm ca n a ccess only by making a small n u m b er of lo cal insp ections of the ob ject, and the goal is to determine wh ether the ob ject has a particular prop erty . The algorithm m ust output ACCE PT if the ob ject h as the desired prop erty and output REJECT if the ob ject is far fr om ev ery ob ject with the pr op erty . (See [ Fis01 , Ron08 , Ron10 , Gol10 ] for detailed surveys and o verviews of the br oad ﬁeld of p rop erty testing; we giv e pr ecise deﬁnitions tailored to our setting in Section 2 .) In d istribution p rop erty testing the “massiv e ob ject” is an u nknown probabilit y distrib ution D o ver an N -element set, and the algorithm accesses the distribution by dra win g indep enden t samples from it. A w ide r ange of diﬀerent p r op erties of pr obabilit y distributions ha v e b een in v estigate d in this setting, and upp er and lo w er b ounds on the n u m b er of samples required hav e by no w b een obtained for many problems. Th ese include testing w hether D is u niform [ GR00 , BFR + 10 , P an08 ], testing whether D is iden tical to a giv en known d istribution D ∗ [ BFF + 01 ], testing whether t wo distribu tions D 1 , D 2 (b oth a v ailable via sample access) are iden tical [ BFR + 00 , V al11 ], and testing whether D has a monotonically in creasing pr ob ab ility mass f unction [ BFR V11 ], as w ell as related problems su c h as estimating the en tropy of D [ BDKR05 , VV11 ], and estimating its supp ort size [ RRSS09 , V al11 , VV11 ]. Similar problems hav e also b een stu died by researc hers in other comm u nities, see e.g., [ Ma81 , Pan04 , Pa n08 ]. One broad insigh t that has emerged from this past decade of work is that while sublinear-sample algorithms do exist for many d istribution testing p r oblems, the n umber of samples required is in general quite large. Ev en the basic problem of testing whether D is the un iform distrib ution U o v er [ N ] = { 1 , . . . , N } versus ǫ -far fr om uniform requires Ω ( √ N ) samp les 2 for constant ǫ , and the other problems men tioned ab o v e ha v e sample complexities at least this high, and in some cases almost line ar in N [ RRSS 09 , V al11 , VV11 ]. S ince suc h samp le complexities could b e prohibitive ly high in real-w orld settings where N ca n b e extremely large, it is natural to explore problem v ariants where it ma y b e p ossible for algorithms to su cceed using few er samples. Indeed, r esearchers h a ve studied distribu tion testing in settings where the unknown d istribution is guaranteed to h a ve some sp ecial structure, such as b eing monoto ne, k -mo dal or a “ k -histogram” o v er [ N ] [ BKR04 , DDS + 13 , ILR12 ], or b eing monotone ov er { 0 , 1 } n [ RS09 ] or o v er other p osets [ BFR V11 ], and h a ve obtained signiﬁcan tly more sample-eﬃcien t algorithms using th ese additional assumptions. 1 There is a more recen t full v ersion of this w ork [ BFR + 10 ] and we henceforth reference th is recen t vers ion. 2 T o ve rify th is, consider the family of all distributions t hat are uniform ov er half of th e domain, and 0 elsewhere. Eac h distribution in this family is Θ(1)-far from the uniform distribution. How ever, it is not possible to distinguish with suﬃciently h igh p robabilit y b etw een the uniform distribution and a distribution selected randomly from this family , given a sample of size √ N /c (for a suﬃcien tly large constant c > 1). This is th e case b ecause for the u niform distribution as well as eac h distribution in this family , the probabilit y of observing the same elemen t more t han once is very smal l. Conditioned on such a collision ev ent not o ccurring, the samples are distributed identic ally . 2 1.2 The conditional sampling mo del In this work w e pur sue a diﬀeren t line of in v estigation: rather than restricting the class of p robabilit y distributions under consideration, w e consider testing algorithms that ma y u se a more p o we rful form of access to the un kno wn d istribution D . This is a c onditional sampling or acle , whic h allo ws the algorithm to obtain a dra w fr om D S , the conditional distribution of D r estricted to a subset S of the d omain (where S is sp eciﬁed by the algorithm). More precisely , we h av e: Deﬁnition 1 Fix a distribution D over [ N ] . A COND oracle for D , denote d COND D , is deﬁne d as fol lows: Th e or acle is given as input a qu ery s et S ⊆ [ N ] , chosen by the algorithm, that has D ( S ) > 0 . The or acle r eturns an element i ∈ S , wher e the pr ob ability that element i is r eturne d is D S ( i ) = D ( i ) /D ( S ) , indep endently of al l pr evious c al ls to the or acle. 3 As menti oned earlier, a recen t w ork of Ch akrab orty et al. [ CFGM 13 ] in tro duced a very similar conditional mod el; w e discuss th eir resu lts and ho w they relate to our results in Subsection 1.4 . F or compatibilit y w ith our COND D notation we w ill write S AMP D to denote an oracle that take s no input and, eac h time it is inv ok ed, return s an element from [ N ] dra wn according to D indep en den tly from all previous dra ws. This is the sample access to D that is used in the standard mo d el of testing distributions, and this is of course the same as a call to COND D ([ N ]) . Motiv ation and Discussion. One p urely th eoretical motiv ation for the study of the COND mo del is that it ma y furth er our u nderstandin g regarding w h at f orm s of information (b ey on d standard sampling) can b e helpful for testing prop erties of distributions. In b oth learning and prop erty testing it is generally inte resting to unders tand how m uc h p o wer algorithms can gain by making qu eries, and COND q u eries are a natur al type of quer y to in v estigate in the con text of distributions. As w e d iscuss in more d etail b elo w, in s ev eral of our r esults we actually consider restricted v ersions of COND qu eries th at do not require the full p o wer of obtaining conditional samples from arbitrary sets. A second attractiv e feature of the COND mo del is that it enables a new level of r ic hn ess for algorithms that deal with p robabilit y distribu tions. I n the standard mo del wh ere only access to SAMP D is provided, all algorithms must necessarily b e non-adaptive , with the same initial step of simply d ra w ing a sample of p oin ts from S AMP D , and the diﬀerence b et ween t w o algorithms comes only from ho w they pro cess th eir samples. In contrast, the essence of the COND mo del is to allo w algorithms to adaptively determine later query sets S b ased on the outcomes of earlier queries. A natural q u estion ab out the COND mo del is its plausibilit y: are th er e settings in whic h an in v estigato r co uld actually mak e conditional samples from a distribution of interest? W e feel that the COND framew ork p ro vides a reasonable ﬁ rst appro ximation for scenarios that arise in application areas (e.g., in biology or c hemistry) wh ere the parameters of an exp eriment can b e 3 Note that as describ ed ab ov e the b ehavior of COND D ( S ) is und eﬁned if D ( S ) = 0, i.e., the set S has zero probabilit y under D . While v arious deﬁnitional choices could b e made to deal with this, we shall assume that in such a case, the oracle (and h ence the algorithm) outputs “failure” and terminates. This will not b e a problem for us throughout th is p ap er, as (a) our low er b oun ds deal only with distribu t ions that h a ve D ( i ) > 0 for all i ∈ [ N ], and (b) in our algorithms COND D ( S ) will only ever b e called on sets S which are “guaranteed” to have D ( S ) > 0. (More precisely , each time an algorithm calls COND D ( S ) it will either b e on th e set S = [ N ], or will b e on a set S whic h con tains an element i whic h has b een returned as the output of an earlier call to COND D .) 3 adjusted so as to restrict th e range of p ossible outcomes. F or example, a scienti st gro wing bacteria or yea st cells in a con trolled environmen t ma y b e able to delib erately in tro duce en vironmen tal factors th at allo w only cells with certain desired c h aracteristics to su rviv e, thus restricting the distribution of all exp erimen tal outcomes to a pre-sp eciﬁed subset. W e fu rther n ote that tec hniques whic h are broadly reminiscen t of COND sampling ha ve lo ng b een emplo y ed in statistics and p olling design un der the name of “stratiﬁed sampling” (see e.g. [ Wik13 , Ney34 ]). W e thus feel that the study of distribution test ing in the COND mo del is we ll motiv ated b oth by theoretical and practical considerations. Giv en th e ab o ve motiv ations, the central qu estion is whether the COND mo del en ab les signif- ican tly m ore eﬃcien t algorithms than are p ossible in the w eak er SA MP mo del. Our results (see Subsection 1.3 ) show that this is ind eed the case. Before detailing our results, w e note that sev eral of them will in fact d eal with a w eaker v ariant of the COND mo del, w hic h w e no w describ e. In d esigning COND -mo del algorithms it is ob viously desirable to ha v e algorithms that only inv ok e the COND oracle on qu ery sets S wh ic h are “simple” in some sense. Of cour s e there are many p ossible notions of simp licit y; in this w ork we consider the size of a set as a measure of its simplicit y , and consider algorithms whic h only query small sets. More pr ecisely , w e consider the follo wing restriction of the general COND mo d el: PCOND oracle: W e deﬁn e a PCOND (short for “pair-cond”) or acle for D is a r estricted v ersion of COND D that only accepts input sets S w hic h are either S = [ N ] (thus providing the p ow er of a S AMP D oracle) or S = { i, j } for some i, j ∈ [ N ], i.e. sets of size t wo. Th e PCOND oracle ma y b e view ed as a minimalist v arian t of COND that essentia lly p ermits an algorithm to compare the relativ e wei gh ts of t w o items u nder D (and to dr a w r andom samples from D , b y setting S = [ N ]). ICOND oracle: W e deﬁne an ICOND (sh ort for “in terv al-cond”) or acle for D as a restricted version of COND D that only accepts input sets S whic h are inte rv als S = [ a, b ] = { a, a + 1 , . . . , b } for some a ≤ b ∈ [ N ] (note that taking a = 1 , b = N this pr o vid es th e p o we r of a SAM P D oracle). T his is a natural restriction on COND queries in settings where the N p oin ts are endo w ed w ith a total ord er . T o m otiv ate the PCOND mo del (whic h essen tially gives th e abilit y to compare t wo elemen ts), one ma y consider a s etting in whic h a human d omain exp er t can pro vid e an estimate of the relat iv e lik eliho o d of tw o d istinct outcomes in a limited-information pred iction scenario. 1.3 Our r esults W e giv e a detailed study of a r ange of natural distrib ution testing problems in th e COND mod el and its v arian ts d escrib ed ab o ve , establishing b oth u p p er and lo wer b ounds on their quer y complexit y . Our results sh ow that the abilit y to do conditional s ampling pro vides a signiﬁcan t amount of p ow er to pr op ert y testers, enabling p olylog( N )-query , or eve n constan t-quer y , algorithms for problems whose sample complexities in the standard mo d el are N Ω(1) ; see T able 1 . While we ha ve considered a v ariet y of distribution testing pr ob lems in the COND m o del, our results are certainly not exh austiv e, and many d irections remain to b e explored; we discuss some of th ese in Section 10 . 4 Problem Our results Standard mo del Is D uniform? COND D Ω  1 ǫ 2  PCOND D ˜ O  1 ǫ 2  ICOND D ˜ O  log 3 N ǫ 3  Θ  √ N ǫ 2  [ GR00 , BFR + 10 , Pan08 ] Ω  log N log log N  Is D = D ∗ for a kno wn D ∗ ? COND D ˜ O  1 ǫ 4  PCOND D ˜ O  log 4 N ǫ 4  Θ  √ N ǫ 2  [ BFF + 01 , Pa n08 , VV14 ] Ω  q log N log log N  Are D 1 , D 2 (b oth u nknown) equiv alen t? COND D 1 ,D 2 ˜ O  log 5 N ǫ 4  Θ  max  N 2 / 3 ǫ 4 / 3 , √ N ǫ 2  [ BFR + 10 , V al11 , CD VV14 ] PCOND D 1 ,D 2 ˜ O  log 6 N ǫ 21  Ho w far is D fr om unif orm ? PCOND D ˜ O  1 ǫ 20  O  1 ǫ 2 N log N  [ VV11 , VV10b ] Ω  N log N  [ VV11 , VV10a ] T able 1: C omparison b et wee n the COND mo del and the standard mo del on a v ariet y of d istr ibution testing p roblems ov er [ N ]. The upp er b ounds for th e ﬁr st three problems are for testing w hether the p rop erty holds (i.e. d TV = 0) v ersus d TV ≥ ǫ , and for the last problem th e up p er b ound is for estimating the distance to u niformit y to within an additiv e ± ǫ . 1.3.1 T est ing dist ribut ions o v er unstructured domains In this early w ork on the COND mo del our main fo cus has b een on the simplest (and, we think, most fun damen tal) pr oblems in distribu tion testing, suc h as testing whether D is the u niform distribution U ; testing whether D = D ∗ for an explicitly provided D ∗ ; testing whether D 1 = D 2 giv en COND D 1 and COND D 2 oracles; and estimating the v ariation distance b et ween D and the uniform distrib ution. I n wh at follo ws d TV denotes the v ariation distance. T est ing uniformit y . W e give a PCOND D algorithm that tests whether D = U ve rsus d TV ( D , U ) ≥ ǫ using ˜ O (1 /ǫ 2 ) calls to PCOND D , indep enden t of N . W e sho w that this PCOND D algorithm is nearly optimal b y pro ving that an y COND D tester (whic h may use arbitrary sub sets S ⊆ [ N ] as its query sets) requir es Ω(1 /ǫ 2 ) qu eries for this testing problem. T est ing equiv alence t o a known distribution. As describ ed ab o ve, for th e simp le problem of testing u niformit y we ha v e an essen tially optimal PCOND testing algorithm and a matc h ing lo w er b ound . A more general and c hallenging problem is that of testing wh ether D (accessible via a PCOND or COND oracle) is equiv alen t to D ∗ , wh ere D ∗ is an arbitrary “kno w n ” d istribution ov er [ N ] that is exp licitly provided to the testing algorithm at no cost (sa y as a v ector ( D ∗ (1) , . . . , D ∗ ( N )) of probabilit y v alues). F or this “kno wn D ∗ ” problem, we giv e a PCOND D algorithm testing whether D = D ∗ v er s us d TV ( D , D ∗ ) ≥ ǫ using ˜ O ((log N ) 4 /ǫ 4 ) queries. W e fu r ther sho w that the (log N ) Ω(1) query complexit y of our PCOND D algorithm is inheren t in the p roblem, b y proving that any PCOND D algorithm for this problem must use p log( N ) / log log ( N ) qu eries for constant ǫ . 5 Giv en th ese (log N ) Θ(1) upp er and lo wer b ounds on the query complexit y of PCOND D -testing equiv alence to a kn o w n distribution, it is natural to ask whether the fu ll COND D oracle pro vides more p o we r for this pr oblem. W e sh ow th at this is in deed the case, b y giving a ˜ O (1 /ǫ 4 )-query algorithm (indep enden t of N ) that uses u nrestricted COND D queries. T est ing equiv alence b et w een tw o unknown distributions. W e next consider the more c h allenging problem of testing whether t wo unk n o w n distr ib utions D 1 , D 2 o ver [ N ] (a v ailable via COND D 1 and COND D 2 oracles) are identic al versus ǫ -far. W e giv e tw o v ery diﬀeren t algorithms for this problem. The ﬁ rst uses PCOND oracles and h as qu er y complexit y ˜ O ((log N ) 6 /ǫ 21 ), wh ile th e second uses COND oracles and h as query complexit y ˜ O ((log N ) 5 /ǫ 4 ). W e b eliev e that the pr o of tec hniqu e of the second algorithm is of indep enden t in terest, since it shows ho w a COND D oracle can eﬃcientl y sim ulate an “appro ximate EV AL D oracle.” (An EV AL D oracle tak es as in put a p oint i ∈ [ N ] and outputs the probabilit y mass D ( i ) that D pu ts on i ; we brieﬂy explain our notion of appro ximating su c h an oracle in Subsection 1.3.3 .) Estimating the distance t o uniformity . W e also consider th e p r oblem of estimating the v aria- tion distance b et ween D an d the uniform distribution U o v er [ N ], to within an additive error of ± ǫ. In the standard SA MP D mo del this is kn own to b e a v ery d iﬃcult problem, with an Ω( N / log N ) lo wer b ou n d established in [ VV11 , VV10a ]. In cont rast, we giv e a PCOND D algorithm that makes only ˜ O (1 /ǫ 20 ) qu eries, indep enden t of N . 1.3.2 T est ing dist ribut ions o v er structured domains In th e ﬁnal p ortion of the pap er we view the d omain [ N ] as an ordered set 1 ≤ · · · ≤ N . (Note that in all the testing p r oblems and results d escrib ed previously , the domain could ju st as well ha ve b een view ed as an u nstructured s et of abstract p oin ts x 1 , . . . , x N .) With this p ersp ectiv e it is natural to consider an additional oracle. W e d eﬁne an ICOND (sh ort for “in terv al-cond”) or acle for D as a restricted v ersion of COND D , wh ic h only accepts in p ut sets S that are in terv als S = [ a, b ] = { a, a + 1 , . . . , b } for some a ≤ b ∈ [ N ] (note th at taking a = 1 , b = N th is provides the p ow er of a SAMP D oracle). W e giv e an ˜ O ((log N ) 3 /ǫ 3 )-query ICOND D algorithm for testing whether D is u niform versus ǫ -far from uniform. W e sho w that a (log N ) Ω(1) query complexit y is inh eren t for uniformity testing using ICOND D , by pro ving an Ω (log N / log log N )-query ICOND D lo wer b ou n d. Along the wa y to establishing our main testing resu lts d escrib ed ab o v e, w e devel op sev eral p ow erful to ols for analyzing distribu tions in the COND and PCOND mo d els, wh ic h we b eliev e may b e of indep enden t in terest an d utilit y in subsequent work on the COND and PCOND mo dels. T hese include as mentio ned ab o v e a pro cedure for appro ximately sim ulating an “ev aluation oracle”, as w ell as a pro cedu r e for estimati ng the w eigh t of the “neigh b orho o d” of a giv en p oin t in the domain of the distribution. (See further d iscu ssion of th ese to ols in Subsection 1.3.3 .) 1.3.3 A high-level discussion of our a lgorithms T o mainta in fo cus here we describ e only the ideas b ehind our algorithms; intuitio n for eac h of our lo wer b ound s can b e found in an in formal discussion preceding the form al pro of, see the b eginnin gs 6 of Sections 4.2 , 5.2 , an d 9 . A s can b e s een in the follo wing discussion, our algorithms share some common themes, though eac h has its own unique idea/tec hnique, w hic h we emphasize b elo w. Our simplest testing algorithm is the algo rithm for testing whether D is uniform o ve r [ N ] (using PCOND D queries). The algorithm is based on the observ ation that if a distr ibution is ǫ -far from u n iform, then the total weigh t (according to D ) of p oin ts y ∈ [ N ] for which D ( y ) ≥ (1 + Ω( ǫ )) / N is Ω( ǫ ), and the fraction of p oin ts x ∈ [ N ] for whic h D ( x ) ≤ (1 − Ω( ǫ )) / N is Ω( ǫ ). If w e obtain suc h a pair of p oints ( x, y ), then w e ca n detect this d eviation from u niformit y by p erforming Θ(1 /ǫ 2 ) PCOND D queries on the p air. Such a pair can b e obtai ned with high probabilit y by making Θ(1 /ǫ ) SA MP D queries (so as to obtain y ) as well as selecting Θ(1 /ǫ ) p oin ts uniform ly (so as to obtain x ). This approac h yields an algorithm wh ose complexit y gro ws like 1 /ǫ 4 . T o actually get an algorithm with query complexit y ˜ O (1 /ǫ 2 ) (whic h, as our low er b oun d sho ws, is tight), a sligh tly more r eﬁned approac h is applied. When we tak e the next step to testing equality to an arbitrary (but fully sp eciﬁed) distribution D ∗ , the ab ov emen tioned observ ation generalizes so as to imply that if w e samp le Θ(1 /ǫ ) p oints from D and Θ(1 /ǫ ) from D ∗ , th en with h igh pr ob ab ility w e shall obtain a pair of p oints ( x, y ) suc h that D ( x ) /D ( y ) diﬀers by at least (1 ± Ω( ǫ )) from D ∗ ( x ) /D ∗ ( y ). Unfortun ately , this cannot necessarily b e detecte d by a small n umber of PCOND D queries since (as op p osed to the uniform case), D ∗ ( x ) /D ∗ ( y ) may b e v ery large or very small. Ho wev er, w e sh o w that b y sampling from b oth D and D ∗ and all o w ing the n umb er of samples to grow with log N , with high probability w e either obtain a pair of p oints as describ ed ab o ve for whic h D ∗ ( x ) /D ∗ ( y ) is a constant, or we detect that for some set of p oin ts B we ha v e that | D ( B ) − D ∗ ( B ) | is r elativ ely large. 4 As noted previously , we pro v e a lo wer b ound sho wing that a p olynomial dep end ence on log N is u na v oidable if only PCOND D queries (in addition to s tand ard samp ling) are allo w ed. T o obtain our more eﬃcient p oly(1 /ǫ )-queries algorithm, w h ic h us es more general COND D queries, w e extend the observ ation from the uniform case in a diﬀeren t w a y . Sp eciﬁcall y , r ather than comparing the relativ e weigh t of pairs of p oin ts, we compare the relativ e weigh t of p airs in which one elemen t is a p oint and the ot her is a su bset of p oints. Roughly sp eaking, w e sho w ho w p oints can b e paired with subsets of p oints of comparable weigh t (according to D ∗ ) such that th e follo wing holds. If D is far from D ∗ , then by taking ˜ O (1 /ǫ ) samples f rom D and selecting s ubsets of p oin ts in an appropriate manner (dep en ding on D ∗ ), we can obtain (with high pr obabilit y) a p oin t x and a subset Y su c h that D ( x ) /D ( Y ) diﬀers signiﬁcan tly fr om D ∗ ( x ) /D ∗ ( Y ) and D ∗ ( x ) /D ∗ ( Y ) is a constant. In our next step, to testing equalit y b e tw ee n tw o unknown distributions D 1 and D 2 , w e need to cop e with the fact that we no longer “hav e a hold” on a kno wn distribu tion. Ou r PCOND algorithm can b e viewed as creating suc h a hold in the follo wing sense. B y sampling fr om D 1 w e obtain (with high probabilit y) a (relativ ely small) set of p oin ts R that c over the d istr ibution D 1 . By “co v ering” we mean th at except for a subs et ha ving small w eigh t according to D 1 , all p oints y in [ N ] ha v e a r epr esentative r ∈ R , i.e. a p oin t r such that D 1 ( y ) is close to D 1 ( r ). W e then sho w th at if D 2 is far from D 1 , then one of th e follo wing m ust hold: (1) There is relativ ely large w eigh t, either according to D 1 or according to D 2 , on p oint s y su c h that for some r ∈ R we ha ve that D 1 ( y ) is close to D 1 ( r ) but D 2 ( y ) is not suﬃ ciently close to D 2 ( r ); (2) There exists a p oint 4 Here we use B for “Buck et”, as we con sider a bu ck eting of t he p oints in [ N ] based on their w eight according to D ∗ . W e note that bu ck eting has b een used extensively in th e context of testing prop erties of distributions, see e.g. [ BFR + 10 , BFF + 01 ]. 7 r ∈ R s u c h that the set of p oints y for which D 1 ( y ) is close to D 1 ( r ) has signiﬁcantly diﬀerent w eigh t according to D 2 as co mpared to D 1 . W e note that this algorithm can b e view ed as a v arian t of the PCOND algorithm for the case wh en one of the distrib utions is known (where the “buc k ets” B , wh ic h were deﬁn ed by D ∗ in that algorithm (and were disjoint) , are no w deﬁned b y the p oin ts in R (and are not necessarily disjoint )). As n oted previously , our (general) COND algorithm for testing the equalit y of t wo (un kno wn) distributions is based on a subr outine that estimates D ( x ) (to within (1 ± O ( ǫ ))) for a giv en p oint x giv en access to COND D . Obtaining s uc h an estimate f or every x ∈ [ N ] cannot b e done eﬃcient ly for some distrib utions. 5 Ho wev er, we sho w that if we allo w the algorithm to output UNKNO WN on some su b set of p oints with total weigh t O ( ǫ ), then the relaxed task can b e p erformed usin g p oly(log N , 1 /ǫ ) queries, b y p erforming a kind of rand omized binary search “with exce ptions”. This relaxed v ersion, whic h we refer to as an appr oximate EV AL or acle , suﬃ ces for our needs in distinguishing b et wee n the case that D 1 and D 2 are the same d istribution and the case in wh ic h they are far from eac h other. It is p ossible that this pro cedu re w ill b e useful for ot her tasks as w ell. The algorithm for estimating the distance to uniformity (which uses p oly(1 /ǫ ) PCOND D queries) is based on a su b routine for ﬁn ding a r efer enc e p oint x together with an estimate b D ( x ) of D ( x ). A referen ce p oin t should b e suc h th at D ( x ) is relativ ely close to 1 / N (if su c h a p oint cannot b e found then it is evidence that D is v ery far from uniform). Giv en a reference p oint x (toget her w ith b D ( x )) it is p ossible to estimate th e d istance to un iformit y by obtaining (using PCOND queries) estimates of the ratio b et wee n D ( x ) and D ( y ) for p oly(1 /ǫ ) uniformly selected p oints y . The pr o cedure for ﬁnd ing a reference p oin t x together with b D ( x ) is b ased on estimating b oth the w eigh t and the size of a su bset of p oin ts y suc h that D ( y ) is close to D ( x ). The pro cedure shares a common subr outine, Estima t e-Neighborhood , with the PCOND algorithm for testing equiv alence b et ween t w o unknown distribu tions. Finally , the ICOND D algorithm for testing uniformity is b ased on a version of the appr o ximate EV AL oracle mentio ned previously , wh ich on one hand u ses only ICOND D (rather than general COND D ) qu eries, and on the other hand exploits th e fact that we are d ealing with the uniform distribution rather than an arbitrary distribu tion. 1.4 The work of Chakrab orty et al. [ CF GM13 ] Chakrab orty et al. [ CF GM13 ] prop osed essen tially the same COND mo d el that we stu dy , diﬀering only in what h app ens on quer y sets S such th at D ( S ) = 0. In our mo del such a query causes the COND oracle and algorithm to retur n F AIL , wh ile in their mo del such a query r etur ns a u niform random i ∈ S. Related to testing equalit y of d istributions, [ CF GM13 ] provides an (adaptiv e) algorithm for testing whether D is equ iv alent to a sp eciﬁed distribu tion D ∗ using p oly(log ∗ N , 1 /ǫ ) COND qu er ies. Recall th at we give an algorithm f or this problem that p erforms ˜ O (1 /ǫ 4 ) COND queries. [ CFG M13 ] also give s a non-adaptive algorithm for this problem that p erforms p oly(log N , 1 /ǫ ) COND queries. 6 5 As an extreme case consider a distribution D for which D (1) = 1 − φ an d D (2) = · · · = D ( N ) = φ/ ( N − 1) for some very small φ (which in particular may dep end on N ), and fo r which we are interested in estimating D (2). This requires Ω(1 /φ ) queries. 6 W e note that it is only p ossible for them to giv e a non- adaptive algorithm b ecause their mo del is more p ermissiv e 8 T esting equiv alence b et we en tw o unknown distribu tions is n ot considered in [ CFGM 13 ], an d the same is true for testing in the PCOND m o del. [ CF GM13 ] also present s additional resu lts for a range of other p r oblems, whic h we d iscu ss b elo w: • An (adaptiv e) algorithm for testing u n iformit y that p erf orms p oly (1 /ǫ ) qu eries. 7 The sets on whic h the algorithms p erforms COND queries are of s ize linear in 1 /ǫ . Recall that our algorithm for this p roblem p erf orms ˜ O (1 /ǫ 2 ) PCOND qu eries and that we show that every algorithm must p erform Ω(1 /ǫ 2 ) queries (when th ere is no restriction on the t yp es of queries). W e note that their analysis u s es th e same observ ation that ours do es r egarding distribu tions that are far fr om un iform (see th e discussion in Subsection 1.3.3 ), but exploits it in a diﬀerent manner. They also give a non -adaptive algorithm for this problem th at p erforms p oly(log N , 1 /ǫ ) COND queries and sh o w that Ω (log log N ) is a lo w er b ound on th e necessary num b er of queries for non-adaptive algorithms. • An (adaptiv e) algorithm for testing w h ether D is equiv alen t to a sp eciﬁed distribution D ∗ using p oly(log ∗ N , 1 /ǫ ) COND queries. Recall that w e giv e an algo rithm for this problem that p erforms ˜ O (1 /ǫ 4 ) COND queries. They also give a non -adaptive algorithm for this problem th at p erforms p oly(log N , 1 /ǫ ) COND qu eries. • An (adaptive) algorithm for testing any lab el-in v arian t (i.e., inv ariant un der p erm u tations of the domain) p rop erty that p er f orms p oly(log N , 1 /ǫ ) COND queries. As n oted in [ CF GM13 ], this in particular implies an algorithm with this complexit y for estimating th e distance to uniformity . Recall that we giv e an algorithm for this estimation problem that p erforms p oly(1 /ǫ ) PCOND queries. The algorithm for testing any lab el-in v ariant p rop erty is based on learning a certain approx- imation of the distribu tion D and in this pro cess d eﬁning some sort of appr oximate EV AL oracle. T o the b est of our understandin g, our notion of an appro ximate EV A L oracl e (whic h is used to obtain one or our resu lts for testing equiv alence b et ween t w o unkn o w n distribu tions) is qu ite diﬀerent . They also sh o w that there exists a lab el-in v arian t prop ert y for whic h any adaptiv e algorithm m ust p erform Ω( √ log log N ) COND queries. • Finally they sho w that there exist general prop erties that requ ire Ω( N ) COND queries. than ours (if a query set S is proposed for whic h D ( S ) = 0, their mod el returns a u n iform random element of S while our mo del returns F AIL ). I n our stricter mo del, an y non-adaptive algorithm which q ueries a p rop er sub set S ( N w ould output F AIL on some distribution D . 7 The p recise p olynomial is not sp eciﬁed – we b elieve it is roughly 1 /ǫ 4 as it follo ws from an application of t h e identit y tester of [ BFF + 01 ] with distance Θ( ǫ 2 ) on a domain of size O (1 /ǫ ). 9 2 Preliminaries 2.1 Deﬁnitions Throughout the pap er we shall w ork with d iscrete distributions ov er an N -elemen t set whose elemen ts are d enoted { 1 , . . . , N } ; we write [ N ] to d enote { 1 , . . . , N } and [ a, b ] to d enote { a, . . . , b } . F or a distribution D o ver [ N ] w e write D ( i ) to d enote the probabilit y of i und er D , and for S ⊆ [ N ] w e wr ite D ( S ) to denote P i ∈ S D ( i ) . F or S ⊆ [ N ] suc h that D ( S ) > 0 w e wr ite D S to d enote the conditional d istribution of D r estricted to S , so D S ( i ) = D ( i ) D ( S ) for i ∈ S and D S ( i ) = 0 for i / ∈ S. As is standard in p rop erty te sting of distributions, throughou t this w ork we measure the distance b et w een tw o distr ib utions D 1 and D 2 using the total variation distanc e : d TV ( D 1 , D 2 ) def = 1 2 k D 1 − D 2 k 1 = 1 2 X i ∈ [ N ] | D 1 ( i ) − D 2 ( i ) | = max S ⊆ [ N ] | D 1 ( S ) − D 2 ( S ) | . W e ma y view a pr op e rty P of distrib utions o ver [ N ] as a s u bset of all distributions o v er [ N ] (consisting of all distributions that ha ve the prop ert y). The d istance fr om D to a prop ert y P , denoted d TV ( D , P ) , is deﬁned as inf D ′ ∈P { d TV ( D , D ′ ) } . W e deﬁ n e testing algorithms for prop erties of d istributions o v er [ N ] as follo ws : Deﬁnition 2 L et P b e a pr op erty of distributions over [ N ] . L et O RA CLE D b e some typ e of or acle which pr ovides ac c ess to D . A q ( ǫ, N )-query ORACLE te sting algo rithm for P is an algorithm T which is given ǫ, N as input p ar ameters and or acle ac c ess to an ORA CLE D or acle. F or any distribution D over [ N ] algorithm T makes at most q ( ǫ, N ) c al ls to ORACL E D , and: • if D ∈ P then with pr ob ability at le ast 2 / 3 algorithm T outputs ACCEPT ; • if d TV ( D , P ) ≥ ǫ then with pr ob ability at le ast 2 / 3 algorithm T outputs REJECT . This deﬁn ition can easily b e extended to co v er situ ations in wh ic h there are tw o “unknown” distributions D 1 , D 2 that are accessible via ORACLE D 1 and OR A CLE D 2 oracles. In particular we shall consider algorithms for testing w hether D 1 = D 2 v er s us d TV ( D 1 , D 2 ) in suc h a setting. W e sometimes write T ORACL E D to indicate that T has access to ORAC LE D . 2.2 Useful to ols On sev eral o ccasions w e will use the data pr o c essing ine qu ality for variation distanc e . This fu n- damen tal result s ays that for any tw o distribu tions D , D ′ , applyin g any (p ossibly ran d omized) function to D and D ′ can nev er increase their statistical distance; see e.g. part (iv) of L emma 2 of [ Rey11 ] for a p ro of of this lemma. 10 Lemma 1 (Data Pro cessing Inequality f or T otal V ariation Distance) L et D , D ′ b e two dis- tributions over a do main Ω . Fix any r andomize d f unction 8 F on Ω , and let F ( D ) b e the distribution such that a dr aw fr om F ( D ) is obtaine d by dr awing indep endently x fr om D and f fr om F and then outputting f ( x ) (likewise for F ( D ′ ) ). Then we have d TV ( F ( D ) , F ( D ′ )) ≤ d TV ( D , D ′ ) . W e next giv e s everal v ariant s of C hernoﬀ b ounds (see e.g. Chapter 4 of [ MR95 ]). Theorem 1 L et Y 1 , . . . , Y m b e m indep endent r andom variables that take on values in [0 , 1] , wher e E[ Y i ] = p i , and P m i =1 p i = P . F or any γ ∈ (0 , 1] we have (additive b ound) Pr " m X i =1 Y i > P + γ m # , Pr " m X i =1 Y i < P − γ m # ≤ exp ( − 2 γ 2 m ) (1) (multiplic ative b ound) Pr " m X i =1 Y i > (1 + γ ) P # < exp ( − γ 2 P / 3) (2) and (multiplic ative b ound) Pr " m X i =1 Y i < (1 − γ ) P # < exp ( − γ 2 P / 2) . (3) The b ound in Eq uation ( 2 ) is derive d fr om the fol lowing mor e gener al b ound, which holds f r om any γ > 0 : Pr " m X i =1 Y i > (1 + γ ) P # ≤  e γ (1 + γ ) 1+ γ  P , (4) and which also implies that for any B > 2 eP , Pr " m X i =1 Y i > B # ≤ 2 − B . (5) The follo wing extension of the multi plicativ e b ound is useful when w e only hav e upp er and/or lo wer b ounds on P (see Exercise 1.1 of [ DP09 ]): Corollary 2 In the setting of The or em 1 supp ose that P L ≤ P ≤ P H . Then for any γ ∈ (0 , 1] , we have Pr " m X i =1 Y i > (1 + γ ) P H # < exp( − γ 2 P H / 3) (6) Pr " m X i =1 Y i < (1 − γ ) P L # < exp( − γ 2 P L / 2) (7) 8 Which can be seen as a distribution o ver functions ov er Ω. 11 W e will also u se the follo wing corollary of Theorem 1 : Corollary 3 L et 0 ≤ w 1 , . . . , w m ∈ R b e such that w i ≤ κ for al l i ∈ [ m ] wher e κ ∈ (0 , 1] . L et X 1 , . . . , X m b e i.i. d. Bernoul li r andom variables with Pr[ X i = 1] = 1 / 2 for al l i , and let X = P m i =1 w i X i and W = P m i =1 w i . F or any γ ∈ (0 , 1] , Pr  X > (1 + γ ) W 2  < exp  − γ 2 W 6 κ  and P r  X < (1 − γ ) W 2  < exp  − γ 2 W 4 κ  , and for any B > e · W , Pr[ X > B ] < 2 − B /κ . Pro of: Let w ′ i = w i /κ (so that w ′ i ∈ [0 , 1]), let W ′ = P m i =1 w ′ i = W /κ , and for eac h i ∈ [ m ] let Y i = w ′ i X i , so that Y i tak es on v alues in [0 , 1] and E[ Y i ] = w ′ i / 2. Let X ′ = P m i =1 w ′ i X i = P m i =1 Y i , so that E[ X ′ ] = W ′ / 2. By the d eﬁnitions of W ′ and X ′ and by Equation ( 2 ), for an y γ ∈ (0 , 1], Pr  X > (1 + γ ) W 2  = Pr  X ′ > (1 + γ ) W ′ 2  < exp  − γ 2 W ′ 6  = exp  − γ 2 W 6 κ  , (8) and similarly by Equ ation ( 3 ) Pr  X < (1 − γ ) W 2  < exp  − γ 2 W 4 κ  . (9) F or B > e · W = 2 e · W/ 2 we apply E quation ( 5 ) and get Pr [ X > B ] = Pr  X ′ > B /κ  < 2 − B /κ , (10) as claimed. 3 Some u seful pro cedures In this section w e describ e some pro cedures that will b e us ed by our algorithms. On a ﬁrst pass the reader m a y w ish to f o cus on th e explanatory p r ose and p erformance guarante es of these pro- cedures (i.e., the statemen ts of Lemma 2 and Lemma 3 , as wel l as Deﬁnition 3 and Th eorem 4 ) and otherwise skip to p. 27 ; the in tern al details of the pro ofs are not n ecessary for the su b sequent sections that use these pro cedures. 3.1 The pro cedure Comp are W e start b y describ in g a pr o cedure that estimates the r atio b et ween th e weigh ts of tw o disjoint sets of p oin ts by p erforming COND queries on the union of th e sets. More p recisely , it estimates the ratio (to within 1 ± η ) if the r atio is not to o high and not to o lo w . Otherwise, it m ay output high or lo w , accordingly . In the s p ecial case w h en eac h set is of size one, the queries p erformed are PCOND queries. 12 Algorithm 1: Comp a re Input : COND query access to a d istribution D o v er [ N ], d isjoin t subsets X, Y ⊂ [ N ], parameters η ∈ (0 , 1], K ≥ 1, and δ ∈ (0 , 1 / 2]. 1. P erform Θ  K log(1 /δ ) η 2  COND D queries on the set S = X ∪ Y , and let ˆ µ b e the f r action of times that a p oint y ∈ Y is returned . 2. If ˆ µ < 2 3 · 1 K +1 , then return Low . 3. Else, if 1 − ˆ µ < 2 3 · 1 K +1 , then return High . 4. Else return ρ = ˆ µ 1 − ˆ µ . Lemma 2 Given as input two disjoint subsets of p oints X , Y to gether with p ar ameters η ∈ (0 , 1] , K ≥ 1 , and δ ∈ (0 , 1 / 2] , as wel l as COND query ac c ess to a distribution D , the pr o c e dur e Comp are (Algo rithm 1 ) either outputs a value ρ > 0 or outputs High or Lo w , and satisﬁes the fol lowing: 1. If D ( X ) /K ≤ D ( Y ) ≤ K · D ( X ) then with pr ob ability at le ast 1 − δ the pr o c e dur e outputs a value ρ ∈ [1 − η, 1 + η ] D ( Y ) /D ( X ) ; 2. If D ( Y ) > K · D ( X ) then with pr ob ability at le ast 1 − δ the pr o c e dur e outputs either High or a value ρ ∈ [1 − η , 1 + η ] D ( Y ) /D ( X ) ; 3. If D ( Y ) < D ( X ) /K then with pr ob ability at le ast 1 − δ the pr o c e dur e outputs e ither Lo w or a value ρ ∈ [1 − η, 1 + η ] D ( Y ) /D ( X ) . The pr o c e dur e p erforms O  K log(1 /δ ) η 2  COND qu e ries on the set X ∪ Y . Pro of: The b ound on the num b er of queries p erformed b y the algorithm f ollo ws directly f r om the description of the algorithm, and hence we turn to establish its correctness. Let w ( X ) = D ( X ) D ( X )+ D ( Y ) and let w ( Y ) = D ( Y ) D ( X )+ D ( Y ) . Observe that w ( Y ) w ( X ) = D ( Y ) D ( X ) and that for ˆ µ as deﬁned in Line 1 of the algorithm, E[ ˆ µ ] = w ( Y ) and E[1 − ˆ µ ] = w ( X ). Also observ e that for an y B ≥ 1, if D ( Y ) ≥ D ( X ) /B , then w ( Y ) ≥ 1 B +1 and if D ( Y ) ≤ B · D ( X ), then w ( X ) ≥ 1 B +1 . Let E 1 b e the ev ent that ˆ µ ∈ [1 − η / 3 , 1 + η/ 3] w ( Y ) and let E 2 b e the ev ent that (1 − ˆ µ ) ∈ [1 − η / 3 , 1 + η / 3] w ( X ). Given the num b er of COND qu eries p erformed on the set X ∪ Y , b y applying a m ultiplicativ e Chernoﬀ b ound (see Theorem 1 ), if w ( Y ) ≥ 1 4 K then with probabilit y at least 1 − δ / 2 the ev ent E 1 holds, and if w ( X ) ≥ 1 4 K , then with probabilit y at least 1 − δ / 2 the ev ent E 2 holds. W e next consider the three cases in the lemma statemen t. 1. If D ( X ) /K ≤ D ( Y ) ≤ K D ( X ), then by the discu ssion ab ov e, w ( Y ) ≥ 1 K +1 , w ( X ) ≥ 1 K +1 , and with probabilit y at least 1 − δ we h a ve that ˆ µ ∈ [1 − η / 3 , 1 + η / 3] w ( Y ) and (1 − ˆ µ ) ∈ [1 − η / 3 , 1 + η / 3] w ( X ). C on d itioned on these b oun d s h oldin g, ˆ µ ≥ 1 − η / 3 K + 1 ≥ 2 3 · 1 K + 1 and 1 − ˆ µ ≥ 2 3 · 1 K + 1 . 13 It f ollo ws that the pr o cedure outputs a v alue ρ = ˆ µ 1 − ˆ µ ∈ [1 − η , 1 + η ] w ( Y ) w ( X ) as r equired by Item 1 . 2. If D ( Y ) > K · D ( X ), then w e consid er t w o sub cases. (a) If D ( Y ) > 3 K · D ( X ), then w ( X ) < 1 3 K +1 , so that by a multiplica tiv e Chern oﬀ b ound (stated in Corollary 2 ), with p r obabilit y at least 1 − δ we h av e that 1 − ˆ µ < 1 + η / 3 3 K + 1 ≤ 4 3 · 1 3 K + 1 ≤ 2 3 · 1 K + 1 , causing the algorithm to outpu t High . Thus Item 2 is established for this sub case. (b) If K · D ( X ) < D ( Y ) ≤ 3 K · D ( X ), then w ( X ) ≥ 1 3 K +1 and w ( Y ) ≥ 1 2 , so that the ev ents E 1 and E 2 b oth h old w ith p robabilit y at least 1 − δ . Assume that these ev ents in fact hold. Th is im p lies that ˆ µ ≥ 1 − η/ 3 2 ≥ 2 3 · 1 K +1 , and the algorithm either outputs High or outputs ρ = ˆ µ 1 − ˆ µ ∈ [1 − η, 1 + η ] w ( Y ) w ( X ) , so Item 2 is established for this sub case as we ll. 3. If D ( Y ) < D ( X ) /K , so that D ( X ) > K · D ( Y ), then the exact same arguments are applied as in the previous case, just sw itching the roles of Y and X and the roles of ˆ µ and 1 − ˆ µ so as to establish Item 3 . W e hav e thus established all items in the lemma. 3.2 The pro cedure Est ima te-Neighborhood In this subsection w e describ e a pr o cedure that, giv en a p oin t x , pr o vid es an estimat e of the we igh t of a set of p oin ts y such that D ( y ) is similar to D ( x ). In ord er to sp ecify th e b eha vior of the pro cedur e m ore precisely , we in tro duce the follo w ing notation. F or a d istribution D ov er [ N ], a p oint x ∈ [ N ] and a parameter γ ∈ [0 , 1], let U D γ ( x ) def = n y ∈ [ N ] : 1 1 + γ D ( x ) ≤ D ( y ) ≤ (1 + γ ) D ( x ) o (11) denote the set of p oin ts whose we igh t is “ γ -close” to th e weigh t of x . If we tak e a sample of p oints distributed according to D , then the exp ected fraction of these p oint s that b elong to U D γ ( x ) is D ( U D γ ( x )). If this v alue is n ot to o small, then the actual fraction in the sample is close to the exp ected v alue. Hence, if we could eﬃcien tly determin e for an y giv en p oin t y w hether or not it b elongs to U D γ ( x ), th en we could obtain a go o d estimate of D ( U D γ ( x )). The diﬃcu lty is that it is not p ossible to p erform this task eﬃcien tly for “b ound ary ” p oin ts y suc h that D ( y ) is very close to (1 + γ ) D ( x ) or to 1 1+ γ D ( x ). Ho wev er, for our purp oses, it is not imp ortant that w e obtain the w eigh t and size of U D γ ( x ) for a sp eciﬁc γ , b u t rather it suﬃces to do so for γ in a give n range, as stated in the n ext lemma. The parameter β in the lemma is the threshold ab ov e which w e exp ect the algorithm to p ro vide an estimate of the w eigh t, while [ κ, 2 κ ) is the range in wh ic h γ is p ermitted to lie; ﬁnally , η is the desired (m ultiplicativ e) accuracy of th e estimate, while δ is a b ound on th e probabilit y of error allo w ed to the su broutine. 14 Lemma 3 Given as input a p oint x to gether with p ar ameters κ, β , η , δ ∈ (0 , 1 / 2] as wel l as PCOND query ac c ess to a distribution D , the pr o c e dur e Estima te-Neighborhoo d (Algorithm 2 ) outputs a p air ( ˆ w , α ) ∈ [0 , 1] × ( κ, 2 κ ) suc h that α is uniformly distribute d in { κ + iθ } κ/θ − 1 i =0 for θ = κηβ δ 64 , and such that the fol lowing holds: 1. If D ( U D α ( x )) ≥ β , then with pr ob ability at le ast 1 − δ we have ˆ w ∈ [1 − η , 1 + η ] · D ( U D α ( x )) , and D ( U D α + θ ( x ) \ U D α ( x )) ≤ η β / 16 ; 2. If D ( U D α ( x )) < β , then with pr ob ability at le ast 1 − δ we have ˆ w ≤ (1 + η ) · β , and D ( U D α + θ ( x ) \ U D α ( x )) ≤ η β / 16 . The numb er of PCOND queries p erforme d by the pr o c e dur e is O  log(1 /δ ) · log (log(1 /δ ) / ( δβ η 2 )) κ 2 η 4 β 3 δ 2  . Algorithm 2: Estima t e-Neighborhood Input: PCOND qu er y access to a d istribution D o v er [ N ], a p oint x ∈ [ N ] and parameters κ, β , η , δ ∈ (0 , 1 / 2] 1: Set θ = κηβ δ 64 and r = κ θ = 64 ηβ δ . 2: Select a v alue α ∈ { κ + iθ } r − 1 i =0 uniformly at random. 3: Call the SAMP D oracle Θ(log (1 /δ ) / ( β η 2 )) times and let S b e the set of p oin ts obtained. 4: F or eac h p oint y in S call Comp are D ( { x } , { y } , θ / 4 , 4 , δ / (4 | S | )) (if a p oin t y app ears more than once in S , then Comp a re is called only on ce on y ). 5: Let ˆ w b e th e fr action of o ccur r ences of p oints y in S for wh ic h Comp are returned a v alue ρ ( y ) ∈ [1 / (1 + α + θ / 2) , (1 + α + θ / 2)]. (That is, S is viewed as a m ultiset.) 6: Return ( ˆ w , α ). Pro of of Lemma 3 : The num b er of PCOND queries p erformed by Estima te-Neighbor hood is the size of S times the num b er of PCOND qu eries p erf ormed in eac h call to Comp a re . By the setting of the p arameters in the calls to Comp are , the total num b er of PCOND qu eries is O  ( | S | ) · log | S | /δ ) θ 2  = O  log(1 /δ ) · log (log(1 /δ ) / ( δβ η 2 )) κ 2 η 4 β 3 δ 2  . W e no w turn to establishing the correctness of the p ro cedure. Since D and x are ﬁxed, in w hat follo ws w e shall use the shorthand U γ for U D γ ( x ). F or α ∈ { κ + iθ } r − 1 i =0 , let ∆ α def = U α + θ \ U α . W e next deﬁne sev eral “desirable” ev en ts. In all that follo ws we view S as a multiset. 1. Let E 1 b e the ev en t that D (∆ α ) ≤ 4 / ( δr ). Since there are r disjoint sets ∆ α for α ∈ { κ + iθ } r − 1 i =0 , the probability that E 1 o ccurs (tak en ov er the u niform c hoice of α ) is at least 1 − δ / 4. F rom this p oint on we ﬁx α and assume E 1 holds. 2. The ev ent E 2 is that | S ∩ ∆ α | / | S | ≤ 8 / ( δr ) (that is, at most twice the u pp er b oun d on the exp ected v alue). By applying the multiplica tiv e Chernoﬀ b ound u sing the fact that | S | = Θ(log(1 /δ ) / ( β η 2 )) = Ω(log(1 /δ ) · ( δr )), we ha v e that P r S [ E 2 ] ≥ 1 − δ / 4. 15 3. The ev en t E 3 is deﬁned as follo ws: If D ( U α ) ≥ β , then | S ∩ U α | / | S | ∈ [1 − η / 2 , 1 + η / 2] · D ( U α ), and if D ( U α ) < β , then | S ∩ U α | / | S | < (1 + η/ 2) · β . O nce again applying the m ultiplicativ e Chernoﬀ b oun d (for b oth cases) and using that fact that | S | = Θ(log(1 /δ ) / ( β η 2 )), w e hav e that Pr S [ E 3 ] ≥ 1 − δ / 4. 4. Let E 4 b e the ev en t that all calls to Comp are return an output as sp eciﬁed in Lemma 2 . Giv en the sett ing of the conﬁdence parameter in the call s to Comp are w e ha ve that Pr[ E 4 ] ≥ 1 − δ / 4 as w ell. Assume from this p oin t on that ev en ts E 1 through E 4 all h old wh ere this o ccurs with probabilit y at least 1 − δ . By the deﬁnition of ∆ α and E 1 w e h av e that D ( U α + θ \ U α ) ≤ 4 / ( δr ) = η β / 16, as r equired (in b oth items of the lemma). L et T b e the (multi-)subset of p oin ts y in S for w hic h Comp are returned a v alue ρ ( y ) ∈ [1 / (1 + α + θ / 2) , (1 + α + θ / 2)] (so that ˆ w , as deﬁn ed in the algorithm, equals | T | / | S | ). Note ﬁrst that conditioned on E 4 w e ha ve that for ev ery y ∈ U 2 κ it holds that the output of Comp ar e when called on { x } and { y } , denoted ρ ( y ), satisﬁes ρ ( y ) ∈ [1 − θ / 4 , 1 + θ / 4]( D ( y ) /D ( x )), while for y / ∈ U 2 κ either Comp are outpu ts High or Lo w or it outpu ts a v alue ρ ( y ) ∈ [1 − θ / 4 , 1 + θ / 4]( D ( y ) /D ( x )). Th is implies that if y ∈ U α , then ρ ( y ) ≤ (1 + α ) · (1 + θ / 4) ≤ 1 + α + θ / 2 and ρ ( y ) ≥ (1 + α ) − 1 · (1 − θ / 4) ≥ (1 + α + θ / 2) − 1 , so that S ∩ U α ⊆ T . On the other hand , if y / ∈ U α + θ then either ρ ( y ) > (1 + α + θ ) · (1 − θ / 4) ≥ 1 + α + θ / 2 or ρ ( y ) < (1 + α + θ ) − 1 · (1 + θ / 4) ≤ (1 + α + θ / 2) − 1 so that T ⊆ S ∩ U α + θ . Com bining the t w o w e h av e: S ∩ U α ⊆ T ⊆ S ∩ U α + θ . (12) Recalling that ˆ w = | T | | S | , the left-hand side of Equation ( 12 ) imp lies that ˆ w ≥ | S ∩ U α | | S | , (13) and by E 1 and E 2 , the righ t-hand-side of Equation ( 12 ) imp lies that ˆ w ≤ | S ∩ U α ) | | S | + 8 δ r ≤ | S ∩ U α | | S | + β η 8 . (14) W e consider the tw o cases stated in the lemma: 1. If D ( U α ) ≥ β , then by Equ ation ( 13 ), Equ ation ( 14 ) and (the ﬁr st part of ) E 3 , we ha v e that ˆ w ∈ [1 − η , 1 + η ] · D ( U α ). 2. If D ( U α ) < β , then b y Equation ( 14 ) and (the second part of ) E 3 , w e h a ve that ˆ w ≤ (1 + η ) β . The lemma is thus established. 3.3 The pro cedure App ro x-Ev al-Simula tor 3.3.1 Appro ximate EV AL oracles. W e b egin b y deﬁnin g the n otion of an “appro ximate EV AL oracle” that w e will use. Intuitiv ely this is an oracle which giv es a multiplic ativ ely (1 ± ǫ )-accurate estimate of the v alue of D ( i ) for all i 16 in a ﬁxed set of probab ility wei gh t at least 1 − ǫ un der D . More precisely , we hav e the follo wing deﬁnition: Deﬁnition 3 L et D b e a distribution over [ N ] . An ( ǫ, δ )-appro ximate EV AL D sim ulator is a r an- domize d pr o c e dur e OR A CLE with the fol lowing pr op erty: F or e ach 0 < ǫ < 1 , ther e is a ﬁxe d set S ( ǫ,D ) ( [ N ] with D ( S ( ǫ,D ) ) < ǫ for which the fol lowing holds. Given as i nput an element i ∗ ∈ [ N ] , the pr o c e dur e ORA CLE either outputs a value α ∈ [0 , 1] or outputs UNKNO WN or F AIL . The fol lowing holds for al l i ∗ ∈ [ N ] : (i) If i ∗ / ∈ S ( ǫ,D ) then with pr ob ability at le ast 1 − δ the output of OR A CLE on input i ∗ is a value α ∈ [0 , 1] such that α ∈ [1 − ǫ, 1 + ǫ ] D ( i ∗ ) ; (i) If i ∗ ∈ S ( ǫ,D ) then with pr ob ability at le ast 1 − δ the pr o c e dur e e ither outputs UNKNO WN or outputs a value α ∈ [0 , 1] such that α ∈ [1 − ǫ, 1 + ǫ ] D ( i ∗ ) . W e note that according to th e ab ov e deﬁnition, it ma y b e the case th at diﬀerent calls to ORACL E on the s ame input element i ∗ ∈ [ N ] may return diﬀerent v alues. Ho w ev er, the “lo w -w eigh t” set S ( ǫ,D ) is an a priori ﬁxed set that do es not dep en d in any wa y on the inp ut p oin t i ∗ giv en to the algorithm. The key prop ert y of an ( ǫ, δ )-approxi mate EV AL D oracle is that it reliably give s a m ultiplicativ ely (1 ± ǫ )-accurate estimate of the v alue of D ( i ) for all i in some ﬁxed set of probabilit y w eigh t at least 1 − ǫ un der D . 3.3.2 Constructing an appro ximate EV AL D sim ulat or using COND D In this subsection we sho w that a COND D oracle can b e used to obtain an app ro x im ate EV AL sim ulator: Theorem 4 L et D b e any distribution over [ N ] and let 0 < ǫ, δ < 1 . The algorithm Appro x- Ev al-Simula t or has the fol lowing pr op erties: It uses ˜ O  (log N ) 5 · (log(1 /δ )) 2 ǫ 3  c al ls to COND D and it is an ( ǫ, δ ) -appr oximate EV AL D simulator. A few notes: First, in the pro of we giv e b elo w of Theorem 4 we assume throughout that 0 < ǫ ≤ 1 / 40. This in curs no loss of generalit y b ecause if the desired ǫ parameter is in (1 / 40 , 1) then the parameter ca n simply b e set to 1 / 40. W e further note th at in k eeping with our requirement on a COND D algorithm, the algorithm Appro x-Ev al-Simula tor only eve r calls the COND D oracle on s ets S w hic h are either S = [ N ] or else conta in at least one elemen t i that has b een returned as the output of an earlier call to COND D . T o see this, note that Line 6 is the only line w hen COND D queries are p erformed. In the ﬁrst execution of the outer “F or” lo op clearly all COND qu eries are on set S 0 = [ N ] . In su bsequent stages the only wa y a set S j is formed is if either (i) S j is set to { i ∗ } in Lin e 10 , in whic h case clearly i ∗ w as previously receiv ed as the resp onse of a COND D ( S j − 1 ) 17 query , or else (ii) a nonzero fr action of elemen ts i 1 , . . . , i m receiv ed as resp onses to COND D ( S j − 1 ) queries b elong to S j (see L ine 19 ). A preliminary simpliﬁcation. Fix a distribution D ov er [ N ]. Let Z denote s u pp( D ), i. e. Z = { i ∈ [ N ] : D ( i ) > 0 } . W e ﬁrst claim that in pr o vin g Theorem 4 we ma y assume w ith ou t loss of generalit y that no tw o distinct elemen ts i, j ∈ Z hav e D ( i ) = D ( j ) – in other words, we shall pro v e the theorem und er this assumption on D , and we claim that this implies the general result. T o see this, observe that if Z con tains elemen ts i 6 = j with D ( i ) = D ( j ), then for an y arb itrarily small ξ > 0 and an y arbitrarily large M we can p er tu rb th e weig ht s of elemen ts in Z to obtain a distribution D ′ supp orted on Z su c h that (i) no tw o elemen ts of Z h a ve the same pr obabilit y u n der D ′ , and (ii) for ev ery S ⊆ [ N ] , S ∩ Z 6 = ∅ w e ha ve d TV ( D S , D ′ S ) ≤ ξ / M . Since the v ariation distance b et w een D ′ S and D S is at most ξ / M for an arbitrarily small ξ , the v ariation distance b et w een (the execution of an y M -query COND algorithm run on D ) and (the execution of any M -query COND algorithm ru n on D ′ ) will b e at most ξ . Since ξ can b e made arb itrarily small this means that indeed withou t loss of generalit y we may w ork with D ′ in wh at follo ws. Th us, we henceforth assume that the distribu tion D has no tw o elemen ts in s u pp( D ) with the same w eigh t. F or suc h a distribution we can explicitly describ e the set S ( ǫ,D ) from Deﬁnition 3 that our analysis will d eal with. Let π : { 1 , . . . , | Z |} → Z b e the b ijection such that D ( π (1)) > · · · > D ( π ( | Z | )) (note that th e bijection π is uniquely deﬁned b y the assum p tion that D ( i ) 6 = D ( j ) for all distinct i, j ∈ Z ). Give n a v alue 0 < τ < 1 w e deﬁne the set L τ ,D to b e ([ N ] \ Z ) ∪ { π ( s ) , . . . , π ( | Z | ) } where s is the smallest ind ex in { 1 , . . . , | Z |} suc h that P | Z | j = s D ( π ( j )) < τ (if D ( π ( | Z | )) itself is at least τ th en we deﬁne L τ ,D = [ N ] \ Z ). Th us intuitiv ely L τ ,D con tains the τ fraction (w.r.t. D ) of [ N ] consisting of the ligh test elemen ts. The d esired set S ( ǫ,D ) is precisely L ǫ,D . In tuition for the algorithm. Th e high-lev el idea of th e EV AL D sim ulation is th e follo wing: Let i ∗ ∈ [ N ] b e th e input element giv en to the EV AL D sim ulator. The algorithm w orks in a sequence of stages. Before p erf orming the j -th stage it mainta ins a set S j − 1 that conta ins i ∗ , and it has a high-accuracy estimate ˆ D ( S j − 1 ) of the v alue of D ( S j − 1 ). (The initial set S 0 is simply [ N ] and the initial estimate ˆ D ( S 0 ) is of course 1.) In the j -th stage the algorithm attempts to constru ct a subset S j of S j − 1 in suc h a wa y that (i) i ∗ ∈ S j , and (ii) it is p ossible to obtain a h igh-accuracy estimate of D ( S j ) /D ( S j − 1 ) (and thus a high-accuracy estimate of D ( S j )). If the algorithm cannot construct such a set S j then it outputs UNKNO WN ; otherwise, after at most (essen tially) O (log N ) stages, it reac h es a situation wh ere S j = { i ∗ } and so the h igh-accuracy estimate of D ( S j ) = D ( i ∗ ) is the desired v alue. A n atural ﬁ rst idea tow ards im p lemen ting this high-lev el plan is simply to split S j − 1 randomly in to t w o pieces and use one of them as S j . How ev er this simple app r oac h may not w ork; for example, if S j − 1 has one or more elements which are very hea vy compared to i ∗ , then with a r andom sp lit it ma y not b e p ossible to eﬃcientl y estimate D ( S j ) /D ( S j − 1 ) as required in (ii) ab o v e. Th us we follo w a more careful app roac h which ﬁrst iden tiﬁes and remo v es “hea vy” elements fr om S j − 1 in eac h s tage. In m ore detail, du ring the j -th stage, the algorithm ﬁrst p erforms COND D queries on the set S j − 1 to identify a set H j ⊆ S j − 1 of “heavy” elements; this set essen tially consists of all ele- men ts whic h individu ally eac h contribute at least a κ fraction of the total mass D ( S j − 1 ). (Here κ is a “n ot-to o-small” quantit y bu t it is signiﬁcantly less th an ǫ. ) Next, the algorithm p erforms 18 additional COND D queries to estimate D ( i ∗ ) /D ( S j − 1 ) . If this fraction excee ds κ/ 20 then it is straigh tforward to estimate D ( i ∗ ) /D ( S j − 1 ) to high accuracy , so usin g ˆ D ( S j − 1 ) it is p ossib le to obtain a high-qualit y estimate of D ( i ∗ ) and the algorithm can conclude. How ev er, the typical case is that D ( i ∗ ) /D ( S j − 1 ) < κ/ 2 0 . In this case, the algorithm next estimates D ( H j ) /D ( S j − 1 ). If this is larger than 1 − ǫ/ 10 then the algorithm outputs UNKNO WN (see b elo w for m ore d iscus- sion of this). If D ( H j ) /D ( S j − 1 ) is less th an 1 − ǫ/ 10 th en D ( S j − 1 \ H j ) /D ( S j − 1 ) ≥ ǫ/ 10 (and so D ( S j − 1 \ H j ) /D ( S j − 1 ) can b e eﬃcien tly estimated to h igh accuracy), but eac h elemen t k of S j − 1 \ H j has D ( k ) /D ( S j − 1 ) ≤ κ ≪ ǫ/ 10 ≤ D ( S j − 1 \ H j ) /D ( S j − 1 ) . Thus it m ust b e the case th at the weigh t under D of S j − 1 \ H j is “spread out” o v er many “ligh t” elemen ts. Giv en that th is is th e situation, the algorithm n ext c h o oses S ′ j to b e a rand om subset of S j − 1 \ ( H j ∪ { i ∗ } ), and sets S j to b e S ′ j ∪ { i ∗ } . It can b e shown that with h igh p robabilit y (o ver the random c h oice of S j ) it will b e the case that D ( S j ) ≥ 1 3 D ( S j − 1 \ H j ) (this relies crucially on the fact that the w eight under D of S j − 1 \ H j is “spread out” o ver man y “light ” elements). This mak es it p ossib le to eﬃcientl y estimate D ( S j ) /D ( S j − 1 \ H j ); together with the high-accuracy estimate of D ( S j − 1 \ H j ) /D ( S j − 1 ) noted ab ov e, and th e h igh-accuracy estimate ˆ D ( S j − 1 ) of D ( S j − 1 ), this means it is p ossible to eﬃcien tly estimate D ( S j ) to high accuracy as required for the next stage. (W e note that after deﬁning S j but b efore pro ceeding to the next stage, the algorithm actually c h ecks to b e sure that S j con tains at least one p oin t that was r eturned from the COND D ( S j − 1 ) calls m ad e in the past s tage. Th is c hec k ensu res that whenever the algorithm calls COND D ( S ) on a set S , it is guaran teed that D ( S ) > 0 as requ ired b y ou r COND D mo del. Ou r analysis sho ws that doing th is chec k d o es not aﬀect correctness of the algorithm sin ce with high p r obabilit y the c hec k alw ays passes.) In tuition for the analysis. W e require some deﬁnitions to giv e the in tuition for the analysis establishing correctness. Fix a nonempt y subset S ⊆ [ N ]. Let π S b e the b ijection mapping { 1 , . . . , | S |} to S in such a wa y that D S ( π S (1)) > · · · > D S ( π S ( | S | )), i.e. π S (1) , . . . , π S ( | S | ) is a listing of the elemen ts of S in order fr om hea viest und er D S to ligh test u nder D S . Given j ∈ S , w e deﬁne the S -r ank of j , denoted rank S ( j ), to b e the v alue P i : D S ( π ( i )) ≤ D S ( j ) D S ( π ( i )), i.e. rank S ( j ) is the su m of the weig h ts (und er D S ) of all the elemen ts in S that are no heavie r than j u nder D S . Note that havi ng i ∗ / ∈ L ǫ,N implies th at rank [ N ] ( i ∗ ) ≥ ǫ . W e ﬁrst sketc h the argument for correctness. (It is easy to sho w that the algorithm only outputs F AIL with v ery small p robabilit y so w e ignore this p ossibilit y b elow.) Sup p ose ﬁrst that i ∗ / ∈ L ǫ,D . A key lemma s h o w s that if i ∗ / ∈ L ǫ,D (and hence rank [ N ] ( i ∗ ) ≥ ǫ ), then with high pr ob ab ility eve ry set S j − 1 constructed b y the algorithm is su c h that rank S j − 1 ( i ∗ ) ≥ ǫ/ 2. (In other w ords, if i ∗ is not in itially among the ǫ -fraction (under D ) of light est elements, then it never “falls to o far” to b ecome part of the ǫ/ 2-fraction (un der D S j − 1 ) of light est elements for S j − 1 , for any j .). Giv en that ( with h igh p r obabilit y) i ∗ alw ays h as rank S j − 1 ( i ∗ ) ≥ ǫ/ 2, th ough, then it must b e the case that ( w ith high probabilit y) th e p r o cedure do es n ot output UNKNO WN (and hence it must with high p robabilit y outp ut a n umerical v alue). This is b ecause there are only tw o places wh ere the pro cedur e can output UNKNO WN , in Lines 14 and 19 ; we consid er b oth cases b elo w. 1. In order for the pro cedur e to output U NKNO WN in Line 14 , it must b e the case that the elemen ts of H j – eac h of whic h individually has wei gh t at least κ/ 2 un der D S j − 1 – collectiv ely ha ve weig h t at least 1 − 3 ǫ/ 20 u nder D S j − 1 b y Line 13 . But i ∗ has weigh t at most 3 κ/ 40 u nder 19 D S j − 1 (b ecause the pro cedure did not go to Line 2 in Line 10 ), and thus i ∗ w ould need to b e in the b ottom 3 ǫ/ 20 of the ligh test elemen ts, i.e. it would need to ha v e rank S j − 1 ( i ∗ ) ≤ 3 ǫ/ 20; but this con tradicts rank S j − 1 ( i ∗ ) ≥ ǫ/ 2. 2. In order for the pro cedu r e to outp u t UNKNO WN in Line 19 , it m ust b e the case that all elemen ts i 1 , . . . , i m dra wn in Line 6 are not chosen for inclusion in S j . In order for the algorithm to reac h Lin e 19 , though, it m ust b e the case th at at least ( ǫ/ 10 − κ/ 20) m of these draws do not b elong to H j ∪ { i ∗ } ; since these dr a ws do n ot b elong to H j eac h one o ccurs only a small n umber of times among th e m draws, so th ere must b e many distinct v alues, and h ence the probabilit y th at none of th ese distinct v alues is chose n f or in clusion in S ′ j is ve ry lo w . Th us w e ha v e seen th at if i ∗ / ∈ L ǫ,D , then with high p robabilit y the pro cedur e outpu ts a n umerical v alue; it r emains to sho w that with high probabilit y this v alue is a high-accuracy estimate of D ( i ∗ ) . Ho w ev er , this f ollo ws easily from the fact that w e indu ctivel y main tain a high- qualit y estimate of D ( S j − 1 ) and the fact that the algorithm ultimately constru cts its estimate of ˆ D ( i ∗ ) only when it additionally has a high-qualit y estimate of D ( i ∗ ) /D ( S j − 1 ) . This f act also handles the case in whic h i ∗ ∈ L ǫ,D – in such a case it is allo wable for the algorithm to output U NKNO WN , so since th e algorithm w ith high pr obabilit y outputs a high-accuracy estimate when it outpu ts a n umerical v alue, this m eans th e algorithm p erf orms as required in Case (ii) of Deﬁnition 3 . W e no w sk etch the argumen t for qu ery complexit y . W e will sho w that the hea vy elemen ts can b e iden tiﬁed in eac h stage usin g p oly(log N , 1 /ǫ ) qu er ies. Since th e algorithm constru cts S j b y taking a rand om subs et of S j − 1 (toget her with i ∗ ) at eac h stage, the num b er of stages is easily b ounded b y (essenti ally) O (log N ) . Sin ce the ﬁnal probabilit y estimate f or D ( i ∗ ) is a pro d uct of O (log N ) conditional probabilities, it suﬃces to estimate eac h of these conditional probabilities to w ithin a m ultiplicativ e factor of (1 ± O  ǫ log N  ) . W e show that eac h conditional probabilit y estimate can b e carried out to this r equired pr ecision using only p oly(log N , 1 /ǫ ) calls to COND D ; giv en this, the o verall p oly(log N , 1 /ǫ ) query b oun d follo ws straightfo rwa rdly . No w w e en ter in to the actual pro of. W e b egin our analysis with a simple bu t usefu l lemma ab out the “hea vy” elemen ts identiﬁed in Line 7 . Lemma 4 With pr ob ability at le ast 1 − δ / 9 , every set H j that is ever c onstructe d in Line 7 satisﬁes the fol lowing for al l ℓ ∈ S j − 1 : (i) If D ( ℓ ) /D ( S j − 1 ) > κ , then ℓ ∈ H j ; (ii) If D ( ℓ ) /D ( S j − 1 ) < κ/ 2 then ℓ / ∈ H j . Pro of: Fix an iteration j . By Line 7 in th e algorithm, a p oin t ℓ is included in H j if it app ears at least 3 4 κm times among i 1 , . . . , i m (whic h are th e outpu t of COND D queries on S j − 1 ). F or th e ﬁrst item, ﬁx an element ℓ such that D ( ℓ ) /D ( S j − 1 ) > κ . Recall that m = Ω( M 2 log( M /δ ) / ( ǫ 2 κ )) = Ω(log( M N /δ ) /κ ) (since M = Ω(log ( N ))). By a multiplicati v e Chernoﬀ b ound, the probabilit y (o ver the choice of i 1 , . . . , i m in S j − 1 ) that ℓ app ears less than 3 4 κm times among i 1 , . . . , i m (that is, less than 3 / 4 times the lo w er b ound on the exp ected v alue) is at most δ/ (9 M N ) (for an appropriate constan t in the setting of m ). On the other hand, for eac h ﬁ xed ℓ su ch that D ( ℓ ) /D ( S j − 1 ) < κ/ 2, 20 Algorithm 3: Appro x-Ev al-Simula tor Input: access to COND D ; p arameters 0 < ǫ, δ < 1; inpu t elemen t i ∗ ∈ [ N ] 1: Set S 0 = [ N ] and ˆ D ( S 0 ) = 1 . Set M = log N + log (9 /δ ) + 1 . S et κ = Θ( ǫ/ ( M 2 log( M /δ ))) . 2: for j = 1 to M do 3: if | S j − 1 | = 1 t hen 4: return ˆ D ( S j − 1 ) (and exit) 5: end if 6: P erform m = Θ(max { M 2 log( M /δ ) / ( ǫ 2 κ ) , log ( M / ( δκ )) /κ 2 } ) COND D queries on S j − 1 to obtain p oints i 1 , . . . , i m ∈ S j − 1 . 7: Let H j = { k ∈ [ N ] : k app ears at least 3 4 κm times in the list i 1 , . . . , i m } 8: Let ˆ D S j − 1 ( i ∗ ) d en ote th e fr action of times that i ∗ app ears in i 1 , . . . , i m 9: if ˆ D S j − 1 ( i ∗ ) ≥ κ 20 then 10: Set S j = { i ∗ } , set ˆ D ( S j ) = ˆ D S j − 1 ( i ∗ ) · ˆ D ( S j − 1 ), increment j , and go to Line 2 . 11: end if 12: Let ˆ D S j − 1 ( H j ) d en ote the fr action of elemen ts among i 1 , . . . , i m that b elong to H j . 13: if ˆ D S j − 1 ( H j ) > 1 − ǫ/ 10 t hen 14: return UNKNO WN (and exit) 15: end if 16: Set S ′ j to b e a uniform r andom su b set of S j − 1 \ ( H j ∪ { i ∗ } ) and set S j to b e S ′ j ∪ { i ∗ } . 17: Let ˆ D S j − 1 ( S j ) d en ote the fraction of elemen ts among i 1 , . . . , i m that b elong to S j 18: if ˆ D S j − 1 ( S j ) = 0 then 19: return UNKNO WN (and exit) 20: end if 21: Set ˆ D ( S j ) = ˆ D S j − 1 ( S j ) · ˆ D ( S j − 1 ) 22: end for 23: Output F AIL . 21 the p robabilit y that ℓ app ears at least 3 4 κm times (that is, at least 3 / 2 times the upp er b oun d on the exp ected v alue) is at most δ / (9 M N ) as w ell. The lemma f ollo ws b y taking a union b ound o ve r all (at most N ) p oin ts consid ered ab o v e and o v er all M settings of j . Next w e sh o w that w ith high pr obabilit y Algorithm Approx-E v al-Simula tor r eturns either UNKNO WN or a numerical v alue (as opp osed to outpu tting F AIL in Line 23 ): Lemma 5 F or any D , ǫ, δ and i ∗ , Algorithm Approx-E v al-Simula tor outputs F AIL with pr ob- ability at most δ / 9 . Pro of: Fix an y element i 6 = i ∗ . The probability (tak en only o v er the c h oice of the rand om su bset in eac h execution of Line 16 ) that i is placed in S ′ j in eac h of the ﬁ rst log N + log (9 /δ ) executions of Line 16 is at most δ 9 N . T aking a union b ound o ver all N − 1 p oin ts i 6 = i ∗ , the pr ob ab ility that an y p oin t other than i ∗ remains in S j − 1 through all of the ﬁrst log N + log (9 /δ ) executions of the outer “for” lo op is at most δ 9 . Assuming that this holds, then in th e execution of the outer “for” lo op when j = log N + log (9 /δ ) + 1, the algorithm w ill retur n ˆ D ( S j − 1 ) = ˆ D ( i ∗ ) in Line 4 . F or the rest of the analysis it will b e h elpful for u s to deﬁne s ev eral “desirable” eve nt s and sho w that they all hold with high p robabilit y : 1. Let E 1 denote th e even t that ev ery set H j that is ev er constructed in L in e 7 satisﬁes b oth prop erties (i) and (ii) stated in Lemma 4 . By Lemma 4 the ev ent E 1 holds with probabilit y at least 1 − δ / 9. 2. Let E 2 denote the ev ent that in ev ery execution of Line 9 , th e estimate ˆ D S j − 1 ( i ∗ ) is within an ad d itiv e ± κ 40 of the true v alue of D ( i ∗ ) /D ( S j − 1 ). By the choic e of m in Lin e 6 (i.e., usin g m = Ω(log( M /δ ) /κ 2 )), an additiv e Chern oﬀ b ound, and a union b ou n d o v er all iterations, the even t E 2 holds with probability at least 1 − δ/ 9. 3. Let E 3 denote the ev ent that if Line 10 is executed, the resu lting v alue ˆ D S j − 1 ( i ∗ ) lies in [1 − ǫ 2 M , 1 + ǫ 2 M ] D ( i ∗ ) /D ( S j − 1 ). Assuming that ev ent E 2 holds, if Line 10 is r eac hed then the true v alue of D ( i ∗ ) /D ( S j − 1 ) must b e at least κ/ 40, and consequent ly a multiplicat iv e Chernoﬀ b ound and the c hoice of m (i.e. using m = Ω( M 2 log( M /δ ) / ( ǫ 2 κ ))) together imply that ˆ D S j − 1 ( i ∗ ) lies in [1 − ǫ 2 M , 1 + ǫ 2 M ] D ( i ∗ ) /D ( S j − 1 ) except w ith f ailure p robabilit y at most δ / 9. 4. Let E 4 denote the eve nt that in eve ry execution of Lin e 12 , th e estimate ˆ D S j − 1 ( H j ) is within an additiv e error of ± ǫ 20 from the true v alue of D ( H j ) /D ( S j − 1 ). By the c h oice of m in Line 6 (i.e., u sing m = Ω (log( M /δ ) /ǫ 2 )) and an additiv e Ch er n oﬀ b ound, the ev en t E 4 holds with probabilit y at least 1 − δ / 9. The ab o ve argument s sh o w that E 1 , E 2 , E 3 and E 4 all hold with pr obabilit y at least 1 − 4 δ/ 9. Let E 5 denote the ev ent that in ev ery execution of Line 16 , the set S ′ j whic h is dra wn s atisﬁes D ( S ′ j ) /D ( S j − 1 \ ( H j ∪ { i ∗ } )) ≥ 1 / 3. The follo wing lemma sa ys that cond itioned on E 1 through E 4 all holding, ev en t E 5 holds with high p robabilit y: 22 Lemma 6 Conditione d on E 1 thr ough E 4 the pr ob ability that E 5 holds is at le ast 1 − δ / 9 . Pro of: Fix a v alue of j and consider the j -th iteratio n of Line 16 . Since even ts E 2 and E 4 hold, it must b e the case that D ( S j − 1 \ ( H j ∪ { i ∗ } )) /D ( S j − 1 ) ≥ ǫ/ 40 . Since ev en t E 1 holds, it m ust b e the case that every i ∈ ( S j − 1 \ ( H j ∪ { i ∗ } )) h as D ( i ) /D ( S j − 1 ) ≤ κ. No w since S ′ j is c hosen by indep en d en tly in clud ing eac h elemen t of S j − 1 \ ( H j ∪ { i ∗ } ) with pr obabilit y 1 / 2, we can apply the ﬁrst p art of Corollary 3 and get Pr  D ( S ′ j ) < 1 3 D ( S j − 1 \ ( H j ∪ { i ∗ } ))  ≤ ǫ − 4 ǫ/ (40 · 9 · 4 κ ) < δ 9 M , where the last inequalit y follo ws by th e setting of κ = Ω( ǫ/ ( M 2 log(1 /δ ))). Th us we ha v e established that E 1 through E 5 all hold with pr obabilit y at least 1 − 5 δ/ 9. Next, let E 6 denote the ev ent that the algorithm n ev er returns UNKNO WN and exits in Line 19 . Our next lemma sh o ws that conditioned on ev ents E 1 through E 5 , the probabilit y of E 6 is at least 1 − δ / 9: Lemma 7 Conditione d on E 1 thr ough E 5 the pr ob ability that E 6 holds is at le ast 1 − δ / 9 . Pro of: Fix an y iteration j of the outer “F or” lo op. In order for the algorithm to reac h Line 18 in this iteration, it must b e the case (by Lines 9 and 13 ) that at least ( ǫ/ 10 − κ/ 20) m > ( ǫ/ 20) m p oints in i 1 , . . . , i m do n ot b elong to H j ∪ { i ∗ } . S ince eac h p oin t not in H j app ears at most 3 4 κm times in the list i 1 , . . . , i m , there must b e at least ǫ 15 κ distinct suc h v alues. Hence th e probability that none of th ese v alues is selecte d to b elong to S ′ j is at most 1 / 2 ǫ/ (15 κ ) < δ / (9 M ) . A union b ound o ver all (at most M ) v alues of j giv es that the probability the algorithm ev er return s UNKNOWN and exits in Lin e 19 is at most δ / 9, so the lemma is prov ed. No w let E 7 denote the ev ent that in every execution of Line 17 , the estimate ˆ D S j − 1 ( S j ) lies in [1 − ǫ 2 M , 1 + ǫ 2 M ] D ( S j ) /D ( S j − 1 ) . The follo wing lemma says that conditioned on E 1 through E 5 , ev ent E 7 holds with probability at least 1 − δ/ 9: Lemma 8 Conditione d on E 1 thr ough E 5 , the pr ob ability that E 7 holds is at le ast 1 − δ / 9 . Pro of: Fix a v alue of j and consider the j -th iteration of Line 17 .The exp ected v alue of ˆ D S j − 1 ( S j ) is precisely D ( S j ) D ( S j − 1 ) = D ( S j ) D ( S j − 1 \ ( H j ∪ { i ∗ } )) · D ( S j − 1 \ ( H j ∪ { i ∗ } )) D ( S j − 1 ) . (15) Since eve n ts E 2 and E 4 hold we hav e that D ( S j − 1 \ ( H j ∪{ i ∗ } )) D ( S j − 1 ) ≥ ǫ/ 40, and since ev ent E 5 holds we ha ve that D ( S j ) D ( S j − 1 \ ( H j ∪{ i ∗ } )) ≥ 1 / 3 (note that D ( S j ) ≥ D ( S ′ j )). Thus we hav e that ( 15 ) is at least ǫ/ 120 . Recalling the v alue of m (i.e., u sing m = Ω( M 2 log( M /δ ) /ǫ 2 κ ) = Ω( M 2 log( M /δ ) /ǫ 3 )) a m ultiplicativ e Chernoﬀ b oun d give s that indeed ˆ D S j − 1 ( S j ) ∈ [1 − ǫ 2 M , 1 + ǫ 2 M ] D ( S j ) /D ( S j − 1 ) with failure p robabilit y at most δ / (9 M ) . A un ion b ound o v er all M p ossible v alues of j ﬁ nishes the pro of. 23 A t this p oint we hav e established that ev en ts E 1 through E 7 all hold with pr obabilit y at least 1 − 7 δ / 9. W e can no w argue that eac h estimate ˆ D ( S j ) is in d eed a h igh-accuracy estimate of the tru e v alue D ( S j ): Lemma 9 With pr ob ability at le ast 1 − 7 δ / 9 e ach estimate ˆ D ( S j ) c onstructe d by Appro x-Ev al- Simula tor lies in [(1 − ǫ 2 M ) j , (1 + ǫ 2 M ) j ] D ( S j ) . Pro of: W e pro v e the lemma by sh o win g that if all even ts E 1 through E 7 hold, then the follo wing claim (d enoted (*)) h olds: eac h estimate ˆ D ( S j ) constructed b y Appro x-Ev al-Simula tor lies in [(1 − ǫ 2 M ) j , (1 + ǫ 2 M ) j ] D ( S j ). Thus for the rest of the pr o of we assume that ind eed all ev en ts E 1 through E 7 hold. The claim (*) is clearly tru e for j = 0. W e p ro v e (*) by induction on j assuming it holds for j − 1 . The only places in the algorithm where ˆ D ( S j ) may b e set are Lin es 10 and 21 . If ˆ D ( S j ) is set in Line 21 th en (*) f ollo ws fr om th e ind uctiv e claim for j − 1 and Lemma 8 . If ˆ D ( S j ) is set in Line 10 , th en (*) follo ws f r om the indu ctiv e claim for j − 1 and the fact that ev en t E 3 holds. This concludes th e pro of of the lemma. Finally , we requ ire the f ollo wing crucial lemma wh ich establishes that if i ∗ / ∈ L ǫ,N (and h ence the in itial r ank rank [ N ] of i ∗ is at least ǫ ), then with v ery high probabilit y the rank of i ∗ nev er b ecomes to o lo w dur in g the execution of th e algorithm: Lemma 10 Supp ose i ∗ / ∈ L ǫ,N . Then with pr ob ability at le ast 1 − δ / 9 , e very set S j − 1 c onstructe d by the algorithm has rank S j − 1 ( i ∗ ) ≥ ǫ/ 2 . W e pr o ve Lemma 10 in Section 3.3.3 b elo w. With these pieces in place w e are ready to pr o ve Theorem 4 . Pro of of Theorem 4 : It is straigh tforw ard to verify that algorithm Ap pro x -Ev al-Simula to r has the claimed q u ery complexit y . W e now argue th at Appro x-Ev al-Simula tor meets the t w o requirement s (i) and (ii) of Deﬁnition 3 . Throughout the discuss ion b elo w we assume that all the “fa vorable even ts” in the ab o ve analysis (i.e. ev en ts E 1 through E 7 , Lemma 5 , and Lemma 10 ) indeed h old as desired (incurrin g an ov erall failure probab ility of at m ost δ ). Supp ose ﬁr st that i ∗ / ∈ L ǫ,D . W e claim that b y Lemma 10 it must b e the case that th e algorithm do es not return UNKNO W N in Line 14 . T o v erify this, observ e th at in ord er to reac h Line 14 it would n eed to b e the case that D ( i ∗ ) /D ( S j − 1 ) ≤ 3 κ/ 40 (so the algorithm do es not instead go to Line 22 in Line 10 ). Since by Lemma 4 ev ery element k in H j satisﬁes D ( k ) /D ( S j − 1 ) ≥ κ/ 2, this means that i ∗ do es n ot b elong to H j . I n order to reac h Line 14 , by even t E 4 w e must ha v e D ( H j ) /D ( S j − 1 ) ≥ 1 − 3 ǫ/ 2 0. Since ev ery elemen t of H j has more mass un der D (at least κ/ 2) than i ∗ (whic h has at most 3 κ/ 40), this would imply that rank S j − 1 ( i ∗ ) ≤ 3 ǫ/ 20, con trad icting Lemma 10 . F urthermore, by Lemma 7 it m ust b e the case th at the algorithm do es not return UNKNO WN in Line 19 . T h us the algorithm terminates by return in g an estimate ˆ D ( S j ) = ˆ D ( i ∗ ) wh ic h , by Lemma 9 , lies in [(1 − ǫ 2 M ) j , (1 + ǫ 2 M ) j ] D ( i ∗ ). S ince j ≤ M this estimate lies in [1 − ǫ, 1 + ǫ ] D ( i ∗ ) as requ ired. 24 No w su pp ose that i ∗ ∈ L ǫ,D . By Lemma 5 we may assume th at the algorithm either outputs UNKNO WN o r a numerical v alue. As ab o v e, Lemma 9 imp lies that if the algorithm outp uts a n umerical v alue then the v alue lies in [1 − ǫ, 1 + ǫ ] D ( i ∗ ) as desir ed . This concludes th e pro of of Theorem 4 . 3.3.3 Pro of of Lemma 10 . The k ey to provi ng Lemma 10 will b e p r o vin g the next lemma. (In the follo w ing, for S a set of real n umbers we write su m( S ) to denote P α ∈ S α. ) Lemma 11 Fix 0 < ǫ ≤ 1 / 40 . Set κ = Ω( ǫ/ ( M 2 log(1 /δ ))) . L et T = { α 1 , . . . , α n } b e a set of values α 1 < · · · < α n such that sum( T ) = 1 . Fix ℓ ∈ [ N ] and let T L = { α 1 , . . . , α ℓ } and let T R = { α ℓ +1 , . . . , α n } , so T L ∪ T R = T . Assume that su m( T L ) ≥ ǫ/ 2 and that α ℓ ≤ κ/ 10 . Fix H to b e any subset of T satisfying the fol lowing two pr op erties: (i) H includes every α j such that α j ≥ κ ; and (ii) H includes no α j such that α j < κ/ 2 . (N ote that c onse quently H do es not interse ct T L . ) L et T ′ b e a subset of ( T \ ( H ∪ { α ℓ } ) sele cte d uniformly at r andom. L et T ′ L = T ′ ∩ T L and let T ′ R = T ′ ∩ T R . Then we have the fol lowing: 1. If sum( T L ) > 20 ǫ , then with pr ob ability at le ast 1 − δ / M (over the r andom choic e of T ′ ) it holds that sum( T ′ L ∪ { α ℓ } ) sum( T ′ ∪ { α ℓ } ) ≥ 9 ǫ ; 2. If ǫ/ 2 ≤ su m( T L ) < 20 ǫ , then with pr ob ability at le ast 1 − δ / M (o ver the r andom choic e of T ′ ) it holds that sum( T ′ L ∪ { α ℓ } ) sum( T ′ ∪ { α ℓ } ) ≥ su m( T L ) (1 − ρ ) , wher e ρ = ln 2 M . Pro of of Lemma 10 using Lemma 11 : W e ap p ly Lemma 11 rep eatedly at eac h iteration j of the outer “F or” lo op. The set H of Lemma 11 corresp onds to the set H j of “hea vy” elemen ts that are r emo ved at a giv en iteration, the set of v alues T corresp onds to the v alues D ( i ) /D ( S j − 1 ) for i ∈ S j − 1 , and the element α ℓ of Lemma 11 corresp onds to D ( i ∗ ) /D ( S j − 1 ) . The v alue sum( T L ) corresp onds to rank S j − 1 ( i ∗ ) and the v alue sum( T ′ L ∪ { α ℓ } ) sum( T ′ ∪ { α ℓ } ) corresp onds to rank S j ( i ∗ ) . Observ e that since i ∗ / ∈ L ǫ,N w e kn o w that in itially rank [ N ] ( i ∗ ) ≥ ǫ , whic h means that the ﬁ rst time we apply Lemma 11 (with T = { D ( i ) : i ∈ [ N ] } ) we h a ve sum( T L ) ≥ ǫ . 25 By Lemma 11 the p robabilit y of failure in any of the (at most M ) iterations is at most δ / 9, so w e assum e that there is neve r a failure. Consequ en tly for all j w e hav e that if rank S j − 1 ( i ∗ ) ≥ 20 ǫ then rank S j ( i ∗ ) ≥ 9 ǫ , and if ǫ/ 2 ≤ rank S j − 1 ( i ∗ ) < 20 ǫ then rank S j ( i ∗ ) ≥ rank S j ( i ∗ ) · (1 − ρ ) . Since rank S 0 ( i ∗ ) ≥ ǫ , it follo ws that for all j ≤ M we ha v e rank S j ( i ∗ ) ≥ ǫ · (1 − ρ ) M > ǫ/ 2 . Pro of of Lemma 11 . W e b egin with the follo w ing claim: Claim 12 With pr ob ability at le ast 1 − δ / (2 M ) (over the r andom choic e of T ′ ) it holds that sum( T ′ L ) ≥ 1 2 · sum( T L ) · (1 − ρ/ 2) . Pro of: R ecall from the setup that eve ry element α i ∈ T L satisﬁes α i ≤ κ/ 10, and sum( T L ) ≥ ǫ/ 2. Also recall that κ = Ω( ǫ/ ( M 2 log(1 /δ ))) and that ρ = ln 2 M , so that ρ 2 ǫ/ (6 κ ) ≥ ln(2 M /δ ) . The claim follo ws by applying th e ﬁ rst part of Corollary 3 (with γ = ρ/ 2) . P art (1) of Lemm a 11 is an immediate consequ ence of Claim 12 , since in p art (1) we ha v e sum( T ′ L ∪ { α ℓ } ) sum( T ′ ∪ { α ℓ } ) ≥ su m( T ′ L ) ≥ 1 2 · sum( T L ) ·  1 − ρ 2  ≥ 1 2 · 20 ǫ ·  1 − ρ 2  ≥ 9 ǫ. It r emains to p ro v e Part (2) of the lemma. W e will d o this using the follo w ing claim: Claim 13 Supp ose ǫ/ 2 ≤ sum( T L ) ≤ 20 ǫ. Then with pr ob ability at le ast 1 − δ / (2 M ) (over the r andom choic e of T ′ ) it holds that sum( T ′ R ) ≤ 1 2 sum( T R ) · (1 + ρ/ 2) . Pro of: O b serv e ﬁrs t that α i < κ for eac h α i ∈ T R \ H . W e consider t wo cases. If sum( T R \ H ) ≥ 4 ǫ , th en we app ly th e ﬁrst part of Corollary 3 to the α i ’s in T R \ H and get that Pr  sum( T ′ R ) > 1 2 sum( T R ) · (1 + ρ/ 2)  ≤ Pr  sum( T ′ R ) > 1 2 sum( T R \ H ) · (1 + ρ/ 2)  < exp( − ρ 2 sum( T R \ H ) / 24 κ ) (16) ≤ exp( − ρ 2 ǫ/ (6 κ )) ≤ δ 2 M (17) (recall fr om the pr o of of Claim 12 that ρ 2 ǫ/ (6 κ ) ≥ ln(2 M /δ )). If su m( T R \ H ) < 4 ǫ , (so that the exp ected v alue of su m( T ′ R ) is less than 2 ǫ ) th en we can apply the second part of C orollary 3 as we exp lain next. Ob serv e that by the pr emise of the lemma, sum( T R ) ≥ 1 − 20 ǫ whic h is at least 1 / 2 (recalling that ǫ is at most 1/40). Consequently , the ev ent “sum( T ′ R ) ≥ 1 2 · sum( T R ) · (1 + ρ/ 2)” implies th e eve nt “sum( T ′ R ) ≥ 1 4 ”, and by applying the second part of Corollary 3 we get Pr  sum( T ′ R ) > 1 2 sum( T R ) · (1 + ρ/ 2)  ≤ Pr  sum( T ′ R ) > 1 4  < 2 − 1 / 4 κ < δ 2 M , (18) as requ ired. 26 No w we can p ro v e Lemma 11 . Using Claims 12 and 13 we ha v e that with p r obabilit y at least 1 − δ / M , sum( T ′ L ) ≥ 1 2 · sum( T L ) · (1 − ρ/ 2) and sum( T ′ R ) ≤ 1 2 sum( T R ) · (1 + ρ/ 2); w e assum e that b oth these inequ alities h old going forth. Since sum( T ′ L ∪ { α ℓ } ) sum( T ′ ∪ { α ℓ } ) = sum( T ′ L ) + α ℓ sum( T ′ ) + α ℓ > sum( T ′ L ) sum( T ′ ) , it is suﬃcient to sho w that sum( T ′ L ) sum( T ′ ) ≥ su m( T L )(1 − ρ ); w e now sho w this. As sum( T ′ ) = sum( T ′ L ) + sum( T ′ R ), sum( T ′ L ) sum( T ′ ) = sum( T ′ L ) sum( T ′ L ) + sum( T ′ R ) = 1 1 + sum( T ′ R ) sum( T ′ L ) ≥ 1 1 + (1 / 2) · sum( T R ) · (1+ ρ/ 2) (1 / 2) · sum( T L ) · (1 − ρ/ 2) = sum( T L ) · (1 − ρ/ 2) sum( T L ) · (1 − ρ/ 2) + sum( T R ) · (1 + ρ/ 2) ≥ sum( T L ) · (1 − ρ/ 2) sum( T L ) · (1 + ρ/ 2) + sum( T R ) · (1 + ρ/ 2) = sum( T L ) · 1 − ρ/ 2 1 + ρ/ 2 > su m( T L ) · (1 − ρ ) . This concludes the pro of of Lemma 11 . 4 Algorithms and lo w er b oun ds for testing uniformit y 4.1 A ˜ O (1 /ǫ 2 ) -query PCOND algorithm for testing uniformit y In this su bsection we present an algorithm PCOND D -Test-Uniform and prov e the follo wing the- orem: Theorem 5 PCOND D -Test-Uniform i s a ˜ O (1 /ǫ 2 ) -query PCOND D testing algorithm for u nifor- mity, i.e. it outputs AC CEPT with pr ob ability at le ast 2 / 3 if D = U and outputs REJECT with pr ob ability at le ast 2 / 3 if d TV ( D , U ) ≥ ǫ. In tuition. F or the sak e of in tu ition w e ﬁrst describ e a simp ler app roac h that yields a ˜ O (1 /ǫ 4 )-query algorithm, and then b u ild on those id eas to obtain our real algorithm with its improv ed ˜ O (1 /ǫ 2 ) b ound . Fix D to b e a distribution o ver [ N ] that is ǫ -far from u n iform. Let H =  h ∈ [ N ]     D ( h ) ≥ 1 N  and L =  ℓ ∈ [ N ]     D ( ℓ ) < 1 N  . 27 It is easy to see that since D is ǫ -far from un if orm , we h a ve X h ∈ H  D ( h ) − 1 N  = X ℓ ∈ L  1 N − D ( ℓ )  ≥ ǫ 2 . (19) F rom this it is not h ard to show that (i) man y elements of [ N ] must b e “signiﬁcan tly ligh t” in the follo wing sens e: De ﬁne L ′ ⊆ L to b e L ′ =  ℓ ∈ [ N ]   D ( ℓ ) < 1 N − ǫ 4 N  . Then it m ust b e the case that | L ′ | ≥ ( ǫ/ 4) N . (ii) D places signiﬁ can t weig h t on elements that are “signiﬁcan tly hea vy” in the follo wing sense: Deﬁne H ′ ⊆ H to b e H ′ =  h ∈ [ N ]   D ( h ) ≥ 1 N + ǫ 4 N  . Then it must b e the case that D ( H ′ ) ≥ ( ǫ/ 4) . Using (i) and (ii) it is f airly straigh tforw ard to giv e a O (1 /ǫ 4 )-query PCOND D testing algo rithm as follo ws: w e can get a p oint in L ′ with high probability by randomly sampling O (1 /ǫ ) p oin ts uniformly at random from [ N ], and we can get a p oin t in H ′ with high probability by drawing O (1 /ǫ ) p oin ts f rom SAM P D . Then at least one of the O (1 /ǫ 2 ) pairs that h a ve one p oin t fr om the ﬁrst samp le and one p oint from the second w ill ha v e a multiplica tiv e factor diﬀerence of 1 + Ω( ǫ ) b et w een th e weig ht under D of the tw o p oints, and this can b e detected by calling the pro cedu re Comp ar e (see Subsection 3.1 ). S ince there are O (1 /ǫ 2 ) pairs and for eac h one the in v o cation of Comp ar e uses ˜ O (1 /ǫ 2 ) queries, the o v erall sample complexit y of this simple ap p roac h is ˜ O (1 /ǫ 4 ) . Our actual algorithm PCOND D -Test-Uniform for testing uniformity extends the ab o v e ideas to get a ˜ O (1 /ǫ 2 )-query algorithm. More precisely , th e algorithm w orks as follo ws: it ﬁr st draws a “reference s amp le” of O (1) p oin ts un iformly from [ N ]. Next, rep eatedly f or O  log 1 ǫ  iterations, the algorithm draws t wo other samples, one uniformly from [ N ] and th e other from SAM P D . (Th ese samples h a ve diﬀerent sizes at d iﬀeren t iterations; intuitiv ely , eac h iteration is meant to deal with a diﬀerent “scale” of pr ob ab ility mass that p oint s could h a ve un der D .) A t eac h iteration it then uses Comp are to do comparisons b etw een pairs of elemen ts, on e fr om the reference sample and the other from one of the tw o other samp les. If D is ǫ -far from unif orm , then with high pr obabilit y at some iteration the algorithm w ill either dra w a p oin t from SAMP D that h as “very big” mass under D , or draw a p oin t from the uniform distribu tion o ver [ N ] that has “v ery small” mass under D , and th is will b e detected by the comparisons to the referen ce p oints. Cho osing the samp le sizes and parameters for the Comp are call s carefully at eac h iteration yields the improv ed query b ound. Let m j denote the num b er of PCOND D queries used to run Comp are D in a giv en execution of Line 7 during the j -th iteration of the outer lo op. By th e setting of the parameters in eac h su ch call and Lemma 2 , m j = O  t ǫ 2 2 2 j  . It is easy to see that the algorithm only p erf orms PCOND D queries and that th e total num b er of queries that the algorithm p erform s is O   t X j =1 q · s j · m j   = O   t X j =1 2 j log  1 ǫ  · log( 1 ǫ ) ǫ 2 2 2 j   = O (log( 1 ǫ )) 2 ǫ 2 ! . W e pr o ve Theorem 5 by arguing completeness and sound ness b elo w. Completeness: Supp ose that D is the uniform d istribution. Then for any ﬁxed p air of p oints ( x, y ), Lemma 2 imp lies that the call to Comp are on { x } , { y } in Line 7 causes the algorithm to 28 Algorithm 4: PCOND D -Test-Uniform Input: error parameter ǫ > 0; qu ery access to PCOND D oracle 1: Set t = log ( 4 ǫ ) + 1. 2: Select q = Θ(1) p oin ts i 1 , . . . , i q indep en d en tly and un iformly from [ N ]. 3: for j = 1 to t do 4: Call the SAMP D oracle s j = Θ  2 j · t  times to obtain p oin ts h 1 , . . . , h s j distributed according to D . 5: Select s j p oints ℓ 1 , . . . , ℓ s j indep en d en tly and un iformly from [ N ] . 6: for all pairs ( x, y ) = ( i r , h r ′ ) and ( x, y ) = ( i r , ℓ r ′ ) (where 1 ≤ r ≤ q , 1 ≤ r ′ ≤ s j ) do 7: Call Comp a re D ( { x } , { y } , Θ( ǫ 2 j ) , 2 , exp( − Θ ( t ))). 8: if th e Comp are call do es n ot retur n a v alue in [1 − 2 j − 5 ǫ 4 , 1 + 2 j − 5 ǫ 4 ] t hen 9: output REJECT (and exit). 10: end if 11: end for 12: end for 13: Output ACCEPT output REJECT in Line 9 with probability at most e − Θ( t ) = p oly( ǫ ) . By taking a un ion b oun d o ve r all p oly(1 /ǫ ) pairs of p oints considered by the algorithm, the alg orithm will acce pt with probabilit y at least 2 / 3, as required. Soundness: Now supp ose th at D is ǫ -far fr om uniform (we assume thr ou gh ou t the analysis that ǫ = 1 / 2 k for some in teger k , which is clearly without loss of generalit y). W e d eﬁ ne H , L as ab o v e and fu rther partition H and L in to “buck ets” as f ollo ws: for j = 1 , . . . , t − 1 = log ( 4 ǫ ), let H j def =  h      1 + 2 j − 1 · ǫ 4  · 1 N ≤ D ( h ) <  1 + 2 j · ǫ 4  · 1 N  , and f or j = 1 , . . . , t − 2 let L j def =  ℓ      1 − 2 j · ǫ 4  · 1 N < D ( ℓ ) ≤  1 − 2 j − 1 · ǫ 4  · 1 N  . Also deﬁn e H 0 def =  h     1 N ≤ D ( h ) <  1 + ǫ 4  · 1 N  , L 0 def =  ℓ      1 − ǫ 4  · 1 N < D ( ℓ ) < 1 N  , and H t def =  h     D ( h ) ≥ 2 N  , L t − 1 def =  ℓ     D ( ℓ ) ≤ 1 2 N  . First obser ve that by the deﬁnition of H 0 and L 0 , we ha v e X h ∈ H 0  D ( h ) − 1 N  ≤ ǫ 4 and X ℓ ∈ L 0  1 N − D ( ℓ )  ≤ ǫ 4 . 29 Therefore (by Equation ( 19 )) we ha v e t X j =1 X h ∈ H j  D ( h ) − 1 N  ≥ ǫ 4 and t − 1 X j =1 X ℓ ∈ L j  1 N − D ( ℓ )  ≥ ǫ 4 . This imp lies that f or some 1 ≤ j ( H ) ≤ t , and some 1 ≤ j ( L ) ≤ t − 1, w e hav e X h ∈ H j ( H )  D ( h ) − 1 N  ≥ ǫ 4 t and X ℓ ∈ L j ( L )  1 N − D ( ℓ )  ≥ ǫ 4 t . (20) The rest of the analysis is d ivided into tw o cases dep end ing on whether | L | ≥ N 2 or | H | > N 2 . Case 1: | L | ≥ N 2 . In this case, with probability at least 99 / 100, in Line 2 the algorithm will select at least one p oin t i r ∈ L . W e consider t w o su b cases: j ( H ) = t , and j ( H ) ≤ t − 1. • j ( H ) = t : In this sub case, by Equation ( 20 ) w e ha v e that P h ∈ H j ( H ) D ( h ) ≥ ǫ 4 t . Th is implies that when j = j ( H ) = t = log( 4 ǫ ) + 1, so that s j = s t = Θ  t ǫ  , with probability at least 99 / 10 0 the algorithm selects a p oin t h r ′ ∈ H t in Line 4 . Assum e that indeed su c h a p oint h r ′ is selected. Since D ( h r ′ ) ≥ 2 N , while D ( i r ) < 1 N , Lemma 2 imp lies that with probability at least 1 − p oly( ǫ ) the Comp are call in Lin e 7 outputs either High or a v alue that is at least 7 12 = 1 2 + 1 12 . Since 7 12 > 1 2 + 2 j − 5 ǫ 4 for j = t , the algorithm will output REJECT in Line 9 . • j ( H ) < t : By Equation ( 20 ) and the d eﬁnition of th e bu c kets, we hav e X h ∈ H j ( H )   1 + 2 j ( H ) ǫ 4  1 N − 1 N  ≥ ǫ 4 t , implying that   H j ( H )   ≥ N 2 j ( H ) t so that D ( H j ( H ) ) ≥ 1 2 j ( H ) t . Th erefore, wh en j = j ( H ) so that s j = Θ  2 j ( H ) t  , w ith probabilit y at least 99 / 10 0 the algorithm will get a p oin t h r ′ ∈ H j ( H ) in Line 4 . Assume that indeed suc h a p oin t h r ′ is selected. Since D ( h r ′ ) ≥  1 + 2 j ( H ) − 1 ǫ 4  1 N , while D ( i r ) ≤ 1 N , for α j ( H ) = 2 j ( H ) − 1 ǫ 4 , we h a ve D ( h r ′ ) D ( i r ) ≥ 1 + α j ( H ) . Since Comp are is called in Line 7 on the pair { i r } , { h r ′ } with the “ δ ” paramete r set to Θ( ǫ 2 j ), with pr obabilit y 1 − p oly( ǫ ) the algorithm outputs REJECT as a result of this Comp are call. Case 2: | H | > N 2 . This pro ceeds similarly to C ase 1. In this case we ha ve that with high constan t pr ob ab ility the algorithm selects a p oint i r ∈ H in Line 2 . Here w e consider th e sub cases j ( L ) = t − 1 and j ( L ) ≤ t − 2. In the ﬁrst sub case we ha ve that P ℓ ∈ L t 1 N ≥ ǫ 4 t , so that | L t | ≥ ( ǫ 4 t ) N , and in the second case we ha v e that P ℓ ∈ L j ( L ) (2 j ( L ) ǫ 4 ) 1 N ≥ ǫ 4 t , so that   L j ( L )   ≥ N 2 j ( L ) t . Th e analysis of eac h sub case is similar to Case 1. This conclud es the pro of of Theorem 5 . 30 4.2 An Ω(1 /ǫ 2 ) lo wer b ound for COND D algorithms that test uniformit y In this subsection we giv e a lo wer b ound sho wing that th e query complexity of the PCOND D algorithm of the pr evious subsection is essential ly optimal, even for algorithms that ma y m ake general COND D queries: Theorem 6 Any COND D algorithm for testing whether D = U v ersus d TV ( D , U ) ≥ ǫ must make Ω(1 /ǫ 2 ) querie s. The high-lev el idea b eh ind Theorem 6 is to r educe it to the well-kno wn fact that d istin gu ish ing a f air coin from a ( 1 2 + 4 ǫ )-biased coin requires Ω  1 ǫ 2  coin tosses. W e show that an y q -query algorithm COND D testing algorithm A can b e transf ormed in to an algorithm A ′ that successfully distinguishes q tosses of a fair coin f rom q tosses of a ( 1 2 + 4 ǫ )-biased coin. Pro of of Theorem 6 : First n ote that we may assu me without loss of generalit y that 0 < ǫ ≤ 1 / 8 . Let A b e any q -query algorithm that mak es COND D queries and tests wh ether D = U v ers us d TV ( D , U ) ≥ ǫ. W e ma y assume with ou t loss of generalit y that in eve ry p ossible execution algorithm A mak es precisely q queries (this will b e con v en ient later). Let D No b e th e distr ib ution that has D No ( i ) = 1+2 ǫ N for eac h i ∈  1 , N 2  and has D No ( i ) = 1 − 2 ǫ N for eac h i ∈  N 2 + 1 , N  . (Th is is the “no”-distribution for our lo wer b ound ; it is ǫ -far in v ariation distance fr om the uniform d istribution U . ) By Deﬁnition 2 , it m ust b e the case that Z :=    Pr h A COND D No outputs ACCEPT i − Pr h A COND U outputs ACCEPT i    ≥ 1 / 3 . The pro of wo rks by sh o w ing that giv en A as describ ed ab ov e, there m ust exist an algorithm A ′ with the follo wing prop erties: A ′ is giv en as inpu t a q -bit string ( b 1 , . . . , b q ) ∈ { 0 , 1 } q . Let D 0 denote the uniform distribution o v er { 0 , 1 } q and let D 4 ǫ denote the d istribution o ver { 0 , 1 } q in whic h eac h co ordinate is ind ep endently set to 1 with probability 1 / 2 + 4 ǫ . T hen algorithm A ′ has   Pr b ∼ D 0 [ A ′ ( b ) outpu ts ACCE PT ] − | Pr b ∼ D 4 ǫ [ A ′ ( b ) outpu ts ACCE PT ]   = Z . (21) Giv en ( 21 ), by the data p ro cessing inequ alit y for total v ariation distance ( Lemma 1 ) we ha v e that Z ≤ d TV ( D 0 , D 4 ǫ ) . It is easy to see that d TV ( D 0 , D 4 ǫ ) is p r ecisely equ al to the v ariation distance d TV (Bin( q , 1 / 2) , Bin ( q , 1 / 2 + 4 ǫ )) . Ho wev er, in ord er for the v ariation distance b et we en these t wo binomial d istributions to b e as large as 1 / 3 it m ust b e the case th at q ≥ Ω(1 /ǫ 2 ): F a ct 14 (Distinguishing F air from Biased Coin) Supp ose m ≤ c ǫ 2 , with c a suﬃci e ntly smal l c onstant and ǫ ≤ 1 / 8 . Then, d TV  Bin  m, 1 2  , Bin  m, 1 2 + 4 ǫ  ≤ 1 3 . ( F act 14 is w ell kno wn; it follo w s , for example, as an immediate consequence of Equations (2.15) and (2 .16) of [ AJ06 ].) Thus to pro v e Theorem 6 it remains only to describ e algo rithm A ′ and pro ve Equation ( 21 ). 31 As suggested ab o ve, algorithm A ′ uses algorithm A ; in ord er to d o this, it m ust p erfectly sim ulate the COND D oracle that A r equires, b oth in the case when D = U and in the case when D = D No . W e sh o w b elo w that when its input b is dra wn from D 0 then A ′ can p erf ectly simula te the execution of A when it is run on the COND U oracle, and when b is d ra wn from D 4 ǫ then A ′ can p erfectly sim ulate the execution of A when it is run on the COND D No oracle. Fix any step 1 ≤ t ≤ q . W e no w describ e ho w A ′ p erfectly simulate s the t -th step of the execution of A (i.e. the t -th call to COND D that A mak es, and the resp onse of COND D ). W e ma y inductiv ely assu me that A ′ has p erfectly simula ted the ﬁrs t t − 1 steps of the execution of A . F or eac h p ossible preﬁx of t − 1 qu ery-resp onse pairs to COND D PREFIX = (( S 1 , s 1 ) , ..., ( S t − 1 , s t − 1 )) (where eac h S i ⊆ [ N ] and eac h s i ∈ S i ), there is some distr ib ution P A, PREFIX o ver p ossible t -th query sets S t that A wo uld mak e giv en that its ﬁ r st t − 1 qu ery-resp onse pairs w ere PREFIX . So for a set S t ⊆ [ N ] and a p ossible p reﬁx P REFIX , the v alue P A, PREFIX ( S t ) is the probability that algorithm A, ha ving had the transcript of its execution th us far b e PREFIX , generates s et S t as its t -th query set. F or any qu ery set S ⊆ [ N ], let us write S as a disj oin t union S = S 0 ∐ S 1 , where S 0 = S ∩  1 , N 2  and S 1 = S ∩ [ N 2 + 1 , N ]. W e may assum e that every query S ev er u sed by A h as | S 0 | , | S 1 | ≥ 1 (for otherwise A could p erf ectly simulat e th e resp onse of COND D ( S ) whether D we re U or D No b y simply choosing a un iform p oint fr om S , so there would b e no n eed to call COND D on suc h an S ). Thus we may assume that P A, PREFIX ( S ) is nonzero on ly for sets S that ha ve | S 0 | , | S 1 | ≥ 1 . Consider the bit b t ∈ { 0 , 1 } . As noted ab ov e, we inductiv ely hav e that (whether D is U or D No ) the algorithm A ′ has p erfectly simula ted the execution of A for its ﬁrs t t − 1 quer y -r esp onse pairs; in this sim ulation s ome pr eﬁx PREFIX = (( S 1 , s 1 ) , . . . , ( S t − 1 , s t − 1 )) of query-resp onse pairs has b een constructed. If b = ( b 1 , . . . , b q ) is d istributed according to D 0 then PREFIX is distribu ted exactly according to the distr ib ution of A ’s pr eﬁxes of length t − 1 when A is r u n with COND U , and if b = ( b 1 , . . . , b q ) is distributed according to D 4 ǫ then the distribution of PREFIX is exactly the d istribution of A ’s preﬁ x es of length t − 1 when A is r un with COND D No . Algorithm A ′ sim ulates the t -th s tage of the execution of A as follo ws: 1. Randomly c ho ose a set S ⊆ [ N ] according to the distribution P A, PREFIX ; let S = S 0 ∐ S 1 b e the set that is selected. Let us write α ( S ) to denote | S 1 | / | S 0 | (so α ( S ) ∈ [2 / N , N/ 2]). 2. If b t = 1 then set the bit σ ∈ { 0 , 1 } to b e 1 with probability u t and to b e 0 with p robabilit y 1 − u t . If b t = 0 then set σ to b e 1 with probabilit y v t and to b e 0 with probabilit y 1 − v t . (W e sp ecify the exact v alues of u t , v t b elo w.) 3. Set s to b e a u niform random elemen t of S σ . Output the query-resp onse pair ( S t , s t ) = ( S, s ) . It is clear that S tep 1 ab o v e p erfectly s imulates the t -th query th at algorithm A would make (no matter wh at is the d istribution D ). T o sh o w that the t -th resp onse is simulated p erf ectly , we m ust show that 32 (i) if b t is u niform r andom o ver { 0 , 1 } then s is distribu ted exactly as it w ould b e distributed if A w ere b eing run on COND U and had just prop osed S as a query to COND U ; i.e. we m ust sho w that s is a uniform random elemen t of S 1 with probabilit y p ( α ) def = α α +1 and is a u n iform random element of S 0 with pr obabilit y 1 − p ( α ) . (ii) if b t ∈ { 0 , 1 } has Pr[ b t = 1] = 1 / 2 + 4 ǫ , then s is d istributed exactly as it would b e distrib uted if A were b eing run on COND D No and had ju s t p rop osed S as a query to COND U ; i.e. we m ust sho w that s is a uniform random elemen t of S 1 with probabilit y q ( α ) def = α α +(1+2 ǫ ) / (1 − 2 ǫ ) and is a u n iform random elemen t of S 0 with pr obabilit y 1 − q ( α ) . By (i), w e r equire that u t 2 + v t 2 = p ( α ) = α α + 1 , (22) and by (ii) we r equire that  1 2 + 4 ǫ  u t +  1 2 − 4 ǫ  v t = q ( α ) = α α + 1+2 ǫ 1 − 2 ǫ (23) It is straigh tforw ard to chec k that u t = α α + 1  1 − 1 2((1 − 2 ǫ ) α + 1 + 2 ǫ )  , v t = α α + 1  1 + 1 2((1 − 2 ǫ ) α + 1 + 2 ǫ )  satisfy the ab o ve equ ations, and that for 0 < α , 0 < ǫ ≤ 1 / 8 w e hav e 0 ≤ u t , v t ≤ 1. So ind eed A ′ p erfectly simula tes th e execution of A in all stages t = 1 , . . . , q . Finally , after simulating the t -th stage algorithm A ′ outputs whatev er is outpu t by its sim ulation of A , so Equation ( 21 ) indeed holds. This concludes the pro of of Theorem 6 . 5 T esting equiv alence to a kno wn distribution D ∗ 5.1 A p oly(log n, 1 /ǫ ) -query PCOND D algorithm In this subsection we p resen t an algo rithm PCOND -Test-Known and prov e the follo win g theorem: Theorem 7 PCOND -Test-Known is a ˜ O ((log N ) 4 /ǫ 4 ) -query PCOND D testing algorithm for test- ing e quivalenc e to a k nown distribution D ∗ . That is, for every p air of distributions D , D ∗ over [ N ] (such that D ∗ is ful ly sp e ciﬁe d and ther e is PCOND qu e ry ac c ess to D ) the algorithm outputs AC- CEPT with pr ob ability at le ast 2 / 3 if D = D ∗ and outputs REJECT with pr ob ability at le ast 2 / 3 if d TV ( D , D ∗ ) ≥ ǫ. In tuition. Let D ∗ b e a fully sp eciﬁed distribution, and let D b e a d istribution that ma y b e accessed via a PCOND D oracle. The high-leve l idea of the PCOND -Test-Known algorithm is the follo wing: As in the case of testing uniformity , w e shall try to “catc h ” a pair of p oints x, y suc h that D ( x ) D ( y ) diﬀers signiﬁcan tly from D ∗ ( x ) D ∗ ( y ) (so that cal ling Comp are D on { x } , { y } w ill rev eal this 33 diﬀerence). In the un iformit y case, w here D ∗ ( z ) = 1 / N for ev er y z (so that D ∗ ( x ) D ∗ ( x )+ D ∗ ( y ) = 1 / 2), to get a p oly(1 /ǫ )-query algorithm it was suﬃcien t to sho w that sampling Θ(1 /ǫ ) p oints uniformly (i.e., according to D ∗ ) with high pr ob ab ility y ields a p oin t x for which D ( x ) < D ∗ ( x ) − Ω( ǫ/ N ), and that sampling Θ(1 /ǫ ) p oin ts fr om SAMP D with high p r obabilit y yields a p oint y for whic h D ( x ) > D ∗ ( y ) + Ω( ǫ/ N ). Ho w ev er, for general D ∗ it is not suﬃcient to get such a pair b ecause it is p ossible that D ∗ ( y ) could b e m uch large r than D ∗ ( x ). If this were the case then it migh t happ en that b oth D ∗ ( x ) D ∗ ( y ) and D ( x ) D ( y ) are very small, so calling Comp a re D on { x } , { y } cannot eﬃcien tly demonstrate that D ∗ ( x ) D ∗ ( y ) diﬀers fr om D ( x ) D ( y ) . T o addr ess this issue w e partition the p oints in to O (log N /ǫ ) “bu c kets” so that within eac h buc k et all p oints hav e similar p robabilit y according to D ∗ . W e show that if D is ǫ -far from D ∗ , then either the probabilit y w eigh t of one of these b uc k ets according to D d iﬀers signiﬁcan tly fr om what it is according to D ∗ (whic h can b e observed by sampling from D ), or w e can get a p air { x, y } that b e long to the same bucke t and for whic h D ( x ) is suﬃciently smaller than D ∗ ( x ) and D ( y ) is suﬃcien tly large r than D ∗ ( y ). F or suc h a pair Comp are will eﬃcien tly giv e evidence that D diﬀers from D ∗ . The algorithm and its analysis. W e deﬁne some quantitie s that are used in the algorithm and its analysis. Let η def = ǫ/c for some suﬃcien tly large constant c that will b e determin ed later. As describ ed ab o v e we partition the domain elemen ts [ N ] in to “buc ke ts” according to their probabilit y w eigh t in D ∗ . Sp eciﬁcal ly , for j = 1 , . . . , ⌈ log ( N/η ) + 1 ⌉ , w e let B j def = { x ∈ [ N ] : 2 j − 1 · η / N ≤ D ∗ ( x ) < 2 j · η / N } (24) and we let B 0 def = { x ∈ [ N ] : D ∗ ( x ) < η / N } . Let b def = ⌈ log( N /η ) + 1 ⌉ + 1 denote the n umber of buc k ets. W e fu rther deﬁn e J h def = { j : D ∗ ( B j ) ≥ η /b } to denote the set of ind ices of “heavy” buck ets, and let J ℓ def = { j : D ∗ ( B j ) < η /b } denote the set of ind ices of “light ” b u c kets. Note that w e h av e X j ∈ J ℓ ∪{ 0 } D ∗ ( B j ) < 2 η. (25) The query complexit y of th e algorithm is dominated by the n umber of PCOND D queries p er- formed in the executions of Comp ar e , which by Lemma 2 is u pp er b oun ded by O ( s 2 · b 2 · (log s ) /η 2 ) = O (log N ǫ ) 4 · log  (log N ǫ ) /ǫ  ǫ 4 ! . W e argue completeness an d soun d ness b elo w. Completeness: Supp ose that D = D ∗ . Since th e exp ected v alue of b D ( B j ) (deﬁn ed in Lin e 3 ) is precisely D ∗ ( B j ), for an y ﬁx ed v alue of j ∈ { 0 , . . . , ⌈ log ( N /η ) + 1 ⌉} an additive Chern oﬀ b ound implies that    D ∗ ( B j ) − b D ( B j )    >η /b w ith failure p r obabilit y at most 1 / (10 b ). By a union b ound o ver all b v alues of j , the algorithm outputs REJECT in Line 5 with probabilit y at most 1 / 10. 34 Algorithm 5: PCOND D -Test-Known Input: error parameter ǫ > 0; qu ery access to PCOND D oracle; explicit description ( D ∗ (1) , . . . , D ∗ ( N )) of distribu tion D ∗ 1: Call the SAMP D oracle m = Θ( b 2 (log b ) /η 2 ) times to ob tain p oin ts h 1 , . . . , h m distributed according to D . 2: for j = 0 to b do 3: Let b D ( B j ) b e the fraction of p oints h 1 , . . . , h m that lie in B j (where the buck ets B j are as deﬁned in Equation ( 24 )). 4: if s ome j has | D ∗ ( B j ) − b D ( B j ) | > η /b then 5: output REJECT and exit 6: end if 7: end for 8: Select s = Θ( b/ǫ ) p oin ts x 1 , . . . , x s indep en d en tly fr om D ∗ . 9: Call the SAMP D oracle s = Θ ( b/ǫ ) times to obtain p oin ts y 1 , . . . , y s distributed according to D . 10: for all pairs ( x i , y j ) (wh ere 1 ≤ i, j ≤ s ) su c h that D ∗ ( x ) D ∗ ( y ) ∈ [1 / 2 , 2] do 11: Call Comp are ( { x } , { y } , η / (4 b ) , 2 , 1 / (10 s 2 )) 12: if Comp are returns Lo w or a v alue smaller than (1 − η / (2 b )) · D ∗ ( x ) D ∗ ( y ) then 13: output REJECT (and exit) 14: end if 15: end for 16: output ACCEPT 35 Later in the algorithm, since D = D ∗ , no matter what p oin ts x i , y j are sampled fr om D ∗ and D resp ectiv ely , the follo wing holds for eac h pair ( x i , y j ) such that D ∗ ( x ) /D ∗ ( y ) ∈ [1 / 2 , 2]. By Lemma 2 (and the setting of the parameters in the calls to Comp ar e ), the probability that Comp are returns Lo w or a v alue s maller than (1 − δ / (2 b )) · ( D ∗ ( x ) /D ∗ ( y )), is at most 1 / (10 s 2 ). A union b ound o ver all (at most s 2 ) pairs ( x i , y j ) for whic h D ∗ ( x ) /D ∗ ( y ) ∈ [1 / 2 , 2], gives that the probabilit y of outputting REJECT in Line 13 is at most 1 / 10. Thus with ov erall pr obabilit y at least 8 / 10 the algorithm outputs ACCE PT . Soundness: No w sup p ose that d TV ( D , D ∗ ) ≥ ǫ ; our goal is to sho w that the algorithm rejects with probabilit y at least 2 / 3 . Since the algorithm rejects if any estimate b D ( B j ) obtained in Lin e 3 deviates fr om D ∗ ( B j ) by more than ± η /b , we may assume th at all these estimates are in deed ± η /b -close to the v alues D ∗ ( B j ) as required. Moreo v er, by an additiv e Chernoﬀ b ound (as in the completeness analysis), w e ha v e that with ov erall failure pr obabilit y at m ost 1 / 10, eac h j h as | b D ( B j ) − D ( B j ) | ≤ η /b ; w e condition on this ev ent going forth. Th us, f or ev ery 0 ≤ j ≤ b , D ∗ ( B j ) − 2 η /b ≤ D ( B j ) ≤ D ∗ ( B j ) + 2 η /b . (26) Recalling the deﬁn ition of J ℓ and Equ ation ( 25 ), we see that X j ∈ J ℓ ∪{ 0 } D ( B j ) < 4 η . (27) Let d j def = X x ∈ B j | D ∗ ( x ) − D ( x ) | , (28) so that k D ∗ − D k 1 = P j d j . By Equations ( 25 ) and ( 27 ), we ha v e X j ∈ J ℓ ∪{ 0 } d j ≤ X j ∈ J ℓ ∪{ 0 } ( D ∗ ( B j ) + D ( B j )) ≤ 6 η . (29) Since we h a ve (by assu mption) that k D ∗ − D k 1 = 2 d TV ( D ∗ , D ) ≥ 2 ǫ, w e get that X j ∈ J h \{ 0 } d j > 2 ǫ − 6 η . (30) Let N j def = | B j | and observe that N j ≤ D ∗ ( B j ) /p j ≤ 1 /p j , wh ere p j def = 2 j − 1 · η / N is the lo wer b ound on the probabilit y (und er D ∗ ) of all elemen ts in B j . F or eac h B j suc h that j ∈ J h \ { 0 } , let H j def = { x ∈ B j : D ( x ) > D ∗ ( x ) } and L j def = { x ∈ B j : D ( x ) < D ∗ ( x ) } . S imilarly to the “testing uniformity” analysis, we h a ve that X x ∈ L j ( D ∗ ( x ) − D ( x )) + X x ∈ H j ( D ( x ) − D ∗ ( x )) = d j . (31) Equation ( 26 ) may b e r ewritten as       X x ∈ L j ( D ∗ ( x ) − D ( x )) − X x ∈ H j ( D ( x ) − D ∗ ( x ))       ≤ 2 η /b , (32) 36 and so w e hav e b oth X x ∈ L j ( D ∗ ( x ) − D ( x )) ≥ d j / 2 − η /b and X x ∈ H j ( D ( x ) − D ∗ ( x )) ≥ d j / 2 − η /b . (33) Also similarly to wh at we h ad b efore, let H ′ j def = { x ∈ B j : D ( x ) > D ∗ ( x ) + η / ( bN j ) } , and L ′ j def = { x ∈ B j : D ( x ) < D ∗ ( x ) − η / ( bN j ) } (recall that N j = | B j | ); these are th e elemen ts of B j that are “signiﬁcan tly hea vier” (ligh ter, r esp ectiv ely) under D than u nder D ∗ . W e ha v e X x ∈ L j \ L ′ j ( D ∗ ( x ) − D ( x )) ≤ η /b and X x ∈ H j \ H ′ j ( D ( x ) − D ∗ ( x )) ≤ η /b . (34) By Equation ( 30 ), there exists j ∗ ∈ J h \ { 0 } for whic h d j ∗ ≥ (2 ǫ − 6 η ) /b . F or this index, applying Equations ( 33 ) and ( 34 ), we get th at X x ∈ L ′ j ∗ D ∗ ( x ) ≥ X x ∈ L ′ j ∗ ( D ∗ ( x ) − D ( x )) ≥ ( ǫ − 5 η ) /b , (35) and similarly , X x ∈ H ′ j ∗ D ( x ) ≥ X x ∈ H ′ j ∗ ( D ( x ) − D ∗ ( x )) ≥ ( ǫ − 5 η ) /b . (36) Recalling that η = ǫ/c and setting the constant c to 6, w e ha ve that ( ǫ − 5 η ) /b = ǫ/ 6 b . Since s = Θ( b/ǫ ), with p robabilit y at least 9/10 it is the case b oth that some x i dra wn in Line 8 b elongs to L ′ j ∗ and that s ome y i ′ dra wn in Line 9 b elongs to H ′ j ∗ . By the deﬁnitions of L ′ j ∗ and H ′ j ∗ and the fact for eac h j > 0 it holds that N j ≤ 1 /p j and p j ≤ D ∗ ( x ) < 2 p j for eac h x i ∈ B j , we h a ve that D ( x i ) < D ∗ ( x i ) − η / ( bN j ∗ ) ≤ D ∗ ( x i ) − ( η /b ) p j ∗ ≤ (1 − η/ (2 b )) D ∗ ( x i ) (37) and D ( y i ′ ) > D ∗ ( y i ′ ) + η / ( bN j ∗ ) ≥ D ∗ ( y i ′ ) + ( η /b ) p j ≥ (1 + η/ (2 b )) D ∗ ( y i ′ ) . (38) Therefore, D ( x i ) D ( y i ′ ) < 1 − η / (2 b ) 1 + η / (2 b ) · D ∗ ( x i ) D ∗ ( y i ′ ) <  1 − 3 η 4 b  · D ∗ ( x i ) D ∗ ( y i ′ ) . (39) By Lemma 2 , with pr obabilit y at least 1 − 1 / (10 s 2 ), the outpu t of Comp are is either Low or is at most  1 − 3 η 4 b  ·  1 + η 4 b  <  1 − η 2 b  , causing the algorithm to reject. Th us th e ov erall pr obabilit y that the algorithm ou tp uts REJECT is at least 8 / 10 − 1 / (1 0 s 2 ) > 2 / 3 , and the theorem is prov ed. 5.2 A (log N ) Ω(1) lo wer b ound for PCOND D In this sub s ection w e pr o ve that any PCOND D algorithm for testing equiv alence to a kn o w n d istri- bution must ha v e query complexit y at least (log N ) Ω(1) : 37 Theorem 8 Fix ǫ = 1 / 2 . Ther e is a distribution D ∗ over [ N ] (describ e d b elow), which is such that any PCOND D algorithm for testing whether D = D ∗ versus d TV ( D , D ∗ ) ≥ ǫ must make Ω  q log N log log N  queries. The distribution D ∗ . Fix parameters r = Θ  log N log log N  and K = Θ(log N ) . W e partition [ N ] from left (1) to r ight ( N ) in to 2 r consecutiv e interv als B 1 , . . . , B 2 r , which w e h enceforth refer to as “bu c kets.” The i -th bu c ket has | B i | = K i (w e ma y assume without loss of generalit y th at N is of the form P 2 r i =1 K i ). The distribu tion D ∗ assigns equal pr obabilit y weigh t to eac h buck et, so D ∗ ( B i ) = 1 / (2 r ) for all 1 ≤ i ≤ 2 r. Moreo ver D ∗ is u niform w ith in eac h buck et, so for all j ∈ B i w e hav e D ∗ ( j ) = 1 / (2 rK i ) . Th is completes the sp eciﬁcation of D ∗ . T o pr o ve the lo we r b ound we construct a probabilit y distrib ution P No o ver p ossible “No”- distributions. T o deﬁne the distribution P No it will b e useful to ha v e the notion of a “buck et-pair.” A buck et-pair U i is U i = B 2 i − 1 ∪ B 2 i , i.e. the union of the i -th pair of consecutive b uc k ets. A distr ibution D dr a wn from P No is obtained by selecting a string π = ( π 1 , . . . , π r ) un iformly at rand om from {↓↑ , ↑↓} r and setting D to b e D π , w hic h we no w deﬁn e. The distribution D π is obtained by p ertur bing D ∗ in the follo w ing wa y: for eac h buck et-pair U i = ( B 2 i − 1 , B 2 i ), • If π i = ↑↓ then the w eigh t of B 2 i − 1 is uniformly “scaled up” fr om 1 / (2 r ) to 3 / (4 r ) (ke eping the distribution u niform within B 2 i − 1 ) and the w eigh t of B 2 i is uniformly “scaled do w n ” fr om 1 / (2 r ) to 1 / (4 r ) (lik ewise k eeping the distribution uniform w ith in B 2 i ). • If π i = ↓↑ then the w eigh t of B 2 i − 1 is u niformly “scaled do wn” from 1 / (2 r ) to 1 / (4 r ) and the w eigh t of B 2 i is u niformly “scaled up” f rom 1 / (2 r ) to 3 / (4 r ). Note that for any distribu tion D in the su p p ort of P No and an y 1 ≤ i ≤ r we hav e that D ( U i ) = D ∗ ( U i ) = 1 /r . Ev ery distribution D in the supp ort of P No has d TV ( D ∗ , D ) = 1 / 2 . Thus Theorem 8 follo ws immediately f rom the f ollo wing: Theorem 9 L et A b e any (p ossibly adaptive) algorithm . wh ich makes at most q ≤ 1 3 · √ r c al ls to PCOND D . Then    Pr D ←P No h A PCOND D outputs ACCEPT i − Pr h A PCOND D ∗ outputs ACCEPT i    ≤ 1 / 5 . (40) Note that in the ﬁr st probabilit y of Equ ation ( 40 ) the r an d omness is o v er th e draw of D from P No , the in ternal randomness of A in selec ting its query sets, and the randomness of the r esp onses to the PCOND D queries. I n the second pr ob ab ility the rand omn ess is j ust o ver the inte rnal coin tosses of A and the r andomness of the resp onses to th e PCOND D queries. In tuition for Theorem 9 . A v ery high-lev el in tuition for the lo wer b ound is that PCOND D queries are only u seful for “comparing” p oin ts w hose pr ob ab ilities are within a reasonable multiplicativ e ratio of eac h other. But D ∗ and ev ery d istribution D in th e supp ort of P No are suc h that every 38 t wo p oin ts either ha v e the same pr obabilit y mass u nder all of th ese distribu tions (so a PCOND D query is not informativ e), or else th e ratio of their pr obabilities is so s kew ed that a small num b er of PCOND D queries is not u s eful for comparing th em. In more detail, we may sup p ose w ithout loss of generalit y that in every p ossible execution, algorithm A ﬁ rst mak es q calls to SA MP D and then m ak es q (p ossibly adaptiv e) calls to PCOND D . The more detailed intuitio n for the low er b ou n d is as follo ws: First consid er the S AMP D calls. Since ev ery p ossible D (w h ether D ∗ or a distrib ution dr awn from P No ) p u ts weig h t 1 /r on eac h buck et- pair U 1 , . . . , U r , a b ir thda y paradox argument imp lies th at in b oth scenarios, w ith pr obabilit y at least 9 / 10 (o v er the randomness in the resp on s es to th e SAM P D queries) no tw o of the q ≤ 1 3 √ r calls to SAMP D return p oints from th e same bu c ket- pair. Conditioned on this, the d istribution of resp onses to the SA MP D queries is exactly the same under D ∗ and un der D where D is dr a w n randomly fr om P No . F or the p air qu er ies, the intuitio n is that in either setting (wh ether the distr ib ution D is D ∗ or a rand omly chosen distribu tion fr om P No ), making q p air qu eries will with 1 − o (1) p robabilit y pro vide no inform ation that the tester could not sim ulate for itself. This is b ecause any pair query PCOND D ( { x, y } ) either h as x, y in the same buck et B i or in d iﬀeren t b uc k ets B i 6 = B j with i < j. If x, y are b oth in th e same buc k et B i then in either setting PCOND D ( { x, y } ) is equally lik ely to return x or y . If they b elong to buc k ets B i , B j with i < j then in either setting PCOND D ( { x, y } ) will r etur n the one that b elongs to P i with pr obabilit y 1 − 1 / Θ( K j − i ) ≥ 1 − 1 / Ω( K ) . Pro of of Theorem 9 : As describ ed ab o v e, w e may ﬁx A to b e an y PCOND D algorithm that mak es exactly q calls to SA MP D follo wed by exactly q adaptive calls to PCOND D . A tr anscript for A is a full sp eciﬁcation of the sequence of interac tions th at A h as w ith the PCOND D oracle in a giv en execution. More pr ecisely , it is a p air ( Y , Z ) wh ere Y = ( s 1 , . . . , s q ) ∈ [ N ] q and Z = (( { x 1 , y 1 } , p 1 ) , . . . , ( { x q , y q } , p q )), wher e p i ∈ { x i , y i } and x i , y i ∈ [ N ] . T he idea is that Y is a p ossible sequ en ce of resp onses that A migh t receiv e to the initial q SA MP D queries, { x i , y i } is a p ossib le pair that could b e th e input to an i -th PCOND D query , an d p i is a p ossible resp onse that could b e receiv ed fr om that qu ery . W e say that a length- i tr anscript pr eﬁx is a pair ( Y , Z i ) wh ere Y is a s a b ov e and Z i = (( { x 1 , y 1 } , p 1 ) , . . . , ( { x i , y i } , p i )). A PCOND algorithm A ma y b e view ed as a collection of distrib u - tions ov er pairs { x, y } in the follo wing wa y: for eac h length- i transcript-preﬁx ( Y , Z i ) (0 ≤ i ≤ q − 1), there is a d istribution o ver pairs { x i +1 , y i +1 } that A w ould use to select the ( i + 1)-st qu ery pair for PCOND D giv en that the length- i transcript preﬁx of A ’s execution th us far was ( Y , Z i ). W e write T ( Y ,Z i ) to denote this distribu tion ov er pairs. Let P ∗ denote the d istribution o v er transcripts induced by runn ing A with oracle PCOND D ∗ . Let P No denote the distribu tion o ver transcripts induced b y ﬁrst (i) dra wing D from P No , and then (ii) ru nning A with oracle PCOND D . T o p ro v e Theorem 9 it is suﬃcient to pro v e th at the distribution o ver transcripts of A is statistically close wh ether the oracle is D ∗ or is a ran d om D dra wn f rom P No , i.e. it is suﬃ cien t to pro v e that d TV (P ∗ , P No ) ≤ 1 / 5 . (41) F or our analysis we will n eed to consider v ariants of algorithm A that, rather th an making q calls to PCOND D , instead “fak e” the ﬁnal q − k of th ese PCOND D queries as describ ed b elow. F or 39 0 ≤ k ≤ q we deﬁne A ( k ) to b e the algorithm that works as follo ws: 1. A ( k ) exactly sim u lates the execution of A in making an initial q SA MP D calls and making the ﬁrst k PCOND D queries precisely lik e A . Let ( Y , Z k ) b e the length- k transcript preﬁx of A ’s execution thus obtained. 2. Exactly lik e A , algorithm A ( k ) dra ws a pair { x k +1 , y k +1 } from T ( Y ,Z k ) . Ho wev er, instead of calling PCOND D ( { x k +1 , y k +1 } ) to obtain p k +1 , algorithm A ( k ) generates p k +1 in the follo w in g manner: (i) If x k +1 and y k +1 b oth b elong to the same bu c ket B ℓ then p k +1 is c hosen u niformly from { x k +1 , y k +1 } . (ii) If one of { x k +1 , y k +1 } b elongs to B ℓ and the other b elongs to B ℓ ′ for some ℓ < ℓ ′ , then p k +1 is set to b e the elemen t of { x k +1 , y k +1 } that b elongs to B ℓ . Let ( Y , Z k +1 ) b e the length-( k +1) transcript preﬁx obtained by appen ding ( { x k +1 , y k +1 } , p k +1 ) to Z k . Algorithm A ( k ) con tinues in this wa y for a total of q − k stages; i.e. it next d r a w s { x k +2 , y k +2 } from T ( Y ,Z k +1 ) and generates p k +2 as describ ed ab ov e; then ( Y , Z k +2 ) is the length-( k + 2) transcript pr eﬁx obtained by app ending ( { x k +2 , y k +2 } , p k +2 ) to Z k +1 ; and so on. A t the end of the pro cess a transcript ( Y , Z q ) has b een constructed. Let P ∗ , ( k ) denote the distribution o ver ﬁn al transcripts ( Y , Z q ) that are obtained by ru n ning A ( k ) on a PCOND D ∗ oracle. Let P No , ( k ) denote the d istribution o v er ﬁnal transcrip ts ( Y , Z q ) that are obtained by (i) ﬁr s t dra wing D from P No , an d then (ii) runn ing A ( k ) on a PCOND D oracle. Note that P ∗ , ( q ) is identic al to P ∗ and P No , ( q ) is identic al to P No (since algorithm A ( q ) , which d o es not fake an y qu eries, is ident ical to algorithm A ). Recall that our goal is to p ro ve Equ ation ( 41 ). Sin ce P ∗ , ( q ) = P ∗ and P No , ( q ) = P No , Equ a- tion ( 41 ) is an imm ed iate consequence (usin g the triangle inequalit y for total v ariation d istance) of the follo wing t w o lemmas, wh ic h we p ro v e b elo w: Lemma 15 d TV (P ∗ , (0) , P No , (0) ) ≤ 1 / 10 . Lemma 16 F or al l 0 ≤ k < q , we have d TV (P ∗ , ( k ) , P ∗ , ( k + 1) ) ≤ 1 / (20 q ) and d TV (P No , ( k ) , P No , ( k +1) ) ≤ 1 / (20 q ) . Pro of of Lemma 15 : Deﬁne P ∗ 0 to b e the distribution o v er outcomes of the q calls to SAMP D (i.e. o ver length-0 transcrip t pr eﬁxes) when D = D ∗ . Deﬁne P No to b e the distribu tion o v er outcomes of the q calls to S AMP D when D is dr a w n fr om P No . W e b egin b y noting th at by the data p ro cessing inequalit y for total v ariation d istance ( Lemma 1 ), we h a ve d TV (P ∗ , (0) , P No , (0) ) ≤ d TV (P ∗ 0 , P No 0 ) (indeed, after the calls to resp ectiv ely SAMP D and SAM P D ∗ , the s ame rand omized fun ction F – whic h fak es all remaining oracle calls – is applied to the t wo resulting distribu tions o v er length-0 transcript pr eﬁxes P ∗ 0 and P No 0 ). In the rest of the pro of w e sh o w that d TV (P ∗ 0 , P No 0 ) ≤ 1 / 10 . 40 Let E denote the ev en t that the q call s to SAMP D yield p oin ts s 1 , . . . , s q suc h that no buc ket-pair U i con tains more than one of these p oin ts. Sin ce D ∗ ( U i ) = 1 /r for all i , P ∗ 0 ( E ) = q − 1 Y j =0  1 − j r  ≥ 9 / 10 , (42) where Equation ( 42 ) follo w s from a standard birthday paradox analysis and th e fact that q ≤ 1 3 √ r . Since for eac h p ossible outcome of D drawn f rom P No w e ha v e D ( U i ) = 1 /r f or all i , we f u rther ha ve that also P No 0 ( E ) = q − 1 Y j =0  1 − j r  . (43) W e moreo v er claim th at the tw o conditional distrib utions (P ∗ 0 | E ) and (P No 0 | E ) are ident ical, i.e. (P ∗ 0 | E ) = (P No 0 | E ) . (44) T o see this, ﬁx an y sequence ( ℓ 1 , . . . , ℓ q ) ∈ [ r ] q suc h that ℓ i 6 = ℓ j for all i 6 = j . Let ( s 1 , . . . , s q ) ∈ [ N ] q denote a dra w from (P ∗ 0 | E ). T he probabilit y that ( s i ∈ U ℓ i for all 1 ≤ i ≤ q ) is pr ecisely 1 /r q . Now giv en that s i ∈ U ℓ i for all i , it is clear that s i is equally lik ely to lie in B 2 ℓ i − 1 and in B 2 ℓ i , and giv en that it lies in a particular one of the t w o bu c kets, it is equally lik ely to b e any elemen t in that bu c ket. T his is true ind ep endently for all 1 ≤ i ≤ q . No w let ( s 1 , . . . , s q ) ∈ [ N ] q denote a draw from (P No 0 | E ) . Since eac h d istribution D in the supp ort of P No has D ( U i ) = 1 /r for all i , w e likewise h a ve that the prob ab ility that ( s i ∈ U ℓ i for all 1 ≤ i ≤ q ) is p recisely 1 /r q . No w given that s i ∈ U ℓ i for all i , we hav e that s i is equally likely to lie in B 2 ℓ i − 1 and in B 2 ℓ i ; this is b ecause π i (recall that π d etermines D = D π ) is equally likely to b e ↑↓ (in which case D ( B 2 ℓ i − 1 ) = 3 / (4 r ) and D ( B 2 ℓ i ) = 1 / (4 r )) as it is to b e ↓↑ (in w hic h case D ( B 2 ℓ i − 1 ) = 1 / (4 r ) and D ( B 2 ℓ i ) = 3 / (4 r )). Additionally , giv en that s i lies in a particular one of the t wo buc k ets, it is equally lik ely to b e any element in th at b uc k et. This is true indep enden tly for all 1 ≤ i ≤ q (b ecause conditioning on E ensures that no tw o elemen ts of s 1 , . . . , s q lie in the same b uc k et-pair, so there is “fresh rand omness for eac h i ”), and so in deed the t w o cond itional distributions (P ∗ 0 | E ) and (P No 0 | E ) are ident ical. Finally , th e claimed b oun d d TV (P ∗ 0 , P No 0 ) ≤ 1 / 10 follo ws dir ectly f rom Equations ( 42 ), ( 43 ) and ( 44 ). Pro of of Lemma 16 : Consid er ﬁ rst th e claim that d TV (P ∗ , ( k ) , P ∗ , ( k + 1) ) ≤ 1 / (20 q ). Fix an y 0 ≤ k < q . Th e data pro cessing inequalit y f or total v ariation distance implies th at d TV (P ∗ , ( k ) , P ∗ , ( k + 1) ) is at most the v ariation d istance b et wee n random v ariables X and X ′ , where • X is the random v ariable obtained by running A on COND D ∗ to obtain a length- k transcript preﬁx ( Y , Z k ), then d ra wing { x k +1 , y k +1 } fr om T ( Y ,Z k ) , then setting p k +1 to b e the output of PCOND D ∗ ( { x k +1 , y k +1 } ); and • X ′ is the random v ariable obtained b y running A on COND D ∗ to obtain a length- k transcript preﬁx ( Y , Z k ), then d ra w ing { x k +1 , y k +1 } from T ( Y ,Z k ) , then setting p k +1 according to the aforemen tioned rules 2(i) and 2(ii). 41 Consider an y ﬁxed outcome of ( Y , Z k ) and { x k +1 , y k +1 } . If ru le 2(i) is applied ( x k +1 and y k +1 are in the same buc k et), then there is zero con tribution to the v ariation distance b et w een X and X ′ , b e- cause c ho osing a uniform elemen t of { x k +1 , y k +1 } is a p erfect sim ulation of PCOND D ( { x k +1 , y k +1 } ) . If ru le 2(ii) is app lied, then the con tribution is upp er b ounded by O (1 /K ) < 1 / 2 0 q , b ecause PCOND D ∗ ( { x k +1 y k +1 } ) w ould return a diﬀerent outcome from rule 2(ii) with prob ab ility 1 / Θ( K ℓ ′ − ℓ ) = O (1 /K ) . Av eraging o v er all p ossible outcomes of ( Y , Z k ) and { x k +1 , y k +1 } we get th at th e v ariation distance b et ween X an d X ′ is at m ost 1 / 20 q as claimed. An id en tical argumen t sho ws that sim ilarly d TV (P No , ( k ) , P No , ( k +1) ) ≤ 1 / (20 q ). The k ey obser- v ation is that for an y distribu tion D in the s u pp ort of P No , as with D ∗ it is the case that p oint s in the same buck et h a ve equal probability u nder D and for a pair of p oin ts { x, y } suc h that x ∈ B ℓ and y ∈ B ℓ ′ for ℓ ′ > ℓ , th e probabilit y that a call to PCOND D ( { x, y } ) returns y is only 1 / Θ( K ℓ ′ − ℓ ). This concludes the pro of of Lemma 16 and of Theorem 8 . 5.3 A p oly(1 /ǫ ) -query COND D algorithm In this sub s ection w e pr esen t an algorithm COND -Test -Known and prov e the follo win g theorem: Theorem 10 COND -Test-Known is a ˜ O (1 /ǫ 4 ) -query COND D testing algorithm for testing e quiv- alenc e to a known distribution D ∗ . That is, for every p air of distributions D , D ∗ over [ N ] (such that D ∗ is ful ly sp e ciﬁe d and ther e is COND que ry ac c ess to D ), the algorithm outputs A CCEPT with pr ob- ability at le ast 2 / 3 if D = D ∗ and outputs REJECT with pr ob ability at le ast 2 / 3 if d TV ( D , D ∗ ) ≥ ǫ. This constant-query testing algorithm stands in int eresting contrast to the (log N ) Ω(1) -query lo wer b ound for PCOND D algorithms for this pr oblem. High-lev e l o v erview of the algorithm and it s analysis: First, we note that by reordering elemen ts of [ N ] w e ma y assu m e without loss of generalit y that D ∗ (1) ≤ · · · ≤ D ∗ ( N ); this w ill b e con venien t for us . Our (log N ) Ω(1) query lo w er b ound for PCOND D algorithms exploited the intuition that com- paring t wo p oin ts using the PCOND D oracle migh t not pro vide m uc h information (e.g . if one of the t wo p oin ts was a priori “known” to b e muc h hea vier th an the other). In con trast, with a general COND D oracle at our disp osal, we can compare a giv en p oin t j ∈ [ N ] with any subset of [ N ] \ { j } . Th us the follo w in g deﬁnition w ill b e useful: Deﬁnition 4 (comparable p oints) Fix 0 < λ ≤ 1 . A p oint j ∈ supp( D ∗ ) is said to b e λ - comparable if ther e e xi sts a set S ⊆ ([ N ] \ { j } ) such that D ∗ ( j ) ∈ [ λD ∗ ( S ) , D ∗ ( S ) /λ ] . Such a set S is then said to b e a λ -comparable-witness for j (ac c or ding to D ∗ ), which is denote d S ∼ = ∗ j . We say that a set T ⊆ [ N ] is λ -c omp ar able if ev ery i ∈ T is λ - c omp ar able. W e stress that the notion of b eing λ -comparable deals only with the kn o w n d istribution D ∗ ; this will b e imp ortan t later. 42 Fix ǫ 1 = Θ ( ǫ ) (w e sp ecify ǫ 1 precisely in Equ ation ( 47 ) b elo w). Ou r analysis and algorithm consider t w o p ossible cases for the d istribution D ∗ (where it is n ot hard to verify , and we pr o vid e an explanation sub sequen tly , that one of the t w o cases must hold): 1. The ﬁrst case is that for some i ∗ ∈ [ N ] w e h a ve D ∗ ( { 1 , . . . , i ∗ } ) > 2 ǫ 1 but D ∗ ( { 1 , . . . , i ∗ − 1 } ) ≤ ǫ 1 . (45) In this case 1 − ǫ 1 of the total pr obabilit y mass of D ∗ m ust lie on a set of at most 1 /ǫ 1 elemen ts, and in such a s itu ation it is easy to eﬃcien tly test whether D = D ∗ using p oly(1 /ǫ ) queries (see Algorithm COND D -Test-Known-Hea vy and Lemma 19 ). 2. The second case is that there exists an element k ∗ ∈ [ N ] su ch that ǫ 1 < D ∗ ( { 1 , . . . , k ∗ } ) ≤ 2 ǫ 1 < D ∗ ( { 1 , . . . , k ∗ + 1 } ) . (46) This is the more c h allenging (and t ypical) case. In this case, it can b e sh o wn that ev ery elemen t j > k ∗ has at least one ǫ 1 -comparable-witness within { 1 , . . . , j } . In fact, w e sho w (see Claim 17 ) that either (a) { 1 , . . . , j − 1 } is an ǫ 1 -comparable witness for j , or (b) th e set { 1 , . . . , j − 1 } can b e partitioned into disjoint sets 9 S 1 , . . . , S t suc h that e ach S i , 1 ≤ i ≤ t , is a 1 2 -comparable-witness for j . Case (a) is r elativ ely easy to h andle so we fo cus on (b) in our informal description b elo w. The partition S 1 , . . . , S t is u seful to us for the follo w in g reason: Sup p ose that d TV ( D , D ∗ ) ≥ ǫ. It is n ot diﬃcult to show (see Claim 18 ) that u nless D ( { 1 , . . . , k ∗ } ) > 3 ǫ 1 (whic h can b e easily detected and provides evidence that the tester should reject), a ran d om sample of Θ(1 /ǫ ) dra ws from D will with high p r obabilit y con tain a “hea vy” p oin t j > k ∗ , that is, a p oin t j > k ∗ suc h that D ( j ) ≥ (1 + ǫ 2 ) D ∗ ( j ) (wh ere ǫ 2 = Θ ( ǫ )). Giv en such a p oin t j , there are t w o p ossibilities: 1. The ﬁrst p ossibility is that a signiﬁ cant fraction of the sets S 1 , . . . , S t ha ve D ( j ) /D ( S i ) “no- ticeably diﬀeren t” from D ∗ ( j ) /D ∗ ( S i ) . (Observe that since eac h set S i is a 1 2 -comparable witness for j , it is p ossible to eﬃcien tly c heck wh ether this is th e case.) If th is is the case then our tester sh ould r eject since this is evidence that D 6 = D ∗ . 2. The second p ossibility is that almost every S i has D ( j ) /D ( S i ) ve ry close to D ∗ ( j ) /D ∗ ( S i ). If this is the case, though, th en s ince D ( j ) ≥ (1 + ǫ 2 ) D ∗ ( j ) and the union of S 1 , . . . , S t is { 1 , . . . , j − 1 } , it m ust b e the case that D ( { 1 , . . . , j } ) is “signiﬁcan tly larger” than D ∗ ( { 1 , . . . , j } ) . This will b e r ev ealed by r andom samp lin g fr om D and thus our testing algorithm can reject in this case as well. Key quantities and useful claims. W e deﬁn e some quantiti es that are used in the algorithm and its analysis. Let ǫ 1 def = ǫ 10 ; ǫ 2 def = ǫ 2 ; ǫ 3 def = ǫ 48 ; ǫ 4 def = ǫ 6 . (47) 9 In fact the sets are interv als (u nder the assumption D ∗ (1) ≤ · · · ≤ D ∗ ( n )), but th at is not really imp ortant for our argumen t s. 43 Claim 17 Supp ose ther e exists an element k ∗ ∈ [ N ] that satisﬁes Equation ( 46 ). Fix any j > k ∗ . Then 1. If D ∗ ( j ) ≥ ǫ 1 , then S 1 def = { 1 , . . . , j − 1 } is an ǫ 1 -c omp ar able witness for j ; 2. If D ∗ ( j ) < ǫ 1 then the set { 1 , . . . , j − 1 } c an b e p artitione d into disjoint sets S 1 , . . . , S t such that e ach S i , 1 ≤ i ≤ t , is a 1 2 -c omp ar able-witness for j . Pro of: First co nsider the case that D ∗ ( j ) ≥ ǫ 1 . In this case S 1 = { 1 , . . . , j − 1 } is an ǫ 1 -comparable witness for j b ecause D ∗ ( j ) ≥ ǫ 1 ≥ ǫ 1 D ∗ ( { 1 , . . . , j − 1 } ) and D ∗ ( j ) ≤ 1 ≤ 1 ǫ 1 D ∗ ( { 1 , . . . , k ∗ } ) ≤ 1 ǫ 1 D ∗ ( { 1 , . . . , j − 1 } ) , where the last inequalit y holds since k ∗ ≤ j − 1. Next, consid er the case that D ∗ ( j ) < ǫ 1 . In this case w e b u ild our interv als iterativ ely fr om righ t to left, as f ollo ws. Let j 1 = j − 1 and let j 2 b e the minimum in dex in { 0 , . . . , j 1 − 1 } such that D ∗ ( { j 2 + 1 , . . . , j 1 } ) ≤ D ∗ ( j ) . (Observe that we m ust ha v e j 2 ≥ 1, b ecause D ∗ ( { 1 , . . . , k ∗ } ) > ǫ 1 > D ∗ ( j ) . ) Since D ∗ ( { j 2 , . . . , j 1 } ) > D ∗ ( j ) and the function D ∗ ( · ) is m onotonically increasing, it must b e the case that 1 2 D ∗ ( j ) ≤ D ∗ ( { j 2 + 1 , . . . , j 1 } ) ≤ D ∗ ( j ) . Th us the int erv al S 1 def = { j 2 + 1 , . . . , j 1 } is a 1 2 -comparable witness for j as desir ed . W e con tin u e in this fashion from right to left; i.e. if we ha v e d eﬁned j 2 , . . . , j t as ab o ve and there is an ind ex j ′ ∈ { 0 , . . . , j t − 1 } such that D ∗ ( { j ′ + 1 , . . . , j t } ) > D ∗ ( j ) , then we d eﬁne j t +1 to b e th e minimum in dex in { 0 , . . . , j t − 1 } such that D ∗ ( { j t +1 + 1 , . . . , j t } ) ≤ D ∗ ( j ) , and w e deﬁne S t to b e the in terv al { j t +1 + 1 , . . . , j t } . Th e argument of the previous paragraph tells us that 1 2 D ∗ ( j ) ≤ D ∗ ( { j t +1 + 1 , . . . , j t } ) ≤ D ∗ ( j ) (48) and hen ce S t is an 1 2 -comparable witness for j . A t some p oin t, after in terv als S 1 = { j 2 + 1 , . . . , j 1 } , . . . , S t = { j t +1 + 1 , . . . , j t } h a ve b een deﬁned in this wa y , it will b e the case that there is no ind ex j ′ ∈ { 0 , . . . , j t − 1 } such that D ∗ ( { j ′ + 1 , . . . , j t } ) > D ∗ ( j ) . At this p oin t there are t w o p ossibilities: ﬁrst, if j t +1 + 1 = 1, then S 1 , . . . , S t giv e the d esir ed p artition of { 1 , . . . , j − 1 } . If j t +1 + 1 > 1 then it must b e the case th at D ∗ ( { 1 , . . . , j t +1 } ) ≤ D ∗ ( j ) . In this case we simp ly add the elemen ts { 1 , . . . , j t +1 } to S t , i.e. we redeﬁne S t to b e { 1 , . . . , j t } . By Equ ation ( 48 ) we h a ve that 1 2 D ∗ ( j ) ≤ D ∗ ( S t ) ≤ 2 D ∗ ( j ) and thus S t is an 1 2 -comparable witness for j as desir ed . T h is concludes the p ro of. 44 Deﬁnition 5 (Hea vy p oin ts) A p oint j ∈ supp( D ∗ ) is said to b e η -hea vy if D ( j ) ≥ (1 + η ) D ∗ ( j ) . Claim 18 Supp ose that d TV ( D , D ∗ ) ≥ ǫ a nd Equation ( 46 ) hol ds. Supp ose mor e over that D ( { 1 , . . . , k ∗ } ) ≤ 4 ǫ 1 . L et i 1 , . . . , i ℓ b e i.i.d. p oints dr awn fr om D . Then for ℓ = Θ(1 /ǫ ) , with pr ob- ability at le ast 99 / 10 0 (over the i.i.d. dr aws of i 1 , . . . , i ℓ ∼ D ) ther e is some p oint i j ∈ { i 1 , . . . , i ℓ } such that i j > k ∗ and i j is ǫ 2 -he avy. Pro of: Deﬁn e H 1 to b e the set of all ǫ 2 -hea vy p oints and H 2 to b e the set of all “sligh tly ligh ter” p oints as follo ws : H 1 = { i ∈ [ N ] | D ( i ) ≥ (1 + ǫ 2 ) D ∗ ( i ) } H 2 = { i ∈ [ N ] | (1 + ǫ 2 ) D ∗ ( i ) > D ( i ) ≥ D ∗ ( i ) } By d eﬁ nition of the total v ariation distance, we ha v e ǫ ≤ d TV ( D , D ∗ ) = X i : D ( i ) ≥ D ∗ ( i ) ( D ( i ) − D ∗ ( i )) = ( D ( H 1 ) − D ∗ ( H 1 )) + ( D ( H 2 ) − D ∗ ( H 2 )) ≤ D ( H 1 ) + ((1 + ǫ 2 ) D ∗ ( H 2 ) − D ∗ ( H 2 )) = D ( H 1 ) + ǫ 2 D ∗ ( H 2 ) < D ( H 1 ) + ǫ 2 = D ( H 1 ) + ǫ 2 . So it m ust b e the case that D ( H 1 ) ≥ ǫ/ 2 = 5 ǫ 1 . Since by assu m ption we ha ve D ( { 1 , . . . , k ∗ } ) ≤ 4 ǫ 1 , it m ust b e the case that D ( H 1 \ { 1 , . . . , k ∗ } ) ≥ ǫ 1 . The claim follo w s from the deﬁn ition of H 1 and the size, ℓ , of the sample. Algorithm 6: COND D -Test-Known Input: error parameter ǫ > 0; qu ery access to COND D oracle; explicit description ( D ∗ (1) , . . . , D ∗ ( N )) of distribu tion D ∗ satisfying D ∗ (1) ≤ · · · ≤ D ∗ ( N ) 1: Let i ∗ b e the minimum index i ∈ [ N ] suc h th at D ∗ ( { 1 , . . . , i } ) > 2 ǫ 1 . 2: if D ∗ ( { 1 , . . . , i ∗ − 1 } ) ≤ ǫ 1 then 3: Call algorithm COND D -T est-Kno wn-Hea vy( ǫ, COND D , D ∗ , i ∗ ) (and exit) 4: else 5: Call algorithm COND D -T est-Kno wn-Main( ǫ, COND D , D ∗ , i ∗ − 1) (and exit). 6: end if 5.3.1 Pro of of Theorem 10 It is straightforw ard to v erify that the query complexit y of COND D -T est-Kno wn-Hea vy is ˜ O (1 /ǫ 4 ) and the query complexit y of COND D -T est-Kno wn-Main is also ˜ O (1 /ǫ 4 ), so the o verall query com- plexit y of COND -Test-Known is as claimed. By th e d eﬁnition of i ∗ (in the ﬁr st line of the algorithm), either Equation ( 45 ) h olds for th is setting of i ∗ , or Equation ( 46 ) h olds for k ∗ = i ∗ − 1. T o p r o ve correctness of the algorithm, w e ﬁrst deal with the simpler case, whic h is that Equation ( 45 ) h olds: 45 Algorithm 7: COND D -Test-Known-Hea vy Input: error parameter ǫ > 0; qu ery access to COND D oracle; explicit description ( D ∗ (1) , . . . , D ∗ ( N )) of distribu tion D ∗ satisfying D ∗ (1) ≤ · · · ≤ D ∗ ( N ); v alue i ∗ ∈ [ N ] satisfying D ∗ ( { 1 , . . . , i ∗ − 1 } ) ≤ ǫ 1 , D ∗ ( { 1 , . . . , i ∗ } ) > 2 ǫ 1 1: Call the SAMP D oracle m = Θ((log (1 /ǫ )) /ǫ 4 ) times. F or eac h i ∈ [ i ∗ , N ] let b D ( i ) b e the fraction of the m calls to SA MP D that retur ned i. L et b D ′ = 1 − P i ∈ [ i ∗ ,N ] b D ( i ) b e the fr action of the m calls that retur ned v alues in { 1 , . . . , i ∗ − 1 } . 2: if either (any i ∈ [ i ∗ , N ] has | b D ( i ) − D ∗ ( i ) | > ǫ 1 2 ) or ( b D ′ − D ∗ ( { 1 , . . . , i ∗ − 1 } ) > ǫ 1 ) then 3: output REJECT (and exit) 4: end if 5: Output ACCEPT Algorithm 8: COND D -Test-Known-Main Input: error parameter ǫ > 0; qu ery access to COND D oracle; explicit description ( D ∗ (1) , . . . , D ∗ ( N )) of distribu tion D ∗ satisfying D ∗ (1) ≤ · · · ≤ D ∗ ( N ); v alue k ∗ ∈ [ N ] satisfying ǫ 1 < D ∗ ( { 1 , . . . , k ∗ } ) ≤ 2 ǫ 1 < D ∗ ( { 1 , . . . , k ∗ + 1 } ) 1: Call the SAMP D oracle Θ(1 /ǫ 2 ) times and let b D ( { 1 , . . . , k ∗ } ) denote the fraction of resp onses that lie in { 1 , . . . , k ∗ } . If b D ( { 1 , . . . , k ∗ } ) / ∈ [ ǫ 1 2 , 5 ǫ 1 2 ] then output REJECT (and exit). 2: Call the SAMP D oracle ℓ = Θ(1 /ǫ ) times to obtain p oin ts i 1 , . . . , i ℓ . 3: for all j ∈ { 1 , . . . , ℓ } su c h that i j > k ∗ do 4: Call the SAMP D oracle m = Θ(log (1 /ǫ ) /ǫ 2 ) times and let b D ( { 1 , . . . , i j } ) b e the fraction of resp onses that lie in { 1 , . . . , i j } . If b D ( { 1 , . . . , i j } ) / ∈ [1 − ǫ 3 , 1 + ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) then output REJECT (and exit). 5: if D ∗ ( i j ) ≥ ǫ 1 then 6: Run Comp are ( { i j } , { 1 , . . . , i j − 1 } , ǫ 2 16 , 2 ǫ 1 , 1 10 ℓ ) and let v denote its output. If v / ∈ [1 − ǫ 2 8 , 1 + ǫ 2 8 ] D ∗ ( { 1 ,...,i j − 1 } ) D ∗ ( { i j } ) then outpu t REJECT (and exit). 7: else 8: Let S 1 , . . . , S t b e th e partition of { 1 , . . . , i j − 1 } such that eac h S i is an ǫ 1 -comparable witness for i j , which is pr o vided by Claim 17 . 9: Select a list of h = Θ(1 /ǫ ) elemen ts S a 1 , . . . , S a h indep en d en tly and un iformly from { S 1 , . . . , S j } . 10: F or eac h S a r , 1 ≤ r ≤ h , r u n Comp are ( { i j } , S a r , ǫ 4 8 , 4 , 1 10 ℓh ) and let v denote its ou tp ut. If v / ∈ [1 − ǫ 4 4 , 1 + ǫ 4 4 ] D ∗ ( S a r ) D ∗ ( { i j } ) then outpu t REJECT (and exit). 11: end if 12: end for 13: Output ACCEPT . 46 Lemma 19 Supp ose that D ∗ is such that D ∗ ( { 1 , . . . , i ∗ } ) > 2 ǫ 1 but D ∗ ( { 1 , . . . , i ∗ − 1 } ) ≤ ǫ 1 . Then COND D -Test-Known-Hea vy ( ǫ, COND D , D ∗ , i ∗ ) r eturns A CCEPT with pr ob ability at le ast 2 / 3 if D = D ∗ and r eturns REJECT with pr ob ability at le ast 2 / 3 if d TV ( D , D ∗ ) ≥ ǫ. Pro of: T h e cond itions of Lemma 19 , to gether with the fact that D ∗ ( · ) is monotone non-decreasing, imply th at eac h i ≥ i ∗ has D ∗ ( i ) ≥ ǫ 1 . Thus there can b e at most 1 /ǫ 1 man y v alues i ∈ { i ∗ , . . . , N } , i.e. it must b e the case that i ∗ ≥ N − 1 /ǫ 1 + 1 . Since the exp ected v alue of b D ( i ) (deﬁn ed in Line 1 of COND D -Test-Known-Hea vy ) is precisely D ( i ), for an y ﬁx ed v alue of i ∈ { i ∗ , . . . , n } an additiv e Ch er n oﬀ b ound implies that | D ( i ) − b D ( i ) | ≤ ( ǫ 1 ) 2 with failure probab ility at most 1 10  1+ 1 ǫ 1  . Similarly | b D ′ − D ( { 1 , . . . , i ∗ − 1 } ) | ≤ ǫ 1 with failure p robabilit y at most 1 10  1+ 1 ǫ 1  . A union b oun d o ver all failure ev en ts giv es that with probability at least 9 / 10 eac h v alue i ∈ { i ∗ , . . . , N } has | D ( i ) − b D ( i ) | ≤ ǫ 1 2 and additionally | b D ′ − D ( { 1 , . . . , i ∗ − 1 } ) | ≤ ǫ 1 ; we refer to this comp ou n d ev en t as (*). If D ∗ = D , by (*) the algorithm outputs ACCE PT with p r obabilit y at least 9 / 10 . No w su pp ose that d TV ( D , D ∗ ) ≥ ǫ. With probabilit y at least 9/10 we h av e (*) so w e supp ose that ind eed (*) h olds. In this case we ha v e ǫ ≤ d TV ( D , D ∗ ) = X i ( ǫ 1 ) 2 then the algorithm outputs REJECT so w e ma y assume that | b D ( i ) − D ∗ ( i ) | ≤ ǫ 1 2 for all i . This imp lies that 6 ǫ 1 = 6 10 ǫ ≤ b D ′ but sin ce D ∗ ( { 1 , . . . , i ∗ − 1 } ) ≤ ǫ 1 the algorithm must REJECT . No w we turn to the more diﬃcult (and t yp ical) case, that Equation ( 46 ) holds (for k ∗ = i ∗ − 1), i.e. ǫ 1 < D ∗ ( { 1 , . . . , k ∗ } ) ≤ 2 ǫ 1 < D ∗ ( { 1 , . . . , k ∗ + 1 } ) . 47 With the claims w e h a ve already established it is str aightforw ard to argue completeness: Lemma 20 Supp ose that D = D ∗ and Equation ( 46 ) holds. Then with pr ob ability at le ast 2 / 3 algorithm COND D -Test-Known-Main outputs ACCEPT . Pro of: W e ﬁrst observe that the exp ected v alue of the quantit y b D ( { 1 , . . . , k ∗ } ) deﬁned in Line 1 is precisely D ( { 1 , . . . , k ∗ } ) = D ∗ ( { 1 , . . . , k ∗ } ) and h ence lies in [ ǫ 1 , 2 ǫ 1 ] by Equ ation ( 46 ). T he additiv e Ch ernoﬀ b ound implies that th e probability th e algorithm outputs REJECT in Line 1 is at most 1 / 10 . Thus we ma y assume the algorithm con tinues to Line 2 . In any give n execution of Line 4 , since the exp ected v alue of b D ( { 1 , . . . , i j } ) is precisely D ( { 1 , . . . , i j } ) = D ∗ ( { 1 , . . . , i j } ) > ǫ 1 , a multiplicativ e Chernoﬀ b ound giv es that the algorithm outputs R EJECT w ith probabilit y at most 1 / (10 ℓ ) . Thus the probability that the algorithm out- puts REJECT in an y execution of Lin e 4 is at most 1 / 10 . W e henceforth assume that the algorithm nev er outp uts REJECT in this step. Fix a setting of j ∈ { 1 , . . . , ℓ } su c h that i j > k ∗ . C onsider ﬁrs t the case that D ∗ ( i j ) ≥ ǫ 1 so the algorithm ent ers Lin e 6 . By item (1) of Claim 17 and item (1) of Lemma 2 , we h a ve that w ith probabilit y at least 1 − 1 10 ℓ Comp ar e outputs a v alue v in the range [1 − ǫ 2 16 , 1 + ǫ 2 16 ] D ∗ ( { 1 ,...,i j − 1 } ) D ∗ ( { i j } ) (recall that D = D ∗ ), so the algorithm do es n ot output REJECT in Line 6 . No w su pp ose that D ∗ ( i j ) < ǫ 1 so the algorithm enters Line 8 . Fix a v alue 1 ≤ r ≤ h in Line 10 . By Claim 17 w e hav e that S a r is a 1 2 -comparable w itn ess for i j . By item (1) of Lemma 2 , w e ha v e that with probabilit y at least 1 − 1 10 ℓh Comp ar e outputs a v alue v in the range [1 − ǫ 4 4 , 1 + ǫ 4 4 ] D ∗ ( S a r ) D ∗ ( { i j } ) (recall that D = D ∗ ). A union b ou n d o ver all h v alues of r giv es that the algorithm outpu ts REJECT in Line 10 with probabilit y at most 1 / (10 ℓ ) . So in either case, for this setting of j , th e algorithm outp u ts REJECT on th at iteration of the ou ter lo op w ith probabilit y at most 1 / (10 ℓ ) . A u nion b ound o v er all ℓ iterations of the outer lo op give s that the algorithm outp u ts REJECT at any execution of Line 6 or Line 10 is at most 1 / 10 . Th us the o verall probability that the algorithm outpu ts REJECT is at most 3 / 10, and the lemma is prov ed. Next we argue soun dness: Lemma 21 Supp ose that d TV ( D , D ∗ ) ≥ ǫ and Equation ( 46 ) holds. Then with pr ob ability at le ast 2 / 3 algorithm COND D -Test-Known-Main outputs REJECT . Pro of: If D ( { 1 , . . . , k ∗ } ) / ∈ [ ǫ 1 , 3 ǫ 1 ] then a stand ard additive Ch ernoﬀ b ound implies that the algorithm outputs REJECT in Line 1 w ith probabilit y at least 9 / 10 . Thus we ma y assume going forw ard in the argument that D ( { 1 , . . . , k ∗ } ) ∈ [ ǫ 1 , 3 ǫ 1 ]. As a result w e m ay apply Claim 18 , and w e ha v e that with probabilit y at least 99 / 100 there is an elemen t i j ∈ { i 1 , . . . , i ℓ } suc h that i j > k ∗ and i j is ǫ 2 -hea vy , i.e. D ( i j ) ≥ (1 + ǫ 2 ) D ∗ ( i j ) . W e condition on this even t going forw ard (the rest of our analysis will deal with this sp eciﬁc element i j ). W e now consider t w o cases: Case 1: D istribution D h as D ( { 1 , . . . , i j } ) / ∈ [1 − 3 ǫ 3 , 1 + 3 ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) . Since the quantit y b D ( { 1 , . . . , i j } ) obtained in Line 4 has exp ected v alue D ( { 1 , . . . , i j } ) ≥ D ( { 1 , . . . , k ∗ } ) ≥ ǫ 1 , applying 48 the m ultiplicativ e Ch er n oﬀ b ound implies that b D ( { 1 , . . . , i j } ) ∈ [1 − ǫ 3 , 1 + ǫ 3 ] D ( { 1 , . . . , i j } ) except with failure probability at most ǫ/ 10 ≤ 1 / 10. If this failure ev en t d o es not o ccur th en since D ( { 1 , . . . , i j } ) / ∈ [1 − 3 ǫ 3 , 1 + 3 ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) it m ust hold that b D ( { 1 , . . . , i j } ) / ∈ [1 − ǫ 3 , 1 + ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) and consequently the algorithm outpu ts REJECT . Thus in Case 1 th e algorithm outputs REJECT with ov erall failure pr ob ab ility at least 89 / 10 0 . Case 2: Distribution D h as D ( { 1 , . . . , i j } ) ∈ [1 − 3 ǫ 3 , 1 + 3 ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) . T his case is divided in to t w o sub-cases dep ending on the v alue of D ∗ ( i j ). Case 2(a): D ∗ ( i j ) ≥ ǫ 1 . In th is case the algorithm reac hes Line 6 . W e use the follo w ing claim: Claim 22 In Case 2(a), supp ose that i j > k ∗ is such that D ( i j ) ≥ (1+ ǫ 2 ) D ∗ ( i j ) , and D ( { 1 , . . . , i j } ) ∈ [1 − 3 ǫ 3 , 1 + 3 ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) . Then D ( { 1 , . . . , i j − 1 } ) D ( i j ) ≤  1 − ǫ 2 4  · D ∗ ( { 1 , . . . , i j − 1 } ) D ∗ ( i j ) . Pro of: T o simp lify notation we w rite a def = D ( i j ); b def = D ∗ ( i j ); c def = D ( { 1 , . . . , i j − 1 } ); d def = D ∗ ( { 1 , . . . , i j − 1 } ) . W e hav e th at a ≥ (1 + ǫ 2 ) b and a + c ≤ (1 + 3 ǫ 3 )( b + d ) . (49) This give s c ≤ (1 + 3 ǫ 3 )( b + d ) − (1 + ǫ 2 ) b = (1 + 3 ǫ 3 ) d + (3 ǫ 3 − ǫ 2 ) b < (1 + 3 ǫ 3 ) d , (50) where in th e last inequalit y we used ǫ 2 > 3 ǫ 3 . Recalling that a ≥ (1 + ǫ 2 ) b and using ǫ 3 = ǫ 2 / 24 we get c a < (1 + 3 ǫ 3 ) d (1 + ǫ 2 ) b = d b · 1 + ǫ 2 / 8 1 + ǫ 2 < d b ·  1 − ǫ 2 4  . (51) This p ro v es the claim. Applying Claim 22 , we get that in Line 6 we hav e D ( { 1 , . . . , i j − 1 } ) D ( i j ) ≤  1 − ǫ 2 4  · D ∗ ( { 1 , . . . , i j − 1 } ) D ∗ ( i j ) . (52) Recalling that by the premise of this case D ∗ ( i j ) ≥ ǫ 1 , by applying Claim 17 w e ha ve that { 1 , . . . , i j − 1 } is an ǫ 1 -comparable w itn ess for i j . Therefore, by Lemma 2 , with prob ab ility at least 1 − 1 10 ℓ the call to Comp are ( { i j } , { 1 , . . . , i j − 1 } , ǫ 2 16 , 2 ǫ 1 , 1 10 ℓ ) in Line 6 either outputs an ele- men t of { High , Lo w } or outputs a v alue v ≤ (1 − ǫ 2 4 )(1 + ǫ 2 16 ) D ∗ ( { 1 ,...,i j − 1 } ) D ∗ ( i j ) < (1 − ǫ 2 8 ) D ∗ ( { 1 ,...,i j − 1 } ) D ∗ ( i j ) . In either case th e algorithm outputs REJECT in Line 6 , so we are done with Case 2(a). Case 2(b): D ∗ ( i j ) < ǫ 1 . In this case the algorithm reac h es Line 10 , and by item 2 of Claim 17 , w e ha v e that S 1 , . . . , S t is a partition of { 1 , . . . , i j − 1 } and eac h set S 1 , . . . , S t is a 1 2 -comparable witness for i j , i.e., for all i ∈ { 1 , . . . , t } , 1 2 D ∗ ( i j ) ≤ D ∗ ( S i ) ≤ 2 D ∗ ( i j ) . (53) W e us e the follo wing claim: 49 Claim 23 In Case 2(b) supp ose i j > k ∗ is such that D ( i j ) ≥ (1 + ǫ 2 ) D ∗ ( i j ) and D ( { 1 , . . . , i j } ) ∈ [1 − 3 ǫ 3 , 1 + 3 ǫ 3 ] D ∗ ( { 1 , . . . , i j } ) . Then at le ast ( ǫ 4 / 8) -fr action of the se ts S 1 , . . . , S t ar e such that D ( S i ) ≤ (1 + ǫ 4 ) D ∗ ( S i ) . Pro of: The p ro of is b y contradicti on. Let ρ = 1 − ǫ 4 / 8 and sup p ose that there are w sets (without loss of generalit y we call them S 1 , . . . , S w ) that satisfy D ( S i ) > (1 + ǫ 4 ) D ∗ ( S i ), wh ere ρ ′ = w t > ρ. W e ﬁ rst observe that the w eigh t of the w su bsets S 1 , . . . , S w under D ∗ , as a f r action of D ∗ ( { 1 , . . . , i j − 1 } ), is at least D ∗ ( S 1 ∪ · · · ∪ S w ) D ∗ ( S 1 ∪ · · · ∪ S w ) + ( t − w ) · 2 D ∗ ( i j ) ≥ w D ∗ ( i j ) 2 w D ∗ ( i j ) 2 + ( t − w ) · 2 D ∗ ( i j ) = w 4 t − 3 w = ρ ′ 4 − 3 ρ ′ , where w e used the r igh t inequalit y in Equation ( 53 ) on S w +1 , . . . , S t to obtain the leftmost expr ession ab o v e, and th e left inequalit y in Equation ( 53 ) (together with the fact that x x + c is an increasing function of x f or all c > 0) to obtain the inequ ality ab ov e. Th is implies th at D ( { 1 , . . . , i j − 1 } ) = w X i =1 D ( S i ) + t X i = w +1 D ( S i ) ≥ (1 + ǫ 4 ) w X i =1 D ∗ ( S i ) + t X i = w +1 D ( S i ) ≥ (1 + ǫ 4 ) ρ ′ 4 − 3 ρ ′ D ∗ ( { 1 , . . . , i j − 1 } ) ≥ (1 + ǫ 4 ) ρ 4 − 3 ρ D ∗ ( { 1 , . . . , i j − 1 } ) . (54) F rom Equation ( 54 ) we h a ve D ( { 1 , . . . , i j } ) ≥ (1 + ǫ 4 ) ρ 4 − 3 ρ D ∗ ( { 1 , . . . , i j − 1 } ) + (1 + ǫ 2 ) D ∗ ( i j ) ≥  1 + 3 ǫ 4 8  D ∗ ( { 1 , . . . , i j − 1 } ) + (1 + ǫ 2 ) D ∗ ( i j ) where for the ﬁrst inequalit y ab ov e w e used D ( i j ) ≥ (1 + ǫ 2 ) D ∗ ( i j ) and for the second inequalit y w e us ed (1 + ǫ 4 ) ρ 4 − 3 ρ ≥ 1 + 3 ǫ 4 8 . Th is implies that D ( { 1 , . . . , i j } ) >  1 + 3 ǫ 4 8  D ∗ ( { 1 , . . . , i j − 1 } ) +  1 + 3 ǫ 4 8  D ∗ ( i j ) =  1 + 3 ǫ 4 8  D ∗ ( { 1 , . . . , i j } ) where the inequality follo ws from ǫ 2 > 3 ǫ 4 8 . Since 3 ǫ 4 8 = 3 ǫ 3 , though, this is a cont radiction and the claim is pr o ved. Applying Claim 23 , and recalling that h = Θ(1 /ǫ ) = Θ(1 /ǫ 4 ) sets are chosen rand omly in Line 9 , we h a ve that w ith probabilit y at least 9 / 1 0 there is some r ∈ { 1 , . . . , h } su c h that D ( S a r ) ≤ (1 + ǫ 4 ) D ∗ ( S a r ). Com bining this with D ( i j ) ≥ (1 + ǫ 2 ) D ∗ ( i j ), we get that D ( S a r ) D ( i j ) ≤ 1 + ǫ 4 1 + ǫ 2 · D ∗ ( S a r ) D ∗ ( i j ) ≤  1 − ǫ 4 2  · D ∗ ( S a r ) D ∗ ( i j ) . 50 By Lemma 2 , with pr ob ab ility at least 1 − 1 10 ℓh the call to Comp are ( { i j } , S a r , ǫ 4 8 , 4 , 1 10 ℓn ) in Line 10 either outputs an elemen t of { High , Lo w } or outputs a v alue v ≤ (1 − ǫ 4 2 )(1 + ǫ 4 8 ) D ∗ ( S a r ) D ∗ ( i j ) < (1 − ǫ 4 4 ) D ∗ ( S a r ) D ∗ ( i j ) . In either case the algorithm outpu ts REJECT in Line 10 , so we are done in Case 2(b). This concludes the pro of of sound n ess and the pro of of Theorem 7 . 6 T esting equalit y b et w een t w o unkno wn distributions 6.1 An approac h based on P COND queries In this subsection we consider the problem of testing wh ether t w o u n kno wn distributions D 1 , D 2 are iden tical v ersus ǫ -far, giv en PCOND access to these d istributions. Although th is is kn o wn to requir e Ω  N 2 / 3  man y s amp les in the standard mo del [ BFR + 10 , V al11 ], we are able to giv e a p oly(log N , 1 /ǫ )-query algorithm using PCOND qu er ies, b y taking adv anta ge of comparisons to p erform some sort of clustering of the d omain. On a high leve l th e algorithm works as f ollo ws. First it obtains (with high pr obabilit y) a small set of p oin ts R suc h that almost ev ery element in [ N ], except p ossibly for some n egligible subs et according to D 1 , has probability w eigh t (und er D 1 ) close to some “representa tiv e” in R . Next, for eac h r epresent ativ e r in R it obtains an estimate of the weig ht , according to D 1 , of a set of p oin ts U ( r ) such that D 1 ( u ) is close to D 1 ( r ) for eac h u in U ( r ) (i.e., r ’s “neigh b orho o d u nder D 1 ”). This is done using the pro cedu r e Estima te-Neighborh ood from Subsection 3.2 . Note that these neigh b orh o o ds can b e interpreted r ou gh ly as a su ccinct c over of the s u pp ort of D 1 in to (not necessarily disj oin t) sets of p oin ts, wh ere within eac h set the p oin ts hav e similar w eigh t (according to D 1 ). Our algorithm is b ased on th e observ ation that, if D 1 and D 2 are far from eac h other, it m ust b e the case that one of these sets, denoted U ( r ∗ ), reﬂ ects it in one of the f ollo wing wa ys: (1) D 2 ( U ( r ∗ )) diﬀers signiﬁcantl y fr om D 1 ( U ( r ∗ )); (2) U ( r ∗ ) conta ins a subset of p oin ts V ( r ∗ ) s u c h that D 2 ( v ) diﬀers signiﬁcan tly from D 2 ( r ∗ ) for eac h v in V ( r ∗ ), and either D 1 ( V ( r ∗ )) is relativ ely large or D 2 ( V ( r ∗ )) is r elativ ely large. (This str u ctural result is made p recise in Lemma 25 ). W e th us tak e add itional samples, b oth fr om D 1 and from D 2 , and compare the weig ht (according to b oth distributions) of eac h p oin t in these samples to the representa tiv es in R (usin g the pr o cedure Comp ar e from Subsection 3.1 ). In this man n er we detect (with high p robabilit y) that either (1) or (2) holds. W e b egin b y formalizing the notion of a co v er discussed ab o ve: Deﬁnition 6 (W e ight-Co ver) Given a distribution D on [ N ] and a p ar ameter ǫ 1 > 0 , we say that a p oint i ∈ [ N ] is ǫ 1 -co ve red by a set R = { r 1 , . . . , r t } ⊆ [ N ] if ther e exists a p oint r j ∈ R such that D ( i ) ∈ [1 / (1 + ǫ 1 ) , 1 + ǫ 1 ] D ( r j ) . L et the set of p oints in [ N ] that ar e ǫ 1 -c over e d by R b e denote d by U D ǫ 1 ( R ) . We say that R is an ( ǫ 1 , ǫ 2 )-co v er for D if D ([ N ] \ U D ǫ 1 ( R )) ≤ ǫ 2 . F or a singleton set R = { r } we sligh tly abuse notation and w rite U D ǫ ( r ) to den ote U D ǫ ( R ); note that this aligns with the notation established in ( 11 ). The follo wing lemma sa y s th at a small s amp le of p oints dr a wn from D give s a co v er with high probabilit y: 51 Lemma 24 L et D b e any distribution over [ N ] . Given any ﬁxe d c > 0 , ther e exists a c onstant c ′ > 0 such that with pr ob ability at le ast 99 / 100 , a sample R of size m = c ′ log( N/ǫ ) ǫ 2 · log  log( N/ǫ ) ǫ  dr awn ac c or ding to distribution D is an ( ǫ/c, ǫ/c ) -c over for D . Pro of: L et t den ote ⌈ ln(2 cN /ǫ ) · c ǫ ⌉ . W e deﬁne t “buc k ets” of p oin ts with similar weigh t under D as follo ws: for i = 0 , 1 , . . . , t − 1, deﬁne B i ⊆ [ N ] to b e B i def =  x ∈ [ N ] : 1 (1 + ǫ/c ) i +1 < D ( x ) ≤ 1 (1 + ǫ/c ) i  . Let L b e the set of p oin ts x which are n ot in any of B 0 , . . . , B t − 1 (b ecause D ( x ) is to o s m all); sin ce ev ery p oin t in L has D ( x ) < ǫ 2 cN , one can see that D ( L ) ≤ ǫ 2 c . It is easy to see that if the sample R con tains a p oin t from a buck et B j then ev ery p oint y ∈ B j is ǫ c -co ve red by R . W e say that buc k et B i is insigniﬁc ant if D ( B i ) ≤ ǫ 2 ct ; otherw ise buck et B i is signiﬁc ant . It is clear that the total weigh t u nder D of all insigniﬁ cant bu c kets is at m ost ǫ / 2 c . Thus if we can show that f or the claimed sample size, with prob ab ility at least 99 / 100 ev ery signiﬁcant buc k et has at least one of its p oints in R , we will hav e established the lemma. This is a simple probabilistic calculatio n: ﬁx any signiﬁcan t b u c ket B j . The p robabilit y that m random dr aws from D all miss B j is at most (1 − ǫ 2 ct ) m , w hic h is at most 1 100 t for a suitable (absolute constan t) choic e of c ′ . Thus a u nion b oun d o ver all (at most t ) signiﬁcan t bu c kets giv es that with probability at least 99 / 100 , n o signiﬁcant buck et is missed by R . Lemma 25 Supp ose d TV ( D 1 , D 2 ) ≥ ǫ , and let R = { r 1 , . . . , r t } b e an (˜ ǫ, ˜ ǫ ) - c over for D 1 wher e ˜ ǫ ≤ ǫ/ 100 . Then, ther e e xists j ∈ [ t ] suc h that at le ast one of the fol lowing c onditions holds f or every α ∈ [ ˜ ǫ, 2˜ ǫ ] : 1. D 1 ( U D 1 α ( r j )) ≥ ˜ ǫ t and D 2 ( U D 1 α ( r j )) / ∈ [1 − ˜ ǫ, 1 + ˜ ǫ ] D 1 ( U D 1 α ( r j )) , or D 1 ( U D 1 α ( r j )) < ˜ ǫ t and D 2 ( U D 1 α ( r j )) > 2˜ ǫ t ; 2. D 1 ( U D 1 α ( r j )) ≥ ˜ ǫ t , and at le ast a ˜ ǫ -fr action of the p oints i in U D 1 α ( r j ) satisfy D 2 ( i ) D 2 ( r j ) / ∈ [1 / (1 + α + ˜ ǫ ) , 1 + α + ˜ ǫ ] ; 3. D 1 ( U D 1 α ( r j )) ≥ ˜ ǫ t , and the total weight ac c or ding to D 2 of the p oints i in U D 1 α ( r j ) for which D 2 ( i ) D 2 ( r j ) / ∈ [1 / (1 + α + ˜ ǫ ) , 1 + α + ˜ ǫ ] is at le ast ˜ ǫ 2 t ; Pro of: Without loss of generalit y , w e can assume that ǫ ≤ 1 / 4. Su p p ose, contrary to the claim, that for eac h r j there exists α j ∈ [ ˜ ǫ, 2˜ ǫ ] such that if we let U j def = U D 1 α j ( r j ), then the follo win g holds: 1. If D 1 ( U j ) < ˜ ǫ t , then D 2 ( U j ) ≤ 2˜ ǫ t ; 2. If D 1 ( U j ) ≥ ˜ ǫ t , then: (a) D 2 ( U j ) ∈ [1 − ˜ ǫ, 1 + ˜ ǫ ] D 1 ( U j ); (b) Less than an ˜ ǫ -fraction of the p oin ts y in U j satisfy D 2 ( y ) D 2 ( r j ) / ∈ [1 / (1 + α j + ˜ ǫ ) , 1 + α j + ˜ ǫ ]; 52 (c) The total wei gh t according to D 2 of the p oin ts y in U j for whic h D 2 ( y ) D 2 ( r j ) / ∈ [1 / (1 + α j + ˜ ǫ ) , 1 + α j + ˜ ǫ ] is at most ˜ ǫ 2 t ; W e sh ow th at in suc h a case d TV ( D 1 , D 2 ) < ǫ , cont rary to th e premise of the claim. Consider eac h p oin t r j ∈ R su c h that D 1 ( U j ) ≥ ˜ ǫ t . By the foregoing discus s ion (p oint 2(a)), D 2 ( U j ) ∈ [1 − ˜ ǫ, 1 + ˜ ǫ ] D 1 ( U j ). By the deﬁnition of U j (and since α j ≤ 2 ˜ ǫ ), D 1 ( r j ) ∈ [1 / (1 + 2˜ ǫ ) , 1 + 2˜ ǫ ] D 1 ( U j ) | U j | . (55 ) T urning to b ound D 2 ( r j ), on one h and (by 2(b)) D 2 ( U j ) = X y ∈ U j D 2 ( y ) ≥ ˜ ǫ | U j | · 0 + (1 − ˜ ǫ ) | U j | · D 2 ( r j ) 1 + 3˜ ǫ , (56) and so D 2 ( r j ) ≤ (1 + 3˜ ǫ ) D 2 ( U j ) (1 − ˜ ǫ ) | U j | ≤ (1 + 6˜ ǫ ) D 1 ( U j ) | U j | . (57) On the other hand (by 2(c)), D 2 ( U j ) = X y ∈ U j D 2 ( y ) ≤ ˜ ǫ 2 t + | U j | · (1 + 3˜ ǫ ) D 2 ( r j ) , (58) and so D 2 ( r j ) ≥ D 2 ( U j ) − ˜ ǫ 2 /t (1 + 3˜ ǫ ) | U j | ≥ (1 − ˜ ǫ ) D 1 ( U j ) − ˜ ǫD 1 ( U j ) (1 + 3˜ ǫ ) | U j | ≥ (1 − 5˜ ǫ ) D 1 ( U j ) | U j | . (59) Therefore, for eac h suc h r j w e hav e D 2 ( r j ) ∈ [1 − 8˜ ǫ, 1 + 10˜ ǫ ] D 1 ( r j ) . ( 60) Let C def = S t j =1 U j . W e n ext partition the p oin ts in C so that eac h p oint i ∈ C is assigned to some r j ( i ) suc h that i ∈ U j ( i ) . W e deﬁn e the follo wing “bad” subsets of p oints in [ N ]: 1. B 1 def = [ N ] \ C , so that D 1 ( B 1 ) ≤ ˜ ǫ (w e later b ou n d D 2 ( B 1 )); 2. B 2 def =  i ∈ C : D 1 ( U j ( i ) ) < ˜ ǫ/t  , so that D 1 ( B 2 ) ≤ ˜ ǫ and D 2 ( B 2 ) ≤ 2˜ ǫ ; 3. B 3 def =  i ∈ C \ B 2 : D 2 ( i ) / ∈ [1 / (1 + 3˜ ǫ ) , 1 + 3˜ ǫ ] D 2 ( r j ( i ) )  , so that D 1 ( B 3 ) ≤ 2˜ ǫ and D 2 ( B 3 ) ≤ ˜ ǫ 2 . Let B def = B 1 ∪ B 2 ∪ B 3 . Observe that for eac h i ∈ [ N ] \ B we hav e that D 2 ( i ) ∈ [1 / (1 + 3˜ ǫ ) , 1 + 3˜ ǫ ] D 2 ( r j ( i ) ) ⊂ [1 − 15 ˜ ǫ, 1 + 15˜ ǫ ] D 1 ( r j ( i ) ) ⊂ [1 − 23˜ ǫ, 1 + 23˜ ǫ ] D 1 ( i ) , (61) 53 where the ﬁ rst con tainment follo ws fr om the fact that i / ∈ B , th e second follo ws f rom Equation ( 60 ), and the third from th e fact that i ∈ U j ( i ) . In order to complete the pro of we need a b oun d on D 2 ( B 1 ), wh ic h we obtain next. D 2 ( B 1 ) = 1 − D 2 ([ N ] \ B 1 ) ≤ 1 − D 2 ([ N ] \ B ) ≤ 1 − (1 − 23˜ ǫ ) D 1 ([ N ] \ B ) ≤ 1 − (1 − 23˜ ǫ )(1 − 4˜ ǫ ) ≤ 27˜ ǫ . (62) Therefore, d TV ( D 1 , D 2 ) = 1 2 N X i =1 | D 1 ( i ) − D 2 ( i ) | ≤ 1 2  D 1 ( B ) + D 2 ( B ) + X i / ∈ B 23˜ ǫD 1 ( i )  < ǫ , (63) and we ha v e reac hed a con trad iction. Theorem 11 If D 1 = D 2 then with pr ob ability at le ast 2 / 3 Algor ithm PCOND -Test-Equality- Unknown r eturns ACCEPT , and if d TV ( D 1 , D 2 ) ≥ ǫ , then with pr ob ability at le ast 2 / 3 Algorithm PCOND -Test-Equality-Unknown r eturns REJECT . The numb er of PCOND qu eries p erforme d by the algorithm is ˜ O  log 6 N ǫ 21  . Pro of: T he num b er of queries p erformed by the algorithm is the sum of: (1) t times the n um b er of queries p erformed in eac h execution of Estima te-Neighborh ood (in Line 3-a ) and (2) t · ( s 1 + s 2 ) = O ( t · s 2 ) times the num b er of queries p erformed in eac h execution of Comp a re (in Line 3-e ). By Lemma 3 (and the settings of the p arameters in the calls to Estima te-Ne ighborhood ), the ﬁrst term is O  t · log(1 /δ ) · log (log(1 /δ ) / ( β η 2 )) κ 2 η 4 β 3 δ 2  = ˜ O  log 6 N ǫ 19  , and by Lemma 2 (and the settings of th e parameters in the calls to Comp are ), th e second term is O  t · s 2 · log( t · s 2 ) θ 2  = ˜ O  log 6 N ǫ 21  , so that w e get the b ound stated in the theorem. W e no w turn to establishing the correctness of th e algorithm. W e shall use the shorth and U j for U D 1 α j ( r j ), and U ′ j for U D 1 α j + θ ( r j ). W e consider the follo wing “desirable” ev en ts. 1. The ev en t E 1 is that the sample R is a (˜ ǫ, ˜ ǫ )-w eight- co ver for D 1 (for ˜ ǫ = ǫ / 100) . By Lemma 24 (and an appropriate constant in the Θ( · ) n otation for the size of R ), the p robabilit y th at E 1 holds is at least 99 / 100. 2. The ev en t E 2 is that all calls to the pr o cedure Estima te -Neighborhood are as sp eciﬁed b y Lemma 3 . By the setting of the conﬁ dence parameter in the calls to the pr o cedure, th e ev ent E 2 holds with probability at least 99 / 100. 3. The ev ent E 3 is that all calls to the pro cedure Comp are are as sp eciﬁed by Lemma 2 . By the setting of the conﬁdence p arameter in the calls to the pro cedu re, the ev en t E 3 holds with probabilit y at least 99 / 100 . 4. The ev en t E 4 is that D 2 ( U ′ j \ U j ) ≤ η β / 16 = ˜ ǫ 2 / (256 t ) for eac h j . I f D 2 = D 1 then this even t follo ws from E 2 . Oth erwise, it holds with probability at least 99 / 100 by the setting of θ and the choice of α j (as shown in the pro of of Lemma 3 in the analysis of the ev en t E 1 there. 54 Algorithm 9: Algorithm PCOND D 1 ,D 2 -Test-Equality-Unknown Input : PCOND query access to distr ib utions D 1 and D 2 and a p arameter ǫ . 1. Set ˜ ǫ = ǫ/ 100. 2. Dra w a sample R of size t = ˜ Θ  log N ǫ 2  from D 1 . 3. F or eac h r j ∈ R : (a) Call Estima t e-Neighborhood D 1 on r j with κ = ˜ ǫ , η = ˜ ǫ 8 , β = ˜ ǫ 2 t , δ = 1 100 t and let the outp u t b e d enoted by ( ˆ w (1) j , α j ). (b) Set θ = κηβ δ / 64 = ˜ Θ( ǫ 7 / log 2 N ). (c) Dra w a sample S 1 from D 1 , of size s 1 = Θ  t ǫ 2  = ˜ Θ  log N ǫ 4  . (d) Dra w a s ample S 2 from D 2 , of size s 2 = Θ  t log t ǫ 3  = ˜ Θ  log N ǫ 5  . (e) F or eac h p oint i ∈ S 1 ∪ S 2 call Comp are D 1 ( { r j } , { i } , θ / 4 , 4 , 1 / (200 t ( s 1 + s 2 ))) and Comp ar e D 2 ( { r j } , { i } , θ / 4 , 4 , 1 / (200 t ( s 1 + s 2 ))), and let the outp uts b e denoted ρ (1) r j ( i ) and ρ (2) r j ( i ), resp ectiv ely (where in particular these outputs ma y b e High or Lo w ). (f ) Let ˆ w (2) j b e th e fr action of o ccurr ences of i ∈ S 2 suc h that ρ (1) r j ( i ) ∈ [1 / (1 + α j + θ / 2) , 1 + α j + θ / 2]. (g) If ( ˆ w (1) j ≤ 3 4 ˜ ǫ t and ˆ w (2) j > 3 2 ˜ ǫ t ) or ( ˆ w (1) j > 3 4 ˜ ǫ t and ˆ w (2) j / ˆ w (1) j / ∈ [1 − ˜ ǫ/ 2 , 1 + ˜ ǫ/ 2] ), then output REJECT . (h) If there exists i ∈ S 1 ∪ S 2 suc h that ρ (1) r j ( i ) ∈ [1 / ( α j + ˜ ǫ/ 2) , 1 + α j + ˜ ǫ/ 2] and ρ (2) r j ( i ) / ∈ [1 / ( α j + 3˜ ǫ/ 2) , 1 + α j + 3˜ ǫ/ 2], then outp u t REJECT . 4. Output ACCEPT . 5. The ev en t E 5 is deﬁn ed as follo ws. F or eac h j , if D 2 ( U j ) ≥ ˜ ǫ/ (4 t ), then | S 2 ∩ U j | / | S 2 | ∈ [1 − ˜ ǫ / 10 , 1 + ˜ ǫ/ 10] D 2 ( U j ), and if D 2 ( U j ) < ˜ ǫ/ (4 t ) then | S 2 ∩ U j | / | S 2 | < (1 + ˜ ǫ / 10)˜ ǫ/ (4 t ). This ev ent holds with probabilit y at least 99 / 100 b y applying a m u ltiplicativ e Ch ernoﬀ b ound in the ﬁ rst case, an d Corollary 2 in th e second. 6. The ev en t E 6 is that for eac h j we ha v e | S 2 ∩ ( U ′ j \ U j ) | / | S 2 | ≤ ˜ ǫ 2 / (128 t ). Cond itioned on E 4 , the even t E 6 holds with probability at least 99 / 100 by applyin g Corollary 2 . F rom this p oin t on we assu me that even ts E 1 − E 6 all hold. Note that in particular this implies the follo wing: 1. By E 2 , for ev ery j : • If D 1 ( U j ) ≥ β = ˜ ǫ/ (2 t ), th en ˆ w (1) j ∈ [1 − η, 1 + η ] D 1 ( U j ) = [1 − ˜ ǫ/ 8 , 1 + ˜ ǫ/ 8] D 1 ( U j ). • If D 1 ( U j ) < ˜ ǫ/ (2 t ), then ˆ w (1) j ≤ (1 + ˜ ǫ/ 8)(˜ ǫ/ (2 t )). 55 2. By E 3 , for ev ery j and for eac h p oint i ∈ S 1 ∪ S 2 : • If i ∈ U j , then ρ (1) r j ( i ) ∈ [1 / (1 + α j + θ 2 ) , 1 + α j + θ 2 ]. • If i / ∈ U ′ j , then ρ (1) r j ( i ) / ∈ [1 / (1 + α j + θ 2 ) , 1 + α j + θ 2 ]. 3. By the previous item and E 4 – E 6 : • If D 2 ( U j ) ≥ ˜ ǫ/ (4 t ), then ˆ w (2) j ≥ (1 − ˜ ǫ/ 10) D 2 ( U j ) and ˆ w (2) j ≤ (1 + ˜ ǫ/ 10) D 2 ( U j ) + ˜ ǫ 2 / (128 t ) ≤ (1 + ˜ ǫ/ 8) D 2 ( U j ). • If D 2 ( U j ) < ˜ ǫ/ (4 t ) then ˆ w (2) j ≤ (1 + ˜ ǫ/ 10)˜ ǫ/ (4 t ) + ˜ ǫ 2 / (128 t ) ≤ (1 + ˜ ǫ/ 4)(˜ ǫ / (4 t )). Completeness. Assume D 1 and D 2 are the same distrib ution D . F or eac h j , if D ( U j ) ≥ ˜ ǫ/t , then by the f oregoing discussion, ˆ w (1) j ≥ (1 − ˜ ǫ/ 8 ) D ( U j ) > 3˜ ǫ/ (4 t ) and ˆ w (2) j / ˆ w (1) j ∈ [(1 − ˜ ǫ/ 8 ) 2 , (1 + ˜ ǫ/ 8) 2 ] ⊂ [1 − ˜ ǫ/ 2 , 1 + ˜ ǫ/ 2], so that the algorithm do es not r eject in Line 3-g . Otherwise (i.e., D ( U j ) < ˜ ǫ /t ), w e consid er tw o sub cases. Either D ( U j ) ≤ ˜ ǫ/ (2 t ), in whic h case ˆ w (1) j ≤ 3˜ ǫ/ (4 t ), or ˜ ǫ/ (2 t ) < D ( U j ) < ˜ ǫ/t , and then ˆ w (1) j ∈ [1 − ˜ ǫ/ 8 , 1 + ˜ ǫ/ 8] D 1 ( U j ). Sin ce in b oth cases ˆ w (2) j ≤ (1 + ˜ ǫ/ 8) D ( U j ) ≤ 3˜ ǫ/ (2 t ), th e algorithm do es not reject in Line 3-g . By E 3 , the algorithm d o es not reject in Line 3-h either. W e next turn to establish soun dness. Soundness. Assume d TV ( D 1 , D 2 ) ≥ ǫ . By app lying Lemma 25 on R (and u sing E 1 ), there exists an index j for whic h one of th e items in the lemma holds. W e denote this index b y j ∗ , and consider the thr ee items in the lemma. 1. If Item 1 holds, then we consider its tw o cases: (a) In the ﬁr st case, D 1 ( U j ∗ ) ≥ ˜ ǫ/t and D 2 ( U j ∗ ) / ∈ [1 − ˜ ǫ, 1 + ˜ ǫ ] D 1 ( U j ∗ ). Due to the lo w er b ound on D 1 ( U j ∗ ) we h av e that ˆ w (1) j ∗ ∈ [1 − ˜ ǫ/ 8 , 1 + ˜ ǫ/ 8] D 1 ( U j ∗ ), so that in p articular ˆ w (1) j ∗ > 3 ˜ ǫ/ (4 t ). As for ˆ w (2) j ∗ , either ˆ w (2) j ∗ < (1 − ˜ ǫ )(1 + ˜ ǫ/ 8) D 1 ( U j ∗ ) (this h olds b oth when D 2 ( U j ∗ ) ≥ ˜ ǫ/ (4 t ) and when D 2 ( U j ∗ ) < ˜ ǫ/ (4 t )) or ˆ w (2) j ∗ > (1 + ˜ ǫ )(1 − ˜ ǫ/ 10) D 1 ( U j ∗ ). In either (sub)case ˆ w (2) j ∗ / ˆ w (1) j ∗ / ∈ [1 − ˜ ǫ/ 2 , 1 + ˜ ǫ / 2], causing the algorithm to r eject in (the second p art of ) Line 3-g . (b) In the second case, D 1 ( U j ∗ ) < ˜ ǫ/t and D 2 ( U j ∗ ) > 2˜ ǫ/t . Due to the lo w er b ound on D 2 ( U j ∗ ) we ha ve that ˆ w (2) j ∗ ≥ (1 − ˜ ǫ/ 10) D 2 ( U j ∗ ) > (1 − ˜ ǫ/ 10)(2˜ ǫ/t ), so that in particular ˆ w (2) j ∗ > (3˜ ǫ/ (2 t )). As for ˆ w (1) j ∗ , if D 1 ( U j ∗ ) ≤ ˜ ǫ / (2 t ), then ˆ w (1) j ∗ ≤ 3˜ ǫ/ (4 t ), causing the algorithm to reject in (the ﬁrst part of ) Line 3-g . If ˜ ǫ/ (2 t ) < D 1 ( U j ∗ ) ≤ ˜ ǫ/t , then ˆ w (1) j ∗ ∈ [1 − ˜ ǫ/ 8 , 1 + ˜ ǫ/ 8] D 1 ( U j ∗ ) ≤ (1 + ˜ ǫ/ 8)(˜ ǫ/t ), so that ˆ w (2) j ∗ / ˆ w (1) j ∗ ≥ (1 − ˜ ǫ/ 10)(2˜ ǫ /t ) (1+˜ ǫ/ 8)˜ ǫ /t > (1 + ˜ ǫ/ 2), causing the algorithm to reject in (either th e ﬁrst or second part of ) Line 3-g . 2. If Item 2 holds, th en, by the c h oice of the size of S 1 , w h ic h is Θ( t/ ˜ ǫ 2 ), and since all p oin ts in U j ∗ ha ve appro ximately the same w eigh t according to D 1 , with probability at least 99 / 100 , the sample S 1 will con tain a p oin t i for wh ic h D 2 ( i ) D 2 ( r j ∗ ) / ∈ [1 / (1 + α j ∗ + ˜ ǫ ) , 1 + α j ∗ + ˜ ǫ ], an d b y E 3 this will b e detected in Line 3-h . 56 3. Similarly , if Item 3 holds, then b y the c h oice of the size of S 2 , with probabilit y at least 99 / 100 , the sample S 2 will con tain a p oin t i for wh ic h D 2 ( i ) D 2 ( r j ∗ ) / ∈ [1 / (1 + α j ∗ + ˜ ǫ ) , 1 + α j ∗ + ˜ ǫ ], an d b y E 3 this will b e detected in Line 3-h . The theorem is thus established. 6.2 An approac h based on simula ting EV AL In this s u bsection we presen t an alternate appr oac h f or testing whether t w o u nkno wn distr ib utions D 1 , D 2 are identica l ve rsus ǫ -far. W e pr o ve the f ollo wing theorem: Theorem 12 COND -Test-Equality-Unkn own is a ˜ O  (log N ) 5 ǫ 4  -query algorithm with the fol lowing pr op erties: given COND D 1 , COND D 2 or acles for any two distri- butions D 1 , D 2 over [ N ] , it outputs A CCEPT with pr ob ability at le ast 2 / 3 if D 1 = D 2 and outputs REJECT with pr ob ability at le ast 2 / 3 if d TV ( D 1 , D 2 ) ≥ ǫ. A t the heart of this r esult is ou r eﬃcien t simulati on of an “appro ximate EV AL D oracle” usin g a COND D oracle. (Recall that an EV AL D oracle is an oracle wh ic h, giv en as inp ut an elemen t i ∈ [ N ], outputs the numerical v alue D ( i ) . ) W e feel that this eﬃcient s imulation of an approximat e EV AL oracle using a COND oracle is of indep enden t interest s in ce it sheds ligh t on the r elativ e p ow er of the COND and EV AL mo dels. In more detail, the starting p oint of our appr oac h to prov e Theorem 12 is a simp le algorithm from [ RS09 ] that uses an EV AL D oracle to test equality b et we en D and a kno wn distribution D ∗ . W e ﬁrs t sho w (see Theorem 13 ) that a modiﬁ ed ve rsion of the alg orithm, which uses a SAMP oracle and an “approximat e” EV AL oracle, can b e used to eﬃcien tly test equalit y b et w een t w o u nkno wn distributions D 1 and D 2 . As w e sho w (in Sub section 3.3.2 ) the required “appro x im ate” EV AL oracle can b e eﬃcien tly implemented using a COND oracle, and so Theorem 12 follo ws straightforw ardly b y com bining Theorems 13 and 4 . 6.2.1 T est ing equa lity b et w een D 1 and D 2 using an appro ximate EV AL oracle. W e now sho w how an app ro x im ate EV AL D 1 oracle, an appr o ximate EV AL D 2 oracle, and a SA MP D 1 oracle can b e us ed toge ther to test whether D 1 = D 2 v er s us d TV ( D 1 , D 2 ) ≥ ǫ. As men tioned earlier, the app roac h is a simple extension of the EV AL algorithm giv en in Ob serv ation 24 of [ RS09 ]. Theorem 13 L et ORACLE 1 b e an ( ǫ/ 100 , ǫ/ 100) -app r oximate EV AL D 1 simulator and let ORACLE 2 b e a n ( ǫ/ 10 0 , ǫ/ 100) -appr oximate EV AL D 2 simulator. Ther e is an algorith m Test-Equality- Unknown with the fol lowing pr op erties: for any distributions D 1 , D 2 over [ N ] , algorithm Test - Equality-Unknown makes O (1 /ǫ ) qu e ries to ORA CLE 1 , ORACLE 2 , SAMP D 1 , SAMP D 2 , and it outputs ACCE PT with pr ob ability at le ast 7 / 10 if D 1 = D 2 and outputs REJECT with pr ob ability at le ast 7 / 10 if d TV ( D 1 , D 2 ) ≥ ǫ. 57 Algorithm 10: Test-Equality-Unknown Input: query access to O RA CLE 1 , to OR A CLE 2 , and access to SAM P D 1 , SAM P D 2 oracles 1: Call the SAMP D 1 oracle m = 5 /ǫ times to obtain p oints h 1 , . . . , h m distributed according to D 1 . 2: Call the SAMP D 2 oracle m = 5 /ǫ times to obtain p oints h m +1 , . . . , h 2 m distributed according to D 2 . 3: for j = 1 to 2 m do 4: Call ORACLE 1 ( h j ). If it r eturns UNKNO WN then output REJECT , otherwise let v 1 ,i ∈ [0 , 1] b e the v alue it outpu ts. 5: Call ORACLE 2 ( h j ). If it r eturns UNKNO WN then output REJECT , otherwise let v 2 ,i ∈ [0 , 1] b e the v alue it outpu ts. 6: if v 1 ,j / ∈ [1 − ǫ/ 8 , 1 + ǫ/ 8] v 2 ,j then 7: output REJECT and exit 8: end if 9: end for 10: output ACCEPT It is clear that Test-Equal ity-Unknown mak es O (1 /ǫ ) queries as cla imed. T o prov e Theorem 13 w e argue completeness and soun dness b elo w. Completeness: Supp ose that D 1 = D 2 . Sin ce ORACLE 1 is an ( ǫ/ 100 , ǫ/ 100)-appro ximate EV AL D 1 sim ulator, the pr obabilit y that an y of th e 2 m = 10 /ǫ p oin ts h 1 , . . . , h 2 m dra wn in Lin es 1 and 2 lies in S ( ǫ/ 100 ,D 1 ) is at most 1 / 10 . Going forth, let us assume that all p oin ts h i indeed lie outside S ( ǫ/ 100 ,D 1 ) . Then for eac h execution of Line 4 we hav e th at with probabilit y at least 1 − ǫ/ 100 the call to ORACLE ( h i ) yields a v alue v 1 ,i satisfying v 1 ,i ∈ [1 − ǫ 100 , 1 + ǫ 100 ] D 1 ( i ). The same holds for eac h execution of Line 5. Since there are 20 /ǫ total executions of Lines 4 and 5, with o verall probabilit y at least 7/10 w e ha v e that eac h 1 ≤ j ≤ m has v 1 ,j , v 2 ,j ∈ [1 − ǫ 100 , 1 + ǫ 100 ] D 1 ( i ). If this is the case then v 1 ,j , v 2 ,j pass the c hec k in Line 6 , and thus th e algorithm outputs ACCEPT with o verall p robabilit y at least 7 / 10 . Soundness: No w supp ose that d TV ( D 1 , D 2 ) ≥ ǫ . L et us sa y th at i ∈ [ N ] is go o d if D 1 ( i ) ∈ [1 − ǫ/ 5 , 1 + ǫ/ 5] D 2 ( i ). Let BAD ⊆ [ N ] denote the set of all i ∈ [ N ] that are not go o d. W e h a ve 2 d TV ( D 1 , D 2 ) = X i is goo d | D 1 ( i ) − D 2 ( i ) | + X i is bad | D 1 ( i ) − D 2 ( i ) | ≥ 2 ǫ. Since X i is goo d | D 1 ( i ) − D 2 ( i ) | ≤ X i is goo d ǫ 5 | D 2 ( i ) | ≤ ǫ 5 , w e hav e X i is bad ( | D 1 ( i ) | + | D 2 ( i ) | ) ≥ X i is bad | D 1 ( i ) − D 2 ( i ) | ≥ 9 5 ǫ. Consequent ly it must b e the case that either D 1 (BAD) ≥ 9 10 ǫ or D 2 (BAD) ≥ 9 10 ǫ. F or the rest of the argumen t we supp ose that D 1 (BAD) ≥ 9 10 ǫ (b y the symmetry of the algorithm, an id entical 58 argumen t to the one w e give b elo w but with the roles of D 1 and D 2 ﬂipp ed throughout handles the other case). Since D 1 (BAD) ≥ 9 10 ǫ , a simple calculation sh o w s that with pr ob ab ility at least 98 / 100 at least one of the 5 /ǫ p oin ts h 1 , . . . , h m dra wn in Line 1 b elongs to BAD. F or the rest of the argument w e supp ose that indeed (at least) one of th ese p oin ts is in BAD; let h i ∗ b e s u c h a p oin t. No w consider the execution of Line 4 w hen OR A CLE 1 is called on h i ∗ . By Deﬁnition 3 , whether or n ot i ∗ b elongs to S ( ǫ/ 100 ,D 1 ) , with pr obabilit y at least 1 − ǫ/ 100 the call to ORACL E 1 either causes Test-Equality- Unknown to REJECT in Line 4 (b ecause OR A CLE 1 returns UNKNO WN ) or it returns a v alue v 1 ,i ∗ ∈ [1 − ǫ 100 , 1 + ǫ 100 ] D 1 ( i ∗ ). W e ma y supp ose that it returns a v alue v 1 ,i ∗ ∈ [1 − ǫ 100 , 1 + ǫ 100 ] D 1 ( i ∗ ). Similarly , in the execution of Lin e 5 w hen OR A CLE 2 is called on h i ∗ , wh ether or n ot i ∗ b elongs to S ( ǫ/ 100 ,D 2 ) , with pr obabilit y at least 1 − ǫ/ 100 the call to ORACL E 2 either causes Test-Equality- Unknown to reject in Line 5 or it r eturns a v alue v 2 ,i ∗ ∈ [1 − ǫ 100 , 1 + ǫ 100 ] D 2 ( i ∗ ) . W e m a y su pp ose that it returns a v alue v 2 ,i ∗ ∈ [1 − ǫ 100 , 1 + ǫ 100 ] D 2 ( i ∗ ). But recalling that i ∗ ∈ BAD, an easy calculatio n s h o w s that the v alues v 1 ,i ∗ and v 2 ,i ∗ m ust b e multiplicati v ely far enough from eac h other that the algorithm will outpu t REJECT in Line 7. Thus with o veral l p robabilit y at least 96 / 10 0 the algorithm outputs R EJECT . 7 An algorithm for estimating the d istance to uniformit y In this section w e d escrib e an algorithm that estimates the distance b et we en a distribu tion D and the u niform distribu tion U by p erforming p oly (1 /ǫ ) PCOND (and SA MP ) queries. W e start by giving a high level description of the algorithm. By the deﬁnition of the v ariation d istance (and the uniform distrib ution), d TV ( D , U ) = X i : D ( i ) < 1 / N  1 N − D ( i )  . (64) W e deﬁ n e th e follo wing fun ction o v er [ N ]: ψ D ( i ) = (1 − N · D ( i )) for D ( i ) < 1 N , and ψ D ( i ) = 0 for D ( i ) ≥ 1 N . (65) Observe that ψ D ( i ) ∈ [0 , 1] for ev ery i ∈ [ N ] an d d TV ( D , U ) = 1 N N X i =1 ψ D ( i ) . (66) Th us d TV ( D , U ) can b e viewe d as an a verage v alue of a fun ction wh ose range is in [0 , 1]. Since D is ﬁ xed throughout this subs ection, w e shall us e the shorth an d ψ ( i ) in stead of ψ D ( i ). Supp ose we w ere able to compu te ψ ( i ) exactly for an y i of our c h oice. Then we could obtain an estimate ˆ d of d TV ( D , U ) to within an additiv e error of ǫ/ 2 by simp ly s electing Θ(1 /ǫ 2 ) p oints in [ N ] un iformly at random and setting ˆ d to b e the a verag e v alue of ψ ( · ) on the s amp led p oints. By an additive Chernoﬀ b ound (for an appr opriate constan t in the Θ ( · ) notation), with high constan t pr ob ab ility the estimate ˆ d wo uld d eviate by at most ǫ/ 2 f r om d TV ( D , U ). 59 Supp ose next that instead of b eing able to compute ψ ( i ) exactly , we were able to compu te an estimate ˆ ψ ( i ) suc h th at | ˆ ψ ( i ) − ψ ( i ) | ≤ ǫ/ 2. By using ˆ ψ ( i ) instead of ψ ( i ) for eac h of the Θ(1 /ǫ 2 ) sampled p oints w e would incur an additional additiv e error of at most ǫ/ 2. Observ e ﬁrst that for i such that D ( i ) ≤ ǫ/ (2 N ) we hav e that ψ ( i ) ≥ 1 − ǫ/ 2, so the estimate ˆ ψ ( i ) = 1 meets our r equirement s. Similarly , for i suc h that D ( i ) ≥ 1 / N , an y estimate ˆ ψ ( i ) ∈ [0 , ǫ/ 2] can b e used. Finally , for i su c h that D ( i ) ∈ [ ǫ/ (2 N ) , 1 / N ], if w e can obtain an estimate b D ( i ) s uc h that b D ( i ) ∈ [1 − ǫ/ 2 , 1 + ǫ/ 2] D ( i ), then we can use ˆ ψ ( i ) = 1 − N · b D ( i ). In ord er to obtain such estimates ˆ ψ ( i ), w e sh all b e int erested in ﬁndin g a r efe r e nc e p oint x . Namely , we shall b e in terested in ﬁndin g a pair ( x, b D ( x )) suc h that b D ( x ) ∈ [1 − ǫ/c, 1 + ǫ/c ] D ( x ) for some su ﬃcien tly large constan t c , and suc h that D ( x ) = Ω( ǫ/ N ) and D ( x ) = O (1 / ( ǫN )). In Subsection 7.1 w e describ e a p r o cedure for ﬁ nding such a reference p oint. More precisely , the pro cedur e is r equ ired to ﬁ nd such a r eference p oin t (with high constant pr obabilit y) only under a certain condition on D . It is not hard to verify (and we sho w this sub sequent ly), that if this condition is not met, then d TV ( D , U ) is v ery close to 1. In order to state the lemma we introd u ce the follo wing notation. F or γ ∈ [0 , 1], let H D γ def =  i : D ( i ) ≥ 1 γ N  . (67) Lemma 26 Given an input p ar ameter κ ∈ (0 , 1 / 4] as wel l as SAM P and PCOND query ac c ess to a distribution D , the pr o c e dur e Find-Ref erence (Algorithm 12 ) either r eturns a p air ( x, b D ( x )) wher e x ∈ [ N ] and b D ( x ) ∈ [0 , 1] or r eturns No-P air . The pr o c e dur e satisﬁes the fol lowing: 1. If D ( H D κ ) ≤ 1 − κ , then with pr ob ability at le ast 9 / 10 , the pr o c e dur e r eturns a p air ( x, b D ( x )) such that b D ( x ) ∈ [1 − 2 κ, 1 + 3 κ ] D ( x ) and D ( x ) ∈  κ 8 , 4 κ  · 1 N . 2. If D ( H D κ ) > 1 − κ , then with pr ob ability at le ast 9 / 10 , the pr o c e dur e either r eturns No-P ai r or it r eturns a p air ( x, b D ( x )) such that b D ( x ) ∈ [1 − 2 κ, 1 + 3 κ ] D ( x ) and D ( x ) ∈  κ 8 , 4 κ  · 1 N . The pr o c e dur e p erforms ˜ O (1 /κ 20 ) PCOND and SA MP queries. Once we h a ve a r eferen ce p oin t x we can use it to obtain an estimate ˆ ψ ( i ) for any i of our c hoice, using the pro cedur e Comp are , whose p rop erties are stated in Lemma 2 (see S ubsection 3.1 ). Theorem 14 With pr ob ability at le ast 2 / 3 , the estimate ˆ d r eturne d by A lgorithm 11 satisﬁes: ˆ d = d TV ( D , U ) ± O ( ǫ ) . The numb e r of queries p erforme d by the algorithm i s ˜ O (1 /ǫ 20 ) . Pro of: In wh at follo ws we shall u se the shorthand H γ instead of H D γ . Let E 0 denote the ev en t that the pro cedur e Find-Referen ce (Algorithm 12 ) ob eys the requ iremen ts in Lemma 26 , where b y Lemma 26 the even t E 0 holds with probab ility at least 9 / 10. C onditioned on E 0 , the algorithm outputs ˆ d = 1 right after calling the pro cedure (b ecause th e pro cedu r e return s No-P air ) only when D ( H κ ) > 1 − κ = 1 − ǫ/ 8. W e claim that in this case d TV ( D , U ) ≥ 1 − 2 ǫ/ 8 = 1 − ǫ/ 4. T o v er if y this, obs erv e that d TV ( D , U ) = X i : D ( i ) > 1 / N  D ( i ) − 1 N  ≥ X i ∈ H κ  D ( i ) − 1 N  = D ( H κ ) − | H κ | N ≥ D ( H κ ) − κ . (68) 60 Algorithm 11: Estimating the Distance to Uniform it y Input : PCOND and S AMP query access to a distribution D and a parameter ǫ ∈ [0 , 1]. 1. Call the pro cedur e F ind-Referen ce (Algorithm 12 ) with κ set to ǫ/ 8. If it retur n s No-P air , then output ˆ d = 1 as the estimate for the distance to uniformity . Oth erwise, let ( x, b D ( x )) b e its outp ut. 2. Select a sample S of Θ(1 /ǫ 2 ) p oin ts uniformly . 3. Let K = max n 2 / N b D ( x ) , b D ( x ) ǫ/ (4 N ) o . 4. F or eac h p oin t y ∈ S : (a) Call Comp a re  { x } , { y } , κ, K, 1 10 | S |  . (b) If Comp a re retur ns High or it returns a v alue ρ ( y ) su c h that ρ ( y ) · b D ( x ) ≥ 1 / N , then set ˆ ψ ( y ) = 0; (c) Else, if Comp a re returns Low or it returns a v alue ρ ( y ) such that ρ ( y ) · b D ( x ) ≤ ǫ/ 4 N , then set ˆ ψ ( y ) = 1; (d) Else set ˆ ψ ( y ) = 1 − N · ρ ( y ) · b D ( x ). 5. Output ˆ d = 1 | S | P y ∈ S ˆ ψ ( y ). Th us, in this case the estimate ˆ d is as r equired. W e turn to the case in which the pro cedure F ind-Referen ce return s a pair ( x, b D ( x )) s u c h that b D ( x ) ∈ [1 − 2 κ, 1 + 3 κ ] D ( x ) and D ( x ) ∈  κ 8 , 4 κ  · 1 N . W e start by deﬁn ing tw o more “desirable” eve nt s, w hic h hold (simultaneo usly) with high con- stan t probability , and then show that conditioned on these even ts holding (as well as E 0 ), the output of the algorithm is as required. Let E 1 b e th e ev en t that the sample S s atisﬁes       1 | S | X y ∈ S ψ ( y ) − d TV ( D , U )       ≤ ǫ/ 2 . (69) By an additiv e Ch ernoﬀ b ound, the eve nt E 1 holds with probability at least 9 / 10. Next, let E 2 b e th e ev ent that all calls to the pro cedure Comp are r eturn answers as sp eciﬁed in Lemma 2 . Since Comp are is called | S | times, and for eac h call the p robabilit y th at it do es not return an answ er as sp eciﬁed in the lemma is at m ost 1 / (10 | S | ), by the u nion b ound the p robabilit y that E 2 holds is at least 9 / 10. F rom this p oint on assume ev en ts E 0 , E 1 and E 2 all o ccur , whic h h olds w ith p robabilit y at least 1 − 3 / 10 ≥ 2 / 3. Since E 2 holds, we get the follo win g. 61 1. When Comp are r etur ns High for y ∈ S (so that ˆ ψ ( y ) is set to 0) w e h a ve that D ( y ) > K · D ( x ) ≥ 2 / N b D ( x ) · D ( x ) > 1 N , (70) implying that ˆ ψ ( y ) = ψ ( y ). 2. When Comp are r etur ns Lo w for y ∈ S (so that ˆ ψ ( y ) is set to 1) w e h a ve that D ( y ) < D ( x ) K ≤ D ( x ) b D ( x ) / ( ǫ/ 4 N ) ≤ ǫ 2 N , (71) implying that ˆ ψ ( y ) ≤ ψ ( y ) + ǫ/ 2 (and clearly ψ ( y ) ≤ ˆ ψ ( y )). 3. When Comp are returns a v alue ρ ( y ) it holds that ρ ( y ) ∈ [1 − κ, 1 + κ ]( D ( y ) /D ( x )), so that ρ ( y ) · b D ( x ) ∈ [(1 − κ )(1 − 2 κ ) , (1 + κ )(1 + 3 κ )] D ( y ). Since κ = ǫ/ 8, if ρ ( y ) · b D ( x ) ≥ 1 / N (so that ˆ ψ ( y ) is set to 0), then ψ ( y ) < ǫ/ 2, if ρ ( y ) · b D ( x ) ≤ ǫ/ 4 N (so th at ˆ ψ ( y ) is set to 1), then ψ ( y ) ≥ 1 − ǫ/ 2, and otherw ise | ˆ ψ ( y ) − ψ ( y ) | ≤ ǫ/ 2. It follo w s that ˆ d = 1 | S | X y ∈ S ˆ ψ ( y ) ∈   1 | S | X y ∈ S ψ ( y ) − ǫ/ 2 , 1 | S | X y ∈ S ψ ( y ) + ǫ/ 2   ⊆ [ d TV ( D , U ) − ǫ, d TV ( D , U ) + ǫ ] (72) as requ ired. The num b er of queries p erformed b y the algorithm is the num b er of queries p erformed by the pr o cedure Find-Re ference , wh ic h is ˜ O (1 /ǫ 20 ), p lus Θ(1 /ǫ 2 ) times the n umber of queries p erformed in eac h call to Comp ar e . The pro cedure Comp are is called with the parameter K , whic h is b ounded by O (1 /ǫ 2 ), the parameter η , whic h is Ω( ǫ ), and δ , whic h is Ω(1 /ǫ 2 ). By Lemma 2 , the num b er of qu eries p erformed in eac h call to Comp are is O (log (1 /ǫ ) /ǫ 4 ). The total num b er of queries p erformed is hence ˜ O (1 /ǫ 20 ). 7.1 Finding a reference p oint In this sub section w e pro v e Lemma 26 . W e start b y giving the high-lev el idea b ehind the pro cedu re. F or a p oin t x ∈ [ N ] and γ ∈ [0 , 1], let U D γ ( x ) b e as deﬁned in Equation ( 11 ). Since D is ﬁ xed throughout this sub section, we shall use the shorthand U γ ( x ) instead of U D γ ( x ). Recall that κ is a parameter give n to the pr o cedure. Assu me w e had a p oin t x f or whic h D ( U κ ( x )) ≥ κ d 1 and | U κ ( x ) | ≥ κ d 2 N f or some constan ts d 1 and d 2 (so th at n ecessarily D ( x ) = Ω( κ d 1 / N ) and D ( x ) = O (1 / ( κ d 2 N )). It is not hard to verify (and we sh o w this in detail su bsequently) , that if D ( H ) ≤ 1 − κ , then a sample of size Θ(1 / p oly( κ )) distribu ted according to D will con tain suc h a p oin t x with high constan t probability . No w su pp ose that we could obtain an estimate ˆ w of D ( U κ ( x )) su ch that ˆ w ∈ [1 − κ, 1 + κ ] D ( U κ ( x )) and an estimate ˆ u of | U κ ( x ) | su c h that ˆ u ∈ [1 − κ, 1 + κ ] | U κ ( x ) | . By the deﬁnition of U κ ( x ) w e hav e that ( ˆ w/ ˆ u ) ∈ [1 − O ( κ ) , 1 + O ( κ )] D ( x ). Obtaining go o d estimates of D ( U κ ( x )) and | U κ ( x ) | (for x suc h that b oth | U κ ( x ) | and D ( U κ ( x )) are suﬃcien tly large) migh t b e infeasible. This is due to the p ossib le existe nce of many p oin ts y for 62 whic h D ( y ) is v ery close to (1 + κ ) D ( x ) or D ( x ) / (1 + κ ) whic h deﬁne the b oundaries of th e set U κ ( x ). F or such p oints it is not p ossib le to eﬃcien tly distinguish b et wee n those among them that b elong to U κ ( x ) (so th at they are within the b ord er s of th e set) and those that do not b elong to U κ ( x ) (so that they are just outside the b orders of th e set). Ho w ev er, for our purp oses it suﬃces to estimate the weigh t and size of some set U α ( x ) su c h that α ≥ κ (so that U κ ( x ) ⊆ U α ( x )) and α is not muc h larger than κ (e.g., α ≤ 2 κ )). T o this end w e can apply Pro cedure Estima te-Ne ighborhood (see Subsection 3.2 ), wh ich (conditioned on D ( U κ ( x )) b eing ab ov e a certain threshold), returns a pair ( ˆ w ( x ) , α ) su c h that ˆ w ( x ) is a go o d estimate of D ( U α ( x )). F urthermore, α is s u c h that for α ′ sligh tly larger than α , the w eight of U α ′ ( x ) \ U α ( x ) is s mall, allo wing us to obtain also a go o d estimate ˆ µ ( x ) of | U α ( x ) | / N . Algorithm 12: Pro cedure Find-Re ference Input : PCOND and S AMP query access to a distribution D and a parameter κ ∈ (0 , 1 / 4] 1. Select a sample X of Θ(log(1 /κ ) /κ 2 ) p oin ts distribu ted according to D . 2. F or eac h x ∈ X do the follo wing: (a) Call Estima t e-Neighborhood w ith the p arameters κ as in the input to Find-Refere nce , β = κ 2 / (40 log (1 /κ )), η = κ , and δ = 1 / (4 0 | X | ). Let θ = κη β δ / 64 = Θ( κ 6 / log 2 (1 /κ )) (as in Find-Ref erence ). (b) If Estima t e-Neighborhood r eturns a pair ( ˆ w ( x ) , α ( x )) suc h that ˆ w ( x ) < κ 2 / 20 log (1 /κ ), th en go to Line 2 and contin ue with next x ∈ X . (c) Select a sample Y x of size Θ(log 2 (1 /κ ) / κ 5 ) d istr ibuted uniform ly . (d) F or eac h y ∈ Y x call Comp are ( { x } , { y } , θ / 4 , 4 , 1 / 40 | X || Y x | ), and let the outpu t b e denoted ρ x ( y ). (e) Let ˆ µ ( x ) b e th e fraction of o ccurr ences of y ∈ Y x suc h that ρ x ( y ) ∈ [1 / (1 + α + θ / 2) , 1 + α + θ / 2]. (f ) S et b D ( x ) = ˆ w ( x ) / ( ˆ µ ( x ) N ). 3. If for some p oin t x ∈ X we ha v e ˆ w ( x ) ≥ κ 2 / 20 log (1 /κ ), ˆ µ ( x ) ≥ κ 3 / 20 log (1 /κ ), and κ/ 4 N ≤ b D ( x ) ≤ 2 / ( κN ), th en return ( x, b D ( x )). Otherwise retur n No-P ai r . Pro of of Lemma 26 : W e ﬁrst introd uce the follo wing notation. L def = n i : D ( i ) < κ 2 N o , M def =  i : κ 2 N ≤ D ( i ) < 1 κN  . (73) Let H = H D κ where H D κ is as deﬁn ed in Equation ( 67 ). Observe that D ( L ) < κ/ 2, so that if D ( H ) ≤ 1 − κ , then D ( M ) ≥ κ/ 2. Consider f u rther partitioning the set M of “medium weig ht ” p oints in to bu c kets M 1 , . . . , M r where r = log 1+ κ (2 /κ 2 ) = Θ (log (1 /κ ) /κ ) and the b uc k et M j is deﬁned as follo w s : M j def = n i : (1 + κ ) j − 1 · κ 2 N ≤ D ( i ) < (1 + κ ) j · κ 2 N o . ( 74) W e consider the follo wing “desirable” ev en ts. 63 1. Let E 1 b e the ev en t th at conditioned on the existence of a buck et M j suc h that D ( M j ) ≥ κ/ 2 r = Ω( κ 2 / log (1 /κ )), there exists a p oint x ∗ ∈ X that b elongs to M j . By the setting of the size of the sample X , the (conditional) even t E 1 holds with p robabilit y at least 1 − 1 / 40. 2. Let E 2 b e the ev en t that all calls to Estima te-Neighbor hood retur n an output as sp eciﬁed b y Lemma 3 . By Lemma 3 , the setting of the conﬁdence p arameter δ in eac h call and a union b ound o ver all | X | calls, E 2 holds with probability at least 1 − 1 / 40. 3. Let E 3 b e th e ev en t that for eac h x ∈ X we ha v e the follo w ing. (a) If | U α ( x ) ( x ) | N ≥ κ 3 40 log (1 /κ ) , then | Y x ∩ U α ( x ) ( x ) | | Y x | ∈ [1 − η / 2 , 1 + η / 2] | U α ( x ) ( x ) | N ; If | U α ( x ) ( x ) | N < κ 3 40 log (1 /κ ) , then | Y x ∩ U α ( x ) ( x ) | | Y x | < κ 3 30 log(1 /κ ) ; (b) Let ∆ α ( x ) ,θ ( x ) def = U α ( x )+ θ ( x ) \ U α ( x ) ( x ) (wher e θ is as sp eciﬁed by the algorithm). If | ∆ α ( x ) ,θ ( x ) | N ≥ κ 4 240 log (1 /κ ) , then | Y x ∩ ∆ α ( x ) ,θ ( x ) | | Y x | ≤ 2 · | ∆ α ( x ) ,θ ( x ) | N ; If | ∆ α ( x ) ,θ ( x ) | N < κ 4 240 log (1 /κ ) , then | Y x ∩ ∆ α ( x ) ,θ ( x ) | | Y x | < κ 4 120 log (1 /κ ) . By the size of eac h set Y x and a u nion b ound o ver all x ∈ X , the ev en t E 3 holds with probabilit y at least 1 − 1 / 40. 4. Let E 4 b e the ev en t that all calls to Comp are return an output as sp eciﬁed by Lemma 2 . By Lemma 2 , the setting of th e conﬁd ence parameter δ in eac h call and a union b oun d o v er all (at most) | X | · | Y | calls, E 3 holds with probability at least 1 − 1 / 40. Assuming ev ents E 1 – E 4 all hold (whic h o ccurs with probabilit y at least 9 / 10) we h av e the follo wing. 1. By E 2 , for eac h x ∈ X su c h that ˆ w ( x ) ≥ κ 2 / 20 log (1 /κ ) (so that x ma y b e selected for the output of the p ro cedure) we hav e that D ( U α ( x ) ( x )) ≥ κ 2 / 40 log (1 /κ ). The ev en t E 2 also implies that for eac h x ∈ X w e ha v e that D (∆ α ( x ) ,θ ( x )) ≤ η β / 16 ≤ ( η / 16) · D ( U α ( x ) ( x )), so that | ∆ α ( x ) ,θ ( x ) | N ≤ η (1 + α ( x ))( 1 + α ( x ) + θ ) 16 · | U α ( x ) ( x ) | N ≤ η 6 · | U α ( x ) ( x ) | N . (75) 2. Consider any x ∈ X suc h that ˆ w ( x ) ≥ κ 2 / 20 log (1 /κ ). L et T x def = { y ∈ Y x : ρ x ( y ) ∈ [1 / (1 + α + θ / 2) , (1 + α + θ / 2] } , so that ˆ µ ( x ) = | T x | / | Y x | . By E 4 , for eac h y ∈ Y x ∩ U α ( x ) ( x ) w e ha v e that ρ x ( y ) ≤ (1 + α )(1 + θ / 4) ≤ (1 + α + θ / 2) and ρ x ( y ) ≥ (1 + α ) − 1 (1 − θ/ 4) ≥ (1 + α + θ / 2) − 1 , so that y ∈ T x . On the other h and, f or eac h y / ∈ Y x ∩ U α ( x )+ θ ( x ) w e hav e that ρ x ( y ) > (1 + α + θ )(1 − θ / 4) ≥ 1 + α + θ / 2 or ρ x ( y ) < (1 + α + θ ) − 1 (1 − θ / 4) < (1 + α + θ / 2) − 1 , so that y / ∈ T x . It follo w s that Y x ∩ U α ( x ) ( x ) ⊆ T x ⊆ Y x ∩ ( U α ( x ) ( x ) ∪ ∆ α ( x ) ,θ ( x )) . (76) By E 3 , when ˆ µ ( x ) = | T x | / | Y x | ≥ κ 3 / 20 log (1 /κ ), then necessarily ˆ µ ( x ) ∈ [1 − η , 1+ η ] | U α ( x ) ( x ) | / N . T o ve rify this consid er the follo win g cases. 64 (a) If | U α ( x ) ( x ) | N ≥ κ 3 40 log(1 /κ ) , then (b y the left-hand -side of E quation ( 76 ) and the d eﬁnition of E 3 ) we ge t that ˆ µ ( x ) ≥ (1 − η / 2) | U α ( x ) ( x ) | N , and (by the righ t-hand-side of Equation ( 76 ), Equation ( 75 ) and E 3 ) we get that ˆ µ ( x ) ≤ (1 + η / 2) | U α ( x ) ( x ) | N + 2( η/ 6) | U α ( x ) ( x ) | N < (1 + η ) | U α ( x ) ( x ) | N . (b) If | U α ( x ) ( x ) | N < κ 3 40 log(1 /κ ) , then (b y the right -hand-side of Eq u ation ( 76 ), Equ ation ( 75 ) and E 3 ) we get that ˆ µ ( x ) < κ 3 30 log(1 /κ ) + κ 4 120 log (1 /κ ) < κ 3 / 20 log (1 /κ ). 3. If D ( H ) ≤ 1 − κ , so that D ( M ) ≥ κ/ 2, then there exists at least one bu c ket M j suc h that D ( M j ) ≥ κ/ 2 r = Ω( κ 2 / log (1 /κ )). By E 1 , the samp le X conta ins a p oin t x ∗ ∈ M j . By the deﬁnition of the buc k ets, for this p oin t x ∗ w e ha v e that D ( U κ ( x ∗ )) ≥ κ/ 2 r ≥ κ 2 / (10 log (1 /κ )) and | U κ ( x ∗ ) | ≥ ( κ 2 / 2 r ) N ≥ κ 3 / (10 log (1 /κ )). By the ﬁrst t w o items ab ov e and the setting η = κ we ha v e that for eac h x su c h that ˆ w ( x ) ≥ κ 2 / 20 log (1 /κ ) and ˆ µ ( x ) ≥ κ 3 / 20 log (1 /κ ), b D ( x ) ∈  1 − κ 1 + κ , 1 + κ 1 − κ  D ( x ) ⊂ [1 − 2 κ, 1 + 3 κ ] D ( x ) . Th us, if the algorithm outputs a pair ( x, b D ( x )) then it satisﬁes the condition stated in b oth items of the lemma. This establishes th e second item in the lemma. By com bining all th r ee items w e get th at if D ( H ) ≥ 1 − κ then the algorithm outputs a pair ( x, b D ( x )) (wh ere p ossibly , but not necessarily , x = x ∗ ), and th e ﬁr st item is established as well. T urning to the query complexit y , the total n umber of PCOND queries p erformed in the | X | = O (log(1 /κ ) /κ 2 ) calls to Est ima te-Ne ighborhood is O  | X | l og(1 /δ ) 2 log(1 / ( β η )) κ 2 η 4 β 3 δ 2  = ˜ O (1 /κ 18 ), and the total num b er of PCOND quer ies p erformed in the calls to Comp are (for at most all pairs x ∈ X and y ∈ Y x ) is ˜ O (1 /κ 20 ). 8 A ˜ O  log 3 N ǫ 3  -query ICOND D algorithm for testing uniformit y In th is and the n ext s ection we consider ICOND algorithms for testing whether an u n kno wn dis- tribution D ov er [ N ] is the un iform distribution ve rsus ǫ -far from un iform. Our resu lts sho w that ICOND algorithms are not as p o werful as PCOND algorithms for this basic testing pr ob lem; in this section we giv e a p oly(log N , 1 /ǫ )-query ICOND D algorithm, and in the next section we prov e that an y ICOND D algorithm m ust make ˜ Ω(log N ) queries. In more detail, in this section w e d escrib e an algorithm ICOND D -Test-Uniform and p ro v e the follo wing theorem: Theorem 15 ICOND D -Test-Uniform is a ˜ O ( log 3 N ǫ 3 ) -query ICOND D testing algorithm for uni- formity, i.e. it outputs A CCEPT with pr ob ability at le ast 2 / 3 if D = U and outputs REJECT with pr ob ability at le ast 2 / 3 if d TV ( D , U ) ≥ ǫ. 65 In tuition. Recall that, as mentio ned in Sub section 4.1 , an y distrib ution D whic h is ǫ -far fr om uniform must p ut Ω( ǫ ) pr obabilit y mass on “signiﬁcantly hea vy” elemen ts (that is, if we d eﬁ ne H ′ =  h ∈ [ N ]   D ( h ) ≥ 1 N + ǫ 4 N  , it must h old th at D ( H ′ ) ≥ ǫ / 4). Consequen tly a samp le of O (1 /ǫ ) p oin ts dr awn from D w ill con tain suc h a p oin t with high p robabilit y . Th u s, a natural approac h to testing whether D is un iform is to devise a pro cedu re that, given an input p oin t y , can distinguish b et w een th e case that y ∈ H ′ and the case th at D ( y ) = 1 / N (as it is wh en D = U ). W e giv e suc h a pro cedur e, whic h uses the ICOND D oracle to p erform a sort of binary search ov er in terv als. The pro cedu r e su ccessiv ely “w eighs” narr o wer and narro w er in terv als until it con verges on th e sin gle p oin t y . In more detail, we consider the interval tr e e whose ro ot is the whole domain [ N ], with t wo children { 1 , . . . , N/ 2 } and { N/ 2 + 1 , . . . , N } , and so on, w ith a single p oint at eac h of the N lea ves. Ou r alg orithm starts at the ro ot of the tree and go es do wn the path that corresp onds to y ; at eac h child no de it uses Comp are to compare the we igh t of the curr en t no de to the we igh t of its s ib ling und er D . If at an y p oin t the estimate d eviates signiﬁcantly from the v alue it s h ould ha ve if D we re u niform (n amely the weig ht s s h ould b e essent ially equ al, with slight d eviations b ecause of ev en/o dd issues), th en th e algorithm rejects. Assu ming the algorithm d o es not reject, it pro v id es a (1 ± O ( ǫ ))-a ccurate multiplicat iv e estimate of D ( y ), an d the algorithm c hecks whether this estimate is suﬃcient ly close to 1 / N (rejecting if this is not the case). If no p oin t in a sample of Θ(1 /ǫ ) p oints (dr a w n according to D ) causes rejection, then th e algorithm accepts. Algorithm 13: Binar y-Des cent Input: parameter ǫ > 0; in tegers 1 ≤ a ≤ b ≤ N ; y ∈ [ a, b ]; query access to ICOND D oracle 1: if a = b then 2: return 1 3: end if 4: Let c =  a + b 2  ; ∆ = ( b − a + 1) / 2 . 5: if y ≤ c then 6: Deﬁne I y = { a, . . . , c } , I ¯ y = { c + 1 , . . . , b } and ρ = ⌈ ∆ ⌉ / ⌊ ∆ ⌋ 7: else 8: Deﬁne I ¯ y = { a, . . . , c } , I y = { c + 1 , . . . , b } and ρ = ⌊ ∆ ⌋ / ⌈ ∆ ⌉ 9: end if 10: Call Comp are on I y , I ¯ y with parameters η = ǫ 48 log N , K = 2, δ = ǫ 100(1+log N ) to get an estimate ˆ ρ of D ( I y ) /D ( I ¯ y ) 11: if ˆ ρ / ∈ [1 − ǫ 48 log N , 1 + ǫ 48 log N ] · ρ (this includes the case that ˆ ρ is High or Lo w ) the n 12: return REJECT 13: end if 14: Call recur s iv ely Binar y-Desce nt on inp ut ( ǫ , th e end p oints of I y , y ); 15: if Binar y-Descent r eturns a v alue ν t hen 16: return ˆ ρ 1+ ˆ ρ · ν 17: else 18: return REJECT 19: end if The algorithm w e use to p erform the “binary searc h” describ ed ab o v e is Algorithm 13 , Binar y- Descent . W e b egin b y pr o vin g correctness for it: 66 Algorithm 14: ICOND D -Test-Uniform Input: error parameter ǫ > 0; qu ery access to ICOND D oracle 1: Dra w t = 20 ǫ p oints y 1 , . . . , y t from SA MP D . 2: for j = 1 to t do 3: Call Binar y-Descen t ( ǫ, 1 , N , y j ) and return REJECT if it rejects, otherwise let ˆ d j b e th e v alue it r eturns as its estimate of D ( y j ) 4: if ˆ d j / ∈ [1 − ǫ 12 , 1 + ǫ 12 ] · 1 N then 5: return REJECT 6: end if 7: end for 8: return ACCEPT Lemma 27 Supp ose the algorithm Bi nar y-Descent is run with inputs ǫ ∈ (0 , 1] , a = 1 , b = N , and y ∈ [ N ] , and is pr ovide d ICOND or acle ac c ess to distribution D over [ N ] . It p erforms ˜ O (log 3 N/ǫ 2 ) querie s and ei ther outputs a value ˆ D ( y ) or REJECT , wher e the fol lowing holds: 1. if D ( y ) ≥ 1 N + ǫ 4 N , then with pr ob ability at le ast 1 − ǫ 100 the pr o c e dur e either outputs a value ˆ D ( y ) ∈ [1 − ǫ/ 12 , 1 + ǫ/ 12] D ( y ) or REJECT ; 2. if D = U , then with pr ob ability at le ast 1 − ǫ 100 the pr o c e dur e outputs a value ˆ D ( y ) ∈ [1 − ǫ/ 12 , 1 + ǫ/ 12] · 1 N . Pro of of Lemma 27 : The claimed query b oun d is easily v eriﬁed, since the recur sion depth is at most 1 + log N and the only qu er ies made are d uring calls to Comp are , eac h of whic h p erforms O (log (1 /δ ) /γ 2 ) = ˜ O (log 2 N/ǫ 2 ) qu eries. Let E 0 b e the ev en t th at all calls to Comp are satisfy the conditions in Lemma 2 ; since eac h of them succeeds with probabilit y at least 1 − δ = 1 − ǫ 100(1+log N ) , a union b ound sho w s that E 0 holds with pr obabilit y at least 1 − ǫ/ 100. W e hereafter condition on E 0 . W e ﬁrst prov e the second part of the lemma where D = U . Fix an y sp eciﬁc r ecursiv e call, sa y the j -th, dur ing the execution of the p ro cedure. The in terv als I ( j ) y , I ( j ) ¯ y used in that execution of the algorithm are easily seen to s atisfy D ( I y ) /D ( I ¯ y ) ∈ [1 /K, K ] (for K = 2), so by even t E 0 it m ust b e the case that Comp are returns an estimate ˆ ρ j ∈ [1 − ǫ 48 log N , 1 + ǫ 48 log N ] · D ( I ( j ) y ) /D ( I ( j ) ¯ y ). Sin ce D = U , we ha v e that D ( I ( j ) y ) /D ( I ( j ) ¯ y ) = ρ ( j ) , so the ov erall pr o cedure returns a n umerical v alue rather than REJECT . Let M = ⌈ log N ⌉ b e the n umber of recur s iv e calls (i.e., the num b er of executions of Line 14 ). Note that we can write D ( y ) as a pro duct D ( y ) = M Y j =1 D ( I ( j ) y ) D ( I ( j ) y ) + D ( I ( j ) ¯ y ) = M Y j =1 D ( I ( j ) y ) /D ( I ( j ) ¯ y ) D ( I ( j ) y ) /D ( I ( j ) ¯ y ) + 1 . (77) W e next observe that for any 0 ≤ ǫ ′ < 1 and ρ, d > 0, if ˆ ρ ∈ [1 − ǫ ′ , 1 + ǫ ′ ] d then w e ha ve ˆ ρ ˆ ρ +1 ∈ [1 − ǫ ′ 2 , 1 + ǫ ′ ] d d +1 (b y str aigh tforward algebra). App lying this M times, we get 67 M Y j =1 ˆ ρ j ˆ ρ j + 1 ∈ "  1 − ǫ 96 log N  M ,  1 + ǫ 48 log N  M # · M Y j =1 D ( I ( j ) y ) /D ( I ( j ) ¯ y ) D ( I ( j ) y ) /D ( I ( j ) ¯ y ) + 1 ∈ "  1 − ǫ 96 log N  M ,  1 + ǫ 48 log N  M # · D ( y ) ∈ h 1 − ǫ 12 , 1 + ǫ 12 i D ( y ) . Since Q M j =1 ˆ ρ j ˆ ρ j +1 is th e v alue th at the pro cedu re outputs, the second part of the lemma is p ro ved. The pr o of of the ﬁrst part of the lemma is virtually id en tical. The only diﬀerence is that no w it is p ossible th at Comp are outputs High or Lo w a t some call (since D is not uniform it need not b e the case that D ( I ( j ) y ) /D ( I ( j ) ¯ y ) = ρ ( j ) ), b u t this is n ot a p roblem for (i) since in that case Binar y-Desc ent wo uld output REJECT . See Algorithm 13 for a descrip tion of the testing algorithm ICOND D -Test-Uniform . W e now pro v e Theorem 15 : Pro of of Theorem 15 : Deﬁn e E 1 to b e the even t that all calls to Binar y-Descent satisfy th e conclusions of Lemma 27 . With a un ion b oun d o ver all these t = 20 /ǫ calls, w e ha ve Pr[ E 1 ] ≥ 8 / 10. Completeness: Supp ose D = U , and condition again on E 1 . Since this implies th at Binar y- Descent will alwa ys return a v alue, the only case ICOND D -Test-Uniform might r eject is b y reac h ing Lin e 5 . Ho wev er, since it is the case that ev er y v alue ˆ d j returned b y the pr o cedure satisﬁes ˆ D ( y ) ∈ [1 − ǫ/ 12 , 1 + ǫ/ 12] · 1 N , this can never happ en. Soundness: Supp ose d TV ( D , U ) ≥ ǫ . Let E 2 b e th e ev en t that at least one of the y i ’s d ra wn in Lin e 1 b elongs to H ′ . As D ( H ′ ) ≥ ǫ/ 4, w e ha v e Pr [ E 2 ] ≥ 1 − (1 − ǫ/ 4) 20 /ǫ ≥ 9 / 10. Conditioning on b oth E 1 and E 2 , for su c h a y j , one of tw o cases b elo w holds: • either the call to Binar y-Descent outpu ts REJECT and ICOND D -Test-Uniform ou t- puts R EJECT ; • or a v alue ˆ d j is r eturned, for which ˆ d j ≥ (1 − ǫ 12 )(1 + ǫ 4 ) · 1 N > (1 + ǫ/ 12) / N (where w e used the fact that E 1 holds); and ICOND D -Test-Uniform r eac hes Line 5 and r ejects. Since Pr[ E 1 ∪ E 2 ] ≥ 7 / 10 , ICOND D -Test-Uniform is correct with probabilit y at least 2 / 3. Fi- nally , the claimed query complexit y directly follo ws from the t = Θ(1 / ǫ ) calls to Binar y-Descent , eac h of which makes ˜ O (log 3 N/ǫ 2 ) queries to ICOND D . 9 An Ω ( log N / log log N ) lo we r b ound for ICOND D algorithms that test uniformit y In this s ection w e prov e th at an y ICOND D algorithm that ǫ -tests un if orm it y even for constan t ǫ m ust hav e query complexit y ˜ Ω(log N ) . Th is shows that our algorithm in the p revious su bsection is not to o far from optimal, and s heds ligh t on a k ey diﬀerence b et we en ICOND and PCOND oracles. 68 Theorem 16 Fix ǫ = 1 / 3 . Any ICOND D algorithm for testing whether D = U versus d TV ( D , D ∗ ) ≥ ǫ must make Ω  log N log log N  queries. T o pro v e this lo wer b ound w e d eﬁne a probabilit y distribu tion P No o ver p ossib le “N o”-distributions (i.e. distribu tions that hav e v ariation distance at least 1 / 3 from U ). A d istr ibution dr awn from P No is constructed as follo ws: ﬁrst (assuming without loss of generalit y that N is a p o we r of 2), we par- tition [ N ] in to b = 2 X consecutiv e inte rv als of the same size ∆ = N 2 X , which we refer to as “blo cks” , where X is a r andom v ariable distribu ted uniform ly on the set { 1 3 log N , 1 3 log N + 1 , . . . , 2 3 log N } . Once th e b lo c k size ∆ is determin ed , a random oﬀset y is d ra wn u niformly at random in [ N ], and all blo c k endp oin ts are shifted by y mo dulo [ N ] (in tuitiv ely , this preven ts the testing algorithm from “knowing” a p riori that sp eciﬁc p oin ts are endp oin ts of blo c ks). Finally , indep end en tly for eac h blo c k, a fair coin is thro wn to determine its pr oﬁle : w ith probabilit y 1 / 2, eac h p oin t in the ﬁrst half of the blo ck will ha v e pr obabilit y weig h t 1 − 2 ǫ N and eac h p oin t in the second half will h a ve probabilit y 1+2 ǫ N (suc h a blo ck is s aid to b e a low-high blo c k , with pr oﬁ le ↓↑ ). With probabilit y 1 / 2 the rev erse is true: e ac h p oin t in the ﬁrst half has p r obabilit y 1+2 ǫ N and eac h p oin t in the second half has probabilit y 1 − 2 ǫ N (a high-low b lo c k ↑↓ ). I t is clear that eac h distr ibution D in the sup p ort of P No deﬁned in this w a y in d eed has d TV ( D , U ) = ǫ. T o summ arize, eac h “No”-distribution D in th e su pp ort of P No is p arameterized by ( b + 2) parameters: its b lo c k size ∆, oﬀset y , and proﬁ le ϑ ∈ {↓↑ , ↑↓} b . Note th at regardless of the proﬁ le v ector, eac h blo c k alw a ys has wei gh t exactly ∆ / N . W e note that while there is only one “Y es”-distribution U , it w ill sometimes b e con v en ien t for the analysis to thin k of U as resu lting fr om th e same initial pro cess of pic king a b lo c k size and oﬀset, b ut without the sub sequent choice of a proﬁle v ector. W e sometimes refer to th is as the “fak e constru ction” of the un iform distr ib ution U (the reason for this will b e clear later). The pr o of of Theorem 16 will b e carried out in t w o steps. First we shall restrict th e analysis to non-adapt ive algorithms , and pr o ve the low er b ound for such algorithms. Th is result will then b e extended to the general setting by in tro ducing (similarly to Subsection 5.2 ) the n otion of a q uery- faking algorithm , and reducing the b eha vior of adaptiv e algorithms to non-adaptiv e ones th rough an approp r iate sequence of such query-faking algorithms. Before pr o ceeding, w e d eﬁne the tr anscript of the interac tion b et w een an algorithm and a ICOND D oracle. Informally , the transcript captur es the entire history of interact ion b et wee n the algorithm and the ICOND D oracle du ring the whole sequence of queries. Deﬁnition 7 Fix any (p ossibly adaptive) testing algorithm A that queries an ICOND D or acle. The transcript of A is a se qu enc e T = ( I ℓ , s ℓ ) ℓ ∈ N ∗ of p airs, wher e I ℓ is the ℓ -th interval pr ovide d by the algorithm as input to ICOND D , and s ℓ ∈ I ℓ is the r esp onse that ICOND D pr ovides to this query. Given a tr anscript T , we shal l denote by T | k the p artial tr anscript induc e d b y the ﬁrst k qu eries, i.e. T | k = ( I ℓ , s ℓ ) 1 ≤ ℓ ≤ k . Equipp ed w ith these deﬁn itions, we n ow turn to p r o vin g the theorem in the sp ecial case of non-adaptiv e testing algorithms. Observ e that there are thr ee diﬀerent sour ces of randomn ess in our argument s: (i) the d r a w of the “No”-instance fr om P No , (ii) the in ternal randomness of the 69 testing algorithm; an d (iii) the r andom draws from the oracle. Whenever there could b e confusion w e sh all explicitly s tate wh ic h probability space is under d iscussion. 9.1 A low er b ound against non-adaptiv e algorithms Throughout this su bsection w e assume that A is an arbitrary , ﬁ xed, non-adaptiv e, r andomized algorithm that make s exactly q ≤ τ · log N log log N queries to ICOND D ; h ere τ > 0 is some absolute constan t that will b e d etermined in the course of the analysis. (The assum p tion that A alw a y s mak es exactly q queries is without loss of generalit y sin ce if in some execution the algorithm mak es q ′ < q queries, it can p erform additional “dum m y” qu er ies). I n this setting algorithm A corresp onds to a distribution P A o ver q -tuples ¯ I = ( I 1 , . . . , I q ) of qu ery interv als. The follo wing theorem will directly imp ly Theorem 16 in the case of non-adaptive algorithms: Theorem 17    Pr D ∼P No [ A ICOND D outputs ACCE PT ] − Pr [ A ICOND U outputs ACCEPT ]    ≤ 1 / 5 . (78) Observe that in the ﬁrs t probabilit y of Equation ( 78 ) the rand omness is tak en ov er the draw of D from P No , the d ra w of ¯ I ∼ P A that A p erforms to select its sequence of query inte rv als, and the randomness of the ICOND D oracle. In the second one the rand omness is j ust o v er the d ra w of ¯ I from P A and the randomness of the ICOND U oracle. In tuition for Theorem 17 . T he high-lev el idea is that the algorithm will not b e able to distin- guish b et w een the uniform distribution and a “No”-distribution u nless it manages to learn some- thing ab out the “stru cture” of th e blo c ks in the “No”-case , either by guessing (roughly) th e righ t blo c k s ize, or by gu essin g (roughly) the lo cation of a blo c k endp oin t and query in g a s hort in terv al con taining s u c h an endp oin t. In more d etail, we deﬁ ne the follo wing “bad ev en ts” (o v er the c h oice of D and the p oints s i ) for a ﬁxed s equence ¯ I = ( I 1 , . . . , I q ) of queries (the d ep endence on ¯ I is omitted in th e notation for the sake of r eadabilit y): B N size =  ∃ ℓ ∈ [ q ]   ∆ / log N ≤ | I ℓ | ≤ ∆ · (log N ) 2  B N boundary = { ∃ ℓ ∈ [ q ] | | I ℓ | < ∆ / log N and I ℓ in tersects t w o blo cks } B N middle = { ∃ ℓ ∈ [ q ] | | I ℓ | < ∆ / log N and I ℓ in tersects b oth halv es of the same blo ck } B N ℓ, outer = { ∆ · (log N ) 2 < | I ℓ | and s ℓ b elongs to a b lo c k n ot con tained en tirely in I ℓ } ℓ ∈ [ q ] B N ℓ, collide = { ∆ · (log N ) 2 < | I ℓ | and ∃ j < ℓ , s ℓ and s j b elong to the same blo c k } ℓ ∈ [ q ] The ﬁrst three eve nt s dep end only on the dra w of D from P No , wh ic h determines ∆ and y , wh ile the last 2 q ev en ts also dep end on th e rand om draws of s ℓ from the ICOND D oracle. W e deﬁne in the same f ashion the corresp onding bad even ts for the “Y es”-instance (i.e. the uniform distribution U ) B Y size , B Y boundary , B Y middle , B Y ℓ, outer and B Y ℓ, collide , u sing the n otion of the “fak e construction” of U men tioned ab o ve. 70 Ev en ts B N size and B Y size corresp ond to the p ossibilit y , men tioned ab o v e, that algorithm A “guesses” essen tially the righ t b lo c k size, and eve n ts B N boundary , B Y boundary and B N middle , B Y middle corresp ond to the p ossibilit y that algorithm A “guesses” a short in terv al con taining resp ectiv ely a blo ck endp oin t or a b lo c k midp oin t. The ﬁnal bad even ts corresp ond to A guessin g a “to o-large” blo ck size bu t “getting lu c ky” w ith the sample returned b y ICOND , either b ecause the sample b elongs to one of th e (at most tw o) outer b lo c ks not ent irely con tained in the qu ery interv al, or b ecause A has already receiv ed a samp le fr om the same blo ck as the current sample. W e can now describ e the failur e events for b oth the un iform d istribution and f or a “No”- distribution as the union of the corresp ond in g b ad ev en ts: B N ( ¯ I ) = B N size ∪ B N boundary ∪ B N middle ∪ q [ ℓ =1 B N ℓ, outer ! ∪ q [ ℓ =1 B N ℓ, collide ! B Y ( ¯ I ) = B Y size ∪ B Y boundary ∪ B Y middle ∪ q [ ℓ =1 B Y ℓ, outer ! ∪ q [ ℓ =1 B Y ℓ, collide ! These f ailur e ev ents can b e in terpreted, from the p oin t of view of th e algorithm A , as the “opp ortunity to p oten tially learn somethin g;” we shall argue b elo w that if the failure ev en ts do not o ccur then th e algorithm gains no information ab out w hether it is int eracting with the uniform distribution or with a “No”-distribution. Structure of the pro of of Theorem 17 . First, observ e that since th e transcript is the result of the int eraction of th e algorithm and the oracle on a r an d omly c hosen distribution, it is itself a random v ariable; we will b e in terested in th e distribu tion o ver this random v ariable in duced b y the dra ws from the oracle and the choic e of D . More precisely , for a ﬁxed sequen ce of qu ery sets ¯ I , let Z N ¯ I denote the rand om v ariable o v er “No”-transcripts generated w hen D is d ra wn fr om P No . Note that this is a random v ariable o v er the p robabilit y space deﬁned by the random draw of D and the dra ws of s i b y ICOND D ( I ℓ ). W e deﬁ n e A N ¯ I as the resulting d istribution o v er these “No”-transcripts. Similarly , Z Y ¯ I will b e the random v ariable o v er “Y es”-transcripts, with corresp onding distribution A Y ¯ I . As noted earlier, the nonadaptive algorithm A corresp ond s to a distribution P A o ver q -tuples ¯ I of query in terv als. W e deﬁne A N as the distribu tion o ve r tr an s cripts corresp ond ing to ﬁrst dra wing ¯ I from P A and then making a dr a w from A N ¯ I . S imilarly , we deﬁ ne A Y as the distribution ov er transcripts corresp onding to ﬁrst dr a win g ¯ I from P A and then making a d r a w from A Y ¯ I . T o p ro v e Theorem 17 it is suﬃcien t to sho w that the t wo d istributions o ver transcripts describ ed ab o v e are statistically close: Lemma 28 d TV  A Y , A N  ≤ 1 / 5 . The pro of of this lemma is stru ctured as follo ws: ﬁrst, for any ﬁxe d sequence of q queries ¯ I , we b ound the prob ab ility of the failure eve nt s, b oth for the unif orm and the “No”-distributions: Claim 29 F or e ach ﬁxe d se quenc e ¯ I of q query intervals, we have Pr  B Y ( ¯ I )  ≤ 1 / 10 and Pr D ←P No  B N ( ¯ I )  ≤ 1 / 10 . 71 (Note that the ﬁrst pr ob ab ility ab o v e is take n o v er th e randomness of the ICOND U resp onses and the c h oice of oﬀset and size in the “fake construction” of U , wh ile the second is o ver the random dra w of D ∼ P No and ov er the ICOND D resp onses.) Next we show that, provided the failure ev ents do n ot o ccur, the distribu tion o v er transcripts is exactly the same in b oth cases: Claim 30 Fix any se quenc e ¯ I = ( I 1 , . . . , I q ) of q queries. Then, c onditione d on their r esp e ctive failur e events not happ ening, Z N ¯ I and Z Y ¯ I ar e identic al ly distribute d: for every tr anscript T = (( I 1 , s 1 ) , . . . , ( I q , s q )) , Pr h Z N ¯ I = T    B N ( ¯ I ) i = Pr h Z Y ¯ I = T    B Y ( ¯ I ) i . Finally w e com b ine these tw o claims to s h o w that the t wo o v erall distrib utions of transcripts are statistica lly close: Claim 31 Fix any se quenc e of q queries ¯ I = ( I 1 , . . . , I q ) . Then d TV  A N ¯ I , A Y ¯ I  ≤ 1 / 5 . Lemma 28 (and thus Theorem 17 ) directly follo ws fr om Claim 31 sin ce, u sing the notation ¯ s = ( s 1 , . . . , s q ) for a sequence of q answers to a sequence ¯ I = ( I 1 , . . . , I q ) of q queries, wh ic h together d eﬁne a tran- script T ( ¯ I , ¯ s ) = (( I 1 , s 1 ) , . . . , ( I q , s q )), d TV  A Y , A N  = 1 2 X ¯ I X ¯ s   P A ( ¯ I ) · Pr  Z Y ¯ I = T ( ¯ I , ¯ s )  − P A ( ¯ I ) · Pr  Z N ¯ I = T ( ¯ I , ¯ s )    = 1 2 X ¯ I P A ( ¯ I ) · X ¯ s   Pr  Z Y ¯ I = T ( ¯ I , ¯ s )  − Pr  Z N ¯ I = T ( ¯ I , ¯ s )    ≤ max ¯ I  d TV  A Y ¯ I , A N ¯ I  ≤ 1 / 5 . (79) This concludes th e pro of of Lemma 28 mo du lo the pro ofs of the ab o ve claims; we giv e those p ro ofs in Subsection 9.1.1 b elow. 9.1.1 Pro of of Claims 29 to 31 T o pro v e Claim 29 w e b ound th e probability of eac h of the bad eve nt s separately , starting with the “No”-c ase. (i) Deﬁning the ev ent B N ℓ, size as B N ℓ, size = { ∆ / log N ≤ | I ℓ | ≤ ∆ · (log N ) 2 } , w e ca n use a union b ound to get Pr[ B N size ] ≤ P q ℓ =1 Pr[ B N ℓ, size ] . F or an y ﬁxed setting of I ℓ there are O (log log N ) v alues of ∆ ∈ { N 2 X | X ∈ { 1 3 log N , . . . , 2 3 log N }} for w h ic h ∆ / log N ≤ I ℓ ≤ ∆ · (log N ) 2 . Hence w e ha ve Pr[ B N ℓ, size ] = O ((log log N ) / log N ), and con- sequen tly Pr [ B N size ] = O ( q (log log N ) / log N ). 72 (ii) Similarly , deﬁning the ev en t B N ℓ, b oundary as B N ℓ, b oundary = {| I ℓ | < ∆ / log N and I ℓ in tersects tw o blo c ks } , w e h av e Pr[ B N boundary ] ≤ P q ℓ =1 Pr[ B N ℓ, b oundary ]. F or any ﬁxed setting of I ℓ , r ecalling the c hoice of a uniform random oﬀset y ∈ [ N ] for the blo cks, w e ha ve th at P r[ B N ℓ, b oundary ] ≤ O (1 / log N ), and consequently Pr[ B N boundary ] = O ( q / log N ). (iii) The analysis of B N middle is ident ical (by consid er in g the midp oin t of a blo ck instead of its endp oin t), yielding d irectly Pr[ B N middle ] = O ( q / log N ). (iv) Fix ℓ ∈ [ q ] and r ecall that B N ℓ, outer = { ∆ · (log N ) 2 < | I ℓ | and s ℓ is dr a w n f rom a blo c k ( I ℓ } . Fix any outcome for ∆ su ch that ∆ · (log N ) 2 < | I ℓ | and let us consider only the randomness o ve r the dra w of s ℓ from I ℓ . Since there are Ω((log N ) 2 ) blocks con tained en tirely in I ℓ , the probabilit y that s ℓ is drawn from a blo ck not conta ined entirely in I ℓ (there are at most t w o such blo cks, one at eac h end of I ℓ ) is O (1 / (log N ) 2 ). Hence we ha v e Pr[ B N ℓ, outer ] ≤ O (1) / (log N ) 2 . (v) Finally , r ecall that B N ℓ, collide = { ∆ · (log N ) 2 < | I ℓ | and ∃ j < ℓ s.t. s ℓ and s j b elong to the same blo c k } . Fix ℓ ∈ [ q ] and a query int erv al I ℓ . Let r ℓ b e the n umber of blo cks in I ℓ within whic h resides some previously sampled p oint s j , j ∈ [ ℓ − 1]. Since there are Ω((log N ) 2 ) blo c ks in I ℓ and r ℓ ≤ ℓ − 1, the pr obabilit y th at s ℓ is d ra w n fr om a b lo c k con taining any s j , j < ℓ , is O ( ℓ/ (log N ) 2 ) . Hence w e h a ve Pr[ B N ℓ, collide ] = O ( ℓ/ (log N ) 2 ) . With these prob ab ility b oun ds for bad ev ents in h and, we can prov e Claim 29 : Pro of of Claim 29 : Recall that q ≤ τ · log N log log N . Recalling th e d eﬁnition of B N ( ¯ I ) , a un ion b ound yields Pr[ B N ( ¯ I ) ] ≤ Pr[ B N size ] + Pr[ B N boundary ] + Pr[ B N middle ] + q X ℓ =1 Pr[ B N ℓ, outer ] + q X ℓ =1 Pr[ B N ℓ, collide ] = O  q · log log N log N  + O  q log N  + O  q log N  + q X ℓ =1 O  1 (log N ) 2  + q X ℓ =1 O  ℓ (log N ) 2  ≤ 1 10 , where the last inequalit y holds for a suﬃ cien tly small c h oice of the absolute constant τ . The same analysis applies u n c h anged f or Pr[ B Y size ], Pr[ B Y middle ] and Pr[ B Y boundary ], using the “fak e construction” view of U as describ ed earlier. Th e argum ents for Pr[ B Y ℓ, outer ] and Pr[ B Y ℓ, collide ] go through unchanged as wel l, an d Claim 29 is pr o ved. Pro of of Claim 30 : Fix any ¯ I = ( I 1 , . . . , I q ) and any transcript T = (( I 1 , s 1 ) , . . . , ( I q , s q )). Recall that th e length- ℓ p artial transcript T | ℓ is deﬁn ed to b e (( I 1 , s 1 ) , . . . , ( I ℓ , s ℓ )). W e deﬁ n e the 73 random v ariables Z N ¯ I ,ℓ and Z Y ¯ I ,ℓ to b e the length- ℓ preﬁxes of Z N ¯ I and Z Y ¯ I resp ectiv ely . W e pro v e Claim 30 by establishing the follo wing, whic h we pro v e by induction on ℓ : Pr h Z N ¯ I ,ℓ = T | ℓ    B N ( ¯ I ) i = Pr h Z Y ¯ I ,ℓ = T | ℓ    B Y ( ¯ I ) i . (80) F or the b ase case, it is clear th at ( 80 ) holds with ℓ = 0 . F or the indu ctiv e step, su pp ose ( 80 ) holds for all k ∈ [ ℓ − 1]. When queryin g I ℓ at the ℓ -th s tep, one of the follo win g cases must hold (since w e conditioned on the “bad ev en ts” not h app ening): (1) I ℓ is con tained within a half-blo c k (more precisely , either entirely within the ﬁ rst h alf of a blo c k or entirely within the second half ). In this case the “yes” and “no” distr ib ution oracles b ehav e exactly the same s ince b oth generate s ℓ b y sampling un iformly from I ℓ . (2) The p oint s ℓ b elongs to a b lo c k, con tained en tirely in I ℓ , w hic h is “fresh” in the sense that it con tains n o s j , j < ℓ. In the “No”-case this blo c k may either b e h igh-lo w or lo w -h igh; but sin ce b oth outcomes ha v e the same probability , there is an other trans cript with equal probabilit y in wh ic h th e t w o proﬁles are sw itc hed . Con s equen tly (o ve r the rand omness in the dra w of D ∼ P No ) the probabilit y of p icking s ℓ in the “No”-distribution case is the same as in the uniform distribution case (i.e., uniform on the fresh b lo c ks contai ned in I ℓ ). This concludes the pro of of Claim 30 . Pro of of C laim 31 : Giv en Claims 29 and 30 , Claim 31 is an immediate consequence of the follo wing b asic fact: F a ct 32 L et D 1 , D 2 b e two distributions over the same ﬁnite set X . L et E 1 , E 2 , b e two events such that D i [ E i ] = α i ≤ α for i = 1 , 2 and the c onditional distributions ( D i ) E i ar e identic al, i . e. d TV (( D 1 ) E 1 , ( D 2 ) E 2 ) = 0 . Then d TV ( D 1 , D 2 ) ≤ α. Pro of: W e ﬁ rst obser ve that since ( D 2 ) E 2 ( E 2 ) = 0 and ( D 1 ) E 1 is id en tical to ( D 2 ) E 2 , it must b e the case that ( D 1 ) E 1 ( E 2 ) = 0, and like wise ( D 2 ) E 2 ( E 1 ) = 0. This implies th at D 1 ( E 2 \ E 1 ) = D 2 ( E 1 \ E 2 ) = 0. No w let us write 2 d TV ( D 1 , D 2 ) = X x ∈ X \ ( E 1 ∪ E 2 ) | D 1 ( x ) − D 2 ( x ) | + X x ∈ E 1 ∩ E 2 | D 1 ( x ) − D 2 ( x ) | + X x ∈ E 1 \ E 2 | D 1 ( x ) − D 2 ( x ) | + X x ∈ E 2 \ E 1 | D 1 ( x ) − D 2 ( x ) | . W e ma y upp er b ound P x ∈ E 1 ∩ E 2 | D 1 ( x ) − D 2 ( x ) | b y P x ∈ E 1 ∩ E 2 ( D 1 ( x ) + D 2 ( x )) = D 1 ( E 1 ∩ E 2 ) + D 2 ( E 1 ∩ E 2 ), and th e abov e discussion giv es P x ∈ E 1 \ E 2 | D 1 ( x ) − D 2 ( x ) | = D 1 ( E 1 \ E 2 ) and P x ∈ E 2 \ E 1 | D 1 ( x ) − D 2 ( x ) | = D 2 ( E 2 \ E 1 ) . W e thus h av e 2 d TV ( D 1 , D 2 ) ≤ X x ∈ X \ ( E 1 ∪ E 2 ) | D 1 ( x ) − D 2 ( x ) | + D 1 ( E 1 ) + D 2 ( E 2 ) ≤ X x ∈ X \ ( E 1 ∪ E 2 ) | D 1 ( x ) − D 2 ( x ) | + α 1 + α 2 . 74 Finally , since d TV (( D 1 ) E 1 , ( D 2 ) E 2 ) = 0, we h a ve X x ∈ X \ ( E 1 ∪ E 2 ) | D 1 ( x ) − D 2 ( x ) | = | D 1 ( X \ ( E 1 ∪ E 2 )) − D 2 ( X \ ( E 1 ∪ E 2 )) | =   D 1 ( E 1 ) − D 2 ( E 2 )   = | α 1 − α 2 | . Th us 2 d TV ( D 1 , D 2 ) ≤ | α 1 − α 2 | + α 1 + α 2 = 2 max { α 1 , α 2 } ≤ 2 α , and the fact is established. This concludes the pro of of Claim 31 . 9.2 A low er bound against adaptiv e algorithms: Outline of the proof of Theorem 16 Throughout this subsection A denotes a general adaptiv e algorithm that m ak es q ≤ τ · log N log log N queries, where as b efore τ > 0 is an absolute constant . Theorem 16 is a consequence of the follo wing th eorem, wh ic h deals with adaptive algorithms: Theorem 18    Pr D ∼P No [ A ICOND D outputs ACCE PT ] − Pr [ A ICOND U outputs ACCEPT ]    ≤ 1 / 5 . (81) The idea h ere is to extend the previous analysis for non-adaptiv e algorithms, and argue th at “adaptiv eness do es not really help” to d istinguish b et w een D = U and D ∼ P No giv en access to ICOND D . As in the n on-adaptiv e case, in order to pro v e Theorem 18 , it is suﬃcient to pro v e th at the transcripts for uniform and “No”-distributions are close in total v ariation d istance; i.e., that d TV  A Y , A N  ≤ 1 / 5 . (82) The key id ea used to pro v e this will b e to int ro du ce a se quenc e A ( k ) , N otf of distribu tions ov er tran- scripts (wh ere “otf ” stands for “on the ﬂy”), for 0 ≤ k ≤ q , suc h that (i) A (0) , N otf = A Y and A ( q ) , N otf = A N , and (ii) the distance d TV  A ( k ) , N otf , A ( k +1) , N otf  for eac h 0 ≤ k ≤ q − 1 is “small”. This will enable us to conclude b y the triangle in equalit y , as d TV  A N , A Y  = d TV  A (0) , N otf , A ( q ) , N otf  ≤ q − 1 X k =0 d TV  A ( k ) , N otf , A ( k +1) , N otf  . (83) T o deﬁn e this s equ ence, in the n ext su bsection we will introd u ce the notion of an extende d tr anscript , whic h in addition to the queries and samples includes additional in formation ab out the “local struc- ture” of the distribution at the endp oints of the qu ery interv als and the samp le p oints. In tuitiv ely , this extra inf ormation will help us analyze th e interac tion b et ween the adaptiv e algorithm and the oracle. W e will then d escrib e an alternativ e p ro cess according to which a “faking algorithm” (reminiscen t of the similar notion fr om Subsection 5.2 ) can in teract with an oracle to generate suc h an extended transcript. More precisely , we shall deﬁn e a sequence of such faking algorithms, paramaterized by “ho w m u c h faking” they p erform . F or b oth the original (“non-faking”) algorithm 75 A and f or th e faking algorithms, we will sh ow ho w extended transcripts can b e generated “on th e ﬂy”. T he aforemen tioned distribu tions A ( k ) , N otf o ver (regular) tr an s cripts are obtained by trunc ating the extended tr anscripts that are generated on the ﬂy (i.e., discarding the extra inform ation), and w e sh all argue that they satisfy requirements (i) and (ii) ab o ve . Before turn ing to the p recise deﬁnitions and the analysis of extended transcripts and faking algorithms, we pro vide the follo w in g v arian t of F act 32 , whic h w ill come in hand y when we b ound the r igh t hand side of Equation ( 83 ). F a ct 33 L et D 1 , D 2 b e two distributions over the same ﬁnite set X . L et E b e an event such that D i [ E ] = α i ≤ α for i = 1 , 2 and the c onditional distributions ( D 1 ) E and ( D 2 ) E ar e statistic al ly close, i.e. d TV (( D 1 ) E , ( D 2 ) E ) = β . Then d TV ( D 1 , D 2 ) ≤ α + β . Pro of: As in the pr o of of F act 32 , let us write 2 d TV ( D 1 , D 2 ) = X x ∈ X \ E | D 1 ( x ) − D 2 ( x ) | + X x ∈ E | D 1 ( x ) − D 2 ( x ) | . W e ma y upp er b ound P x ∈ E | D 1 ( x ) − D 2 ( x ) | b y P x ∈ E ( D 1 ( x ) + D 2 ( x )) = D 1 ( E ) + D 2 ( E ) = α 1 + α 2 ; furthermore, X x ∈ ¯ E | D 1 ( x ) − D 2 ( x ) | = X x ∈ ¯ E   ( D 1 ) ¯ E ( x ) · D 1 ( ¯ E ) − ( D 2 ) ¯ E ( x ) · D 2 ( ¯ E )   ≤ D 1 ( ¯ E ) · X x ∈ ¯ E | ( D 1 ) ¯ E ( x ) − ( D 2 ) ¯ E ( x ) | +   D 1 ( ¯ E ) − D 2 ( ¯ E )   · ( D 2 ) ¯ E ( ¯ E ) ≤ (1 − α 1 ) · (2 β ) + | α 2 − α 1 | · 1 ≤ 2 β + | α 2 − α 1 | Th us 2 d TV ( D 1 , D 2 ) ≤ 2 β + | α 1 − α 2 | + α 1 + α 2 = 2 β + 2 m ax { α 1 , α 2 } ≤ 2( α + β ), and the fact is established. 9.3 Extended t ranscripts and dra wing D ∼ P No on the ﬂy . Observe th at th e testing algorithm, seeing only p airs of qu eries and answers, do es not h av e direct access to all the un d erlying information – namely , in th e case of a “No”-distribution, wh ether th e proﬁle of the blo c k that the sample p oint comes from is ↓↑ or ↑↓ . It will b e useful for u s to consider an “extended” v ersion of the trans cripts, whic h in cludes this information along with in formation ab out the proﬁle of the “b oun dary” blo c ks for eac h queried interv al, ev en th ou gh this in formation is not directly av ailable to the algorithm. Deﬁnition 8 With the same notation as in Deﬁnition 7 , the ext ended transcript of a se qu enc e of q u eries made by A and the c orr esp onding r esp onses is a se quenc e E = ( I ℓ , s ℓ , b ℓ ) ℓ ∈ [ q ] of triples, wher e I ℓ and s ℓ ar e as b efor e, and b ℓ = ( b L ℓ , b samp ℓ , b R ℓ ) ∈ {↓↑ , ↑↓} 3 is a triple deﬁne d as f ol lows: L et B i L , . . . , B i R b e the blo cks that I ℓ interse cts, going fr om left to right. Then 1. b L ℓ is the pr oﬁle of the blo ck B i L ; 76 2. b R ℓ is the pr oﬁle of the blo ck B i R ; 3. b samp ℓ is the pr oﬁle of the blo ck B ℓ ∈ { B i L , . . . , B i R } that s ℓ b elongs to. We deﬁne E | k to b e the length- k pr eﬁx of an extende d tr anscript E . As was b rieﬂy discuss ed prior to the current subsection, w e shall b e intereste d in considering algorithms that fake some ans wers to their queries. Sp eciﬁcally , giv en an adaptive algorithm A , w e deﬁ n e A (1) as the algorithm that fakes its ﬁrs t qu er y , in the follo win g sense: I f the ﬁrst query made by A to the oracle is some in terv al I , then th e algorithm A (1) do es n ot call ICOND on I bu t instead c h o oses a p oin t s uniformly at random from I and then b eha v es exactly as A wo uld b ehav e if th e ICOND oracle had returned s in resp onse to the query I . More generally , w e deﬁ n e A ( k ) for all 0 ≤ k ≤ q as the algorithm b eha vin g lik e A b ut faking its ﬁ rst k queries (note that A (0) = A ). In Subsection 9.3.1 we explain ho w extended transcripts can b e generated for A (0) = A in an “on the ﬂy” fashion so th at the resulting d istribution ov er extended transcripts is the same as the one that w ould result from ﬁr s t dra wing D f rom P No and then running algorithm A on it. I t follo ws that wh en w e r emov e the extension to the transcript so as to obtain a regular transcript, w e get a distribution ov er transcripts that is iden tical to A N . In S ubsection 9.3.2 w e explain h o w to generate extended tr anscripts for A ( k ) where 0 ≤ k ≤ q . W e n ote that for k ≥ 1 the resulting distribution o ver extended transcrip ts is not the same as the one that w ou ld result fr om ﬁ rst dr a win g D from P No and then runn ing algorithm A ( k ) on it. Ho w ev er, this is n ot necessary for our purp oses. F or our purp oses it is su ﬃcien t that the distributions corresp onding to pairs of consecutive ind ices ( k , k + 1) are similar (including the p air (0 , 1)), and th at for k = q the distribution ov er regular transcripts obtained by remo ving the extension to th e transcript is identical to A Y . 9.3.1 Extended t ranscripts for A = A (0) Our pro of of Eq u ation ( 82 ) take s adv an tage of the fact that one can view the draw of a “No”- distribution from P No as b eing done “on the ﬂy” during the course of algorithm A ’s execution. First, the size ∆ an d the oﬀset y are drawn at the very b eginning, but we ma y view the proﬁle v ector ϑ as having its comp on ents c hosen indep end en tly , coord inate by co ord in ate, only as A in teracts with ICOND – eac h time an element s ℓ is obtained in resp onse to the ℓ -th query I ℓ , only then are the elemen ts of the proﬁ le v ector ϑ corresp onding to the three co ordinates of b ℓ c h osen (if they were not already completely d etermin ed by previous calls to ICOND ). More p recise details follo w. Consider the ℓ -th query I ℓ that A mak es to ICOND D . Inductiv ely some co ordin ates of ϑ may ha ve b een already set by pr evious qu eries. Let B i L , . . . , B i R b e the b lo c ks th at I ℓ in tersects. First, if the co ordin ate of ϑ corresp onding to blo c k B i L w as not already set b y a pr evious query , a f air coin is tossed to c ho ose a s etting f rom {↓↑ , ↑↓} for this co ordinate. Likewise, if the co ordinate of ϑ corresp ondin g to blo ck B i R w as n ot already set (either by a previous qu ery or b ecause i R = i L ), a fair coin is tossed to c ho ose a setting from {↓↑ , ↑↓} for this co ordinate. A t this p oint, the v alues of b L ℓ and b R ℓ ha ve b een set. A simp le b ut imp ortan t observ ation is that these outcomes of b L ℓ and b R ℓ completely d etermine th e probabilities (call them α L and α R 77 resp ectiv ely) that the blo c k B ℓ from whic h s ℓ will b e c hosen is B i L (is B i R resp ectiv ely), as w e explain in more detail next. If i R = i L then there is no c h oice to b e made, and so assume that i R > i L . F or K ∈ { L, R } let ρ K 1 · ∆ b e the size of the intersect ion of I ℓ with the ﬁrst (left) half of B i K and let ρ K 2 · ∆ b e the size of the in tersection of I ℓ with the second (righ t) half of B i K . Note that 0 < ρ K 1 + ρ K 2 ≤ 1 and that ρ L 1 = 0 when ρ L 2 ≤ 1 / 2 and similarly ρ R 2 = 0 when ρ R 1 ≤ 1 / 2. If b K ℓ = ↑↓ then let w K = ρ K 1 · (1 + 2 ǫ ) + ρ K 2 · (1 − 2 ǫ ) = ρ K 1 + ρ K 2 + 2 ǫ ( ρ K 1 − ρ K 2 ), and if b K ℓ = ↓↑ then let w K = ρ K 1 + ρ K 2 − 2 ǫ ( ρ K 1 − ρ K 2 ). W e no w set α K = w K w L + w R +( i L − i R − 1) . The blo c k B i L is selected with pr obabilit y α L , the blo ck B i R is selected with prob ab ility α R , and for i R ≥ i L + 2, eac h of the other b lo c ks is selected with equal probability , 1 w L + w R +( i L − i R − 1) . Giv en the selection of the blo c k B ℓ as describ ed ab o v e, the elemen t s ℓ and the proﬁle b samp ℓ of the blo c k to wh ic h it b elongs are selected as f ollo ws. If th e co ord inate of ϑ corresp onding to B ℓ has already b een determined, then b samp ℓ is set to this v alue and s ℓ is drawn from B ℓ as determined b y the ↓↑ or ↑↓ setting of b samp ℓ . Otherwise, a fair coin is tossed, b samp ℓ is set either to ↓↑ or to ↑↓ dep end ing on th e outcome, and s ℓ is d ra wn fr om B ℓ as in the previous case (as determined by the setting of b samp ℓ ). No w all of I ℓ , s ℓ , and b ℓ = ( b L ℓ , b samp ℓ , b R ℓ ) hav e b een determined and the trip le ( I ℓ , s ℓ , b ℓ ) is tak en as the ℓ -th element of the extended transcript. W e now deﬁne A (0) , N otf as follo ws. A dra w from this distr ibution ov er (non-extended) tran s cripts is obtained by ﬁrst dr a win g an extended tr anscript ( I 1 , s 1 , b 1 ) , . . . , ( I q , s q , b q ) from the on-the-ﬂy pro- cess d escrib ed ab o v e, and then r emo vin g the third elemen t of eac h triple to yield ( I 1 , s 1 ) , . . . , ( I q , s q ) . This is exactly the distr ib ution o v er transcripts that is obtained by ﬁ rst dra wing D fr om P N o and then ru nning A on it. 9.3.2 Extended t ranscripts for A ( k ) , k ≥ 0 In this su bsection we deﬁn e the distribution A ( k ) ,N otf for 0 ≤ k ≤ q (the deﬁn ition we giv e b elo w will coincide with ou r deﬁnition from the previous s ubsection for k = 0). Here to o the size ∆ and the oﬀset y are dr awn at the v ery b eginning, an d the co ord inates of the proﬁ le v ector ϑ are c h osen on the ﬂy , together with the samp le p oints. F or eac h ℓ > k , the p air ( s ℓ , b ℓ ) is selected exactly as was describ ed for A , conditioned on the length- k p reﬁx of the extended tr anscript and the new query I ℓ (as well as the c hoice of (∆ , y )). It remains to explain how the selection is made for 1 ≤ ℓ ≤ k . Consider a v alue 1 ≤ ℓ ≤ k and the ℓ -th q u ery interv al I ℓ . As in our description of the “on- the-ﬂy” p ro cess f or A , indu ctiv ely some co ordin ates of ϑ may ha v e b een already set by p revious queries. Let B i L , . . . , B i R b e the blo c ks that I ℓ in tersects. As in the pro cess for A , if the co ordin ate of ϑ corresp ond ing to b lo c k B i L w as n ot already set b y a previous qu ery , a fair coin is tossed to c h o ose a setting from {↓↑ , ↑↓} for this co ordinate. Like wise, if the co ordinate of ϑ corresp on d ing to blo c k B i R w as n ot already set (either by a previous query or b ecause i L = i R ), a f air coin is tossed to c h o ose a setting from {↓↑ , ↑↓} for this co ord in ate. Hence, b L ℓ and b R ℓ are set exactly the same as describ ed for A . W e now explain ho w to set the p robabilities α L and α R of selecting the blo ck B ℓ (from which s ℓ is c hosen) to b e B i L and B i R , resp ectiv ely . S ince the “faking” p ro cess should choose s ℓ to b e a uniform p oint from I ℓ , the p robabilit y α L is simply | B i L ∩ I ℓ | / | I ℓ | , and similarly for α R . (If i L = i R w e tak e α L = 1 and α R = 0 . ) Thus the v alues of α L and α R are completely determined by the 78 n umber of blo c ks j and the relativ e sizes of the in tersection of I ℓ with B i L and w ith B i R . Now, with probability α L the blo c k B ℓ is c h osen to b e B i L , with probability α R it is c hosen to b e B i R and with probability 1 − α L − α R it is c h osen u niformly among { B i L +1 , . . . , B i R − 1 } . Giv en the selection of the b lo c k B ℓ as d escrib ed ab ov e, s ℓ is c h osen to b e a uniform rand om elemen t of B ℓ ∩ I ℓ . The p roﬁle b samp ℓ of B ℓ is selected as f ollo ws: 1. If the co ord in ate of ϑ corresp ondin g to B ℓ has already b een determined (either b y a p revious query or b ecause B ℓ ∈ { B i L , B i R } ), then b samp ℓ is set accordingly . 2. Otherwise, the proﬁle of B ℓ w as n ot already set; note that in this case it must h old that B ℓ / ∈ { B i L , B i R } . W e lo ok at the h alf of B ℓ that s ℓ b elongs to, and toss a biased coin to set its proﬁle b samp ℓ ∈ {↓↑ , ↑↓} : If s ℓ b elongs to the ﬁrst half, then the coin toss’s probabilities are ((1 − 2 ǫ ) / 2 , (1 + 2 ǫ ) / 2) ; otherw ise, they are ((1 + 2 ǫ ) / 2 , (1 − 2 ǫ ) / 2). Let E ( k ) , N otf denote the distribu tion in d uced by the ab o ve p ro cess ov er extended transcripts, and let A ( k ) , N otf b e the corresp onding distrib u tion o ver r egular transcripts (that is, wh en removing the proﬁles fr om th e trans cr ip t). As noted in Subsection 9.3.1 , for k = 0 we ha v e that A (0) , N otf = A N . In the other extreme, for k = q , since eac h p oin t s ℓ is selected un iformly in I ℓ (with no dep endence on the selected proﬁles) we ha v e that A ( q ) , N otf = A Y . In the next su bsection we b ou n d th e total v ariation distance b et w een A ( k ) , N otf and A ( k +1) , N otf for ev ery 0 ≤ k ≤ q − 1 by b ounding the distance b et w een the corresp onding distributions E ( k ) , N otf and E ( k +1) , N otf . Roughly sp eaking, the only diﬀerence b et w een th e t w o (for eac h 0 ≤ k ≤ q − 1) is in the distribu tion o ver ( s k +1 , b samp k +1 ). As w e argue in more detail and formally in the next su b section, conditioned on certain ev en ts (determined, among other things, b y the c hoice of (∆ , y )), we h a ve that ( s k +1 , b samp k +1 ) are distributed the same un der E ( k ) , N otf and E ( k +1) , N otf . 9.4 Bounding d TV  A ( k ) , N otf , A ( k +1) , N otf  As p er the foregoing discussion, w e can f o cus on b ounding the total v ariation distance b et we en extended transcrip ts d TV  E ( k ) , N otf , E ( k +1) , N otf  for arbitrary ﬁxed k ∈ { 0 , . . . , q − 1 } . Before diving into the p r o of, w e start b y deﬁn ing the probabilit y s p ace we s hall b e wo rking in, as we ll as explaining th e d iﬀeren t sources of r andomness that are in p la y and how they ﬁt in to the random pr o cesses we en d up analyzing. The p robability sp ace. R ecall the deﬁnition of an extended transcr ip t: for n otational conv e- nience, we reserv e the notation E = ( I ℓ , s ℓ , b ℓ ) ℓ ∈ [ q ] for extended transcript v alued random v ariables, and will write E = ( ι ℓ , σ ℓ , π ℓ ) ℓ ∈ [ q ] for a ﬁ xed outcome. W e denote by Σ the space of all suc h tuples E , and by Λ the set of all p ossib le outcomes for (∆ , y ). The sample space we are considering is no w deﬁned as X def = Σ × Λ: that is, an extended transcrip t along with the und er lyin g c hoice of 79 blo c k size and oﬀset 10 . The tw o p robabilit y measures on X we sh all consider will b e induced by the execution of A ( k ) and A ( k +1) , as p er the pro cess detailed b elo w. A key thing to observ e is that, as w e fo cus on t w o “adjacent” faking algorithms A ( k ) and A ( k +1) , it will b e suﬃcient to consider the follo wing equ iv alent view of the w a y an extended tr anscript is generated: 1. up to (and includ in g) stage k , the f aking algorithm generates on its o wn b oth the queries ι ℓ and the un iformly distributed samp les σ ℓ ∈ ι ℓ ; it also chooses its ( k + 1)-st query ι k +1 ; 2. then, at that p oin t only is the c h oice of (∆ , y ) made; and the proﬁles π ℓ (1 ≤ ℓ ≤ k ) of th e pr evious blo c ks decided up on, as describ ed in Subsection 9.3 ; 3. after this, the sampling and blo ck pr oﬁle selection is made exactly according to th e previous “on-the-ﬂy pr o cess” d escription. The reason that we can defer the c hoice of (∆ , y ) and the setting of th e proﬁles in the mann er describ ed ab o ve is the follo wing: F or b oth A ( k ) and A ( k +1) , the choic e of eac h σ ℓ for 1 ≤ ℓ ≤ k de- p end s only on ι ℓ and the c hoice of eac h ι ℓ for 1 ≤ ℓ ≤ k + 1 dep ends only on ( ι 1 , σ 1 ) , . . . , ( ι ℓ − 1 , σ ℓ − 1 ). That is, there is no dep end ence on (∆ , y ) n or on any π ℓ ′ for ℓ ′ ≤ ℓ . By deferring the c h oice of the pair (∆ , y ) we may consider the randomness coming in its dra w only at th e ( k + 1)-st stage (whic h is the p iv otal stage here). Note that, b oth f or A ( k ) and A ( k +1) , th e resulting distribution o ver X induced by the d escription ab o v e exactly matc hes the one from the “on-the-ﬂy” pr o cess. In the next paragraph, we go into more detail, and b reak do wn further the randomn ess and choic es happ en ing in this new view. Sources of ran domness . T o d eﬁne the probabilit y measure on this space, we describ e the pro cess that, u p to s tage k + 1, generates the corresp ond ing part of the extended transcript and the (∆ , y ) for A ( m ) (where m ∈ { k , k + 1 } ) (see the previous su bsections for pr ecise d escriptions of ho w the follo w in g random c hoices are made): (R1) A ( m ) dra ws ι 1 , σ 1 , . . . , ι k , σ k and ﬁnally ι k +1 b y itself; (R2) the outcome of (∆ , y ) is c h osen: this “retroactiv ely” ﬁxes the partition of the ι ℓ ’s (1 ≤ ℓ ≤ k + 1) into blo c ks B ( ℓ ) i L , . . . , B ( ℓ ) i R ; (R3) the proﬁles of B ( ℓ ) i L , B ( ℓ ) i R and B ℓ (i.e., the v alues of the triples π ℓ , for 1 ≤ ℓ ≤ k ) are dra wn; (R4) the proﬁles of B ( k +1) i L , B ( k +1) i R are c hosen; (R5) the blo ck selection (c hoice of the b lo c k B k +1 to which σ k +1 will b elong to) is made: (a) whether it will b e one of the t w o end blo c ks, or one of the inner ones (for A ( k +1) this is based on the resp ectiv e s izes of th e end blo c ks, and for A ( k ) this is based on the w eigh ts of the end b lo c ks, u sing the proﬁles of the end blo cks); 10 W e emphasize th e fact that the algorithm, whether faking or not, has access n eith er to the “ext en ded” part of the t ranscript nor to the c hoice of (∆ , y ); how ever, these elements are part of the ev ents w e analyze. 80 (b) the c hoice itself: • if one of the outer ones, dr aw it based on either the resp ectiv e sizes (for A ( k +1) ) or the r esp ectiv e weig h ts (for A ( k ) , u sing the p roﬁles of the end blo cks) • if one of the inner on es, un iformly at ran d om among all inn er blo c ks; (R6) the sample σ k +1 and the pr oﬁ le π samp k +1 are chosen; (R7) the r est of the tr anscript, for k + 1 , . . . , q , is iterativ ely chosen (in the same wa y f or A ( k ) and A ( k +1) ) according to the on-the-ﬂy pro cess d iscussed b efore. Note that th e only d iﬀerences b et ween the pro cesses for A ( k ) and A ( k +1) lie in steps ( R5a ), ( R5b ) and ( R6 ) of the ( k + 1)-st stage. Bad event s and outline of the argument Let G ( ι k +1 ) (where ‘ G ’ stands for ‘Goo d ’) denote the settings of (∆ , y ) th at satisfy the follo w ing: Either (i) | ι k +1 | > ∆ · (log N ) 2 or (ii) | ι k +1 | < ∆ / log N and ι k +1 is conta ined en tirely w ithin a single half blo ck. W e next d eﬁ ne three indicator random v ariables f or a giv en elemen t ω = ( E , (∆ , y )) of the sample space X , where E = (( ι 1 , σ 1 , π 1 ) , . . . , ( ι q , σ q , π q )). The ﬁrst, Γ 1 , is zero when (∆ , y ) / ∈ G ( ι k +1 ). Note that th e randomness for Γ 1 is o v er the choice of (∆ , y ) and the c hoice of ι k +1 . Th e second, Γ 2 , is zero wh en ι k +1 in tersects at least t w o b lo c ks and the blo c k B k +1 is one of the t wo extreme blo c ks in tersected by ι k +1 . T he third, Γ 3 , is zero when ι k +1 is n ot conta ined entirely within a single h alf b lo c k and B k +1 is a blo c k whose p r oﬁle h ad already b een set (either b ecause it con tains a selected p oin t σ ℓ for ℓ ≤ k or b ecause it b elongs to one of the tw o extreme blo c ks for some qu eried in terv al ι ℓ for ℓ ≤ k ). F or n otational ease we w rite Γ( E ) to den ote the triple (Γ 1 , Γ 2 , Γ 3 ). Observe that these in dicator v ariables are well deﬁn ed , and corresp ond to eve nt s that are ind eed s u bsets of our space X : giv en any elemen t ω ∈ X , wh ether Γ i ( ω ) = 1 (for i ∈ { 1 , 2 , 3 } ) is fully determined. Deﬁne D 1 , D 2 as the t wo distributions o ver X ind uced b y the executions of resp ectiv ely A ( k ) and A ( k +1) (in particular, b y only keeping the ﬁ rst marginal of D 1 w e get b ac k E ( k ) , N ). Applying F act 33 to D 1 and D 2 , we obtain that d TV ( D 1 , D 2 ) ≤ Pr  Γ 6 = (1 , 1 , 1)  + d TV  D 1 | Γ = (1 , 1 , 1) , D 2 | Γ = (1 , 1 , 1)  ≤ Pr[ Γ 1 = 0 ] + Pr[ Γ 2 = 0 | Γ 1 = 1 ] + Pr[ Γ 3 = 0 | Γ 1 = Γ 2 = 1 ] + d TV  D 1 | Γ = (1 , 1 , 1) , D 2 | Γ = (1 , 1 , 1)  . (84) T o conclude, w e can no w deal wh ic h eac h of these 4 sum m ands separately: Claim 34 We have that Pr [ Γ 1 = 0 ] ≤ η ( N ) , wher e η ( N ) = O  log log N log N  . Pro of: Simila rly to the p r o of of Claim 29 , for an y ﬁ xed setting of ι k +1 , th ere are O (log log N ) v alues of ∆ ∈  N 2 j   j ∈ { 1 3 log N , . . . , 2 3 log N }  for which ∆ / log N ≤ ι k +1 ≤ ∆ · (log N ) 2 . Th ere- fore, the probabilit y that one of these (“bad”) v alues of ∆ is selected is O  log log N log N  . If the choic e of ∆ is suc h that | ι k +1 | < ∆ / log N , then, b y the c hoice of the random oﬀset y , the prob ab ility th at ι k +1 is n ot en tirely con tained within a single h alf blo ck is O (1 / log N ). T h e claim f ollo ws. 81 Claim 35 We have that Pr [ Γ 2 = 0 | Γ 1 = 1 ] ≤ η ( N ) . Pro of: If Γ 1 = 1 b ecause | ι k +1 | < ∆ / (log N ) 2 and ι k +1 is ent irely con tained within a single h alf blo c k, then Γ 2 = 1 (with probabilit y 1). Otherwise, | ι k +1 | > ∆ · (log N ) 2 , so th at ι k +1 in tersects at least (log N ) 2 blo c ks. The probability that one of the tw o extreme blo c ks is selected is hence O (1 / (log N ) 2 ), and th e claim follo ws. Claim 36 We have that Pr [ Γ 3 = 0 | Γ 1 = Γ 2 = 1 ] ≤ η ( N ) . Pro of: If Γ 1 = 1 b ecause | ι k +1 | < ∆ / (log N ) 2 and ι k +1 is ent irely con tained within a single h alf blo c k, then Γ 3 = 1 (with pr obabilit y 1). Otherwise, | ι k +1 | > ∆ · (lo g N ) 2 , so that ι k +1 in tersects at least (log N ) 2 blo c ks. Since Γ 2 = 1, the blo c k B k +1 is un iformly selected fr om (log N ) 2 − 2 non- extreme blocks. Among them there are at most 3 k = O  log N log log N  blo c ks whose p roﬁles were already set. The probabilit y that one of them is selected (so that Γ 3 = 1) is O  1 log N log log N  = O  log l og N log N  , and the claim f ollo ws. W e are left with only the last term, d TV  D 1 | Γ = (1 , 1 , 1) , D 2 | Γ = (1 , 1 , 1)  . But as we are no w ruling out all the “b ad even ts” that would indu ce a diﬀerence b et ween the distr ibutions of the extended transcripts un der A ( k ) and A ( k +1) , it b ecomes p ossible to argue that this distance is actually zero: Claim 37 d TV  D 1 | Γ = (1 , 1 , 1) , D 2 | Γ = (1 , 1 , 1)  = 0 . Pro of: Unrolling the deﬁ nition, we can write d TV  D 1 | Γ = (1 , 1 , 1) , D 2 | Γ = (1 , 1 , 1)  as X E , (∆ ,y )    Pr h E ( k ) = E , Y ( m ) = (∆ , y )    Γ = (1 , 1 , 1) i − Pr h E ( k +1) = E , Y ( m ) = (∆ , y )    Γ = (1 , 1 , 1) i    . where Y ( m ) denotes the Λ-v alued random v ariable corresp onding to A ( m ) . In order to b ound this sum, we will sho w th at eac h of its term s is zero: i.e., th at for an y ﬁxed ( E , (∆ , y )) ∈ Σ × Λ we ha v e Pr h E ( k ) = E , Y ( k ) = (∆ , y )    Γ = (1 , 1 , 1) i = Pr h E ( k +1) = E , Y ( k +1) = (∆ , y )    Γ = (1 , 1 , 1) i . W e start by observin g that, for m ∈ { k , k + 1 } , Pr h E ( m ) = E , Y ( m ) = (∆ , y )    Γ = (1 , 1 , 1) i = Pr h E ( m ) = E    Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y ) i Pr h Y ( m ) = (∆ , y )    Γ = (1 , 1 , 1) i and that the term Pr  Y ( m ) = (∆ , y )   Γ = (1 , 1 , 1)  = Pr  Y ( m ) = (∆ , y )  is identic al for m = k and m = k + 1. Therefore, it is suﬃcient to sh ow th at Pr h E ( k ) = E    Γ = (1 , 1 , 1) , Y ( k ) = (∆ , y ) i = Pr h E ( k +1) = E    Γ = (1 , 1 , 1) , Y ( k +1) = (∆ , y ) i . Let ω = ( E , (∆ , y )) ∈ X b e arb itrary , with E = (( ι 1 , σ 1 , π 1 ) , . . . , ( ι q , σ q , π q )) ∈ Σ, and let m ∈ { k , k + 1 } . W e can express Φ ( m ) ( ω ) def = Pr  E ( m ) = E   Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y )  as the p ro duct of the follo wing 5 terms: 82 (T1) p ( m ) , in t , samp k ( ω ), deﬁned as p ( m ) , in t , samp k ( ω ) def = Pr h E ( m ) , in t , samp | k = E int , samp | k    Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y ) i = Pr h E ( m ) , in t , samp | k = E int , samp | k i , where E int , samp ℓ denotes ( ι ℓ , σ ℓ ) and E int , samp | k denotes ( E int , samp 1 , . . . , E int , samp k ); (T2) p ( m ) , prof k ( ω ), deﬁned as p ( m ) , prof k ( ω ) def = Pr h E ( m ) , prof | k = E prof | k    E ( m ) , in t , samp | k = E int , samp | k , Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y ) i = Pr h E ( m ) , prof | k = E prof | k    E ( m ) , in t , samp | k = E int , samp | k , Y ( m ) = (∆ , y ) i , where E prof | k denotes ( π 1 , . . . , π k ); (T3) p ( m ) , in t k +1 ( ω ), deﬁned as p ( m ) , in t k +1 ( ω ) def = Pr h I k +1 = ι k +1    E ( m ) , in t , samp | k = E int , samp | k , Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y ) i = Pr h I k +1 = ι k +1    E ( m ) , in t , samp | k = E int , samp | k i ; (T4) p ( m ) , samp , prof k +1 ( ω ), deﬁned as p ( m ) , samp , prof k +1 ( ω ) def = Pr h ( s k +1 , b k +1 ) = ( σ k +1 , π k +1 )    I k +1 = ι k +1 , E | ( m ) k = E | k , Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y ) i ; (T5) and the last term p ( m ) k +2 ( ω ), deﬁned as p ( m ) k +2 ( ω ) def = Pr h E ( m ) | k +2 ,...,q = E | k +2 ,...,q    E ( m ) | k +1 = E | k +1 , Γ = (1 , 1 , 1) , Y ( m ) = (∆ , y ) i , where E | k +1 = (( ι k +1 , σ k +1 , π k +1 ) , . . . , ( ι q , σ q , π q )). Note that we could remo ve the conditioning on ¯ Γ for the ﬁrs t three terms, as they only dep end on the length- k p reﬁx of the (extended) transcript and the c hoice of ι k +1 , that is, on the randomn ess from ( R1 ). T he imp ortant observ ation is that the ab ov e probabilities are indep endent of wh ether m = k or m = k + 1. W e ﬁ rst v erify this for ( T1 ), ( T2 ), ( T3 ) and ( T5 ), and then turn to th e sligh tly less straight forwa rd term ( T4 ). This is true for p ( m ) , in t , samp k ( E ) b ecause A ( k ) and A ( k +1) select their in terv al qu er ies in exactly the same mann er, an d for 1 ≤ ℓ ≤ k , the ℓ -th sample p oint is u niformly selected in the ℓ -th qu eried inte rv al. S imilarly we get th at p ( k ) , i n t k +1 ( E ) = p ( k +1) , i nt k +1 ( E ). The probabilities p ( k ) , pr of k ( E ) and p ( k +1) , prof k ( E ) are in d uced in the same mann er b y ( R2 ) and ( R3 ), and p ( k ) k +2 ( E ) = p ( k +1) k +2 ( E ) since for b oth A ( k ) and A ( k +1) , th e pair ( s ℓ , b ℓ ) is distrib u ted the same for ev ery ℓ ≥ k + 2 (conditioned on any length-( k + 1) p reﬁx of the (extended) transcript and the c h oice of (∆ , y )). 83 T urning to ( T4 ), ob s erv e that Γ 1 = Γ 2 = Γ 3 = 1 (by conditioning). C onsider ﬁ rst the case that Γ 1 = 1 b ecause | ι k +1 | < ∆ / log N and ι k +1 is con tained en tirely within a single half blo c k. F or this case there are t w o sub cases. In the ﬁrs t sub case, the pr oﬁle of the blo c k that con tains ι k +1 w as already set. This imp lies that b k +1 is fully determined (in the same mann er) for b oth m = k and m = k + 1. In the second sub case, the pr oﬁle of the blo c k that con tains ι k +1 (whic h is an extreme b lo c k) is set indep endent ly and with equal p robabilit y to either ↓↑ or ↑↓ for b oth m = k and m = k + 1. I n either sub case, s k +1 is u niformly distributed in ι k +1 for b oth m = k and m = k + 1. Next, consider the r emaining case that Γ 1 = 1 b ecause | ι k +1 | > ∆ · (log N ) 2 . In th is case, sin ce Γ 2 = 1, the blo c k B k +1 is not an extreme b lo c k, and since Γ 3 = 1, the pr oﬁle of the blo ck B k +1 w as not pr eviously set. Give n th is, it follo ws from the discus s ion at th e end of Subsection 9.3 that the distribu tion of ( s k +1 , b k +1 ) is identic al whether m = k (and A ( m ) do es not f ak e the ( k + 1)-th query) or m = k + 1 (and A ( m ) fak es the ( k + 1)-th quer y ). Assem b ling the pieces, the 4 claims ab ov e together w ith Equation ( 84 ) yield d TV  E ( k ) , N , E ( k +1) , N  ≤ d TV ( D 1 , D 2 ) ≤ 3 η ( N ), and ﬁnally d TV  A N , A Y  = d TV  A (0) , N otf , A ( q ) , N otf  ≤ q − 1 X k =0 d TV  A ( k ) , N otf , A ( k +1) , N otf  ≤ q − 1 X k =0 d TV  E ( k ) , N , E ( k +1) , N  ≤ 3 q · η ( N ) ≤ 1 / 5 for a suitable choice of the absolute constan t τ . 10 Conclusion W e hav e in tro d uced a new conditional sampling framewo rk for testing probabilit y distributions and sho wn that it allo ws signiﬁcan tly more query-eﬃcient algorithms than the standard framew ork for a range of pr oblems. T his n ew f ramew ork p resen ts many p otenti al directions for fu tu re w ork. One sp eciﬁc goal is to str engthen the up p er and lo w er b ounds for pr ob lems stud ied in this pap er. As a concrete question along these lines, we conjecture that COND algorithms for testing equalit y of t w o u n kno wn d istributions D 1 and D 2 o ver [ N ] require (log N ) Ω(1) queries. A broader goal is to study more prop erties of distributions b ey ond those considered in this pap er; n atural can- didates h ere, which hav e b een w ell-studied in the stand ard mo del, are monotonicit y (for which we ha ve p reliminary resu lts), ind ep endence b et wee n marginals of a joint distribution, and entrop y .Y et another goal is to study distrib utions o v er other stru ctured domains suc h as th e Boolean hyper cu b e { 0 , 1 } n – here it w ould seem natur al to consider “sub cub e” quer ies, analogous to the ICOND qu eries w e considered when the structured d omain is the linearly ordered set [ N ]. A ﬁnal br oad goal is to study distribu tion le arning (rather than testing) p roblems in the conditional sampling framew ork. 84 Ac kn o wledgemen ts W e are sincerely grateful to th e anon ymous referees for their close reading of this pap er and for their many helpful su ggestions, wh ic h signiﬁcant ly improv ed the exp osition of the ﬁnal v er s ion. References [AJ06] Jos ´ e A. Ad ell and P edr o Jo dr a. E xact Kolmogoro v and total v ariation d istances b et wee n some familiar discrete distributions. J. of Ine q. A pp. , 2006(1):6 4307, 2006. 4.2 [BDKR05] T u˘ gk an Batu, S an j o y Dasgupta, Ra vi Kumar, and Ronitt Ru binfeld. The complexit y of app ro ximating the entrop y . SICO M P , 35(1):132– 150, 2005. 1.1 [BFF + 01] T u˘ gk an Bat u, Eld ar Fisc her, Lance F ortno w , Ravi Kumar, Ronitt Rubinfeld, and P atric k White. T esting rand om v ariables for indep endence and iden tity . In Pr o c e e dings of FOCS , pages 442–451, 2001. 1.1 , 1.3 , 4 , 7 [BFR + 00] T u˘ gk an Batu, Lance F ortno w, Ronitt Rubinf eld, W arren D. S mith, and Pa tric k White. T esting that distr ibutions are close. I n Pr o c e e dings of FOCS , p ages 189–197 , 2000. 1.1 , 10 [BFR + 10] T u˘ gk an Batu, Lance F ortno w, Ronitt Rubinf eld, W arren D. S mith, and Pa tric k White. T esting closeness of discrete distribu tions. (abs/1009.5 397), 2010. This is a long v ersion of [ BFR + 00 ]. 1.1 , 1 , 1.3 , 4 , 6.1 [BFR V11] Arnab Bhatta c h aryy a, Eldar Fisc her, Ronitt Rub in feld, and P au l V alian t. T esting monotonicit y of distributions o v er general p artial orders. In Pr o c e e dings of ITCS , pages 239–2 52, 2011. 1.1 [BKR04] T u˘ gk an Batu, Ra vi Kumar, and Ronitt Rubinfeld. Su blinear algorithms for testing monotone and un imo dal d istributions. In Pr o c e e dings of STOC , p ages 381–390, New Y ork, NY, USA, 2004. A CM. 1.1 [CD VV14] Siu-On Chan, Ilias Diak onik olas, Gr egory V alia n t, and P aul V alian t. Optimal algo- rithms for testing closeness of discrete distrib utions. pages 1193– 1203. So ciet y for In- dustrial and Applied Mathematics (SIAM), 2014. 1.3 [CF GM13] Soura v Ch akrab orty , Eldar Fisc her, Y onatan Goldhirsh , and Ar ie Matsliah. On the p ow er of conditional samples in distrib ution testing. In Pr o c e e dings of the 4th c onfer enc e on Innovations in The or etic al Computer Scienc e , IT CS ’13, p ages 561–580 , New Y ork, NY, USA, 2013. ACM. (do cument) , 1.2 , 1.4 [DDS + 13] Constan tinos Dask alakis, Ilias Diak onikola s, Ro cco A. Serve dio, Gregory V alia n t, and P aul V alian t. T esting k -mo dal distrib utions: Op timal algorithms via reductions. In Pr o c e e dings of SODA , pages 1833– 1852. So ciet y for In dustrial and Applied Mathemati cs (SIAM), 2013. 1.1 85 [DP09] Devdatt P . Dubhashi and Alessandr o Pa nconesi. Conc entr ation of me asur e for the analysis of r andomize d algorithms . Cam bridge Un iv er s it y Press, Cambridge, 2009. 2.2 [Fis01] Eldar Fisc her. Th e art of uninform ed decisions: A primer to prop ert y testi ng. BEA TCS , 75:97– 126, 2001. 1.1 [GGR98] Od ed Goldreic h, S h aﬁ Goldw asser, and Dana Ron. Prop ert y testing and its connection to learning and appro ximation. JACM , 45(4):653 –750, 1998. 1.1 [Gol10] Oded Goldreic h, editor. Pr op erty T esting: Curr ent R ese ar ch and Su rveys . Springer, 2010. LNC S 6390. 1.1 [GR00] O ded Goldreic h and Dana Ron. On testing expansion in b oun d ed-degree graphs. T ech- nical Rep ort T R00-020 , E C CC, 2000. 1.1 , 1.3 [ILR12] Piotr Indyk, R eu t Levi, and Ronitt Rubinf eld. Appr o ximating and T esting k -Histogram Distributions in Su b-linear Time. In Pr o c e e dings of PO DS , pages 15–22, 2012. 1.1 [Ma81] Shang-Keng Ma. C alculation of en trop y from data of motion. J. Stat. Phys. , 26(2): 221– 240, 1981. 1.1 [MR95] Ra jeev Mot wani and Prabhak ar Ragha v an. R andomize d Algorith ms . Cam bridge Uni- v er s it y Press, New Y ork, NY, 1995. 2.2 [Ney34] Jerzy Neyman. On the t wo diﬀeren t asp ects of the r ep resen tativ e metho d: Th e metho d of stratiﬁed samp lin g and the metho d of purp osiv e selection. Journal of the R oyal Statistic al So cie ty , 97(4):558– 625, 1934. 1.2 [P an04] Liam P aninski. Es timating ent rop y on m b ins giv en fewer than m samples. IEEE- IT , 50(9): 2200–2 203, 2004. 1.1 [P an08] Liam Paninski. A coincidence-based test for un iformit y given v ery sparsely sampled discrete data. IEEE -IT , 54(10):47 50–475 5, 2008. 1.1 , 1.3 [Rey11] Leo Reyzin. Extractors and the lefto ver hash lemma. http://w ww.cs.bu .edu/ ~ reyzin/t eaching/ s11cs937/notes- leo- 1.pdf , Marc h 2011. Lecture notes. 2.2 [Ron08] Dana Ron. Prop ert y T esting: A Learning T heory P ersp ectiv e. FnTML , 1(3):3 07–402 , 2008. 1.1 [Ron10] Dana Ron. Algorithmic and analysis tec hn iques in prop ert y testing. FnTCS , 5:73–205, 2010. 1.1 [RRSS09] Sofya Raskho dnik o v a, Dana Ron, Amir Shpilk a, and Adam Smith. S trong lo wer b onds f or appr oximati ng distrib utions supp ort size and the d istinct elements problem. SICOMP , 39(3):813–8 42, 2009. 1.1 [RS96] Ronitt Rubinfeld and Madhu S udan. Robust c h aracterizat ion of p olynomials w ith applications to pr ogram testing. SICOMP , 25(2):252– 271, 1996. 1.1 86 [RS09] Ronitt Rub infeld and Ro cco A. Serve dio. T esting monotone high-dim en sional distribu - tions. RSA , 34(1):24–44 , January 2009. 1.1 , 6.2 , 6.2.1 [V al1 1] P aul V alian t. T esting s ymmetric prop erties of distrib utions. SICOMP , 40(6):1927– 1968, 2011. 1.1 , 1.3 , 6.1 [VV10a] Gregory V alian t and P aul V alian t. A CL T and tigh t low er b ounds for estimating en tropy . T ec hnical Rep ort TR10-179, EC CC, 2010. 1.3 , 1.3.1 , 10 [VV10b] Gregory V alia nt and P au l V alian t. Estimating the unseen: A sublinear-sample canonical estimator of distributions. T ec h nical Rep ort T R10-180 , E CCC, 2010. 1.3 , 10 [VV11] Greg ory V alian t and Pa ul V alian t. Estimating th e unseen: a n n / log ( n )-sample esti- mator for entrop y and sup p ort size, shown optimal via new CL Ts. I n Pr o c e e dings of STOC , pages 685–6 94, 2011 . See also [ VV10a ] and [ VV10b ]. 1.1 , 1.3 , 1.3.1 [VV14] Greg ory V alian t and P aul V alian t. An automatic inequalit y pro ver and ins tance optimal iden tit y testing. In Pr o c e e dings of FOCS , 2014. 1.3 [Wik13] Wikiped ia con tr ibutors. Stratiﬁed S ampling. http:/ /en.wiki pedia.or g/wiki/Stratified_sampling accessed J uly 1, 2013. 1.2 87

Testing probability distributions using conditional samples

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment