Testing Properties of Edge Distributions

T esting Prop erties of Edge Distributions Y umou F ei ∗ Abstract W e initiate the study of distribution testing for probability distributions ov er the edges of a graph, motiv ated b y the closely related question of “edge-distribution-free” graph property testing. The main results of this paper are nearly-tight b ounds on testing bipartiteness, triangle- freeness and square-freeness of edge distributions, whose sample complexities are shown to scale as Θ( n ), n 4 / 3 ± o (1) and n 9 / 8 ± o (1) , resp ectively . The tec hnical core of our pap er lies in the pro of of the upp er b ound for testing square-freeness, wherein we develop new techniques based on certain birthday-parado x-t ype lemmas that may b e of indep enden t in terest. W e will discuss how our tec hniques ﬁt into the general framework of distribution-free prop ert y testing. W e will also discuss how our results are conceptually connected with T ur´ an problems and subgraph remo v al lemmas in extremal combinatorics. ∗ Departmen t of EECS, Massac husetts Institute of T echnology . 1 Con ten ts 1 In tro duction 3 1.1 Distribution-F ree T esting of F unctions . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Additional Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Related W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 F urther Motiv ation for Edge-Distribution-F ree T esting . . . . . . . . . . . . . . . . . 6 2 T ec hnical Overview 7 2.1 General F ramework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Applying the F ramework to Graph Problems . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Subgraph-Remo v al for Sparse Graphs . . . . . . . . . . . . . . . . . . . . . . 9 3 Preliminaries 10 3.1 General Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Sto c hastic Domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 T esting Bipartiteness 12 4.1 Upp er Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Lo wer Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5 Upp er Bound for Square-F reeness 15 5.1 Birthda y Parado x Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1.1 Birthda y Parado x in Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1.2 V ertex Cov er and F ractional Matchi ng . . . . . . . . . . . . . . . . . . . . . . 17 5.1.3 Birthda y Parado x in Hypergraphs . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 The Case Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.1 F rom Edges to Squares to Edges . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.2 The Diluteness Notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.3 The Dilute Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2.4 The Concen trated Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Upp er Bound for T ree-F reeness 30 6.1 More Birthda y Parado x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.2 Induction on the Number of Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.3 T ree-F reeness and Cliques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 7 Lo w er Bounds for Subgraph-F reeness 41 7.1 T riangle-F reeness Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 7.2 Square-F reeness Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.3 T ree-F reeness Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 8 Op en Problems 47 References 48 A Pro of of Prop osition 1.4 51 2 1 In tro duction Supp ose Λ is a ﬁnite set, and P is a class of probability distributions on Λ. In the standard model of distribution testing [ GGR98 , BFR + 00 ], given sampling access to an unkno wn distribution µ ov er Λ, an algorithm should accept with probabilit y at least 2 / 3 if µ b elongs to the class P , and reject if µ has total v ariation distance at least ε to any distribution in P . In many central problems such as uniformit y testing and identit y testing (see e.g. the survey [ Can22 ] and references therein), the domain Λ is a general unstructur e d set, i.e. there are no rela- tions among the elements of Λ. How ev er, there are also imp ortan t examples of problems where the domain Λ is endow ed with a certain structure. F or example, in monotonicity testing of distribu- tions (introduced by [ BKR04 ]), the domain is assumed to b e a partially ordered set. Some concrete domain structures, such as the hypercub e Λ = { 0 , 1 } n , hav e also b een studied in the literature for v arious problems. In this pap er, w e initiate the study of distribution testing with domain Λ =  [ n ] 2  = { t w o-element subsets of [ n ] } . A distribution ov er such a domain can b e view ed as a edge-weigh ted graph on n vertices, and a random sample from the distribution is a random edge from the graph generated with probabilities prop ortional to the edge w eights. W e fo cus the present study on prop erties of distributions that are c haracterized solely by the supp ort of the unkno wn distribution µ , i.e. supp( µ ) :=  x ∈ Λ   µ ( { x } ) > 0  . If the domain Λ is a general unstructured set, then the only (symmetric) information ab out the supp ort is its cardinality . Indeed, the problem of estimating the supp ort size (or testing whether the supp ort size is at most some v alue) of distributions has b een extensiv ely studied (see e.g. [ VV17 , FH25 ]). Since in our con text the supp ort of a distribution is the edge set of a graph, instead of a bare subset of an unstructured domain, one can study muc h richer prop erties of the supp ort suc h as bipartiteness and subgraph-freeness. F or the sake of conv enience, we say an edge distribution µ ov er  [ n ] 2  satisﬁes a certain graph prop ert y (such as triangle-freeness) if the supp ort of µ satisﬁes that prop ert y . The main results of this pap er are nearly-tigh t b ounds on the sample complexities of testing bipartiteness, triangle- freeness and square-freeness of edge distributions: Theorem 1.1 (Informal) . The sample c omplexities of testing bip artiteness, triangle-fr e eness and squar e-fr e eness of e dge distributions on n vertic es ar e Θ( n ) , n 4 / 3 ± o (1) and n 9 / 8 ± o (1) , r esp e ctively. 1.1 Distribution-F ree T esting of F unctions In this subsection, w e sho w ho w our results relate to distribution-fr e e pr op erty testing (of functions) . W e ﬁrst presen t the follo wing standard formalization of the distribution testing mo del used in Theorem 1.1 . Deﬁnition 1.2. Supp ose H is a nonempty (down ward-closed) family of subsets of a ﬁnite domain Λ. F or any parameter ε ∈ (0 , 1), we deﬁne dsam ( H , ε ) to be the minim um possible v alue of p ositiv e in teger m such that the follo wing holds: there exists an algorithm that for any distribution µ o v er Λ, tak es m indep enden t samples from µ and 3 (1) accepts with probabilit y at least 2 / 3 if supp( µ ) ∈ H ; (2) rejects with probabilit y at least 2 / 3 if ∥ µ − ν ∥ TV ⩾ ε for an y distribution ν ov er Λ with supp( ν ) ∈ H . Distribution-free prop ert y testing (of functions) was ﬁrst introduced by Goldreic h, Goldwasser and Ron [ GGR98 ] and has b een studied extensively . The (sample-based) distribution-free property testing mo del is deﬁned as follows. Deﬁnition 1.3 ([ GGR98 , Deﬁnition 2.1]) . Supp ose H is a nonempt y family of Boolean-v alued functions on a ﬁnite domain Λ. 1 F or an y parameter ε ∈ (0 , 1), we deﬁne sam ( H , ε ) to b e the minim um p ossible v alue of p ositive integer m such that the follo wing holds: there exists an algorithm that for an y distribution µ o ver Λ and an y function f : Λ → { 0 , 1 } , takes m indep enden t f -lab eled samples  x (1) , f ( x (1) )  , . . . ,  x ( m ) , f ( x ( m ) )  , where each x ( i ) is dra wn indep enden tly from µ , and (1) accepts with probabilit y at least 2 / 3 if f ∈ H ; (2) rejects with probabilit y at least 2 / 3 if P x ∼ µ [ f ( x )  = g ( x )] ⩾ ε for an y function g ∈ H . It is easy to observe the follo wing relation b et ween Deﬁnitions 1.2 and 1.3 (see Section A for a pro of ). Prop osition 1.4. Supp ose H is a do wnw ard-closed family of subsets of a ﬁnite domain Λ. F or any parameter ε ∈ (0 , 1), w e hav e dsam ( H , ε ) ⩽ sam ( H , ε ) ⩽ 20 ε ·  dsam ( H , ε ) + 1  . Therefore, the distribution testing problem in Deﬁnition 1.2 can basically b e viewed as the sp ecial case of (sample-based) distribution-free prop erty testing of Bo olean-v alued functions where the property is do wnw ard-closed. In the rest of the paper, w e will mostly w ork with Deﬁnition 1.3 instead of Deﬁnition 1.2 . Our main theorem (Theorem 1.1 ) can then b e formalized as follo ws. Theorem 1.5 (F ormal version of Theorem 1.1 ) . F or p ositive inte gers n , let G bip n , G tri n and G squ n b e the c ol le ction of bip artite, triangle-fr e e and squar e-fr e e subsets of  [ n ] 2  , r esp e ctively. F or any ε ∈ (0 , 1 10 ) , we have 2 Ω( n ) ⩽ sam  G bip n , ε  ⩽ O ( n/ε ) , (1.1) n 4 / 3 exp  − O  p log n  ⩽ sam  G tri n , ε  ⩽ O ( n 4 / 3 /ε ) , and (1.2) n 9 / 8 exp  − O  p log n  ⩽ sam  G squ n , ε  ⩽ O ( n 9 / 8 /ε ) . (1.3) Remark 1. In addition to the f -lab eled sampling access to µ as describ ed in Deﬁnition 1.3 , one can also allo w the distribution-free prop erty tester to query the function f on any input x ∈ Λ and receiv e the v alue f ( x ). Viewing an f -lab eled sample as also containing a “query ,” the total query complexit y of suc h an algorithm is the sum of the n umber of samples taken and the n um b er of additional queries made. In this pap er, unless otherwise stated, w e assume the prop ert y testers to b e sample-b ase d , i.e. they do not hav e the p o wer to mak e oracle queries for function v alues. 1 By iden tifying a subset of Λ with its indicator function, w e view a family of Boolean-v alued functions in terchange- ably as a family of subsets of the domain. 2 The lo wer bound sam  G bip n , 1 / 9  ⩾ Ω( n ) w as already prov en b y Goldreich and Ron [ GR16 , Theorem 4.6] ev en in the case where the unkno wn distribution ov er  [ n ] 2  is uniform. They also prov ed an O ( n/ε ) upp er bound in the uniform distribution case; our result sam  G bip n , ε  ⩽ O ( n/ε ) extends their upper bound to the distribution-free setting. 4 1.2 Additional Results In this subsection, w e presen t a few additional results complemen ting Theorem 1.5 . The ﬁrst additional result is the following generalization of the bipartiteness-testing result ( 1.1 ): the sample complexit y of testing any gr aph-homomorphism pr op erty is Θ( n ). Theorem 1.6. L et H b e a ﬁxe d simple gr aph with at le ast one e dge. F or p ositive inte gers n , let G H -hom n b e the c ol le ction of e dge sets E ⊆  [ n ] 2  such that ther e is a gr aph homomorphism fr om the gr aph ([ n ] , E ) to H . Then ther e exists a c onstant ε 0 ∈ (0 , 1) dep ending only on H such that for any ε ∈ (0 , ε 0 ] we have Ω( n ) ⩽ sam  G H -hom n , ε  ⩽ O ( n/ε ) . A generalization of graph-homomorphism prop erties is the class of semi-homo gene ous gr aph p artition pr op erties [ FR21 ], but the Θ( n ) sample complexit y in Theorem 1.6 do es not generalize to this class of prop erties. In fact, the prop ert y of b eing a clique b elongs to this class, and our next result sho ws that testing it requires only Θ( n 2 / 3 ) samples. 3 Theorem 1.7. F or p ositive inte gers n , let G cliq n b e the c ol le ction of subsets of  [ n ] 2  that c orr esp ond to cliques. F or any ε ∈ (0 , 1 10 ) , we have Ω( n 2 / 3 ) ⩽ sam  G cliq n , ε  ⩽ O ( n 2 / 3 /ε ) . Note that since G cliq n is not down ward-closed for n ⩾ 3, Theorem 1.7 cannot b e form ulated as a “distribution testing” result via Prop osition 1.4 (while Theorem 1.6 can). T urning to subgraph-freeness prop erties, how ever, w e are unable to determine the sample com- plexit y of testing H -freeness for every ﬁxed graph H . A relatively simple sp ecial case that we are able to solve is when H is a tree: Theorem 1.8. F or any simple gr aph H with at le ast one e dge and any p ositive inte ger n , let G H -free n b e the c ol le ction of H -fr e e subsets of  [ n ] 2  . If H is a ﬁxe d tr e e with t e dges, ther e exists a c onstant ε 0 dep ending only on t such that for any ε ∈ (0 , ε 0 ] we have Ω( n ( t − 1) /t ) ⩽ sam  G H -free n , ε  ⩽ O ( n ( t − 1) /t /ε ) . 1.3 Related W ork P ossibly due to the close connection with P A C learning, man y studies on distribution-free property testing fo cused on functions on the h yp ercube { 0 , 1 } n (e.g. [ GS09 , DR11 , CX16 , BFH21 , CP22 , CFP24 ]). Nevertheless, distribution-free mo dels has also b een considered in the con text of graph prop ert y testing, as w e discuss b elo w. A few pap ers [ Gol19 , GS19 ] studied a mo del called “vertex-distribution-free” graph prop ert y testing. In that mo del, the unknown distribution is ov er the vertices of a graph, and (more crit- ically ,) distances b et ween graphs are measured with resp ect to the v ertex distribution: supp ose E 1 , E 2 ⊆  [ n ] 2  are tw o edge sets, then the distance b et ween E 1 and E 2 with resp ect to a distribu- tion µ ov er [ n ] is X { u,v }∈ E 1 △ E 2 µ ( { u } ) · µ ( { v } ) , where E 1 △ E 2 is the symmetric diﬀerence b et ween E 1 and E 2 . 3 The prop erty of b eing a clique is in fact a homo geneous graph partition prop ert y , as deﬁned in [ FR21 ]. 5 This roughly corresp onds to taking Λ =  [ n ] 2  in Deﬁnition 1.3 and restricting the unknown dis- tribution µ ov er  [ n ] 2  to b e a “pro duct distribution” (see e.g. [ GGR98 , Section 10.1.3]). It turns out that suc h pro duct distributions b eha ve not to o diﬀeren tly from the uniform distribution in man y imp ortant asp ects. In particular, subgraph remov al lemmas (that are w ell-known in the uniform distribution setting; see e.g. [ Sha22 , Section 4.4]) still hold [ Gol19 ], implying that an y subgraph-freeness prop ert y can b e tested in constant queries 4 in the v ertex-distribution-free mo del. In con trast, the questions considered in the present work corresp ond to taking Λ =  [ n ] 2  in Deﬁnition 1.3 but not imp osing an y restriction on the unknown distribution µ . This setting p er- haps should b e called the “edge-distribution-free” mo del. In this mo del, (especially since distances b et ween graphs are no longer measured with resp ect to a pro duct distribution) subgraph remov al lemmas no longer mak e sense, and many basic prop erties such as triangle-freeness cannot b e tested in constant queries, as already observ ed in [ GGR98 , Section 10.1.4]. Retreating from the unattain- able constant-query regime, it is still natural to ask whether some prop erties hav e query/sample complexities that grow with the size parameter n more slowly than other prop erties — which is exactly what Theorems 1.5 to 1.8 attempts to answ er. 5 Another type of distribution-free models that has been considered for graphs features unkno wn distributions ov er [ n ] × [ d ] (see e.g. [ HK08 ]), where [ n ] is the v ertex set and d is an upp er b ound on the vertex degrees. This is arguably closer in spirit to the setting of unknown v ertex distributions than to the one of unknown edge distributions. 1.4 F urther Motiv ation for Edge-Distribution-F ree T esting As men tioned in Section 1.3 , it has been more p opular to study the case Λ = { 0 , 1 } n in Deﬁnition 1.3 than the case Λ =  [ n ] 2  . Ho w ever, sometimes questions ab out the latter domain naturally arise when studying questions ab out the former. In particular, in the pap er [ CFP24 ] on distribution- free testing of decision lists (a class of Bo olean functions on the h yp ercube), it turns out that the “hardest” case for a decision list tester is (roughly sp eaking) when the unkno wn distribution ov er { 0 , 1 } n is actually supp orted on the “weigh t-2 slice”  x ∈ { 0 , 1 } n   the Hamming weigh t of x is 2  , whic h is clearly equiv alen t to the domain  [ n ] 2  . The class of functions { 0 , 1 } n → { 0 , 1 } that are decision lists, when restricted to the weigh t-2 slice, b ecomes a class of functions  [ n ] 2  → { 0 , 1 } or equiv alen tly a class of graphs known as thr eshold gr aphs . In order to show that the prop erty of b eing a decision lists on { 0 , 1 } n can b e tested in O ( n 11 / 12 ) queries, the authors of [ CFP24 ] had to (roughly sp eaking) ﬁrst sho w the following: Theorem 1.9 (Implicit in [ CFP24 ]) . The pr op erty of b eing a thr eshold gr aph on n vertic es c an b e teste d in the e dge-distribution-fr e e mo del (with queries; se e R emark 1 ) using at most O ( n 2 / 3 ) queries (samples and queries c ombine d; se e R emark 1 ). It seems likely that in order to obtain an query-optimal distribution-free decision list tester, one has to ﬁrst optimize the query complexity in Theorem 1.9 . W e note that a query low er b ound of Ω( √ n ) for testing decision lists and (implicitly) for testing threshold graphs was sho wn in [ CFP24 ]. 4 By “constant queries,” w e mean the query complexity dep ends only on the proximit y parameter ε but not on the n umber of vertices of the graph. 5 Indeed, it was asked in [ sub ] whether one can deﬁne, motiv ate, and pro ve non-trivial results in “an edge- distribution-free mo del” for graph prop erty testing. 6 2 T ec hnical Ov erview In this section, we pro vide an o v erview of our pro of techniques. 2.1 General F ramew ork W e start by describing a general framew ork for distribution-free prop ert y testing. Deﬁnition 2.1. Supp ose H is a nonempt y family of Bo olean v alued functions on a ﬁnite domain Λ. Fix a function f : Λ → { 0 , 1 } . A subset S ⊆ Λ is said to b e an f -violation of the prop erty H if there do es not exist h ∈ H such that f agrees with h on S (i.e. f ( x ) = h ( x ) for all x ∈ S ). A subset S ⊆ Λ is said to b e a minimal f -violation of H if S is an f -violation of H but no prop er subset of S is an f -violation of H . The c ollection of minimal f -violations of H is the edge set of a h yp ergraph on the v ertex set Λ that we call the violation hyp er gr aph of f against H . The notion of violation h yp ergraph was formally introduced in [ DR11 ], 6 and is inheren tly im- p ortan t for prop erty testing because of the follo wing observ ation: Prop osition 2.2. F or an y distribution µ o ver Λ, if S is a vertex cov er of the violation hypergraph of f against H with minim um p ossible measure under µ , then µ ( S ) = min h ∈H P x ∼ µ [ f ( x )  = h ( x )] . (2.1) Pr o of. F or any h ∈ H , the set { x ∈ Λ | f ( x )  = h ( x ) } is a vertex co ver of the violation hypergraph of f against H , so its measure under µ is at least µ ( S ). Con versely , we claim that there exists some h ∈ H suc h that { x ∈ Λ | f ( x )  = h ( x ) } ⊆ S, whic h implies that the righ t-hand side of ( 2.1 ) is at most µ ( S ). Assume on the contrary that for ev ery h ∈ H there exists some x ∈ Λ \ S such that f ( x )  = h ( x ). Then Λ \ S is an f -violation of H , and hence there exists a minimal f -violation of H that is contained in Λ \ S . This con tradicts the assumption that S is a v ertex co v er of the violation hypergraph. Note that a one-sided-error tester for the property H can reject a function f if and only if it has sampled (or queried) all elemen ts of an f -violation of H . Therefore, the question of analyzing the one-sided-error sample complexity of testing H is equiv alen t to: given that the minimum-w eight v ertex cov er of the violation hypergraph of f has weigh t at least ε under µ , how many samples from µ do es one need to get a full edge of the f -violation hypergraph with high probabilit y? It turns out that one can prov e a fairly general birthday-parado x-type lemma in resp onse to this question (see Lemma 5.4 for a formal v ersion of the following lemma): Lemma 2.3 ([ CFP24 , Lemma 2.2], informal) . F or any k -uniform hyp er gr aph on n vertic es with vertic es weighte d by µ , if the minimum-weight vertex c over has weight at le ast ε , then O ( n ( k − 1) /k /ε ) samples fr om µ ar e suﬃcient to ﬁnd a ful l e dge with high pr ob ability. Remark 2. A pro of of the k = 2 case of Lemma 2.3 w as implicit already in [ DR11 ]; it w as abstracted in to the curren t form and generalized to k ⩾ 3 b y [ CFP24 ]. The proof of Lemma 2.3 in [ CFP24 ] is diﬀerent from the proof in [ DR11 ]; see Section 2.2 for more discussions. 6 In [ DR11 ] the edge set of the violation h yp ergraph is the collection of f -violations, instead of minimal f -violations. 7 As a simple application of Lemma 2.3 , consider the question of monotonicity testing o ver general p osets. Suppose Λ is a partially ordered set and f : Λ → { 0 , 1 } is a function, and w e wan t to test the prop ert y that f is monotone, i.e. f ( x ) ⩽ f ( y ) for all x ⩽ y . It is easy to see that any minimal f -violation of monotonicity m ust b e a pair { x, y } ⊆ Λ such that x < y . Therefore, the violation h yp ergraphs are 2-uniform, and applying Lemma 2.3 immediately yields the follo wing result of [ BFH21 ]: Theorem 2.4 ([ BFH21 , Theorem 7.9]) . F or any p artial ly or der e d set Λ with n elements, if H is the c ol le ction of monotone Bo ole an-value d functions on Λ , then sam ( H , ε ) ⩽ O ( √ n/ε ) . 7 Remark 3. As discussed earlier, ev ery sample-based property testing problem admits a one-sided- error c anonic al tester : the tester simply rejects if there is a violation of the prop ert y within the observ ed samples. When the property is downwar d-close d (more commonly referred to as monotone in the prop ert y testing literature), the canonical tester has an ev en simpler description. Let H b e a down ward-closed family of subsets of a ﬁnite domain Λ, and let µ b e a probability distribution o v er Λ. Recall from Deﬁnition 1.3 that in order to test whether an unknown set E ⊆ Λ b elongs to H , the algorithm receives samples e 1 , . . . , e m dra wn from µ , together with the information of whether e i ∈ E for each i ∈ [ m ]. W e call e i a p ositive sample if e i ∈ E . Since H is do wn ward-closed, it follo ws that the canonical tester for H rejects if and only if the set formed b y the p ositiv e samples do es not belong to H . 2.2 Applying the F ramew ork to Graph Problems The framework in Section 2.1 is esp ecially suitable for analyzing subgraph-freeness prop erties. It is easy to see that for any graph H and any f :  [ n ] 2  → { 0 , 1 } , the minimal f -violations of G H -free n (deﬁned in Theorem 1.8 ) are the copies of H in the edge set f − 1 (1). Therefore, for an y graph H with t edges, any violation hypergraph against H -freeness is t -uniform. W e ma y th us apply Lemma 2.3 and immediately get: Theorem 2.5. F or any simple gr aph H with t ⩾ 1 e dges, we have sam  G H -free n , ε  ⩽ O (1 /ε ) ·  n 2  ( t − 1) /t = O ( n 2( t − 1) /t /ε ) . The upp er b ound part of ( 1.2 ) follows as a sp ecial case of Theorem 2.5 , by taking H to b e a triangle. Ho wev er, if we apply Theorem 2.5 to the square-freeness prop ert y , we can only get the upp er b ound sam  G squ n , ε  ⩽ O ( n 3 / 2 /ε ), failing to reach the optimal b ound n 9 / 8 ± o (1) as stated in ( 1.3 ). In order to obtain the desired upp er b ound O ( n 9 / 8 /ε ), we ha v e to dev elop new tec hniques based on Lemma 2.3 : 1. W e ﬁrst op en up the pro of of Theorem 2.3 (due to [ CFP24 ]) as a white b o x. The main idea of the pro of is to use linear programming dualit y to turn the universal ly-quantiﬁe d condition ab out vertex co ver into an existenc e of “fractional matc hing” in the violation h yp ergraph (Lemma 5.2 ). In the context of testing square-freeness, the edges of the violation hypergraph are copies of squares in the input graph, so the “fractional matc hing” would translate to a family of “w eighted squares” whose conv ex com bination is dominated b y the edge distribution (Deﬁnition 5.5 ). 7 In the uniform distribution case, monotonicity can b e tested in O  p n/ε  samples [ FLN + 02 ]. 8 2. The fractional matching eﬀectively allo ws us to “embed” a classical-birthday-parado x struc- ture into the edge distribution. One can use Carath´ eo dory’s theorem to limit the n um b er of squares in the fractional matching, i.e. the n umber of “birthda y slots,” to at most O ( n 2 ). In O ( n 3 / 2 ) samples, with high probabilit y there are four p eople sharing a birthda y , i.e. four edges forming a square. This is how Theorem 2.3 is prov ed in [ CFP24 ] (and works p erfectly well for testing triangle-freeness), but falls short of the optimal b ound for testing square-freeness b y a p olynomial factor. 3. The main new idea is to delay the application of Carath´ eo dory’s theorem, and to try to milk the “fractional matching” for more. It turns out that there are two diﬀerent sources of squares that we could hop e to rev eal by samples. On the one hand, there is the family of weigh ted squares “planted” in to the edge distribution b y the fractional matc hing, which has b een our only source of squares. On the other hand, if these squares are plan ted in a suﬃcien tly “dilute” manner, we argue that a h uge num b er of unintende d squares will inevitably b e created during the pro cess, and these unin tended ones will b e our second source of squares. 4. W e hav e to divide into tw o cases based on the “fractional matching” we obtained from linear programming duality . If the fractional matching is “dilute,” we argue that an unintended square will likely sho w up in as few as e O ( n ) samples (Lemma 5.12 ). In the “concen trated” case, w e argue that the n umber of “birthday slots” are eﬀectively reduced to O ( n 3 / 2 ), and hence there will likely b e “four p eople sharing a birthda y” within O ( n 9 / 8 ) samples (Lemma 5.13 ). W e remark that a nontrivial amount of eﬀort is required for ﬁnding a formalization of the “diluteness” notion that works smoothly in the pro of (see Section 5.2 ). The other sample complexit y upp er bounds prov ed in this paper (Theorems 1.6 to 1.8 ) require diﬀeren t techniques, for whic h we c ho ose not to pro vide ov erviews here. 2.2.1 Subgraph-Remo v al for Sparse Graphs The discussion abov e is reminiscen t of the celebrated r emoval lemmas in graph theory (see e.g. the surv ey [ CF13 ]). Supp ose H is a ﬁxed connected simple graph, and consider a graph on n v ertices whose edge set consists of m edge-disjoin t copies of H . How large can m b e if no “unin tended” cop y of H is allow ed, i.e. every edge is in exactly one copy of H ? The subgraph remov al lemma implies that for any H with at least t ⩾ 3 vertices, we must ha ve m ⩽ o ( n 2 ) if there is no unintended cop y of H . F urthermore, if m = Ω( n 2 ) edge-disjoint copies of H are planted, then there must b e as man y as Ω( n t ) unin tended copies of H . F or conv enience of further discussions, w e use the following non-standard notation. Deﬁnition 2.6. Giv en a simple graph H , let ex =1 ( n, H ) b e the maximum n um b er of edges in an n -v ertex graph where every edge is con tained in exactly one copy of H . F or certain graphs H , one can pack into an n -v ertex graph as many as n 2 − o (1) copies of H without creating unin tended copies. In the case where H = C 3 is a triangle, the celebrated Ruzsa- Szemer ´ edi construction [ RS78 ] (based on Behrand’s construction [ Beh46 ] of integer sets without 3-term arithmetic progressions) shows that: Prop osition 2.7 ([ Beh46 , RS78 ]) . W e hav e ex =1 ( n, C 3 ) ⩾ n 2 exp  − O  √ log n  . W e will use Prop osition 2.7 to pro ve the sample complexit y low er b ound for testing triangle- freeness stated in ( 1.2 ). Indeed, Proposition 2.7 almost immediately implies an n 4 / 3 − o (1) sample lo w er bound for one-side d-err or triangle-freeness testers. T o extend the lo wer bound to hold against 9 t w o-sided-error testers, we use a standard constructional technique (see Section 7.1 ) that has ap- p eared in, for example, low er b ounds for triangle-freeness testers in the “general graph mo del” [ AKKR08 ]. There are also some graphs H for whic h the b ound ex =1 ( n, H ) ⩽ o ( n 2 ) pro vided by the remo v al lemma can b e impro v ed b y a p olynomial factor in n . Recall that the T ur´ an n umber of H , denoted b y ex( n, H ), is the maxim um n um b er of edges in an n -vertex graph with no subgraphs isomorphic to H . F or an y graph in which ev ery edge is contained in exactly one copy of H , deleting one edge from ev ery copy of H yields results in an H -free graph, so w e hav e: Prop osition 2.8. F or a ﬁxed simple graph H with at least tw o edges, w e ha v e ex =1 ( n, H ) ⩽ 2 · ex( n, H ). The K˝ ov´ ari-S´ os-T ur´ an theorem [ KST54 ] shows for an y ﬁxed bipartite graph H with t vertices that ex( n, H ) ⩽ O ( n 2 − 1 /t ), so we also hav e ex =1 ( n, H ) ⩽ O ( n 2 − 1 /t ). When H = C 4 is a square (i.e. 4-cycle), the resulting upper bound ex =1 ( n, C 4 ) ⩽ n 3 / 2 is the key reason we are able to impro ve the upp er b ound on sam  G squ n , ε  from O ( n 3 / 2 /ε ) to O ( n 9 / 8 /ε ), as discussed earlier. That b eing said, w e are not able to use the b ound ex =1 ( n, C 4 ) ⩽ O ( n 3 / 2 ) or ex( n, C 4 ) ⩽ O ( n 3 / 2 ) as a black b o x to pro v e the O ( n 9 / 8 ) sample complexity upp er b ound, and it seems that some careful case analysis (as describ ed earlier) is necessary for proving the latter. In terms of low er b ounds, it was sho wn by [ Bro66 , ER TS66 ] that ex( n, C 4 ) = Θ( n 3 / 2 ), and the follo wing low er b ound on ex =1 ( n, C 4 ) is (implicitly) shown b y Timmons and V erstra ¨ ete [ TV15 ]: Prop osition 2.9 ([ TV15 ]) . W e hav e ex =1 ( n, C 4 ) ⩾ n 3 / 2 exp  − O  √ log n  . As in the case of triangle-freeness, we will use Prop osition 2.9 to prov e the sample complexity lo w er b ound for testing square-freeness stated in ( 1.3 ). F or the sak e of completeness, w e will sk etch the pro of of Prop osition 2.9 in Section 7.2 . Remark 4. T o the best of the author’s kno wledge, it is unknown whether ex =1 ( n, C 4 ) = o ( n 3 / 2 ), 8 and determining the asymptotics of ex =1 ( n, C 3 ) is a ma jor open problem (see e.g. [ Sha22 ]). 3 Preliminaries 3.1 General Notations In this subsection we summarize general notational con v entions used throughout this paper. Sets. F or tw o subsets E 1 , E 2 of a domain Λ, w e use E 1 △ E 2 := ( E 1 \ E 2 ) ∪ ( E 2 \ E 1 ) to denote the symmetric diﬀerence b et w een E 1 and E 2 . Probabilit y . F or a ﬁnite domain Λ and a probability distribution µ o ver Λ, w e write E x ∼ µ [ · ] and P x ∼ µ [ · ] to denote exp ectation and probability , resp ectiv ely , when x ∈ Λ is a random element follo wing the distribution µ . A pr ob ability mass function on Λ is a function f : Λ → [0 , + ∞ ) such that P x ∈ Λ f ( x ) = 1. A sub-pr ob ability mass function on Λ is a function f : Λ → [0 , + ∞ ) such that P x ∈ Λ f ( x ) ⩽ 1. Similarly , a pr ob ability ve ctor indexed b y Λ is a vector p ∈ [0 , 1] Λ suc h that P x ∈ Λ p x = 1, while a vector p ∈ [0 , 1] Λ is called a sub-pr ob ability ve ctor if P x ∈ Λ p x ⩽ 1. 8 Indeed, Solymosi [ Sol11 ] conjectured that ex =1 ( n, C 4 ) = o ( n 3 / 2 ), while V erstra ¨ ete [ V er16 ] conjectured that ex =1 ( n, C 4 ) = Θ( n 3 / 2 ). 10 Sampling. Giv en a ﬁnite domain Λ, a sample from a sub-probabilit y vector p ∈ [0 , 1] Λ is a random elemen t y of an extended domain Λ ∪ { nil } suc h that P y [ y = x ] = p x for an y x ∈ Λ and P y [ y = nil ] = 1 − X x ∈ Λ p x . The sp ecial symbol nil will alw ays b e used as an “outside” placeholder element in such con texts. Samples from sub-probability mass functions are similarly deﬁned. Empirical v ectors. Giv en a sequence of elements y 1 , . . . , y m ∈ Λ, we deﬁne the empiric al c ount ve ctor of this sequence to b e v ector w ∈ N Λ = { 0 , 1 , 2 , . . . } Λ where the co ordinate w x equals the n um b er of indices i ∈ [ n ] suc h that y i = x , for eac h element x ∈ Λ. The empiric al indic ator ve ctor of this sequence is the vector w ′ ∈ { 0 , 1 } Λ deﬁned b y w ′ x = 1 [ w x ⩾ 0] for all x ∈ [ n ]. Sampling Pro cesses. Supp ose p ∈ [0 , 1] Λ is a sub-probabilit y v ector, and f : [ n ] → [0 , 1] is the sub-probability mass function asso ciated with p (i.e. f ( x ) = p x for all x ∈ [ n ]). Consider the follo wing canonical sampling pro cess: 1. T ak e a batch of m independent samples y 1 , . . . , y m from p . 2. Let w ∈ N Λ b e the empirical coun t vector of the sequence y 1 , . . . , y m , and output w . W e use S ( p, m ) or S ( f , m ) to denote the distribution of the output v ector w in the ab o ve pro cess. 9 If the empirical count v ector in step 2 of the pro cess is replaced with the empirical indicator function, the resulting output distribution ov er { 0 , 1 } Λ is denoted by S ′ ( p, m ) or S ′ ( f , m ). Com binatorial Structures. F or a ﬁxed p ositive in teger n , w e deﬁne v arious com binatorial struc- tures asso ciated with the edge set  [ n ] 2  . W e deﬁne Squa re ( n ) :=   { a, b } , { b, c } , { c, d } , { d, a }  ⊆  [ n ] 2      a, b, c, d are distinct elements of [ n ]  to b e the collection of all four-edge sets that corresp ond to squares. Two edges in  [ n ] 2  are said to form a we dge if they ha ve exactly one common vertex, and w e corresp ondingly deﬁne W edge ( n ) :=   { a, b } , { b, c }  ⊆  [ n ] 2      a, b, c are distinct elemen ts of [ n ]  to b e the collection of w edges on the vertex set [ n ]. A w edge  { a, b } , { b, c }  can also b e view ed as an ordered pair  { a, c } , b  ∈  [ n ] 2  × [ n ]. By an abuse of notation, w e iden tify the collection W edge ( n ) with the subset W edge ( n ) :=   { a, c } , b  ∈  [ n ] 2  × [ n ]     b ∈ { a, c }  ⊆  [ n ] 2  × [ n ] . (3.1) Subgraph-F reeness. Given any constant ε ∈ (0 , 1) and a ﬁxed simple graph H , a sub-probability v ector p ∈ [0 , 1] ( [ n ] 2 ) is said to b e ε -far from H -free if for any edge set E ∈ G H -free n (see the statement of Theorem 1.8 ), we ha ve X e ∈ ( [ n ] 2 ) \ E p e ⩾ ε. 9 Note that if p ∈ [0 , 1] Λ is a probabilit y v ector, then S ( p, m ) is a m ultinomial distribution. How ever, if p is only a sub -probabilit y v ector, then S ( p, m ) may not b e supp orted on the lay er  w ∈ N Λ   P x ∈ Λ w x = m  . 11 3.2 Sto c hastic Domination Sampling processes (as formally introduced in Section 3.1 ) are of central imp ortance in this pap er. In order to meaningfully compare diﬀeren t sampling pro cesses, w e mak e the following deﬁnition of sto c hastic domination. Recall from Section 3.1 that the output of a sampling process is a random elemen t of the space N Λ for some index set Λ, so it suﬃces to “compare” distributions ov er N Λ . Deﬁnition 3.1. Let Λ b e a ﬁnite set and let µ, ν b e probability distributions o ver N Λ . F or parameters λ 1 , λ 2 ∈ (0 , 1], w e say ν is ( λ 1 , λ 2 ) -dominate d b y µ , written as ν ⩽ ( λ 1 ,λ 2 ) µ, if there exists a coupling distribution ρ o v er N Λ × N Λ suc h that the following conditions hold: (1) P ( w,z ) ∼ ρ [ w ⪰ z ] ⩾ λ 1 . 10 (2) F or an y subset S ⊆ N Λ , w e hav e P ( w,z ) ∼ ρ [ w ∈ S ] = µ ( S ). (3) F or an y subset S ⊆ N Λ , w e hav e λ 2 · P ( w,z ) ∼ ρ [ z ∈ S ] ⩽ ν ( S ). When λ 1 = λ 2 = 1, w e simply say that ν is dominated by µ , omitting the ( λ 1 , λ 2 ). Our deﬁnition of sto c hastic domination has the following basic prop ert y . Prop osition 3.2. Let Λ b e a ﬁnite set, and let λ 1 , λ 2 ∈ (0 , 1] b e constants. Suppose µ and ν are probabilit y distributions o ver N Λ suc h that ν is ( λ 1 , λ 2 )-dominated b y µ . Then for an y do wnw ard- closed subset S ⊆ N Λ , w e hav e µ ( S ) ⩽ λ − 1 2 · ν ( S ) + (1 − λ 1 ) . Pr o of. It suﬃces to notice the union b ound inequalit y P ( x,y ) ∼ ρ [ x ∈ S ] ⩽ P ( x,y ) ∼ ρ [ y ∈ S ] + P ( x,y ) ∼ ρ [ x ⪰ y ] , and then replace the three terms in the inequalit y b y the desired quan tities, using the three condi- tions in Deﬁnition 3.1 . 4 T esting Bipartiteness The goal of this section is to pro ve Theorem 1.6 . Note that when the graph H is a single edge (on t w o v ertices), the collection G H -hom n is iden tical to G bip n ; so the bipartiteness testing result stated in ( 1.1 ) is a sp ecial case of Theorem 1.6 . The pro ofs of b oth the upp er b ound and the low er b ound are similar to [ GR16 , Theorem 4.6]. 10 F or vectors w , z ∈ N Λ , we write w ⪰ z if w x ⩾ z x for all x ∈ Λ. 12 4.1 Upp er Bound Supp ose H is a ﬁxed simple graph on the vertex set [ k ], and for an y indices i, j ∈ [ k ] we denote H ij = ( 1 , if i  = j and H contains the edge { i, j } , 0 , otherwise . Giv en an assignment map τ : [ n ] → [ k ], let F τ ,H b e the collection of edges { a, b } ∈  [ n ] 2  suc h that H τ ( a ) ,τ ( b ) = 0. By deﬁnition, an edge set E ⊆  [ n ] 2  b elongs to the collection G H -hom n if and only if E ∩ F τ ,H = ∅ for some τ : [ n ] → [ k ]. Let µ b e a probabilit y distribution ov er  [ n ] 2  . F or any edge set E ⊆  [ n ] 2  , it is easy to see that min E ′ ∈G H -hom n µ ( E △ E ′ ) = min τ :[ n ] → [ k ] µ ( E ∩ F τ ,H ) (4.1) W e claim that for any edge set E ⊆  [ n ] 2  that is ε -far from G H -hom n with respect to µ , the canonical tester for G H -hom n (describ ed in Remark 3 ) rejects E with probabilit y at least 2 / 3 after receiving O ( n/ε ) labeled samples from µ . By ( 4.1 ) and Remark 3 , it suﬃces to prov e the follo wing lemma. Lemma 4.1. L et ε ∈ (0 , 1) b e a c onstant. Supp ose E ⊆  [ n ] 2  is an e dge set and µ is a distribution over  [ n ] 2  such that µ ( E ∩ F τ ,H ) ⩾ ε for any map τ : [ n ] → [ k ] . F or any inte ger m ⩾ ε − 1 (2 + n ln k ) , in m indep endent samples fr om µ , the pr ob ability is at le ast 2 / 3 that for any τ : [ n ] → [ k ] , ther e is a sample d e dge that b elongs to E ∩ F τ ,H . Pr o of. F or any ﬁxed map τ : [ n ] → [ k ], the probabilit y that no sample falls in E ∩ F τ ,H is at most (1 − ε ) m ⩽ exp( − εm ) ⩽ 1 3 exp( − n ln k ) = 1 3 k − n . By union b ound ov er all maps τ : [ n ] → [ k ], it follo ws that with probabilit y at most 1 / 3 for any τ there is a sampled edge falling in E ∩ F τ ,H . Corollary 4.2. F or any ﬁxe d simple gr aph H with at le ast one e dge, we have sam  G H -hom n , ε  ⩽ O ( n/ε ) . 4.2 Lo wer Bound In this subsection, w e pro ve the lo w er bound part of Theorem 1.6 . Throughout this subsection, w e let k ⩾ 3 b e a ﬁxed in teger. A basic to ol in the pro of is the fact that a complete regular k -partite graph is far from ( k − 1)-colorable. Lemma 4.3. L et V 1 , . . . , V k b e p airwise disjoint sets, e ach of size n , and let Γ :=  { u, v } : u ∈ V i , v ∈ V j , 1 ⩽ i < j ⩽ k  . Thus Γ is the e dge set of the c omplete k -p artite gr aph with p arts V 1 , . . . , V k . If E ⊆ Γ is such that the gr aph ( V 1 ∪ · · · ∪ V k , E ) is ( k − 1) -c olor able, then | Γ \ E | ⩾ n 2 . Pr o of. Fix a prop er ( k − 1)-coloring of the graph ( V 1 ∪ · · · ∪ V k , E ), and let C 1 , . . . , C k − 1 denote its color classes. F or each i ∈ [ k ] and c ∈ [ k − 1], set w i,c := | V i ∩ C c | . Then w i, 1 + · · · + w i,k − 1 = n for ev ery i ∈ [ k ] . No w ﬁx a color c ∈ [ k − 1] and t w o distinct indices i, j ∈ [ k ]. Every pair of v ertices u ∈ V i ∩ C c and v ∈ V j ∩ C c forms an edge of Γ, but cannot b elong to E , since u and v hav e the same color. Hence all w i,c w j,c suc h edges lie in Γ \ E . Summing o ver all colors and all pairs i < j , w e obtain | Γ \ E | ⩾ k − 1 X c =1 X 1 ⩽ i | I 2 ( b i ) | , dra w a random vertex c ∗ ∈ [ n ] \ { b i } according to the probability v ector  p b i c 2 deg p ( b i )  c ∈ [ n ] \{ b i } ∈ [0 , 1] [ n ] \{ b i } . If c ∗ = a i , let X i = nil . Otherwise let X i = ( { a i , c ∗ } , b i ). (ii) If j ⩽ | I 2 ( b i ) | , let ℓ b e the j -th smallest index in the set I 2 ( b i ). If c ℓ = a i , let X i = nil . Otherwise, let X i = ( { a i , c ℓ } , b i ). Step 3: analysis of coupling. It is easy to see that if ( 5.14 ) are 5 m indep enden t samples drawn from p , the ab ov e pro cedure pro duces m indep enden t samples from Walk [ p, B ]. Indeed, for each w edge ( { a, c } , b ) ∈ Wedge ( n ) suc h that b ∈ B , the probabilit y that ( a i , b i ) = ( a, b ) is p ab / 2, and P  X i = ( { a, c } , b )   ( a i , b i ) = ( a, b )  = p bc 2 deg p ( b ) . Therefore w e hav e P [ X i = ( { a, c } , b )] = P [( a i , b i ) = ( a, b )] · P  X i = { a, c }   ( a i , b i ) = ( a, b )  + P [( a i , b i ) = ( c, b )] · P  X i = { a, c }   ( a i , b i ) = ( c, b )  =  p ab 2 · p bc 2 deg p ( b ) + p bc 2 · p ab 2 deg p ( b )  = W alk [ p, B ]  { a, c } , b  . Step 4: wrapping up. Giv en 5 m indep enden t samples ( 5.14 ) dra wn from p , w e let x ∈ N ( [ n ] 2 ) b e the empirical count vector of the 5 m samples. Deﬁne a v ector w ∈ N Wedge ( n ) b y letting w ac,b = x ab x bc for all ( { a, c } , b ) ∈ Wedge ( n ). W e generate m samples X 1 , . . . , X m according to Step 2, and let z ∈ N Wedge ( n ) b e the empirical count v ector of the samples X 1 , . . . , X m . Finally , let ρ b e the joint distribution of the vector pair ( w, z ) ∈ N Wedge ( n ) × N Wedge ( n ) . It is clear that the marginal distribution of ρ in the w co ordinate is iden tical to W ( p, 5 m ), while its marginal distribution in the z co ordinate is iden tical to S ( W alk [ p, B ] , m ). Note that whenever 26 | I 1 ( b ) | ⩽ | I 2 ( b ) | holds for every b ∈ B , the case 2(i) in the pro cedure of Step 2 is never activ ated, whic h leads to w ac,b = x ab x bc ⩾ z ac,b for all ( { a, c } , b ) ∈ W edge ( n ). Therefore, b y the concentration inequalities ( 5.15 ), ( 5.16 ) and a union bound o v er all b ∈ B , we ha ve P ( w,z ) ∼ ρ [ w ⪰ z ] ⩾ 1 − X b ∈ B P  | I 1 ( b ) | > | I 2 ( b ) |  ⩾ 1 − 2 | B | · δ 2 n ⩾ 1 − δ , as desired. The next lemma is a standard argument showing that v ertices with to o small degrees can b e safely ignored when choosing the middle v ertex of a length-2 walk. Lemma 5.18. L et ε ∈ (0 , 1) b e a c onstant. Supp ose p ∈ [0 , 1] ( [ n ] 2 ) is a sub-pr ob ability ve ctor and B is the set of vertic es b ∈ [ n ] such that deg p ( b ) ⩾ ε/ (2 n ) . Then we have X { a,c }∈ ( [ n ] 2 ) HopD [ p ]( a, c ) − X { a,c }∈ ( [ n ] 2 ) HopD [ p, B ]( a, c ) ⩽ ε 2 . Pr o of. By Deﬁnition 5.10 , the function W alk [ p, B ] is no larger than the function Walk [ p ] on any input, so for any { a, c } ∈  [ n ] 2  w e hav e max b ∈ [ n ] \{ a,c } W alk [ p, B ]  { a, c } , b  ⩽ max b ∈ [ n ] \{ a,c } W alk [ p ]  { a, c } , b  . By Deﬁnition 5.11 , it follows that HopD [ p ]( a, c ) − HopD [ p, B ]( a, c ) ⩽ Hop [ p ]( a, c ) − Hop [ p, B ]( a, c ) . (5.17) Expanding and rearranging using Deﬁnitions 5.10 and 5.11 , w e hav e X { a,c }∈ ( [ n ] 2 ) Hop [ p ]( a, c ) − X { a,c }∈ ( [ n ] 2 ) Hop [ p, B ]( a, c ) = X b ∈ [ n ] \ B X { a,c }∈ ( [ n ] \{ b } 2 ) W alk [ p ]  { a, c } , b  ⩽ 1 2 X b ∈ [ n ] \ B X a,c ∈ [ n ] \{ b } p ab p bc 2 deg p ( b ) = 1 2 X b ∈ [ n ] \ B 2 deg p ( b ) ⩽ ( n − | B | ) · ε 2 n ⩽ ε 2 . Com bining this with ( 5.17 ) immediately yields the conclusion. W e are now ready to pro v e the dilute case lemma, Lemma 5.12 . Pr o of of L emma 5.12 . Let B b e the set of v ertices b ∈ [ n ] suc h that deg p ( b ) ⩾ ε/ (2 n ). By Lemma 5.18 and the assumption ( 5.10 ), w e ha v e X { a,c }∈ ( [ n ] 2 ) HopD [ p, B ]( a, c ) ⩾ ε 2 . 27 W e no w apply Lemma 5.1 to the sub-probabilit y mass function W alk [ p, B ] o ver the set of w edges W edge ( n ) ⊆  [ n ] 2  × [ n ]. It follows that giv en at least m 5 ⩾ 64 & ( ε/ 2) − 1 log(4 /δ ) s  n 2  ' indep enden t samples from W alk [ p, B ], with probability at least 1 − δ / 2 there exist tw o sampled w edges ( { a, c } , b ) and ( { a, c } , d ) with the same ﬁrst co ordinate { a, c } ∈  [ n ] 2  and diﬀerent second co ordinates b, d ∈ [ n ]. Note that since W alk [ p, B ] is supported on W edge ( n ), w e ma y assume a, c, b, d are distinct vertices. W e next apply Lemma 5.17 to the probability vector p and the v ertex set B . Due to the guaran teed low er b ound on m , the conclusion of Lemma 5.17 yields that S ( W alk [ p, B ] , m/ 5) ⩽ (1 − δ / 2 , 1) W ( p, m ) . Since the set S = n w ∈ N Wedge ( n )    there are no distinct a, b, c, d ∈ [ n ] s.t. w ac,b , w ac,d ⩾ 1 o is a down ward-closed subset of N Wedge ( n ) , it follows from Proposition 3.2 that P w ∼ W ( p,m ) [ w ∈ S ] ⩽ P w ∼ S ( W alk [ p,B ] ,m/ 5) [ w ∈ S ] + δ 2 . By the conclusion of the last paragraph, the ﬁrst summand on the righ t-hand side is at most δ / 2. Therefore, we conclude that P w ∼ W ( p,m ) [ w ∈ S ] ⩽ δ . In other words, with probability at least 1 − δ , there exist distinct vertices a, b, c, d ∈ [ n ] suc h that all four pairs { a, b } , { b, c } , { c, d } , { d, a } app ear in a batch of m independent samples from p , as desired. 5.2.4 The Concen trated Case T o prov e the concentrated case lemma, w e need the following result from sp ectral graph theory . Prop osition 5.19. Let n b e a p ositiv e integer, and let S b e a symmetric subset of [ n ] × [ n ] (i.e. for an y ( i, j ) ∈ S we ha ve ( j, i ) ∈ S as well). F or an y real num b ers x 1 , . . . , x n , w e hav e p | S | · n X i =1 x 2 i ⩾ X ( i,j ) ∈ S x i x j . Pr o of. Consider the symmetric matrix M ∈ { 0 , 1 } n × n deﬁned b y M ij = ( 1 , if ( i, j ) ∈ S, 0 , if ( i, j ) ∈ S, for all ( i, j ) ∈ [ n ] × [ n ] . W e know that M has n real eigenv alues, and we order them decreasingly as λ 1 ⩾ λ 2 ⩾ . . . ⩾ λ n . The matrix λ 1 I n − M is positive semi-deﬁnite, where I n is the n × n iden tity matrix. In particular, w e hav e 0 ⩽ n X i =1 n X j =1 x i ( λ 1 I n − M ) ij x j = λ 1 n X i =1 x 2 i − X ( i,j ) ∈ S x i x j . 28 The conclusion thus follo ws from the fact that λ 1 ⩽ p | S | . T o see this, note that n X i =1 λ 2 i = trace( M 2 ) = n X i =1 n X j =1 M ij M j i = | S | , whic h clearly implies λ 1 ⩽ p | S | . W e hav e no w arrived at the crux of the pro of — sho wing that ( 5.11 ) implies a nontrivial lo wer b ound on the ℓ 4 -norm of the vector p . Pr o of of L emma 5.13 . F or eac h pair { a, c } ∈  [ n ] 2  , w e let h  { a, c }  b e an arbitrary vertex in the set of maximizers argmax b ∈ [ n ] \{ a,c } W alk [ p ]  { a, c } , b  . F or each b ∈ [ n ], w e deﬁne a set S b ⊆ [ n ] × [ n ] b y S b :=  ( a, c )   a, c ∈ [ n ] \ { b } such that a  = c and h  { a, c }  = b  ∪  ( a, a )   a ∈ [ n ] \ { b }  . Note that n X b =1 | S b | = X { a,c }∈ ( [ n ] 2 ) 2 + n X b =1 ( n − 1) = 2 n ( n − 1) . (5.18) F or each b ∈ [ n ], w e deﬁne β b := X a,c ∈ [ n ] \{ b } p ab p cb = (2 deg p ( b )) 2 and γ b := X ( a,c ) ∈ S b p ab p cb . No w by Deﬁnitions 5.10 and 5.11 w e hav e X { a,c }∈ ( [ n ] 2 ) HopD [ p ]( a, c ) = X { a,c }∈ ( [ n ] 2 ) Hop [ p ]( a, c ) − X { a,c }∈ ( [ n ] 2 ) max b ∈ [ n ] \{ a,c } W alk [ p ]  { a, c } , b  = X { a,c }∈ ( [ n ] 2 ) X b ∈ [ n ] \{ a,c } p ab p cb 2 deg p ( b ) − X { a,c }∈ ( [ n ] 2 ) W alk [ p ]  { a, c } , h  { a, c }  = 1 2 n X b =1   X a,c ∈ [ n ] \{ b } p ab p cb 2 deg p ( b ) − X ( a,c ) ∈ S b p ab p cb 2 deg p ( b )   = 1 2 n X b =1 β b − γ b √ β b ⩾ 1 2 n X b =1 β b − γ b √ β b + √ γ b = 1 2 n X b =1  p β b − √ γ b  = 1 2 n X b =1  2 deg p ( b ) − √ γ b  = X { a,b }∈ ( [ n ] 2 ) p ab − 1 2 n X b =1 √ γ b . Therefore, the assumption ( 5.11 ) implies P n b =1 √ γ b ⩾ 2 ε . On the other hand, we ha ve X { a,b }∈ ( [ n ] 2 ) p 4 ab = 1 2 n X b =1 X a ∈ [ n ] \{ b } p 4 ab ⩾ 1 2 X b ∈ [ n ] , S b  = ∅   1 p | S b | X ( a,c ) ∈ S b p 2 ab p 2 cb   (using Prop osition 5.19 ) 29 ⩾ 1 2 X b ∈ [ n ] , S b  = ∅   1 | S b | 3 / 2   X ( a,c ) ∈ S b p ab p cb   2   (b y Cauch y-Sch warz) ⩾ 1 2 ·  P n b =1 √ γ b  4  P n b =1 p | S b |  3 ⩾ 1 2 ·  P n b =1 √ γ b  4 ( n P n b =1 | S b | ) 3 / 2 (b y H¨ older’s inequalit y) ⩾ 1 8 n 9 / 2 n X i =1 √ γ b ! 4 ⩾ 2 ε 4 n 9 / 2 , (using ( 5.18 )) as desired. 6 Upp er Bound for T ree-F reeness The goal of this section is to prov e the upp er b ounds for testing tree-freeness and testing cliques, as stated in Theorems 1.8 and 1.7 , resp ectiv ely . Although they are seemingly unrelated results, the pro ofs of these tw o upp er b ounds are quite similar, and in particular they rely on the same t yp e of birthday-parado x argumen t. In Sections 6.1 and 6.2 , w e develop the birthday-parado x-type lemmas underlying the pro ofs. The t wo sample complexity upp er b ounds will then b e prov ed in Section 6.3 . 6.1 More Birthda y P arado x In one form ulation of the classical birthda y parado x, tw o batches of samples are drawn from the same probability distribution, and the goal is to sho w that, with high probability , there exists a common sample appearing in b oth batc hes. In this subsection, w e tak e this p erspective a step further: w e sho w that the set of common samples of the t wo batches can, in a rough sense, be view ed as a single batch of samples dra wn from the same distribution. The “eﬀectiv e size” of this deriv ed batc h dep ends on the sizes of the t wo original batches. The classical birthda y parado x can then b e in terpreted as establishing that this eﬀectiv e size is at least 1, and hence that the intersection of the t wo batc hes is likely to b e non-empt y . The notion of taking the “intersection” of t wo batches of samples can b e formalized as follo ws. Deﬁnition 6.1. Supp ose w (1) , w (2) ∈ { 0 , 1 } n are empirical indicator vectors (see Section 3.1 ). W e let P ( w (1) , w (2) ) b e the en try-wise pro duct vector w ∈ { 0 , 1 } n deﬁned by w b = w (1) b w (2) b for all b ∈ [ n ]. If µ and ν are probability distributions ov er N n , let P ( µ, ν ) b e the distribution of P ( w (1) , w (2) ) where w (1) ∼ µ and w (2) ∼ ν are indep enden t random vectors. W e will also need the follo wing “matrix-vector m ultiplication” version of Deﬁnition 6.1 . Deﬁnition 6.2. Supp ose w (1) ∈ { 0 , 1 } n and w (2) ∈ { 0 , 1 } n × n are empirical indicator vectors. W e let J ( w (1) , w (2) ) b e the v ector w ∈ { 0 , 1 } n deﬁned b y w b = ( 1 , if w (1) a = 1 and w (2) ab = 1 for some a ∈ [ n ] , 0 , otherwise. If µ and ν are probabilit y distributions o ver N n and N n × n , respectively , then we let J ( µ, ν ) be the distribution of J ( w (1) , w (2) ) where w (1) ∼ µ and w (2) ∼ ν are indep enden t random vectors. 30 T o prepare for the main lemma of this subsection, w e make the following t w o standard deﬁni- tions. The ﬁrst allows us to tak e “marginals” of sub-probability mass functions: Deﬁnition 6.3. Let f : [ n ] 2 → [0 , 1] b e a sub-probability mass function. Deﬁne sub-probability mass functions π 1 f , π 2 f : [ n ] → [0 , 1] b y letting π 1 f ( a ) = n X b =1 f ( a, b ) for all a ∈ [ n ] , and π 2 f ( b ) = n X a =1 f ( a, b ) for all b ∈ [ n ] . The next deﬁnition is motiv ated by the standard trick of “ignoring elemen ts with to o small w eigh ts,” whic h ha ve already b een used in Section 5 (se e, for example, Lemma 5.18 ) and will con tin ue to come into pla y frequen tly in this section. Deﬁnition 6.4. F or an y ﬁnite domain Λ and tw o sub-probability mass functions f , g : Λ → [0 , 1], w e say that g is an ε -pruning of f if g ( x ) ⩽ f ( x ) for all x ∈ Λ and P x ∈ Λ ( f ( x ) − g ( x )) ⩽ ε . W e are now ready to state the main lemma of this subsection. Lemma 6.5. L et β , γ , δ, ε ∈ (0 , 1) and C > 0 b e c onstants such that β + γ ⩽ 1 and γ δ ε · C ⩾ 16 . F or suﬃciently lar ge p ositive inte gers n and m 1 = l C n 1 − β m , m 2 =  C n 1 − γ  , and m 3 = l C n 1 − β − γ m , we have (r e c al l the notion of sto chastic domination in Deﬁnition 3.1 ): (1) A n y sub-pr ob ability mass function f : [ n ] → [0 , 1] has an ε -pruning g such that S ′ ( g , m 3 ) ⩽ (1 − δ, 1) P  S ′ ( g , m 1 ) , S ′ ( g , m 2 )  . (6.1) (2) A ny sub-pr ob ability mass function f : [ n ] 2 → [0 , 1] has an ε -pruning g such that S ′ ( π 2 g , m 3 ) ⩽ (1 − δ, 1) J  S ′ ( π 1 g , m 1 ) , S ′ ( g , m 2 )  . F or an element a ∈ [ n ] with v ery small weigh t f ( a ) under a sub-probability mass function f : [ n ] → [0 , 1], the probabilit y that a app ears in the in tersection of tw o indep endent batches of samples is roughly proportional to f ( a ) 2 , whereas the probabilit y that it app ears in a single batc h is prop ortional to f ( a ). Thus, for small v alues of f ( a ), the former is signiﬁcan tly smaller than the latter. Consequen tly , a k ey diﬃcult y in establishing the stochastic domination in ( 6.1 ) is handling elemen ts with small weigh t. In particular, if there exist elements with extremely small w eight — namely those a ∈ [ n ] with 0 < f ( a ) ≲ 1 / ( C n ) — then it is imp ossible for P ( S ′ ( f , m 1 ) , S ′ ( f , m 2 )) to dominate S ′ ( f , m 3 ). This necessitates an ε -pruning step to exclude such elemen ts. F or elemen ts with mo derately small w eigh ts, for instance those a ∈ [ n ] with f ( a ) ≈ 1 /n , the next lemma provides a useful b ound on their app earance in a single batch of samples. Lemma 6.6. L et γ , ε, δ ∈ (0 , 1) and C ⩾ 1 b e c onstants. F ur suﬃciently lar ge p ositive inte gers n , the fol lowing statement holds. Supp ose f : [ n ] → [0 , 1] is a sub-pr ob ability mass function such that for al l a ∈ [ n ] , either f ( a ) = 0 or f ( a ) ⩾ ε/n . Then for m =  C n 1 − γ  , we have P w ∼ S ( f ,m )  w a ⩽ 2 n γ ε · f ( a ) for all a ∈ [ n ]  ⩾ 1 − δ . 31 Pr o of. F or each a ∈ [ n ] such that f ( a )  = 0, the co ordinate w a is the sum of m Bernoulli random v ariables with mean f ( a ). By Chernoﬀ b ound, w e ha ve P [ w a ⩾ t ] ⩽  4 · E [ w a ] t  t for an y t ⩾ E [ w a ] . (6.2) No w let t a = 2 γ − 1 ε − 1 n · f ( a ). Since f ( a ) ⩾ ε/n , we ha v e t a ⩾ 2 γ − 1 . F urthermore, we ha ve E [ w a ] t a = m · f ( a ) 2 γ − 1 ε − 1 n · f ( a ) ⩽ C γ ε n γ . Plugging in to ( 6.2 ), it follows that P [ w a ⩾ t a ] ⩽  4 C γ ε n γ  2 γ − 1 = (4 C γ ε ) 2 γ − 1 n 2 ⩽ δ n , where w e used the condition that n is suﬃcien tly large in the last transition. Now, taking a union b ound o ver all a ∈ [ n ] such that f ( a )  = 0 yields the conclusion. W e are now ready to pro v e Lemma 6.5 . Pr o of of L emma 6.5 . It is not hard to see that the ﬁrst statement implies the second statement. In fact, for any sub-probabilit y mass function f : [ n ] 2 → [0 , 1], the distribution S ′ ( π 2 f , m 3 ) equals the output distribution of the following process : 1. Sample w (1) ∼ S ′ ( π 1 f , m 3 ). 2. Initialize w (2) ∈ { 0 , 1 } n to b e the all-zero vector. F or each a ∈ [ n ], rep eat the follo wing w (1) a times: • Sample an elemen t b ∈ [ n ] with probabilit y prop ortional to f ( a, b ). • Update w (2) b ← 1. 3. Output w (2) . If the distribution S ′ ( π 1 f , m 3 ) in the ﬁrst step in replaced with P  S ′ ( π 1 f , m 1 ) , S ′ ( π 1 f , m 2 )  , then the distribution of the output in the third step b ecomes J  S ′ ( π 1 f , m 1 ) , S ′ ( f , m 2 )  . Therefore, to pro v e the second statemen t of Lemma 6.5 , we apply the ﬁrst statement to the sub-probability mass function π 1 f . This yields an ε -pruning g 1 : [ n ] → [0 , 1] of π 1 f suc h that S ′ ( g 1 , m 3 ) ⩽ (1 − δ, 1) P  S ′ ( g 1 , m 1 ) , S ′ ( g 1 , m 2 )  . Since there clearly exists an ε -pruning g of f suc h that π 1 g = g 1 , it follows from the argumen t ab o ve that for this g , w e hav e S ′ ( π 2 g , m 3 ) ⩽ (1 − δ, 1) J  S ′ ( π 1 g , m 1 ) , S ′ ( g , m 2 )  . In the rest of the pro of, we pro ve the ﬁrst statemen t of Lemma 6.5 . Fix an arbitrary sub-probability mass function f : [ n ] → [0 , 1]. W e deﬁne g : [ n ] → [0 , 1] b y g ( a ) = f ( a ) · 1 h f ( a ) ⩾ ε n i for all a ∈ [ n ] . 32 It is clear that g ⩽ f p oin twise and n X a =1 ( f ( a ) − g ( a )) = n X a =1 f ( a ) · 1 h f ( a ) < ε n i < n · ε n = ε. So g is an ε -pruning of f . F urthermore, for eac h a ∈ [ n ], either g ( a ) = 0 or g ( a ) ⩾ ε/n . W e next deﬁne and analyze three sampling pro cesses P 1 , P 2 and P ′ 2 . Note that b oth P 2 and P ′ 2 op erate on the output of P 1 . The pro cess P 1 . Let b 1 , . . . , b m 2 b e a sequence of m 2 indep enden t samples dra wn from g . Analysis of P 1 . F or each a ∈ [ n ], let I a b e the set of indices i ∈ [ m 2 ] suc h that b i = a . Let E 1 b e the ev en t that | I a | ⩽ 2 γ − 1 ε − 1 n · g ( a ) for any a ∈ [ n ] . It follo ws from Lemma 6.6 that P P 1 [ E 1 ] ⩾ 1 − δ 2 (6.3) when n is suﬃciently large. The pro cess P 2 . If the ev en t E 1 do es not happen, output the zero v ector in N n . If the ev ent E 1 happ ens, run the follo wing pro cedure: 1. Let a 1 , . . . , a m 1 b e a sequence of m 1 indep enden t samples dra wn from π u [ g ]. 2. F or each i ∈ [ m 1 ], if a i = nil then let r i = nil . If a i  = nil (i.e. a i ∈ [ n ]), let r i b e a uniformly random element of I a i with probabilit y p i = γ ε · | I a i | 2 n · g ( a i ) ∈ [0 , 1] , and let r i = nil with probability 1 − p i . 3. Output the empirical indicator vector of the sequence  b r i  1 ⩽ i ⩽ m 1 . Analysis of P 2 . W e assume E 1 happ ens. It is easy to see that r 1 , r 2 , . . . , r m 1 are indep enden t random v ariables, each being a uniformly random elemen t of [ m 2 ] with probability p = n X a =1 g ( a ) · γ ε · | I a | 2 n · g ( a ) = γ εm 2 2 n ⩽ 1 , and being nil with probability 1 − p . Let S = { i ∈ [ m 1 ] | r i  = nil } , and let T = { r i | i ∈ S } ⊆ [ m 2 ]. Let s = | S | and t = | T | . Let E 2 b e the ev en t that t ⩾ m 3 . W e next show that (when n is suﬃciently large) P P 2 [ E 2 | E 1 ] ⩾ 1 − δ 2 . (6.4) Note that when n is suﬃciently large, E [ s ] = 4 m 1 p = 2 γ ε · m 1 m 2 n ⩾ 2 γ ε · C 2 n β + γ − 1 ⩾ 4 m 3 33 Th us, by Chernoﬀ bound, we ha v e P [ s ⩽ 3 m 3 ] ⩽ exp( m 3 / 8) ⩽ δ 4 . (6.5) Conditioned on s = | S | ⩾ 4 m 3 , w e hav e E [ t | s ⩾ 3 m 3 ] ⩾ m 2 1 −  1 − 1 m 2  3 m 3 ! ⩾ 2 m 3 , (6.6) where w e used m 2 ⩾ 4 m 3 in the last transition. Conditioned on S , the random v ariables 1 [ r ∈ R ] (where r ranges in [ m 2 ]) are pairwise negatively correlated. So w e ha ve V ar [ t | S ] ⩽ m 2 X r =1 V ar  1 [ r ∈ T ]   S  ⩽ m 2 X r =1 E  1 [ r ∈ T ]   S  = E [ t | S ] . (6.7) It then follows from Cheb yshev’s inequality that P [ t ⩽ m 3 | s ⩾ 3 m 3 ] ⩽ V ar [ t | s ⩾ 3 m 3 ]  E [ t | s ⩾ 3 m 3 ] − m 3  2 ⩽ E [ t | s ⩾ 3 m 3 ]  E [ t | s ⩾ 3 m 3 ] − m 3  2 (using ( 6.7 )) ⩽ 2 m 3 m 2 3 ⩽ δ 4 . (using ( 6.6 )) Com bining the ab o v e with ( 6.5 ), we obtain ( 6.4 ). The pro cess P ′ 2 . Recall that b 1 , . . . , b m 2 ∈ [ n ] are the samples drawn in the process P 1 . Let T ′ ⊆ [ m 2 ] b e a uniformly random subset of size m 3 , and output the empirical indicator vector of the sequence  b r  r ∈ T ′ . Analysis of P ′ 2 . F or ﬁxed samples b 1 , . . . , b m 2 dra wn in the pro cess P 1 suc h that E 1 happ ens, w e consider the output distributions of P 2 and P ′ 2 when running on b 1 , . . . , b m 2 . Since the set T deﬁned in the analysis of P 2 is a uniformly random subset of [ m 2 ] with (random) size t , it follows that conditioned on E 2 = { t ⩾ m 3 } , the output distribution of P 2 dominates the output distribution of P ′ 2 (recall Deﬁnition 3.1 ). Putting things together. W e use P ′ 2 ◦ P 1 to denote the output distribution of P ′ 2 running on the output of P 1 . Note that P ′ 2 ◦ P 1 is a distribution ov er { 0 , 1 } n . It is easy to see that S ′ ( g , m 3 ) = P ′ 2 ◦ P 1 (6.8) Analogously , w e use P 2 ◦ P 1 to denote the output distribution of P 2 running on the output of P 1 . By the deﬁnition of P 2 , it is easy to see that P 2 ◦ P 1 ⩽ (1 , 1) P  S ′ ( g , m 1 ) , S ′ ( g , m 2 )  . (6.9) F urthermore, as we ha ve argued, conditioned on the even t E 1 ∩ E 2 w e hav e  P ′ 2 ◦ P 1   E 1 ∩ E 2  ⩽ (1 , 1)  P 2 ◦ P 1   E 1  . (6.10) Since P [ E 1 ∩ E 2 ] ⩾ 1 − δ b y ( 6.3 ) and ( 6.4 ), com bining ( 6.8 ), ( 6.9 ) and ( 6.10 ) yields S ′ ( g , m 3 ) ⩽ (1 − δ, 1) P  S ′ ( g , m 1 ) , S ′ ( g , m 2 )  . 34 6.2 Induction on the Number of Edges Our main idea for pro ving the sample complexit y upp er bound in Theorem 1.8 is to induct on the n um b er of edges in the tree. How ever, we cannot directly use the statement of Theorem 1.8 as an induction h yp othesis. Instead, we will formulate a “tree version” of the birthda y-parado x-type statemen t in Lemma 6.5 that is sp eciﬁcally designed to b e pro v able by induction. F or the con v enience of the induction argument, w e view the edges of a tree as directed edges that con verge to a designated ro ot v ertex. Deﬁnition 6.7. Let V be a ﬁnite set. Giv en a ﬁnite set T of ( | V | − 1) ordered pairs ( u, v ) ∈ V 2 and a distinguished elemen t v ∗ ∈ V , the set T is called a dir e cte d r o ote d tr e e on V with ro ot v ∗ if the follo wing hold: (1) F or eac h ( u, v ) ∈ T , we ha ve u  = v . (2) F or ev ery u ∈ V \ { v ∗ } , there is exactly one v ∈ V such that ( u, v ) ∈ T . (3) Ev ery vertex has a path to v ∗ : for ev ery v 0 ∈ V , there is an in teger ℓ ⩾ 0 and elements v 1 , . . . , v ℓ ∈ V such that v ℓ = v ∗ and ( v i , v i +1 ) ∈ T for all i ∈ { 0 , 1 , . . . , ℓ − 1 } . T o form ulate a “tree v ersion” of Deﬁnition 6.2 , we make the follo wing t wo standard deﬁnitions. Deﬁnition 6.8. Fix a directed ro oted tree T on a ﬁnite set V . Giv en a map φ : T → [ n ] 2 and a v ector y ∈ [ n ] V , w e say f is c omp atible with y if φ ( u, v ) = ( y u , y v ) for all ( u, v ) ∈ T . Deﬁnition 6.9. Let V be a ﬁnite set, and let f : [ n ] V → [0 , 1] be a sub-probabilit y mass function. F or any subset U ⊆ V , we deﬁne a sub-probability mass function π U [ f ] : [ n ] U → [0 , 1] b y letting π U [ f ]( z ) = X y ∈ [ n ] V 1 [ y u = z u for all u ∈ U ] · f ( y ) for all z ∈ [ n ] U . When the cardinality of U is 1 or 2, w e sligh tly abuse the notation as follows. F or any tw o distinct v ertices u, v ∈ [ n ], deﬁne π u,v f : [ n ] 2 → [0 , 1] b y letting π u,v f ( a, b ) = X y ∈ [ n ] V 1 [ y u = a and y v = b ] · f ( y ) for all a, b ∈ [ n ] . F or any single elemen t v ∈ V , analogously deﬁne π v f : [ n ] → [0 , 1] b y letting π v f ( a ) = X y ∈ [ n ] V 1 [ y v = a ] · f ( y ) for all a ∈ [ n ] . The “tree version” of Deﬁnition 6.2 can no w b e stated as follows. Deﬁnition 6.10. Supp ose T is a directed rooted tree on a ﬁnite set V with ro ot v ∗ . Given a sub-probabilit y mass function f : [ n ] V → [0 , 1] and a positive in teger m , let J T v ∗ ( f , m ) b e the output distribution of the following process: 1. F or each pair ( u, v ) ∈ T , independently draw m samples from the sub-probability mass function π u,v f , and let X ( u,v ) ⊆ [ n ] 2 b e the set formed b y the m samples. 2. Initialize w ∈ { 0 , 1 } n to b e the all-zero v ector. F or eac h a ∈ [ n ], let w b = 1 if there exists a map φ : T → [ n ] 2 suc h that 35 • φ is compatible with some vector y ∈ [ n ] V suc h that y v ∗ = b ; and • φ ( u, v ) ∈ X ( u,v ) for eac h ( u, v ) ∈ T . 3. Output the v ector w . The next lemma is the “tree version” of Lemma 6.5 , and is pro ved via induction on the num b er of edges in the tree. Lemma 6.11. L et k , t b e p ositive inte gers such that k ⩾ t . L et δ, ε ∈ (0 , 1) and δ ε · C ⩾ 16 k b e c onstants. The fol lowing statement holds for suﬃciently lar ge p ositive inte gers n . Supp ose T is a dir e cte d r o ote d tr e e on a ﬁnite set V with r o ot v ∗ , wher e | V | = t + 1 . If m 1 = l C n ( k − 1) /k m and m t = l C n ( k − t ) /k m then any sub-pr ob ability mass function f : [ n ] V → [0 , 1] has an εt -pruning g such that S ′ ( π v ∗ g , m t ) ⩽ (1 − δ ( t − 1) , 1) J T v ∗ ( f , m 1 ) . Pr o of. W e proceed b y induction on t . The base case t = 1 is straigh tforward: when T consists of a single edge, for any sub-probabilit y mass function f : [ n ] V → [0 , 1] and m 1 =  C n ( k − 1) /k  w e hav e S ′ ( π v ∗ f , m 1 ) = J T v ∗ ( f , m 1 ) . In the follo wing, we assume t ⩾ 2 and the statemen t in the lemma holds for all smaller v alues of t . Supp ose T is a directed ro oted tree on V with t edges and a ro ot vertex v ∗ . Let u ∗ ∈ V be a v ertex suc h that ( u ∗ , v ∗ ) ∈ T . Then the edge set T can b e uniquely partitioned in to three sets: a sub-tree T 1 ro oted at u ∗ , a sub-tree T 2 ro oted at v ∗ , and the singleton edge ( u ∗ , v ∗ ). Let | T 1 | = t 1 and | T 2 | = t 2 . Let V 1 and V 2 b e the v ertex sets of the sub-trees T 1 and T 2 , resp ectiv ely . Th us V is the disjoin t union of V 1 and V 2 . W e also denote m r = l C n ( k − r ) /k m for eac h r ∈ { 1 , 2 , . . . , t } . Case 1: t 2 ⩾ 1 . Let V 3 = V 1 ∪ { v ∗ } and T 3 = T 1 ∪ { ( u ∗ , v ∗ ) } , so T 3 is a directed ro oted tree on V 3 with ro ot v ∗ . Since 1 ⩽ | T 3 | = t 1 + 1 = t − t 2 < t , we can apply the induction h yp othesis to π V 3 [ f ] and obtain an εt 1 -pruning f 3 of π V 3 [ f ] such that S ′ ( π v ∗ f 3 , m t 1 +1 ) ⩽ (1 − δ t 1 , 1) J T 3 v ∗  π V 3 [ f ] , m 1  . (6.11) There clearly exists an εt 1 -pruning f ′ of f such that f 3 = π V 3 [ f ′ ]. Since 1 ⩽ | T 2 | = t 2 = t − t 1 − 1 < t , w e can apply the induction h yp othesis again to π V 2 [ f ′ ] and obtain an ε ( t 2 − 1)-pruning f 2 of π V 2 [ f ′ ] suc h that S ′ ( π v ∗ f 2 , m t 2 ) ⩽ (1 − δ ( t 2 − 1) , 1) J T 2 v ∗  π V 2 [ f ′ ] , m 1  . (6.12) There clearly exists an ε ( t 2 − 1)-pruning f ′′ of f ′ suc h that f 2 = π V 2 [ f ′′ ]. W e then apply Lemma 6.5 (1) to obtain an ε -pruning f 4 of π v ∗ f ′′ suc h that S ′ ( f 4 , m t ) ⩽ (1 − δ, 1) P  S ′ ( f 4 , m t 1 +1 ) , S ′ ( f 4 , m t 2 )  . (6.13) Com bining ( 6.11 ), ( 6.12 ) and ( 6.13 ), it follows that (using f 4 ⩽ π v ∗ f ′′ = π v ∗ f 2 ⩽ π v ∗ f ′ = π v ∗ f 3 ) S ′ ( f 4 , m t ) ⩽ (1 − δ t, 1) P  J T 3 v ∗  π V 3 [ f ] , m 1  , J T 2 v ∗  π V 2 [ f ′ ] , m 1  . (6.14) 36 There clearly exists an ε -pruning g of f ′′ suc h that f 4 = π v ∗ g . See Figure 1 for an illustration of the relations among the sub-probability mass functions f , f ′ , f ′′ and g . f f ′ f ′′ g π V 3 [ f ] f 3 π V 2 [ f ′ ] f 2 π v ∗ f ′′ f 4 π V 3 εt 1 -pruning π V 3 π V 2 ε ( t 2 − 1)-pruning π V 2 π v ∗ ε -pruning π v ∗ εt 1 -pruning ε ( t 2 − 1)-pruning ε -pruning Figure 1: Relations b et w een functions in Case 1 Note that by deﬁnition w e hav e P  J T 3 v ∗  π V 3 [ f ] , m 1  , J T 2 v ∗  π V 2 [ f ] , m 1  = J T v ∗ ( f , m 1 ) . (6.15) Com bining ( 6.14 ) and ( 6.15 ), it follows that (using f ′ ⩽ f ) S ′ ( π v ∗ g ) ⩽ (1 − δ t, 1) J T v ∗ ( f , m 1 ) . Since g is an εt -pruning of f , we conclude the proof in Case 1. Case 2: t 2 = 0 . Since 1 ⩽ | T 1 | = t 1 = t − 1, w e can apply the induction h yp othesis to π V 1 [ f ] and obtain an ε ( t 1 − 1)-pruning f 1 of π V 1 [ f ] such that S ′ ( π u ∗ f 1 , m t 1 ) ⩽ (1 − δ ( t 1 − 1) , 1) J T 1 u ∗  π V 1 [ f ] , m 1  . (6.16) There clearly exists an ε ( t 1 − 1)-pruning f ♮ of f such that f 1 = π V 1 [ f ♮ ]. W e then apply Lemma 6.5 (2) to obtain an ε -pruning f 0 of π u ∗ ,v ∗ f ♮ suc h that S ′ ( π 2 f 0 , m t ) ⩽ (1 − δ, 1) J  S ′ ( π 1 f 0 , m t 1 ) , S ′ ( f 0 , m 1 )  . (6.17) Com bining ( 6.16 ) and ( 6.17 ), it follows that (using π 1 f 0 ⩽ π u ∗ f ♮ = π u ∗ f 1 ) S ′ ( π 2 f 0 , m t ) ⩽ J  J T 1 u ∗  π V 1 [ f ] , m 1  , S ′ ( f 0 , m 1 )  . (6.18) There clearly exists an ε -pruning g of f ♮ suc h that f 0 = π u ∗ ,v ∗ g . See Figure 2 for an illustration of the relations among the sub-probability mass functions f , f ♮ and g . f f ♮ g π V 1 [ f ] f 1 π u ∗ ,v ∗ f ♮ f 0 π V 1 ε ( t 1 − 1)-pruning π V 1 π u ∗ ,v ∗ ε -pruning π u ∗ ,v ∗ ε ( t 1 − 1)-pruning ε -pruning Figure 2: Relations b et w een functions in Case 2 Note that by deﬁnition, w e hav e J  J T 1 u ∗  π V 1 [ f ] , m 1  , S ′ ( π u ∗ ,v ∗ f , m 1 )  = J T v ∗ ( f , m 1 ) . (6.19) Com bining ( 6.18 ) and ( 6.19 ), it follows that (using f 0 = π u ∗ ,v ∗ g ⩽ π u ∗ ,v ∗ f ) S ′ ( π v ∗ g ) ⩽ (1 − δ t, 1) J T v ∗ ( f , m 1 ) . Since g is an εt -pruning of f , we conclude the proof in Case 2. 37 6.3 T ree-F reeness and Cliques Lemma 6.11 provides the birthday-parado x to ol that we need for pro ving the upp er b ounds on testing tree-freeness (Theorem 6.13 ) and testing cliques (Theorem 6.15 ). In the pro of of Theorem 6.13 , we will use the following notation (similar notations hav e b een deﬁned in Section 3.1 and used in Section 5 ). Deﬁnition 6.12. F or a ﬁxed p ositive in teger n and a ﬁxed tree H with t edges, w e deﬁne T ree H ( n ) to b e the collection of all t -edge subsets E ⊆  [ n ] 2  suc h that the graph ( V ( E ) , E ) is isomorphic to H , where V ( E ) denotes the set of vertices inciden t to some edge in E . Theorem 6.13. L et H b e a ﬁxe d tr e e with t e dges, and let ε ∈ (0 , 1) b e a c onstant. Supp ose p ∈ [0 , 1] ( [ n ] 2 ) is a sub-pr ob ability ve ctor that is ε -far fr om H -fr e e. Then in O ( n ( t − 1) /t /ε ) indep en- dent samples fr om p , with pr ob ability at le ast 2 / 3 ther e exists t sample d e dges forming a sub gr aph isomorphic to H . Pr o of. W e consider a t -uniform h yp ergraph whose vertex set in  [ n ] 2  and whose edge set is the collection of all t -edge subsets E ⊆  [ n ] 2  suc h that the subgraph formed by E is isomorphic to H . Since µ is ε -far from H -free, we can apply Lemma 5.2 to this hypergraph and obtain a sub- probabilit y vector λ = ( λ E ), where E ranges in the collection T ree H ( n ), that satisﬁes the three conditions listed in Lemma 5.2 . 15 Let V be the v ertex set of H . F or eac h E ∈ T ree H ( n ), let V ( E ) ⊆ [ n ] denote the set of vertices inciden t to E , and choose an isomorphism map ψ E : V ( E ) → V from the graph ( V ( E ) , E ) to H . Then deﬁne a vector y ( E ) ∈ [ n ] V b y letting y ( E ) v = ψ − 1 E ( v ) ∈ [ n ] for all v ∈ V . It is clear that for an y y ∈ [ n ] V , there is at most one E ∈ T ree H ( n ) such that y = y ( E ) . W e now deﬁne a sub-probability mass function f : [ n ] V → [0 , 1] b y 16 f ( y ) = ( λ E , if y = y ( E ) for some E ∈ T ree H ( n ) , 0 , otherwise . W e thus ha ve X y ∈ [ n ] V f ( y ) = X E ∈ T ree H ( n ) λ E ⩾ ε t , where in the last transition w e used the third condition of the conclusion of Lemma 5.2 . F urther- more, for any edge { a, b } ∈  [ n ] 2  and an y edge ( u, v ) ∈ T , we ha v e π u,v f ( a, b ) = X y ∈ [ n ] V 1 [ y u = a and y v = b ] · f ( y ) ⩽ X E ∈ T ree H ( n ) 1 [ { a, b } ∈ E ] · λ E ⩽ p ab , (6.20) where in the last transition we used the second condition of the conclusion of Lemma 5.2 . No w w e pick an arbitrary vertex v ∗ ∈ V and let T be a directed ro oted tree (with ro ot v ∗ ) on V such that the edges of T (when view ed as undirected edges) coincide with the edges of H . Given a p ositiv e in teger m , let P 1 ( m ) b e the follo wing pro cess: 15 W e only need the second and third conditions for this pro of. 16 Note that f is a sub-probability mass function b ecause P y ∈ [ n ] V f ( y ) = P E ∈ T ree H ( n ) λ E ⩽ 1. 38 1. F or eac h pair ( u, v ) ∈ T , indep enden tly draw m samples from π u,v f , and let X ( u,v ) ⊆ [ n ] 2 b e the set formed by the m samples. 2. Output 1 if there exists a map φ : T → [ n ] 2 suc h that φ is compatible with some vector y ∈ [ n ] V , and φ ( u, v ) ∈ X ( u,v ) for eac h ( u, v ) ∈ T . Otherwise, output 0. By Lemma 6.11 , if n is suﬃcien tly large and C = 288 t 4 ε , m = l C n ( t − 1) /t m , (6.21) there exists an ( ε/ (2 t ))-pruning g of f such that S ′ ( π v ∗ g , ⌈ C ⌉ ) ⩽ (5 / 6 , 1) J T v ∗ ( f , m ) . By the deﬁnition of J T v ∗ ( f , m ) (Deﬁnition 6.10 ), it follows that P  P 1 ( m ) outputs 1  ⩾ P w ∼ J T v ∗ ( f ,m ) h w  =  0 i ⩾ P w ∼ S ′ ( π v ∗ g , ⌈ C ⌉ ) h w  =  0 i − 1 6 ⩾ 2 3 , (6.22) where we used the fact that P n a =1 π v ∗ g ( a ) ⩾ − ε/ (2 t ) + P y ∈ [ n ] V f ( y ) ⩾ ε/ (2 t ) in the last transition. No w consider the following process denoted b y P 2 ( m ): 1. F or each pair ( u, v ) ∈ T , indep enden tly draw m samples from the sub-probability vector p , and let Y ( u,v ) ⊆  [ n ] 2  b e the set formed b y the m samples. 2. Output 1 if there exists a map φ : T →  [ n ] 2  suc h that  φ ( u, v )   ( u, v ) ∈ T  ∈ T ree H ( n ) . Otherwise, output 0. Due to ( 6.20 ), there is an obvious coupling b et w een the pro cesses P 1 ( m ) and P 2 ( m ) under whic h the output of the latter pro cess is alwa ys at least the output of the former. By ( 6.22 ), this means that P 2 ( m ) outputs 1 with probability at least 2 / 3 if m is c hosen as in ( 6.21 ). On the other hand, note that P 2 ( m ) takes a total num b er of tm independent samples from p , and whenever it outputs 1, there are t edges among the tm samples that form a subgraph isomorphic to H . Therefore, we conclude that when n is suﬃciently large, in tm = t ·  288 t 4 ε n ( t − 1) /t  = O ( n ( t − 1) /t /ε ) samples from p , with probabilit y at least 2 / 3 there are t sampled edges forming a subgraph isomor- phic to H . Corollary 6.14. F or any ﬁxe d tr e e H with t e dges, we have sam  G H -free n  ⩽ O ( n ( t − 1) /t /ε ) . Pr o of. Theorem 6.13 implies Corollary 6.14 in the same w ay as Theorem 5.14 implies Corollary 5.15 . W e refer to the pro of of Corollary 5.15 for an outline of the argument. P erhaps somewhat surprisingly , the pro of of the upp er bound for testing cliques follo ws the same route as the pro of of Theorem 6.13 . The reason is that in an y violation h yp ergraph against the prop ert y G cliq n (see Deﬁnition 2.1 for the deﬁnition of violation hypergraphs), all hyperedges corresp ond to length-3 paths in the n -v ertex complete graph (in particular, violation hypergraphs against G cliq n are alwa ys 3-uniform). Since the length-3 path is a tree, the birthday-parado x to ols (sp eciﬁcally , Lemma 6.11 ) we hav e developed for analyzing the tree-freeness tester are also well- suited for analyzing the clique tester. 39 Theorem 6.15. We have sam  G cliq n , ε  ⩽ O ( n 2 / 3 /ε ) . Pr o of. It is easy to see that for any E ⊆  [ n ] 2  , the minimal E -violations (recall Deﬁnition 2.1 ) of G cliq n are exactly the three-edge sets  { a, b } , { b, c } , { c, d }  ⊆  [ n ] 2  suc h that { a, b } , { c, d } ∈ E and { b, c }  = E . 17 W e refer to such three-edge sets as E -alternating p aths . By the discussion in Section 2.1 , it suﬃces to sho w the follo wing for an y ﬁxed E ⊆  [ n ] 2  : if µ is a distribution ov er  [ n ] 2  suc h that µ ( E △ E ′ ) ⩾ ε for any E ′ ∈ G cliq n , then in O ( n 2 / 3 /ε ) independent samples from µ , with probabilit y at least 2 / 3 there are three sampled edges forming an E -alternating path. W e apply Lemma 5.2 to the violation h yp ergraph of E against G cliq n . This yields a sub-probabilit y v ector λ = ( λ P ), where P ranges ov er all E -alternating paths, that satisﬁes the three conditions in Lemma 5.2 . 18 F or each E -alternating path P =  { a, b } , { b, c } , { c, d }  , deﬁne a vector y ( P ) ∈ [ n ] 4 b y letting 19 y ( P ) 1 = a, y ( P ) 2 = b, y ( P ) 3 = c, and y ( P ) 4 = d. W e deﬁne a sub-probability mass function f : [ n ] 4 → [0 , 1] b y f ( y ) = ( λ P , if y = y ( P ) for some E -alternating path P, 0 , otherwise . W e thus ha ve X y ∈ [ n ] 4 f ( y ) = X E -alternating paths P λ P ⩾ ε 3 , where in the last transition w e used the third condition of the conclusion of Lemma 5.2 . F urther- more, for any edge { a, b } ∈  [ n ] 2  and an y j ∈ { 1 , 2 , 3 } , we ha ve π j,j +1 f ( a, b ) = X y ∈ [ n ] 4 1 [ y j = a and y j +1 = b ] · f ( y ) ⩽ X E -alternating paths P 1 [ { a, b } ∈ P ] · λ P ⩽ µ ( { a, b } ) , where in the last transition we used the second condition of the conclusion of Lemma 5.2 . The rest of the proof is en tirely analogous to the proof of Theorem 6.13 and is thus omitted. 20 17 Note that here a and d are not necessarily distinct. 18 As in the pro of of Theorem 6.13 , we only need the second and third conditions. 19 Here one can order the four vertices either as a, b, c, d or as d, c, b, a . 20 The main idea is to apply Lemma 6.11 to the directed ro oted tree T = { (1 , 2) , (2 , 3) , (3 , 4) } with ro ot 4. 40 7 Lo w er Bounds for Subgraph-F reeness In this section, w e prov e the sample complexity lo wer b ounds for testing triangle-freeness, square- freeness and tree-freeness, stated in ( 1.2 ), ( 1.3 ) and Theorem 1.8 , resp ectiv ely . As is the case with upp er bounds (see Section 6 ), we will also pro v e the low er b ound for testing cliques (stated in Theorem 1.7 ) in Section 7.3 , along with the low er b ound for tree -freeness, b ecause their proofs are similar to each other. 7.1 T riangle-F reeness Constructions As discussed in Section 2.2.1 , the lo w er b ound for testing triangle-freeness is prov ed b y com bining the Rusza-Szemer´ edi construction (Prop osition 2.7 ) with a standard tec hnique that lifts low er b ounds for one-sided-error tester to tw o-sided-error tester. The technique is reminiscen t of that used in Section 4.2 . Given an edge set E ⊆  [ n ] 2  , w e consider the t wo-fold blow-up of the graph ([ n ] , E ), in whic h each v ertex a ∈ [ n ] is replaced b y a pair of copies. F or an y tw o suc h pairs corresp onding to v ertices a, b ∈ [ n ] with { a, b } ∈ E , the blo w-up graph contains all four poss ible edges betw een the t w o pairs. The key idea is to retain exactly tw o of these four edges for each { a, b } ∈ E . The structure of the resulting graph can then v ary in an interesting w ay , dep ending on how the t wo edges are selected in each case. W e formalize this op eration in the follo wing deﬁnition. Deﬁnition 7.1. F or an y edge set E ⊆  [ n ] 2  and an y vector y ∈ F E 2 , w e deﬁne an edge set R y ( E ) = n  ( a, t ) , ( b, y ab + t )     { a, b } ∈ E and t ∈ F 2 o ⊆  [ n ] × F 2 2  . o v er the vertex set [ n ] × F 2 . Note that the vector y ∈ F E 2 sp eciﬁes for each { a, b } ∈ E ho w tw o of the four edges b etw een the a -copies ( a, 0) , ( a, 1) and the b -copies ( b, 0) , ( b, 1) are selected. The main observ ation is that if every edge in E is con tained in exactly one triangle, then w e can easily mak e R y ( E ) either triangle-free or far-from triangle-free, by pic king suitable v ectors y for each case. Deﬁnition 7.2. Supp ose E ⊆  [ n ] 2  is an edge set such that ev ery edge in E is con tained in exactly one triangle. W e deﬁne tw o collections of vectors Y yes △ ( E ) and Y no △ ( E ) by Y yes △ ( E ) = n y ∈ F E 2    y ab + y bc + y ca = 1 for all triangles {{ a, b } , { b, c } , { c, a }} ⊆ E o , and Y no △ ( E ) = n y ∈ F E 2    y ab + y bc + y ca = 0 for all triangles {{ a, b } , { b, c } , { c, a }} ⊆ E o , Prop osition 7.3. Suppose E ⊆  [ n ] 2  is an edge set suc h that every edge in E is con tained in exactly one triangle. W e hav e (1) F or an y y ∈ Y yes △ ( E ), the edge set R y ( E ) is triangle-free. (2) F or any y ∈ Y no △ ( E ), the edge set R y ( E ) is the edge-disjoint union of 2 | E | / 3 triangles. Consequen tly , we ha ve | R y ( E ) \ E ′ | ⩾ | R y ( E ) | / 3 for an y triangle-free edge set E ′ ⊆  [ n ] × F 2 2  . Pr o of. The second statement is obvious. F or the ﬁrst statement, it suﬃces to note that for any y ∈ F E 2 , an y triangle in R y ( E ) must “pro jects” to a triangle in E under the canonical pro jection map from the vertex set [ n ] × F 2 to the vertex set [ n ]. 41 W e next sho w that when y is randomized in either Y yes △ ( E ) or Y no △ ( E ), it is imp ossible to distinguish the tw o cases apart if one is only given o ( | E | 2 / 3 ) edge samples from R y ( E ). Lemma 7.4. Fix an e dge set E ⊆  [ n ] 2  such that every e dge in E is c ontaine d in exactly one triangle. Supp ose ther e is a r andomize d map A :  [ n ] × F 2 2  m → { 0 , 1 } that satisﬁes the fol lowing. (1) F or a uniformly r andom y ∈ Y yes △ ( E ) and indep endent e dge samples e 1 , . . . , e m ∈ R y ( E ) , we have P [ A ( e 1 , . . . , e m ) = 1] ⩾ 2 / 3 . (2) F or a uniformly r andom y ∈ Y no △ ( E ) and indep endent e dge samples e 1 , . . . , e m ∈ R y ( E ) , we have P [ A ( e 1 , . . . , e m ) = 0] ⩾ 2 / 3 . Then we must have m ⩾ | E | 2 / 3 . Pr o of. In the tw o assumptions on A stated in the lemma, the input ( e 1 , . . . , e m ) to A follo w t w o diﬀeren t distributions. It suﬃces to sho w that these tw o distributions ov er  [ n ] × F 2 2  m , whic h we denote by D yes and D no , respectively , ha ve total v ariation distance less than 1 / 3 if m < | E | 2 / 3 . Both D yes and D no can b e alternatively generated by ﬁrst sampling edges { u 1 , v 1 } , . . . , { u m , v m } uniformly at random from E and then letting e i =  ( u i , t i ) , ( v i , s i )  for some suitably chosen s i , t i ∈ F 2 for all i ∈ [ m ]. Note that the ﬁrst step (c ho osing u i ’s and v i ’s) is identical for D yes and D no , while the second step may b e implemented diﬀeren tly for the tw o. F urthermore, if the collection  { u 1 , v 1 } , . . . , { u m , v m }  sampled in the ﬁrst step do es not contain a triangle, the second step is also identical for D yes and D no . Since  { u 1 , v 1 } , . . . , { u m , v m }  con tains a triangle with probabilit y at most (by union bound) | E | 3 · m 3 | E | 3 = 1 3 m 3 | E | − 2 , w e hav e ∥D yes − D no ∥ TV ⩽ 1 3 m 3 | E | − 2 < 1 3 if m < | E | 2 / 3 . Corollary 7.5. We have sam  G tri 2 n , 1 / 3  ⩾ n 4 / 3 exp  − O  √ log n  . Pr o of. W e use Prop osition 2.7 to obtain an edge set E ⊆  [ n ] 2  in which ev ery edge is contained in exactly one triangle, and | E | = ex =1 ( n, C 3 ) = n 2 exp  − O  √ log n  . F or an y y ∈ Y yes △ ( E ), the graph R y ( E ) is triangle-free by Proposition 7.3 (1). On the other hand, it follows from Prop osition 7.3 (2) that if w e let µ y denote the uniform distribution o ver R y ( E ) (considered as an edge set o ver [2 n ]), then µ y  R y ( E ) △ E ′  ⩾ 1 3 for an y y ∈ Y no △ ( E ) and an y E ′ ∈ G tri 2 n . Therefore, an y sample-based distribution-free tester for G tri 2 n with proximit y parameter ε = 1 / 3 and sample complexity m , when considered as a randomized map A :  [ n ] × F 2 2  m → { 0 , 1 } , m ust satisfy the conditions of Lemma 7.4 and hence m ⩾ | E | 2 / 3 = n 4 / 3 exp  − O  √ log n  . 7.2 Square-F reeness Constructions As in Section 7.1 , it suﬃces to pro v e for any positive in teger n that sam  G squ 2 n , 1 / 4  ⩾  ex =1 ( n, C 4 )  3 / 4 . (7.1) 42 The desired low er b ound sam  G squ 2 n , 1 / 4  ⩾ n 9 / 8 exp  − O  p log n  then follows b y plugging Prop osition 2.9 into ( 7.1 ). The pro of of ( 7.1 ) is essentially the same as the corresp onding pro of for triangle-freeness in Section 7.1 . In particular, for any edge set E ⊆  [ n ] 2  in whic h every edge is con tained in exactly one square, we can deﬁne t wo collections of vectors Y yes □ ( E ) , Y no □ ( E ) ⊆ F E 2 b y requiring their members y to satisfy y ab + y bc + y cd + y da = 0 (resp ectiv ely , = 1) for all squares {{ a, b } , { b, c } , { c, d } , { d, a }} ⊆ E . The imp ortan t observ ation is that for an y y ∈ F E 2 and any edge set E ⊆  [ n ] 2  , any square in R y ( E ) must “pro jects” to a square in E under the canonical pro jection map [ n ] × F 2 → [ n ]. 21 The rest of the argumen t is en tirely analogous to Section 7.1 , and th us w e omit the pro of of ( 7.1 ). In the rest of this subsection, we sk etch the proof of Prop osition 2.9 that is implicit in the pap er b y Timmons and V erstra¨ ete [ TV15 ]. As is the case with the proof of Prop osition 2.7 b y [ RS78 ], the construction of graphs in whic h ev ery edge is contained in exactly one square relies on additive combinatorics. While the Ruzsa- Semer ´ edi construction for ex =1 ( n, C 3 ) is based on in teger sets without 3-term arithmetic progres- sions, Timmons and V erstra ¨ ete [ TV15 ] observ ed that one can similarly obtain constructions for ex =1 ( n, C 4 ) using certain in teger sets known as k -fold Sidon sets , which w ere ﬁrst deﬁned b y Lazeb- nik and V erstra¨ ete [ L V03 ]. Deﬁnition 7.6. Let c 1 , . . . , c r b e nonzero in tegers suc h that P r i =1 c i = 0. Giv en an Ab elian group Γ, a solution ( a 1 , . . . , a r ) ∈ Γ r to the equation c 1 x 1 + · · · + c r x r = 0 is called a trivial solution if there exists a partition of [ r ] in to nonempty sets T 1 , . . . , T m suc h that for ev ery i ∈ [ m ], w e hav e P j ∈ T i c j = 0 and a j 1 = a j 2 whenev er j 1 , j 2 ∈ T i . Deﬁnition 7.7 ([ L V03 ]) . Let k be a p ositiv e in teger and let Γ b e an Ab elian group. A subset A ⊆ Γ is called a k -fold Sidon set if any solution ( a 1 , . . . , a 4 ) ∈ A 4 to an y equation of the form c 1 x 1 + c 2 x 2 + c 3 x 3 + c 4 x 4 = 0 , where c 1 , . . . , c 4 are integers such that | c i | ⩽ k for all i ∈ [4] and c 1 + c 2 + c 3 + c 4 = 0, must b e trivial. Prop osition 7.8 ([ TV15 , Theorem 7.1]) . Supp ose n is a p ositiv e integer not divisible by 2 or 3, and Γ is an Ab elian group of order n . If A ⊆ Γ is a 3-fold Sidon set, w e hav e ex =1 (4 n, C 4 ) ⩾ 4 n | A | . Pr o of. W e construct a graph with v ertex set Γ × [4] where each t wo v ertices ( x, i ) , ( y , j ) ∈ Γ × [4] are connected by an edge if and only if { i, j } ∈ {{ 1 , 3 } , { 1 , 4 } , { 2 , 3 } , { 2 , 4 }} and y − x = ( j − i ) a for some a ∈ A. The n um b er of edges in this graph is 4 n | A | . F urthermore, using the condition that A is a 3-fold Sidon set, it is easy to see that ev ery edge in this graph is con tained in exactly one square. 21 Note the this argument w ould fail if w e were considering the property C 6 -freeness, because a 6-cycle in R y ( E ) do es not necessarily pro ject to a 6-cycle in E (there may b e rep eated vertices after the pro jection). 43 In ligh t of Proposition 7.8 and the prime n umber theorem for arithmetic progressions, to pro ve Prop osition 2.9 it suﬃces to prov e the following lemma: Lemma 7.9. Supp ose p is a prime numb er such that p ≡ ± 5 (mo d 12) . Then ther e is a 3 -fold Sidon set A ⊆ F 2 p (her e F 2 p is an A b elian gr oup under addition) of c ar dinality at le ast p · exp  − O  √ log p  . Pr o of Sketch. As p oin ted out in [ CT14 ], this can b e pro ved by adapting Ruzsa’s pro of of [ Ruz93 , Theorem 7.3]. F or eac h a ∈ F p , let f ( a ) = ( a, a 2 ) ∈ F 2 p . F or an y nonzero integers c 1 , c 2 , c 3 , c 4 ∈ [ − 3 , 3] such that c 1 + c 2 + c 3 + c 4 = 0, consider solutions ( a 1 , a 2 , a 3 , a 4 ) ∈ F 4 p to the equation c 1 f ( x 1 ) + c 2 f ( x 2 ) + c 3 f ( x 3 ) + c 4 f ( x 4 ) = 0 . (7.2) A solution ( a 1 , a 2 , a 3 , a 4 ) to ( 7.2 ) is said to b e a trivial solution if ( f ( a 1 ) , f ( a 2 ) , f ( a 3 ) , f ( a 4 )) is a trivial solution to the linear equation c 1 x 1 + c 2 x 2 + c 3 x 3 + c 4 x 4 = 0 (as per Deﬁnition 7.6 ). It no w suﬃces to ﬁnd a set A ⊆ F p of cardinalit y at least p · exp  − O  √ log p  suc h that for an y equation of the form ( 7.2 ) only has trivial solutions in A . F or each individual equation of the form x 1 + x 2 + x 3 = 3 x 4 , or (7.3) d 1 x 1 + d 2 x 2 = ( d 1 + d 2 ) x 3 , where d 1 , d 2 ∈ { 1 , 2 , . . . , 20 } , (7.4) b y Behrend’s construction [ Beh46 ] there is a set B ⊆ F p of cardinality at least p · exp  − O  √ log p  in whic h it has no non trivial solutions. By taking random translations of all these individual sets B and in tersecting them, one gets a (random) set A ⊆ F p with (expected) size at least p · exp  − O  √ log p  in whic h no equation of the form ( 7.3 ) or ( 7.4 ) has nontrivial solutions. W e claim that in such sets A , equations of the form ( 7.2 ) also ha v e no nontrivial solutions. Case 1: if one of c 1 , c 2 , c 3 , c 4 has a diﬀeren t sign from the other three, then since c 1 , c 2 , c 3 , c 4 are in tegers in the range [ − 3 , 3], the equation c 1 x 1 + c 2 x 2 + c 3 x 3 + c 4 x 4 = 0 can only b e of the form ( 7.3 ), whic h has no nontrivial solutions in A . Case 2: if t wo of c 1 , c 2 , c 3 , c 4 are p ositive and the other tw o are negativ e, without loss of generalit y assume c 1 , c 2 > 0 and c 3 , c 4 < 0. Using the condition that no equation of the form ( 7.4 ) has non trivial solutions in A , it is easy to see that for an y nontrivial solution ( a 1 , a 2 , a 3 , a 4 ) to ( 7.2 ), the elemen ts a 1 , a 2 , a 3 , a 4 m ust b e pairwise distinct. F urthermore, we ha ve c 1 c 2 ( a 1 − a 2 ) 2 = ( c 1 a 2 1 + c 2 a 2 2 )( c 1 + c 2 ) − ( c 1 a 1 + c 2 a 2 ) 2 = ( c 3 a 2 3 + c 4 a 2 4 )( c 3 + c 4 ) − ( c 3 a 3 + c 4 a 4 ) 2 = c 3 c 4 ( a 3 − a 4 ) 2 . This implies c 1 c 2 c 3 c 4 m ust b e a quadratic residue mo dulo p . Since 3 is not a quadratic residue mo dulo p (due to the condition p ≡ ± 5 (mo d 12)) and since c 1 c 2 c 3 c 4 ∈ { 1 , 4 , 9 , 12 , 16 , 36 , 81 } , it m ust b e the case that c 1 c 2 c 3 c 4 is a p erfect square. Th us the quadratic equation c 1 c 2 ( a 1 − a 2 ) 2 = c 3 c 4 ( a 3 − a 4 ) 2 in v ariables a 1 , a 2 , a 3 , a 4 can be factorized into t wo linear equations. Combining either of the tw o linear equations with the condition that c 1 a 1 + c 2 a 2 + c 3 a 3 + c 4 a 4 = 0, one obtain a linear equation in the v ariables a 1 , a 2 , a 3 . This three-v ariable equation either reduces to a t w o-v ariable equation, whic h w ould force t w o of a 1 , a 2 , a 3 to b e equal, or has the form ( 7.4 ). W e thus reac h the conclusion that ( 7.2 ) has no nontrivial solutions in A . 7.3 T ree-F reeness Constructions In this subsection, we pro ve the lo wer bound part of Theorems 1.7 and 1.8 . W e ﬁrst pro v e the lo wer b ound for testing tree-freeness. 44 Theorem 7.10. L et H b e a ﬁxe d tr e e with t e dges. Then ther e exists a c onstant ε ∈ (0 , 1) such that sam  G H -free n , ε  ⩾ Ω( n ( t − 1) /t ) . Pr o of. The case t = 1 is easy; we assume t ⩾ 2 in the following. Construction. Supp ose H = ( V , T ) is a tree with | T | = t . W e build t wo graphs H (0) and H (1) as follo ws: 1. Initialize H (0) , H (1) to b e empt y graphs (with empty v ertex sets). 2. F or eac h subset T ′ ⊆ T , do the following: • If | T | − | T ′ | is even, add a copy of the graph ( V , T ′ ) to H (0) (so that H (0) gets | V | = t + 1 new v ertices and | T ′ | new edges). • It | T | − | T ′ | is odd, add a copy of the graph ( V , T ′ ) to H (1) (so that H (0) gets | V | = t + 1 new v ertices and | T ′ | new edges). Since there are exactly 2 t − 1 subsets of T with o dd (or ev en) cardinality , b oth H (0) and H (1) ha v e 2 t − 1 ( t + 1) vertices. W e denote r = 2 t − 1 ( t + 1). F or each j ∈ { 0 , 1 } and p ositiv e integer n , let H ( j ) n b e the output distribution of the following pro cess: 1. Initialize G to b e a graph with the vertex set [ rn ] and an empt y edge set. 2. F or eac h i ∈ [ n ], do the follo wing: • Pic k a random bijection φ from the set { ( i − 1) r + 1 , . . . , ir } to the vertex set of H ( j ) . • F or eac h edge { u, v } in H ( j ) , add to G an edge betw een φ − 1 ( u ) and φ − 1 ( v ). 3. Output G . In w ords, a random graph G ∼ H ( j ) n is the v ertex-disjoint union of n copies of H ( j ) , with the vertices of eac h copy randomly permuted. Finally , for each j ∈ { 0 , 1 } and p ositiv e integers n, m , let D ( j ) n,m b e the output distribution of the follo wing pro cess: 1. Sample a graph G ∼ H ( j ) n . 2. Sample m edges e 1 , . . . , e m indep enden tly and uniformly from the edge set of G . 3. Output the sequence ( e 1 , . . . , e m ). A sequence ( e 1 , . . . , e m ) ∈  [ rn ] 2  m sampled from D ( j ) n,m is said to b e wel l-b ehave d if for eac h i ∈ [ n ], there are at most ( t − 1) indices k ∈ [ m ] such that b oth endp oints of e k fall in { ( i − 1) r + 1 , . . . , ir } . In other words, the edge sequence ( e 1 , . . . , e m ) is well-behav ed if no t edges come from the same cop y of H ( j ) . 45 Analysis. F or any edge e ∈ T , there are exactly 2 t − 2 copies of e in b oth H (0) and H (1) . Th us b oth H (0) and H (1) ha v e 2 t − 2 t edges. The main observ ation is that, for an y edge e ∈ T , if w e remo ve all copies of e from H (0) and H (1) , the t wo graphs b ecome isomorphic. F rom this observ ation, it is easy to see that the distributions D (0) 1 ,m and D (1) 1 ,m are identical if m ⩽ t − 1. Consequently , for any p ositiv e integers n and m , a random wel l-b ehave d sample from D (0) n,m is indistinguishable from a random w ell-b eha ved sample from D (1) n,m . F or eac h j ∈ { 0 , 1 } , a random sample ( e 1 , . . . , e m ) ∼ D ( j ) n,m is w ell-b eha v ed with probability at least (using union b ound) 1 − n · m t n t > 2 3 if m < 1 3 n ( t − 1) /t . Therefore, w e hav e    D (0) n,m − D (1) n,m    < 1 3 if m < 1 3 n ( t − 1) /t . (7.5) On the other hand, since H (1) is H -free, an y graph G in the supp ort of the distribution H (1) n is H -free. Since H (0) con tains a cop y of H , for any graph G in the supp ort of H (0) n , at least n edges must be remo ved from G to make it H -free; in other words, the uniform distribution o ver the edge set of G is ε -far from H -free, where ε = 2 − ( t − 2) t − 1 . Therefore, any sample-based distribution- free tester for G H -free rn with pro ximity parameter ε = 2 − ( t − 2) t − 1 and sample complexity m must distinguish D (0) n,m from D (1) n,m with probability at least 2 / 3. By ( 7.5 ), this requires m ⩾ n ( t − 1) /t / 3. W e thus conclude that sam  G H -free rn , 2 − ( t − 2) t − 1  ⩾ 1 3 n ( t − 1) /t . W e next prov e the lo wer b ound for testing cliques, using the techniques in the pro of of Theo- rem 7.10 . Theorem 7.11. We have sam  G cliq 6 n , 1 / 4  ⩾ n 2 / 3 / 3 . Pr o of. Deﬁne tw o edge sets E (0) , E (1) ⊆  [6] 2  as follo ws: E (0) =  { 1 , 2 } , { 2 , 3 } , { 3 , 4 } , { 5 , 6 }  and E (1) =  { 1 , 2 } , { 2 , 3 } , { 4 , 5 } , { 5 , 6 }  . Let D no n b e the output distribution of the following process: 1. F or eac h i ∈ [ n ], pic k a random bijection φ i : { 6 i − 5 , . . . , 6 i } → { 1 , 2 , . . . , 6 } . 2. Deﬁne a function f :  [6 n ] 2  → { 0 , 1 } as follo ws: for any { a, b } ∈  [6 n ] 2  , let f ( { a, b } ) = 1 if and only if { a, b } = φ − 1 i ( { 1 , 2 } ) or { a, b } = φ − 1 i ( { 3 , 4 } ) for some i ∈ [ n ] . 3. Let µ b e the uniform distribution o ver [ i ∈ [ n ] n φ − 1 i ( { 1 , 2 } ) , φ − 1 i ( { 2 , 3 } ) , φ − 1 i ( { 3 , 4 } ) , φ − 1 i ( { 5 , 6 } ) o ⊆  [6 n ] 2  . 4. Output the pair ( f , µ ). Let D yes n b e the output distribution of the following process: 46 1. F or eac h i ∈ [ n ], pic k a random bijection φ i : { 6 i − 5 , . . . , 6 i } → { 1 , 2 , . . . , 6 } . 2. Deﬁne a function f :  [6 n ] 2  → { 0 , 1 } as follo ws: for any { a, b } ∈  [6 n ] 2  , let f ( { a, b } ) = 1 if and only if a, b b elongs to the v ertex set [ i ∈ [ n ] φ − 1 i ( { 1 , 2 , 4 , 5 } ) ⊆ [6 n ] . 3. Let µ b e the uniform distribution o ver [ i ∈ [ n ] n φ − 1 i ( { 1 , 2 } ) , φ − 1 i ( { 2 , 3 } ) , φ − 1 i ( { 4 , 5 } ) , φ − 1 i ( { 5 , 6 } ) o ⊆  [6 n ] 2  . 4. Output the pair ( f , µ ). F or any pair ( f , µ ) is the supp ort of D yes n , the graph  [6 n ] , f − 1 (1)  is a clique (of 4 n v ertices) and th us f ∈ G cliq 6 n . On the other hand, it is easy to see that for any pair ( f , µ ) in the supp ort of D no n , w e hav e P { a,b }∼ µ  f ( { a, b } )  = g ( { a, b } )  ⩾ 1 4 for an y g ∈ G cliq 6 n . Ho w ever, using a birthday-parado x argument similar to the pro of of Theorem 7.10 , one can show that in order to distinguish the no case ( f , µ ) ∼ D no n from the y es case ( f , µ ) ∼ D yes n with probabilit y at least 2 / 3, the num b er of f -lab eled samples tak en from µ m ust be at least n 2 / 3 / 3. W e can thus conclude that sam  G cliq 6 n , 1 / 4  ⩾ n 2 / 3 / 3. 8 Op en Problems Let H b e a nonempt y family of Bo olean-v alued functions on a ﬁnite domain Λ. W e use VC ( H ) to denote the VC-dimension of H . A fundamental result in learning theory (see e.g. [ SB14 ]) is that for any constant ε ∈ (0 , 10 − 2 ) the n umber of f -lab eled samples needed for P A C-learning a function f ∈ H up to error ε is Θ( VC ( H )). 22 It w as sho wn in [ GGR98 , Prop osition 3.1.1] that (sample-based distribution-free) testing cannot b e harder than learning: we ha ve sam ( H , ε ) = O ε ( V C ( H )) for any constan t ε ∈ (0 , 1) . An natural question is, for whic h families H is distribution-free testing m uc h easier than P AC- learning? F or most of the w ell-studied function families H n (indexed by a parameter n growing to inﬁnity), such as linear threshold functions, conjunctions and decision lists on the hypercub e { 0 , 1 } n , there exists some constan t ε ∈ (0 , 1) such that sam ( H n , ε ) = e Ω( V C ( H n )) (see [ BFH21 ] and [ CFP24 , Section 8]). Blais, F erreira Pinto Jr. and Harms [ BFH21 , Section 7] also gav e t wo examples of natural function families H n for whic h there exists c ∈ (0 , 1) suc h that sam ( H n , ε ) = O ε  V C ( H n ) 1 − c  for an y constant ε ∈ (0 , 1) . (8.1) In terestingly , in b oth of the examples given by [ BFH21 ], the reason that testing can b e more eﬃcien t than learning seems to b e the birthday p ar adox . Note that for the subgraph-freeness prop ert y G H -free n deﬁned in the statement of Theorem 1.8 , w e ha ve VC ( G H -free n ) = ex( n, H ). Theorems 1.5 , 1.8 and 2.5 imply that ( 8.1 ) holds also for the subgraph-freeness prop ert y H n = G H -free n if H is a square, a tree with at least 2 edges 23 , or a non-bipartite graph. 24 F urthermore, our pro ofs seem to suggest that the reason we hav e ( 8.1 ) is 22 Ev en if query is allow ed (see Remark 1 ), the query complexity of P A C-learning is still Θ( VC ( H )) [ T ur93 ]. 23 It is well-kno wn that ex( n, H ) = Θ( n ) for any tree H with at least 2 edges. 24 F or non-bipartite graphs H w e easily hav e ex( n, H ) = Ω( n 2 ). 47 again (v arian ts of ) the birthday parado x. This motiv ates the following conjecture: Conjecture 8.1. F or any c onne cte d simple gr aph H with at le ast 2 e dges, ther e exists a c onstant c ∈ (0 , 1) such that sam  G H -free n , ε  = O ε  ex( n, H ) 1 − c  for any c onstant ε ∈ (0 , 1) . As discussed in Section 2.1 , if w e restrict to sample-based testers with one-sided error, the tester m ust essen tially b e the “canonical” one. F or tw o-sided-error testers, it is sligh tly less clear what is the b est algorithm. Can there b e a b etter t wo-sided error tester for some prop erties? Problem 8.2. Do es there exist a connected simple graph H suc h that testing H -freeness of edge distributions is muc h easier for tw o-sided-error testers than for one-sided-error testers, in terms of sample complexit y? Another w ell-studied class of graph prop erties is the (homogeneous) partition properties [ FR21 ]. Giv en a symmetric 0 / 1-matrix A ∈ { 0 , 1 } k × k and a graph G = ([ n ] , E ), w e sa y that G has the prop ert y G A -part n if there is a partition of the vertex set φ : [ n ] → [ k ] such that for an y { a, b } ∈  [ n ] 2  , w e hav e { a, b } ∈ E if and only if A φ ( a ) ,φ ( b ) = 1. W e p ose the following question: Problem 8.3. Determine the sample complexit y sam  G A -part n , ε  asymptotically in n for any ﬁxed symmetric 0 / 1-matrix A . Note that the clique prop ert y G cliq n studied in Theorem 1.7 coincides with G A -part n for A =  1 0 0 0  . The p ow er of “query access” in edge-distribution-free property testing has been left unexplored b y this work. W e p ose the follo wing questions: Problem 8.4. If edge-query is allo wed as in Remark 1 , can triangle-freeness b e tested in n 4 / 3 − Ω(1) queries? Can bipartiteness b e tested in n 1 − Ω(1) queries? Problem 8.5. If edge-query is allo w ed as in Remark 1 , what is the query complexity of testing threshold graphs (see Section 1.4 for the motiv ation)? Ac kno wledgemen ts The author w ould lik e to thank Ronitt Rubinfeld and Asaf Shapira for man y stim ulating discussions during the dev elopmen t of this work, esp ecially for bringing the pap ers [ AKKR08 ] and [ TV15 ] to his atten tion. References [AKKR08] Noga Alon, T ali Kaufman, Mic hael Krivelevic h, and Dana Ron. T esting triangle-freeness in general graphs. SIAM Journal on Discr ete Mathematics , 22(2):786–819, 2008. [Beh46] F elix A Behrend. On sets of integers whic h contain no three terms in arithmetical progression. Pr o c e e dings of the National A c ademy of Scienc es , 32(12):331–332, 1946. [BFH21] Eric Blais, Renato F erreira Pin to Jr, and Nathaniel Harms. Vc dimension and distribution-free sample-based testing. In Pr o c e e dings of the 53r d Annual A CM SIGA CT Symp osium on The ory of Computing , pages 504–517, 2021. 48 [BFR + 00] T ugk an Batu, Lance F ortnow, Ronitt Rubinfeld, W arren D Smith, and Patric k White. T esting that distributions are close. In Pr o c e e dings 41st Annual Symp osium on F oun- dations of Computer Scienc e , pages 259–269. IEEE, 2000. [BKR04] T ugk an Batu, Ra vi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimo dal distributions. In Pr o c e e dings of the thirty-sixth annual ACM symp osium on The ory of c omputing , pages 381–390, 2004. [Bro66] William G Bro wn. On graphs that do not con tain a thomsen graph. Canadian Mathe- matic al Bul letin , 9(3):281–285, 1966. [Can22] Cl ´ ement L Canonne. T opics and techniques in distribution testing: A biased but rep- resen tativ e sample. F oundations and T r ends ® in Communic ations and Information The ory , 19(6):1032–1198, 2022. [CF13] Da vid Conlon and Jacob F ox. Graph remo v al lemmas. Surveys in c ombinatorics , 409:1– 49, 2013. [CFP24] Xi Chen, Y umou F ei, and Sh yamal P atel. Distribution-free testing of decision lists with a sublinear n umber of queries. In Pr o c e e dings of the 56th A nnual A CM Symp osium on The ory of Computing , pages 1051–1062, 2024. [CP22] Xi Chen and Shy amal Patel. Distribution-free testing for halfspaces (almost) requires pac learning. In Pr o c e e dings of the 2022 Annual ACM-SIAM Symp osium on Discr ete A lgorithms (SODA) , pages 1715–1743. SIAM, 2022. [CT14] Ja vier Cilleruelo and Craig Timmons. k -fold sidon sets. The Ele ctr onic Journal of Combinatorics , pages P4–12, 2014. [CX16] Xi Chen and Jinyu Xie. Tigh t b ounds for the distribution-free testing of monotone conjunctions. In Pr o c e e dings of the Twenty-Seventh Annual A CM-SIAM Symp osium on Discr ete Algorithms , pages 54–71. SIAM, 2016. [DR11] Ely a Dolev and Dana Ron. Distribution-free testing for monomials with a sublinear n um b er of queries. The ory of Computing , 7(1):155–176, 2011. [ER TS66] P´ al Erd˝ os, Alfr´ ed R ´ en yi, and V era T S´ os. On a problem of graph theory . Studia Scientiarum Mathematic arum Hungaric a , 1:215–235, 1966. [FH25] Renato F erreira Pinto Jr and Nathaniel Harms. T esting supp ort size more eﬃcien tly than learning histograms. In Pr o c e e dings of the 57th Annual ACM Symp osium on The ory of Computing , pages 995–1006, 2025. [FLN + 02] Eldar Fischer, Eric Lehman, Ilan Newman, Sofy a Raskho dnik ov a, Ronitt Rubinfeld, and Alex Samoro dnitsky . Monotonicity testing ov er general p oset domains. In Pr o c e e dings of the thiry-fourth annual ACM symp osium on The ory of c omputing , pages 474–483, 2002. [FR21] Nimro d Fiat and Dana Ron. On eﬃcient distance appro ximation for graph prop erties. In Pr o c e e dings of the 2021 ACM -SIAM Symp osium on Discr ete A lgorithms (SODA) , pages 1618–1637. SIAM, 2021. 49 [GGR98] Oded Goldreich, Shari Goldwasser, and Dana Ron. Prop ert y testing and its connection to learning and approximation. Journal of the ACM (JACM) , 45(4):653–750, 1998. [Gol19] Oded Goldreich. T esting graphs in v ertex-distribution-free mo dels. In Pr o c e e dings of the 51st Annual ACM SIGACT Symp osium on The ory of Computing , pages 527–534, 2019. [GR16] Oded Goldreic h and Dana Ron. On sample-based testers. ACM T r ansactions on Com- putation The ory (TOCT) , 8(2):1–54, 2016. [GS09] Dana Glasner and Ro cco A Servedio. Distribution-free testing low er b ound for basic b oolean functions. The ory of Computing , 5(1):191–216, 2009. [GS19] Lior Gishboliner and Asaf Shapira. T esting graphs against an unknown distribution. In Pr o c e e dings of the 51st Annual ACM SIGA CT Symp osium on The ory of Computing , pages 535–546, 2019. [HK08] Shirley Halevy and Eyal Kushilevitz. Distribution-free connectivity testing for sparse graphs. Algorithmic a , 51(1):24–48, 2008. [KST54] P K˝ ov´ ari, V era T S´ os, and P´ al T ur´ an. On a problem of zarankiewicz. In Col lo quium Mathematicum , v olume 3, pages 50–57. P olsk a Ak ademia Nauk, 1954. [L V03] F elix Lazebnik and Jacques V erstra ¨ ete. On h yp ergraphs of girth ﬁve. the ele ctr onic journal of c ombinatorics , pages R25–R25, 2003. [RS78] Imre Z Ruzsa and Endre Szemer ´ edi. T riple systems with no six p oin ts carrying three triangles. Combinatorics (Keszthely, 1976), Col l. Math. So c. J. Bolyai , 18(939-945):2, 1978. [Ruz93] Imre Z Ruzsa. Solving a linear equation in a set of integers i. A cta arithmetic a , 65(3):259–282, 1993. [SB14] Shai Shalev-Shw artz and Shai Ben-Da vid. Understanding machine le arning: F r om the- ory to algorithms . Cam bridge universit y press, 2014. [Sha22] Asaf Shapira. Lo cal-vs-global com binatorics. In Pr o c e e dings of the international c ongr ess of mathematicians , volume 6, pages 4682–4708, 2022. [Sol11] J Solymosi. C4 remo v al lemma for sparse graphs in: Open problem session, mathema- tisc hes forsch ungsinstitut ob erw olfac h. T echnical report, Rep ort, 2011. [sub] List of open problems in sublinear algorithms: Problem 99. https://sublinear.info/ 99 . [T ur93] Gy¨ orgy T ur´ an. Low er b ounds for pac learning with queries. In Pr o c e e dings of the sixth annual c onfer enc e on Computational le arning the ory , pages 384–391, 1993. [TV15] Craig Timmons and Jacques V erstra¨ ete. A coun terexample to sparse remo v al. Eur op e an journal of c ombinatorics , 44:77–86, 2015. [V er16] Jacques V erstra ¨ ete. Extremal problems for cycles in graphs. In R e c ent tr ends in c om- binatorics , pages 83–116. Springer, 2016. [VV17] Gregory V alian t and Paul V alian t. Estimating the unseen: improv ed estimators for en trop y and other prop erties. Journal of the ACM (JACM) , 64(6):1–41, 2017. 50 A Pro of of Prop osition 1.4 Pr o of of Pr op osition 1.4 . T o obtain the ﬁrst inequalit y , note that a sample x from µ is e quiv alent to an f -lab eled sample ( x, f ( x )) with x ∼ µ , if f is the indicator function of supp( µ ). This obviously giv es a reduction from the distribution testing problem in Deﬁnition 1.2 to the function testing problem in Deﬁnition 1.3 . T o obtain the second inequalit y , consider an algorithm A testing whether supp( µ ) ∈ H using m := dsam ( H , ε ) samples from any µ . T o test whether f − 1 (1) ∈ H with resp ect to µ in the sense of Deﬁnition 1.3 , we run the follo wing pro cedure: 1. T ak e samples  x ( i ) , f ( x ( i ) )  for 1 ⩽ i ⩽ m ′ = ⌈ 18 m/ε ⌉ , where each x ( i ) is dra wn from µ . 2. If the n um b er of i ∈ [ m ′ ] suc h that f ( x ( i ) )  = 0 is at most 9 m ⩽ εm ′ / 2, w e accept. 3. Otherwise, we take the ﬁrst 9 m samples x ( i ) suc h that f ( x ( i ) )  = 0 and group them in to 9 batc hes of size m . These are 9 batc hes of indep endent samples from µ conditioned on the set f − 1 (1). W e then run A on these 9 batches and take the ma jority vote to test whether the supp ort of this conditional distribution (denoted by µ ′ ) b elongs to H . Completeness of the reduction: if f ∈ H , then since H is do wn ward-closed we hav e supp( µ ′ ) ∈ H , and th us step 3 accepts with probabilit y at least 2 / 3. Soundness of the reduction: If f is ε -far from H with resp ect to µ , then since the identically- zero function b elongs to H we hav e P x ∼ µ [ f ( x )  = 0] ⩾ ε , and thus step 2 passes with probabilit y at most 1 / 9. W e claim that ∥ µ ′ − ν ∥ TV ⩾ ε for an y distribution ν ov er Λ suc h that supp( ν ) ∈ H ; in that case, step 3 passes with probabilit y at most 1 / 6 and the ov erall rejection probabilit y of our pro cedure is at least 1 − 1 / 9 − 1 / 6 = 2 / 3. Suppose that ∥ µ ′ − ν ∥ TV < ε for some ν with supp( ν ) ∈ H . Since supp( µ ′ ) ⊆ f − 1 (1) and H is do wn ward-closed, we can obviously assume supp( ν ) ⊆ f − 1 (1). Let g b e the indicator function of supp( ν ) and we ha ve P x ∼ µ [ f ( x )  = g ( x )] ⩽ P x ∼ µ ′ [ f ( x )  = g ( x )] = P x ∈ µ ′ [ x ∈ supp( ν )] ⩽ ∥ µ ′ − ν ∥ TV < ε, con tradicting the assumption that f is ε -far from H with resp ect to µ . 51

Testing Properties of Edge Distributions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment