TV over Bernoulli products: the small parameter regime

TV O VER BERNOULLI PR ODUCTS: THE SMALL P ARAMETER REGIME ARIEL A VIT AL, AR YEH K ONTORO VICH, AND GEOR GE SALAF A TINOS Abstract. W e study the total v ariation distance (TV) betw een tw o n - fold Bernoulli pro duct measures parametrized by p = ( p 1 , . . . , p n ) and q = ( q 1 , . . . , q n ), resp ectiv ely , in the tiny and smal l regimes. In the tiny regime, we ha ve p i , q i ≲ 1 /n 2 , and in the small regime, p i , q i ≲ 1 /n . W e disco ver that in the tiny regime, the TV distance b eha ves as ∥ p − q ∥ 1 , while in the small regime, it behav es as n X i =1    p i Y j  = i (1 − p j ) − q i Y j  = i (1 − q j )    , b oth up to absolute constan ts. Along the wa y w e discov er some iden tities of p ossible indep endent in terest. 1. Introduction F or p , q ∈ [0 , 1] n , consider the Bernoulli pro duct measures Ber( p ) := Ber( p 1 ) ⊗ · · · ⊗ Ber( p n ) , Ber( q ) := Ber( q 1 ) ⊗ · · · ⊗ Ber( q n ) , on the Hamming cub e { 0 , 1 } n . The total v ariation distance TV(Ber( p ) , Ber( q )) = 1 2 X x ∈{ 0 , 1 } n | Ber( p )( x ) − Ber( q )( x ) | is both fundamen tal and notoriously diﬃcult to compute exactly . F rom the algorithmic persp ective, computing TV(Ber( p ) , Ber( q )) exactly is #P- hard in general [1], motiv ating a series of eﬃcient approximation schemes for pro duct measures [2, 3, 4]. F rom the analytic p ersp ectiv e, TV is often b ounded b y more tractable div ergences (KL, Hellinger, χ 2 , etc.), whose tensorization prop erties are well understo o d; see [5] for a recen t discussion in the presen t setting. This work contin ues the program initiated in [5, 6] of appro ximating the TV o v er pro duct measures in terms of readily computable elemen tary func- tions; the forthcoming paper [7] further builds on the results w e pro v e here. Our p oin t of departure is Theorems 1.1 and 1.2 of [5], which show, resp ec- tiv ely , that TV(Ber( p ) , Ber( q )) ≳ ∥ p − q ∥ 2 , 2020 Mathematics Subje ct Classiﬁc ation. 60E05; 60C05. Key wor ds and phr ases. total v ariation distance; Bernoulli pro duct measures; small probabilities; P oisson–binomial. 1 2 ARIEL A VIT AL, AR YEH KONTOR OVICH, AND GEOR GE SALAF A TINOS and TV(Ber( p ) , Ber( q )) ≤ ∥ p − q ∥ 2 , the latter under the additional assumption that p , q are symmetric ab out 1 / 2 (i.e., p = 1 − q ). In the App endix, w e extend the argument to quasi- symmetric pairs, obtaining the upper b ound with an extra factor √ 2. T ak en together, these results indicate that in a neigh b orhoo d of 1 / 2, TV b ehav es lik e ℓ 2 , up to universal constants. The fo cus of the presen t note is complementary: we turn to the opp osite extreme of the parameter space, namely p i , q i v ery close to 0 (and, b y sym- metry , very close to 1), and identify regimes in whic h TV  Ber( p ) , Ber( q )  admits a particularly simple description. A key to ol is the slic e de c omp o- sition : if ∆ k ( p , q ) denotes the total absolute discrepancy b et w een the tw o measures on the k -th Hamming slice, then 2 TV(Ber( p ) , Ber( q )) = n X k =0 ∆ k ( p , q ) , (1.1) so controlling TV reduces to understanding which slices dominate. Main results. F or a subset S ⊆ [ n ] and a parameter v ector y ∈ [0 , 1] n , let P S ( y ) denote the mass assigned by Ber( y ) to the atom with ones in S (see (2.1) b elo w), and deﬁne the slice discrepancies ∆ k ( p , q ) as in Deﬁni- tion 2.1. Our ﬁrst theorem sho ws that in the tiny regime, the TV distance is equiv alen t to ∥ p − q ∥ 1 up to constan ts. Theorem 1.1 (Tiny regime: ℓ 1 geometry) . F or p , q ∈ [0 , 1 /n 2 ] n , 1 4 ∥ p − q ∥ 1 ≤ TV (Ber( p ) , Ber( q )) ≤ ∥ p − q ∥ 1 . Our second theorem sho ws that in the small regime, the en tire ℓ 1 distance b et w een the pro duct measures is con trolled by the singleton slice. Theorem 1.2 (Small regime: singletons con trol TV) . F or p , q ∈ [0 , 1 / (2 n )] n with n ≥ 2 , write ∆ 1 ( p , q ) = n X i =1    p i Y j  = i (1 − p j ) − q i Y j  = i (1 − q j )    . Then 1 2 ∆ 1 ( p , q ) ≤ TV (Ber( p ) , Ber( q )) ≤  2 − 1 n  ∆ 1 ( p , q ) . 2. Definitions and not a tion W e write A ≲ B to mean A ≤ C B for a universal constant C > 0, and A ≳ B similarly; A ≍ B means b oth A ≲ B and B ≲ A . F or integer n ≥ 1, TV DIST ANCE FOR BERNOULLI PR ODUCTS 3 [ n ] := { 1 , 2 , . . . , n } . F or a parameter v ector y = ( y 1 , . . . , y n ) ∈ [0 , 1] n and a subset S ⊆ [ n ], deﬁne the product Bernoulli mass (2.1) P S ( y ) := Y i ∈ S y i Y i / ∈ S (1 − y i ) . In particular, P ∅ ( y ) = Q n i =1 (1 − y i ) is the probability of the all-zeros atom. Deﬁnition 2.1 (Slice discrepancies) . F or p , q ∈ [0 , 1] n and S ⊆ [ n ], deﬁne δ S ( p , q ) := P S ( p ) − P S ( q ) . F or eac h k ∈ { 0 , 1 , . . . , n } deﬁne the absolute k -slice discrepancy ∆ k ( p , q ) := X S ⊆ [ n ]: | S | = k | δ S ( p , q ) | . W e will often abbreviate δ S := δ S ( p , q ) and ∆ k := ∆ k ( p , q ) when the pair ( p , q ) is clear. In particular (1.1) holds. W e also write x := p − q , x i := p i − q i , ∥ p − q ∥ 1 = n X i =1 | x i | . 3. Proof of Theorem 1.1 In this section we assume (3.1) p , q ∈ h 0 , 1 n 2 i n . Only the lo wer b ound 1 4 ∥ p − q ∥ 1 ≤ TV(Ber( p ) , Ber( q )) requires pro of; the upp er b ound TV(Ber( p ) , Ber( q )) ≤ ∥ p − q ∥ 1 is classic [5, Disp. (1.4)]. F or eac h i ∈ [ n ], deﬁne P − i ( y ) := Y j  = i (1 − y j ) , so that P { i } ( y ) = y i P − i ( y ) . Lemma 3.1. Under (3.1) , for every i ∈ [ n ] , P − i ( p ) = Y j  = i (1 − p j ) ≥ 3 4 . Pr o of. If n = 1, the pro duct is empty and equals 1. Assume n ≥ 2. Since p j ≤ 1 /n 2 , we hav e P − i ( p ) ≥  1 − 1 n 2  n − 1 . F or n = 2 this equals 3 / 4. F or n ≥ 3, Bernoulli’s inequality gives  1 − 1 n 2  n − 1 ≥ 1 − n − 1 n 2 = n 2 − n + 1 n 2 ≥ 3 4 , since 4( n 2 − n + 1) ≥ 3 n 2 is equiv alen t to ( n − 2) 2 ≥ 0. □ 4 ARIEL A VIT AL, AR YEH KONTOR OVICH, AND GEOR GE SALAF A TINOS Lemma 3.2. Under (3.1) , for every i ∈ [ n ] , | P − i ( p ) − P − i ( q ) | ≤ X k  = i | x k | . Pr o of. If n = 1 then P − 1 ( · ) ≡ 1 and the claim is trivial. Assume n ≥ 2 and consider the segmen t u ( t ) := q + t ( p − q ). The function t 7→ P − i ( u ( t )) is diﬀeren tiable, and by the mean v alue theorem P − i ( p ) − P − i ( q ) = X k  = i  − Y j  = i,k (1 − u j ( ξ ))  x k for some ξ ∈ (0 , 1) . T aking absolute v alues and using 0 ≤ (1 − u j ( ξ )) ≤ 1 giv es | P − i ( p ) − P − i ( q ) | ≤ X k  = i | x k | . □ Pr o of of The or em 1.1. Fix i ∈ [ n ]. Using P { i } ( y ) = y i P − i ( y ) w e expand P { i } ( p ) − P { i } ( q ) = p i P − i ( p ) − q i P − i ( q ) = ( p i − q i ) P − i ( p ) + q i  P − i ( p ) − P − i ( q )  = x i P − i ( p ) + q i  P − i ( p ) − P − i ( q )  . By the rev erse triangle inequality ,   P { i } ( p ) − P { i } ( q )   ≥ | x i | P − i ( p ) − q i | P − i ( p ) − P − i ( q ) | . Apply Lemma 3.1 and Lemma 3.2 to obtain   P { i } ( p ) − P { i } ( q )   ≥ 3 4 | x i | − q i X k  = i | x k | . Summing ov er i yields ∆ 1 ≥ 3 4 n X i =1 | x i | − n X i =1 q i X k  = i | x k | = 3 4 ∥ p − q ∥ 1 − n X k =1 | x k | X i  = k q i . Because q i ≤ 1 /n 2 , for eac h ﬁxed k we ha v e P i  = k q i ≤ ( n − 1) /n 2 ≤ 1 / 4. Therefore the last term is at most 1 4 ∥ p − q ∥ 1 , and ∆ 1 ≥  3 4 − 1 4  ∥ p − q ∥ 1 = 1 2 ∥ p − q ∥ 1 . Finally , TV(Ber( p ) , Ber( q )) = 1 2 n X k =0 ∆ k ( p , q ) ≥ 1 2 ∆ 1 ( p , q ) ≥ 1 4 ∥ p − q ∥ 1 . The upp er b ound TV (Ber( p ) , Ber( q )) ≤ ∥ p − q ∥ 1 is standard and holds for all p , q ∈ [0 , 1] n . □ TV DIST ANCE FOR BERNOULLI PR ODUCTS 5 4. Proof of Theorem 1.2 In this section we assume n ≥ 2 and work in the “small” parameter domain (4.1) p , q ∈ [0 , λ n ] n , λ n := 1 2 n . It is con venien t to also introduce (4.2) β n := λ n 1 − λ n = 1 2 n − 1 . All slice discrepancies ∆ k b elo w are understo o d for the same pair ( p , q ). Theorem 4.1 (∆ 0 b ound) . F or p , q ∈ [0 , λ n ] n , ∆ 0 ≤  2 − 1 n  ∆ 1 = 2 n − 1 n ∆ 1 . Pr o of. Let g ( y ) := P ∅ ( y ) = Q n j =1 (1 − y j ) and deﬁne the segment u ( t ) := q + t ( p − q ). By the fundamental theorem of calculus, P ∅ ( p ) − P ∅ ( q ) = Z 1 0 d dt g ( u ( t )) dt. W rite x k := p k − q k . Since ∂ g /∂ y k ( y ) = − Q j  = k (1 − y j ), d dt g ( u ( t )) = − n X k =1 x k Y j  = k (1 − u j ( t )) . Deﬁne B k := x k Z 1 0 Y j  = k (1 − u j ( t )) dt. Then P ∅ ( p ) − P ∅ ( q ) = − P n k =1 B k , so ∆ 0 = | P ∅ ( p ) − P ∅ ( q ) | ≤ P k | B k | . Next ﬁx k ∈ [ n ] and consider h k ( y ) := P { k } ( y ) = y k Q j  = k (1 − y j ). Diﬀer- en tiating along the same segment giv es δ { k } = h k ( p ) − h k ( q ) = Z 1 0 d dt h k ( u ( t )) dt. A direct computation yields d dt h k ( u ( t )) = x k Y j  = k (1 − u j ( t )) − X m  = k x m u k ( t ) Y j  = k ,m (1 − u j ( t )) . In tegrating and rearranging sho ws B k = δ { k } + X m  = k x m Z 1 0 u k ( t ) Y j  = k ,m (1 − u j ( t )) dt. T aking absolute v alues and summing ov er k giv es n X k =1 | B k | ≤ n X k =1   δ { k }   + n X k =1 X m  = k | x m | Z 1 0 u k ( t ) Y j  = k ,m (1 − u j ( t )) dt. 6 ARIEL A VIT AL, AR YEH KONTOR OVICH, AND GEOR GE SALAF A TINOS F or k  = m we use u k ( t ) Y j  = k ,m (1 − u j ( t )) = u k ( t ) 1 − u k ( t ) Y j  = m (1 − u j ( t )) ≤ β n Y j  = m (1 − u j ( t )) , b ecause u k ( t ) ∈ [0 , λ n ] implies u k ( t ) 1 − u k ( t ) ≤ β n . Therefore | x m | Z 1 0 u k ( t ) Y j  = k ,m (1 − u j ( t )) dt ≤ β n | x m | Z 1 0 Y j  = m (1 − u j ( t )) dt = β n | B m | . Summing ov er k  = m yields n X k =1 X m  = k | x m | Z 1 0 u k ( t ) Y j  = k ,m (1 − u j ( t )) dt ≤ β n ( n − 1) n X m =1 | B m | . Hence n X k =1 | B k | ≤ ∆ 1 + β n ( n − 1) n X k =1 | B k | . Since β n ( n − 1) = n − 1 2 n − 1 < 1, we can absorb to obtain n X k =1 | B k | ≤ 1 1 − β n ( n − 1) ∆ 1 = 2 n − 1 n ∆ 1 . Finally ∆ 0 ≤ P k | B k | gives the claim. □ Theorem 4.2 (∆ 2 / ∆ 1 b ound) . F or p , q ∈ [0 , λ n ] n , ∆ 2 ≤ 3( n − 1) 2(2 n − 1) ∆ 1 . Pr o of. Fix a < b . W rite δ a := δ { a } , δ b := δ { b } , and δ ab := δ { a,b } . W e in tro duce an auxiliary quantit y chosen so that a certain linear com- bination of singleton and doubleton masses isolates δ ab cleanly in o dds co- ordinates: S ( y ; a, b ) := β n  P { a } ( y ) + P { b } ( y )  − 2 P { a,b } ( y ) . A direct expansion shows (4.3) β n 2 ( δ a + δ b ) − δ ab = 1 2  S ( p ; a, b ) − S ( q ; a, b )  =: 1 2 ∆ S ab . By the triangle inequality , (4.4) | δ ab | ≤ β n 2  | δ a | + | δ b |  + 1 2 | ∆ S ab | . Summing (4.4) ov er all a < b yields (4.5) ∆ 2 ≤ β n 2 ( n − 1)∆ 1 + X a 0 , then ∆ k ∆ 1 ≤ β n ( n − k + 1) k · ∆ k − 1 ∆ 1 +  n − 1 k − 1  λ k − 1 n (1 − λ n ) n − k − 1 · K ( n ) k , wher e K ( n ) is as in L emma 4.3. Pr o of. Fix S ⊆ [ n ] with | S | = k and ﬁx i ∈ S . Using P S ( y ) = y i 1 − y i P S \{ i } ( y ) w e write δ S = p i 1 − p i P S \{ i } ( p ) − q i 1 − q i P S \{ i } ( q ) = p i 1 − p i  P S \{ i } ( p ) − P S \{ i } ( q )  +  p i 1 − p i − q i 1 − q i  P S \{ i } ( q ) = p i 1 − p i δ S \{ i } + p i − q i (1 − p i )(1 − q i ) P S \{ i } ( q ) . T aking absolute v alues and summing ov er i ∈ S gives k | δ S | ≤ X i ∈ S  p i 1 − p i   δ S \{ i }   + | x i | (1 − p i )(1 − q i ) P S \{ i } ( q )  . Summing further o ver all S with | S | = k yields k ∆ k ≤ X | S | = k X i ∈ S p i 1 − p i   δ S \{ i }   + X | S | = k X i ∈ S | x i | (1 − p i )(1 − q i ) P S \{ i } ( q ) . 10 ARIEL A VIT AL, AR YEH KONTOR OVICH, AND GEOR GE SALAF A TINOS F or the ﬁrst term, p i 1 − p i ≤ β n on [0 , λ n ], and a multiplicit y count giv es X | S | = k X i ∈ S   δ S \{ i }   = ( n − k + 1) X | T | = k − 1 | δ T | = ( n − k + 1)∆ k − 1 . Hence the ﬁrst term is at most β n ( n − k + 1)∆ k − 1 . F or the second term, re-index b y i and write T = S \ { i } (so | T | = k − 1 and i / ∈ T ): X | S | = k X i ∈ S | x i | (1 − p i )(1 − q i ) P S \{ i } ( q ) = n X i =1 | x i | (1 − p i )(1 − q i ) X T ⊆ [ n ] \{ i } | T | = k − 1 P T ( q ) . F or each such T , the factor (1 − q i ) app ears in P T ( q ) (since i / ∈ T ), so w e can cancel it: 1 (1 − p i )(1 − q i ) P T ( q ) = 1 1 − p i   Y j ∈ T q j     Y ℓ / ∈ T , ℓ  = i (1 − q ℓ )   . Th us X T ⊆ [ n ] \{ i } | T | = k − 1 1 (1 − p i )(1 − q i ) P T ( q ) = 1 1 − p i P   X ℓ  = i Z ℓ = k − 1   , where Z ℓ ∼ Ber( q ℓ ) are indep enden t. Since q ℓ ≤ λ n and λ n ≤ 1 /n = 1 / (( n − 1) + 1), Lemma 4.4 (with N = n − 1, λ = λ n , and m = k − 1 ≥ 1) gives P   X ℓ  = i Z ℓ = k − 1   ≤  n − 1 k − 1  λ k − 1 n (1 − λ n ) n − k . Moreo v er 1 − p i ≥ 1 − λ n , so 1 1 − p i ≤ 1 1 − λ n . Therefore the entire second term is b ounded b y  n − 1 k − 1  λ k − 1 n (1 − λ n ) n − k − 1 n X i =1 | x i | . Apply Lemma 4.3 to b ound P i | x i | ≤ K ( n )∆ 1 . Putting ev erything together, k ∆ k ≤ β n ( n − k + 1)∆ k − 1 +  n − 1 k − 1  λ k − 1 n (1 − λ n ) n − k − 1 K ( n ) ∆ 1 . Divide by k ∆ 1 (for ∆ 1 > 0) to obtain the claim. □ Deﬁne a n umerical sequence B k ( n ) by B 1 ( n ) := 1 and, for k ≥ 2, (4.6) B k ( n ) := n − k + 1 k (2 n − 1) B k − 1 ( n ) +  n − 1 k − 1   1 2 n − 1  k − 1 2 k . TV DIST ANCE FOR BERNOULLI PR ODUCTS 11 Corollary 4.6 (Univ ersal slice b ounds) . F or every k ∈ { 2 , . . . , n } and every p , q ∈ [0 , λ n ] n , ∆ k ( p , q ) ≤ B k ( n ) ∆ 1 ( p , q ) . Pr o of. If ∆ 1 = 0 then p = q b y Lemma 4.3, hence ∆ k = 0. If ∆ 1 > 0, Lemma 4.5 yields the recursion (4.6) as an upp er b ound, starting from B 1 ( n ) = 1. □ Lemma 4.7 (Closed form for B k ( n )) . F or every inte ger k with 2 ≤ k ≤ n , B k ( n ) = 2 k − 1 k ( k − 1)  n − 2 k − 2  n − 1 (2 n − 1) k − 1 . Pr o of. By induction on k . F or k = 2, B 2 ( n ) = 3 2 ·  n − 2 0  · n − 1 2 n − 1 = 3( n − 1) 2(2 n − 1) , matc hing Theorem 4.2. Assume the formula holds for k − 1 ≥ 2. Using (4.6) and  n − 1 k − 1  = n − 1 k − 1  n − 2 k − 2  , we compute B k ( n ) = n − k + 1 k (2 n − 1) · 2( k − 1) − 1 ( k − 1)( k − 2)  n − 2 k − 3  n − 1 (2 n − 1) k − 2 + 2 k  n − 1 k − 1  1 (2 n − 1) k − 1 = n − 1 k (2 n − 1) k − 1  ( n − k + 1)(2 k − 3) ( k − 1)( k − 2)  n − 2 k − 3  + 2 k − 1  n − 2 k − 2  . Using  n − 2 k − 3  =  n − 2 k − 2  · k − 2 n − k +1 , the brac ket b ecomes  n − 2 k − 2   2 k − 3 k − 1 + 2 k − 1  =  n − 2 k − 2  · 2 k − 1 k − 1 . Substituting yields the desired formula. □ Theorem 4.8 (Summation iden tit y) . F or every n ≥ 2 , n X k =2 B k ( n ) = n − 1 n . Pr o of. Set t := 1 2 n − 1 . Using Lemma 4.7 and the c hange of v ariables j = k − 2, w e obtain n X k =2 B k ( n ) = n − 2 X j =0 2 j + 3 ( j + 1)( j + 2)  n − 2 j  ( n − 1) t j +1 = t n n X k =2  n k  (2 k − 1) t k − 2 , where we used the identit y 1 ( j +1)( j +2)  n − 2 j  = 1 n ( n − 1)  n j +2  . 12 ARIEL A VIT AL, AR YEH KONTOR OVICH, AND GEOR GE SALAF A TINOS Let A ( t ) := P n k =0  n k  t k = (1 + t ) n . Then A ′ ( t ) = P n k =1 k  n k  t k − 1 = n (1 + t ) n − 1 . A short computation giv es n X k =0  n k  (2 k − 1) t k − 2 = 2 n t (1 + t ) n − 1 − 1 t 2 (1 + t ) n . Subtracting the k = 0 and k = 1 terms (equal to − t − 2 and nt − 1 ) yields n X k =2  n k  (2 k − 1) t k − 2 = 2 n t (1 + t ) n − 1 − 1 t 2 (1 + t ) n + 1 t 2 − n t . Multiplying by t/n gives n X k =2 B k ( n ) = 2(1 + t ) n − 1 − 1 − (1 + t ) n − 1 nt . Finally , with t = 1 2 n − 1 w e hav e 1 + t = 2 n 2 n − 1 and (1 + t ) n − 1 nt = 2 n − 1 n  2 n 2 n − 1  n − 1  = 2  2 n 2 n − 1  n − 1 − 2 n − 1 n . Substituting cancels the (1 + t ) n − 1 terms and yields n X k =2 B k ( n ) = − 1 + 2 n − 1 n = n − 1 n . □ Theorem 4.9. F or every p , q ∈ [0 , λ n ] n , n X k =2 ∆ k ( p , q ) ≤ n − 1 n ∆ 1 ( p , q ) . Pr o of. If ∆ 1 = 0 then p = q b y Lemma 4.3, so every ∆ k = 0. Assume ∆ 1 > 0. By Corollary 4.6, ∆ k / ∆ 1 ≤ B k ( n ) for k = 2 , . . . , n . Summing o v er k and using Theorem 4.8 giv es n X k =2 ∆ k ∆ 1 ≤ n X k =2 B k ( n ) = n − 1 n . Multiplying by ∆ 1 completes the pro of. □ Com bining Theorem 4.1 and Theorem 4.9 gives n X k =0 ∆ k ≤  1 + 2 n − 1 n + n − 1 n  ∆ 1 =  4 − 2 n  ∆ 1 , whic h immediately implies Theorem 1.2. TV DIST ANCE FOR BERNOULLI PR ODUCTS 13 Appendix A. An ℓ 2 bound f or quasi-symmetric p airs It is sho wn in [5, Theorem 1.2] that for symmetric pairs satisfying q = 1 − p , w e hav e TV (Ber( p ) , Ber( q )) ≤ ∥ p − q ∥ 2 . Here we record a simple extension to a larger class of quasi-symmetric pairs, with an additional factor √ 2. Deﬁnition A.1 (Quasi-symmetric pairs) . A pair ( u, v ) ∈ [0 , 1] 2 is quasi- symmetric if either u ≤ 1 2 ≤ v or v ≤ 1 2 ≤ u . V ectors p , q ∈ [0 , 1] n are quasi-symmetric if eac h co ordinate pair ( p i , q i ) is quasi-symmetric. Theorem A.2. If p , q ∈ [0 , 1] n ar e quasi-symmetric, then TV(Ber( p ) , Ber( q )) ≤ √ 2 ∥ p − q ∥ 2 . Remark: the c hoice p = (1 , 1), q =  1 2 , 1 2  sho ws that the optimal constant m ust b e at least 3 / √ 8. Lemma A.3 ([8], Disp. (2.20)) . F or any pr ob ability me asur es P , Q on a ﬁnite set Ω , TV( P , Q ) ≤ v u u t 1 − X ω ∈ Ω p P ( ω ) Q ( ω ) ! 2 . Lemma A.4 ([8], p. 83) . L et P = Ber( p ) and Q = Ber( q ) on { 0 , 1 } n . Then X x ∈{ 0 , 1 } n p P ( x ) Q ( x ) = n Y i =1  √ p i q i + p (1 − p i )(1 − q i )  . Lemma A.5. L et p, q ∈ [0 , 1] satisfy p ≥ 1 2 ≥ q and deﬁne b ( p, q ) := √ pq + p (1 − p )(1 − q ) . Then 1 − b ( p, q ) 2 ≤ 2( p − q ) 2 . Pr o of. A direct expansion shows the identit y (A.1) 1 − b ( p, q ) 2 =  p p (1 − q ) − p q (1 − p )  2 . Let A = p (1 − q ) and B = q (1 − p ). Then A − B = p − q . Using √ A − √ B = A − B √ A + √ B , and (A.1), w e obtain (A.2) 1 − b ( p, q ) 2 = ( p − q ) 2  p p (1 − q ) + p q (1 − p )  2 . It remains to low er b ound the denominator. Deﬁne unit v ectors in R 2 : u = ( √ p, p 1 − p ) , v = ( p 1 − q , √ q ) , ∥ u ∥ 2 = ∥ v ∥ 2 = 1 . 14 ARIEL A VIT AL, AR YEH KONTOR OVICH, AND GEOR GE SALAF A TINOS Then u · v = p p (1 − q ) + p q (1 − p ) . Let θ and ϕ b e the angles of u and v from the x -axis. Since p ≥ 1 / 2, w e hav e θ ∈ [0 , π / 4], and since q ≤ 1 / 2, w e hav e ϕ ∈ [0 , π / 4]. Therefore | θ − ϕ | ≤ π / 4, so u · v = cos( θ − ϕ ) ≥ cos( π / 4) = 1 √ 2 . Substituting into (A.2) yields 1 − b ( p, q ) 2 ≤ ( p − q ) 2 (1 / √ 2) 2 = 2( p − q ) 2 , as desired. □ Pr o of of The or em A.2. Since any pair p i , q i ma y b e simultaneously reﬂected ab out 1 / 2 without aﬀecting the TV, we ma y assume, for all i , p i ≥ 1 2 ≥ q i . By Lemmas A.3 and A.4, TV( P , Q ) ≤ v u u t 1 − n Y i =1 b 2 i , b i := √ p i q i + p (1 − p i )(1 − q i ) ∈ [0 , 1] . F or n um b ers y i ∈ [0 , 1] one has 1 − Q i y i ≤ P i (1 − y i ), so with y i = b 2 i , TV( P , Q ) ≤ v u u t n X i =1 (1 − b 2 i ) . By Lemma A.5 applied co ordinatewise (recall p i ≥ 1 2 ≥ q i ), we hav e 1 − b 2 i ≤ 2( p i − q i ) 2 . Therefore TV( P , Q ) ≤ v u u t n X i =1 2( p i − q i ) 2 = √ 2 ∥ p − q ∥ 2 , whic h completes the pro of. □ Ac kno wledgmen ts . This researc h w as supp orted in part b y the Israel Sci- ence F oundation ISF grant 581/25 and the Binational Science F oundation BSF grant 2024243. TV DIST ANCE FOR BERNOULLI PR ODUCTS 15 References [1] Arnab Bhattacharyy a, Sutanu Gay en, Kuldeep S. Meel, Dimitrios Myrisiotis, A. P av an, and N. V. Vino dchandran. On Approximating T otal V ariation Distance. In Pr o c e e dings of the Thirty-Se c ond International Joint Confer enc e on Artiﬁcial Intel- ligenc e, IJCAI 2023, 19th-25th August 2023, Mac ao, SAR, China , pages 3479–3487. ijcai.org, 2023. doi: 10.24963/IJCAI.2023/387. [2] W eiming F eng, Heng Guo, Mark Jerrum, and Jiaheng W ang. A simple polynomial- time approximation algorithm for the total v ariation distance betw een tw o product distributions. In 2023 Symp osium on Simplicity in Algorithms (SOSA) , pages 343– 347, 2023. doi: 10.1137/1.9781611977585.c h30. [3] W eiming F eng, Liqiang Liu, and Tianren Liu. On Deterministically Approx- imating T otal V ariation Distance. In Pro c e e dings of the 2024 Annual ACM- SIAM Symp osium on Discr ete Algorithms (SODA) , pages 1766–1791, 2024. doi: 10.1137/1.9781611977912.70. [4] Arnab Bhattacharyy a, Sutanu Gay en, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pa- v an, and N. V. Vino dc handran. T otal V ariation Distance Meets Probabilistic Infer- ence. In F orty-ﬁrst International Confer enc e on Machine L e arning , 2024. [5] Aryeh Kontoro vic h. On the tensorization of the v ariational distance. Ele ctr onic Com- munic ations in Pr ob ability 30: 1–10, 2025. doi: 10.1214/25-ECP680. [6] Aryeh Kontoro vic h. TV homogenization inequalities, preprint. math.PR, 2026. [7] Aryeh Kontoro vich and Ariel Avital. T otal v ariation o ver Bernoulli pro ducts: an O ( √ log n ) approximation, in preparation. 2026. [8] Alexandre B. Tsybako v. Intr o duction to Nonp ar ametric Estimation . Springer series in statistics. Springer, 2009. doi: 10.1007/B13794. Email addr ess : avitalq@post.bgu.ac.il Email addr ess : karyeh@cs.bgu.ac.il Email addr ess : georgesalafatinos@gmail.com

TV over Bernoulli products: the small parameter regime

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment