Scalable and Differentially Private Distributed Aggregation in the Shuffled Model

Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of ind…

Authors: Badih Ghazi, Rasmus Pagh, Ameya Velingker

Scalable and Differentially Private Distributed Aggregation in the   Shuffled Model
Scalable and Differ entially Priv ate Distrib uted Aggr egation in the Shuffled Model Badih Ghazi Google Research badihghazi@gmail.com Rasmus Pagh Google Research & IT Univ ersity of Copenhagen pagh@itu.dk Ameya V elingker Google Research ameyav@google.com Abstract F ederated learning promises to make machine learning feasible on distributed, pri- vate datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contrib utions of individual users. Current practical protocols for secure aggregation w ork in an “honest but curious” setting where a curious adversary observing all communication to and from the server cannot learn any priv ate information assuming the server is honest and follows the protocol. A more scalable and robust primiti v e for priv ac y-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets , hav e been proposed as a primitiv e for priv ac y-preserving analytics in the Encode-Shuf fle-Analyze framew ork by Bittau et al., which was later analytically studied by Erlingsson et al. and Cheu et al.. The recent papers by Cheu et al., and Balle et al. ha ve gi ven protocols for secure aggregation that achie ve dif fer ential privacy guarantees in this “shuf fled model”. Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial n Ω(1) in the number of users n . In this paper we propose simple and more efficient protocol for aggre gation in the shuffled model, where communication as well as error increases only polylogarith- mically in n . Our new technique is a conceptual “in visibility cloak” that makes users’ data almost indistinguishable from random noise while introducing zero distortion on the sum. 1 Introduction W e consider the problem of priv ately summing n numbers in the shuf fled model which is based on the Encode-Shuf fle-Analyze architecture of Bittau et al. [ 5 ] and was first analytically studied in [ 9 , 7 ]. For consistenc y with the literature we will use the term aggr e gation for the sum operation. Consider n users with data x 1 , . . . , x n ∈ [0 , 1] . In the shuffled model user i applies a randomized encoder algorithm E that maps x i to a multiset of m messages, E ( x i ) = { y i, 1 , . . . , y i,m } ⊆ Y , where m is a parameter . Then a trusted shuf fler S takes all nm messages and outputs them in random order . Finally , an analyzer algorithm A maps the shuf fled output S ( E ( x 1 ) , . . . , E ( x n )) to an estimate of P i x i . A protocol in the shuf fled model is ( ε, δ ) -differentially pri vate if S ( R 1 ( x 1 ) , . . . , R n ( x n )) is ( ε, δ ) - dif ferentially pri vate (see definition in Section 2.1), where probabilities are with respect to the random choices made in the algorithm E and the shuffler S . The priv acy claim is justified by the existence of highly scalable protocols for priv ately implementing the shuffling primiti ve [5, 7]. Preprint. Under revie w . T wo protocols for aggregation in the shuffl ed model were recently suggested by Balle et al. [ 4 ] and Cheu et al. [ 7 ]. W e discuss these further in Section 1.2, but note here that all previously known protocols have either communication or error that grows as n Ω(1) . This is unav oidable for single- message protocols, by the lower bound in [ 4 ], but it has been unclear if such a trade-of f is necessary in general. Cheu et al. [7] explicitly mention it as an open problem to in vestigate this question. 1.1 Our Results W e show that a trade-of f is not necessary — it is possible to a void the n Ω(1) factor in both the error bound and the amount of communication per user . The precise results obtained depend on the notion of “neighboring dataset” in the definition of dif ferential priv acy . W e consider the standard notion of neighboring dataset in differential pri vac y , that the input of a single user is changed, and show: Theorem 1. Let ε > 0 and δ ∈ (0 , 1) be any real number s. There e xists a pr otocol in the shuffled model that is ( ε, δ ) -differ entially private under single-user c hanges, has e xpected err or O ( 1 ε q log 1 δ ) , and wher e each encoder sends O (log( n εδ )) messages of O (log( n δ )) bits. W e also consider a different notion similar to the gold standard of secure multi-party computation: T wo datasets are considered neighboring if the their sums (taken after discretization) are identical. This notion turns out to allow much better pri vac y , e ven with zer o noise in the final sum — the only error in the protocol comes from representing the terms of the sum in bounded precision. Theorem 2. Let ε > 0 and δ ∈ (0 , 1) be any r eal numbers and let m > 10 log ( n εδ ) . Ther e e xists a pr otocol in the shuffled model that is ( ε, δ ) -differ entially private under sum-pr eserving changes, has worst-case err or 2 − m , and wher e each encoder sends m messages of O ( m ) bits. In addition to analyzing error and priv acy of our new protocol we consider its resilience tow ards untrusted users that may de viate from the protocol. While the shuffled model is vulnerable to such attacks in general [ 4 ], we argue in Section 2.5 that the pri vac y guarantees of our protocol are rob ust ev en to a large fraction of colluding users. For reasons of exposition we show Theorem 2 before Theorem 1. The technical ideas behind our new results are discussed in Section 1.3. Next, we discuss implications for machine learning and the relation to previous w ork. Concurrently and independently of our work, Balle et al. obtained a result similar to Theorem 1 [ 3 ]. Their algorithm is similar to ours and, as they point out, a similar algorithm was used by Ishai et al. in their work on cryptography from anon ymity [ 12 ]. Our priv acy analysis howe ver is different from theirs. In particular, they use a different noise distribution which leads to a better dependence on ε and δ , achieving a number of messages of O (log( n/δ )) , a message size of O (log n ) bits and an expected error of O (1 /ε ) . W e point out that following the appearance of this work, tighter quantitati ve bounds for the secure aggregation problem that was studied in [ 12 ] were independently obtained in [ 10 , 2 ]. Using a reduction of [ 3 ], these imply more efficient dif ferential priv acy protocols for aggregation in the shuffled model. Moreover , [ 10 ] obtained near tight lower bounds on the corresponding secure aggregation problem. W e refer to [10] for a comparison of the v arious followup w orks. 1.2 Discussion of Related W ork and Applications Our protocol is applicable in any setting where secure aggregation is applied. Below we mention some of the most significant examples and compare to e xisting results in the literature. Federated Lear ning. Our main application in a machine learning context is gradient descent-based federated learning [ 14 ]. The idea is to av oid collecting user data, and instead compute weight updates in a distributed manner by sending model parameters to users, locally running stochastic gradient descent on pri v ate data, and aggre gating model updates o ver all users. Using a secur e aggr egation protocol (see e.g. [ 6 ]) guards against information leakage from the update of a single user , since the server only learns the aggregated model update. A federated learning system based on these principles is currently used by Google to train neural networks on data residing on users’ phones [ 15 ]. Current practical secure aggregation protocols such as that of Bonawitz et al. [ 6 ] hav e user computa- tion cost O ( n 2 ) and total communication complexity O ( n 2 ) , where n is the number of users. This 2 Reference #messages / n Message size Expected error Privacy pr otection Cheu et al. [7] ε √ n m 1 1 ε log n δ √ n/m + 1 ε log 1 δ Single-user change Balle et al. [4] 1 log n n 1 / 6 log 1 / 3 (1 /δ ) ε 2 / 3 Single-user change New log( n εδ ) m > log( n εδ ) log( n δ ) m 1 ε q log 1 δ 2 − m Single-user change Sum-preserving change F igure 1: Comparison of differ entially private aggr egation pr otocols in the shuffled model with ( ε, δ ) -differ ential privacy . The number of user s is n , and m is an inte ger parameter . Message sizes ar e in bits; asymptotic notation is suppr essed for r eadability . W e consider two types of privacy pr otection, corr esponding to differ ent notions of “neighboring dataset” in differ ential privacy: In the first one, which was consider ed in pr evious papers, datasets ar e consider ed neighboring if they dif fer in the data of a single user . In the latter , datasets ar e considered neighboring if the y have the same sum. limits the number of users that can participate in the secure aggregation protocol. In addition, the priv acy analysis assumes of an “honest b ut curious” server that does not de viate from the protocol, so some level of trust in the secure aggregation server is required. In contrast, protocols based on shuf fling work with much weaker assumptions on the server [ 5 , 7 ]. In addition to this adv antage, total work and communication of our ne w protocol scales near-linearly with the number of users. Differentially Pri vate Aggregation in the Shuffled Model. It is known that gradient descent can work well e ven if data is accessible only in noised form, in order to achie ve dif ferential priv acy [ 1 ]. Note that in order to run gradient descent in a dif ferentially priv ate manner, pri vacy parameters need to be chosen in such a way that the combined pri v acy loss o ver man y iterations is limited. Each aggregation protocol sho wn in Figure 1 represents a different trade-of f, optimizing different parameters. Our protocols are the only ones that av oid n Ω(1) factors in both the communication per user and the error . Private Sketching and Statistical Learning . At first glance it may seem that aggregation is a rather weak primiti ve for combining data from many sources in order to analyze it. Howe ver , research in the area of data stream algorithms has uncovered man y non-trivial algorithms that are small linear sketc hes , see e.g. [ 8 , 16 ]. Linear sketches over the integers (or ov er a finite field) can be implemented using secure aggregation by computing linear sketches locally and summing them up over some range that is large enough to hold the sum. This unlocks many dif ferentially pri vate protocols in the shuf fled model, e.g. estimation of ` p -norms, quantiles, heavy hitters, and number of distinct elements. Second, as observed in [ 7 ] we can translate any statistical query over a distributed data set to an aggregation problem o ver numbers in [0 , 1] . That is, ev ery learning problem solvable using a small number of statistical queries [13] can be solved pri vately and ef ficiently in the shuffled model. 1.3 In visibility Cloak Protocol W e use a technique from protocols for secure multi-party aggregation (see e.g. [ 11 ]): Ensure that individual numbers passed to the analyzer are fully random by adding random noise terms, but coor dinate the noise such that all noise terms cancel, and the sum remain the same as the sum of the original data. Our new insight is that in the shuffled model the addition of zero-sum noise can be done without coordination between the users. Instead, each user individually produces numbers y 1 , . . . , y m that are are fully random except that they sum to x i , and pass them to the shuffler . This is visualized in Figure 2. Conceptually the noise we introduce acts as an in visibility cloak : The data is still there, possible to aggregate, b ut is almost impossible to gain any other information from. The details of our encoder is giv en as Algorithm 1. For parameters N , k , and m to be specified later it con verts each input x i to a set of random v alues { y 1 , . . . , y m } whose sum, up to scaling and rounding, equals x i . When the output of all encoders E N ,k,m ( x i ) is composed with a shuf fler this 3 Alice: x 1 x 1,1 y 1,3 y 1,2 y 2,1 y 2,3 y 2,2 y 3,1 y 3,3 y 3,2 y 4,1 y 4,3 y 4,2 y 2 x 3 x 4 x Bob: x 2 Clarice: x 3 David: x 4 Invisibility Cloak Analyzer π Invisibility Cloak Invisibility Cloak Invisibility Cloak F igure 2: Diagram of the In visibility Cloak Pr otocol for secure multi-party a ggr e gation directly gi ves dif ferential pri vac y with respect to sum-preserving changes of data (where the sum is considered after rounding). T o achie ve dif ferential priv acy with respect to single-user changes the protocol must be combined with a pre-randomizer that adds noise to each x i with some probability , see discussion in Section 2.4. Our analyzer is giv en as Algorithm 2. It computes ¯ z as the sum of the inputs (receiv ed from the shuffler) modulo N , which by definition of the encoder is guaranteed to equal the sum P i b x i k c of scaled, rounded inputs. If x 1 , . . . , x n ∈ [0 , 1] this sum will be in [0 , nk ] and ¯ z /k will be within n/k of the true sum P i x i . In the setting where a pre-randomizer adds noise to some inputs, howe ver , we may hav e z 6∈ [0 , nk ] in which case we round to the nearest feasible output sum, 0 or n . Privacy Intuition. The output of each encoder is very close to fully random in the sense that ev ery set of m − 1 values are independent and uniformly random. Only by summing exactly the outputs of an encoder (or sev eral encoders) do we get a value that is not uniformly random. On the other hand, many size- m subsets look like the output of an encoder in the sense that the sum of elements corresponds to a feasible v alue x i . In fact, something stronger is true: For e very possible input with the same sum as the true input (sum taken after scaling and rounding) we can, with high probability , find a splitting of the shuf fler’ s output consistent with that input. Furthermore, the number of such splittings is about the same for each potential input. Our technique can be compared to the recently proposed “priv acy blanket” [ 4 ], which introduces uniform, random noise to replace some inputs. Since that paper operates in a single-message model there is no possibility of ensuring perfect noise cancellation, and thus the number of noise terms needs to be kept small, which in turn means that a rather coarse discretization is required for dif ferential priv acy . Since the noise we add is zero-sum we can add much more noise, and thus we do not need a coarse discretization, ultimately resulting in much higher accuracy . Algorithm 1: In visibility Cloak Encoder Algorithm E N ,k,m ( x ) : Input: x ∈ R , integer parameters N , k , m ≥ 4 Output: Multiset { y 1 , . . . , y m } ⊆ { 0 , . . . , N − 1 } Let ¯ x ← b xk c for j = 1 , . . . , m − 1 do y j ← Uniform ( { 0 , . . . , N − 1 } ) y m ←  ¯ x − P m − 1 j =1 y j  mod N retur n { y 1 , . . . , y m } 4 Algorithm 2: Analyzer A N ,k,n ( y 1 , . . . , y mn ) : Input: ( y 1 , . . . , y nm ) ∈ { 0 , . . . , N − 1 } mn , integer parameters k , n , odd N > 3 nk Output: z ∈ [0 , n ] ¯ z ← P i y i mod N if ¯ z > 2 nk then return 0 ; else if ¯ z > nk then return n ; else retur n ¯ z /k ; 2 Analysis Overview . W e first consider priv acy with respect to sum-preserving changes to the input, arguing that observing the output of the shuffler giv es almost no information on the input, apart from the sum. Our proof strategy is to sho w priv acy in the setting of two players and then argue that this implies priv acy for n players, essentially because the two-player priv acy holds regardless of the behavior of the other players. In the two-player case we first ar gue that with high probability the outputs of the encoders satisfy a smoothness condition saying that e very potential input x 1 , x 2 to the encoders corresponds to roughly the same number of di visions of the 2 m shuffler outputs into sets of size m . Finally we argue that smoothness in conjunction with the 2 m elements being unique implies priv acy . 2.1 Preliminaries Notation. W e use Uniform ( R ) to denote a v alue uniformly sampled from a finite set R , and denote by S t the set of all permutations of { 0 , . . . , t − 1 } . Unless stated otherwise, sets in this paper will be multisets . It will be con venient to work with indexed multisets whose elements are identified by indices in some set I . W e can represent a multiset M ⊆ R with index set I as a function M : I → R . Multisets M 1 and M 2 with index sets I 1 and I 2 are considered identical if there exists a bijection π : I 1 → I 2 such that M 1 ( i ) = M 2 ( π ( i )) for all i ∈ I 1 . For disjoint I 1 and I 2 we define the union of M 1 and M 2 as the function defined on I 1 ∪ I 2 that maps i 1 ∈ I 1 to M 1 ( i 1 ) and i 2 ∈ I 2 to M 2 ( i 2 ) . Differential Pri vacy and the Shuffled Model. W e consider the established notion of differential priv acy , formalizing that the output distrib ution does not dif fer much between a certain data set and any “neighboring” dataset. Definition 1. Let A be a randomized algorithm taking as input a dataset and let ε ≥ 0 and δ ∈ (0 , 1) be given parameters. Then, A is said to be ( ε, δ ) -dif ferentially pri vate if for all neighboring datasets D 1 and D 2 and for all subsets S of the image of A , it is the case that Pr[ A ( D 1 ) ∈ S ] ≤ e ε · Pr[ A ( D 2 ) ∈ S ] + δ , wher e the pr obability is over the randomness used by the algorithm A . W e consider two notions of “neighboring dataset”: 1) That the input of a single user is changed, but all other inputs are the same, and 2) That the sum of user inputs is preserved. In the latter case we consider the sum after rounding to the nearest lower multiple of 1 /k , for a large integer parameter k , i.e., ( x 1 , . . . , x n ) ∈ [0 , 1] n is a neighbor of ( x 0 1 , . . . , x 0 n ) ∈ [0 , 1] n if and only if P i b x i k c = P i b x 0 i k c . (Alternativ ely , just assume that the input is discretized such that x i k is integer .) In the shuffled model, the algorithm that we want to show dif ferentially pri vate is the composition of the shuf fler and the encoder algorithm run on user inputs. In contrast to the local model of dif ferential priv acy , the outputs of encoders do not need to be dif ferentially pri vate. W e refer to [7] for details. 2.2 Common lemmas Let Y = { 0 , . . . , N − 1 } , and consider some index ed multiset E = { y 1 , . . . , y 2 m } ⊆ Y that can possibly be obtained as the union of the outputs of two encoders. Further , let I denote the collection of subsets of { 1 , . . . , 2 m } of size m . For each I ∈ I define X I ( E ) = P i ∈ I y i mod N . W e will be interested in the following property of a gi ven (fixed) multiset E : Definition 2. A multiset E = { y 1 , . . . , y 2 m } is γ -smooth if the distribution of values X I ( E ) for I ∈ I is close to uniform in the sense that Pr i ∈I [ X I ( E ) = x ] ∈  1 − γ N , 1+ γ N  for every x ∈ Y . 5 W e name the collection of multisets that are γ -smooth and contain 2 m distinct elements:  Y 2 m  γ -smooth = {{ y 1 , . . . , y 2 m } | { y 1 , . . . , y 2 m } is γ -smooth and y 1 , . . . , y 2 m are distinct } . Giv en x 1 , x 2 ∈ [0 , 1] such that x 1 k and x 2 k are integers, consider the multisets E N ,k,m ( x 1 ) = { y 1 , . . . , y m } and E N ,k,m ( x 2 ) = { y m +1 , . . . , y 2 m } , and let E ( x 1 , x 2 ) = { y 1 , . . . , y 2 m } be their multiset union. The multiset E ( x 1 , x 2 ) is a random variable due to the random choices made by the encoder algorithm. Lemma 1. F or every m ≥ 4 , γ > 6 √ m/ 2 2 m and for every choice of x 1 , x 2 ∈ Y we have Pr h E ( x 1 , x 2 ) 6∈  Y 2 m  γ -smooth i < 2 m 2 N + 18 √ m N 2 γ 2 2 2 m . Pr oof of Lemma 1. W e first upper bound the probability that the multiset E ( x 1 , x 2 ) has any duplicate elements. For i 6 = j consider the ev ent E i,j that y i = y j . Since m > 2 we have that every pair of distinct values y i , y j are uniform in Y and independent, so Pr( E i,j ) = 1 / N . A union bound ov er all  2 m 2  < 2 m 2 pairs yields an upper bound of 2 m 2 / N on the probability that there is at least one duplicate pair . Second, we bound the probability that E ( x 1 , x 2 ) is not γ -smooth. Let I 1 = { 1 , . . . , m } and I 2 = { m + 1 , . . . , 2 m } . Then by definition of the encoder, X I 1 ( E ( x 1 , x 2 )) = x 1 and X I 2 ( E ( x 1 , x 2 )) = x 2 with probability 1. For each I ∈ I \{ I 1 , I 2 } we have that X I is uniformly random in the range Y , over the randomness of the encoder . Furthermore, observe that the random variables { X I ( E ( x 1 , x 2 )) } I ∈I are pairwise independent. Let Z I ( x ) be the indicator random v ariable that is 1 if and only if X I ( E ( x 1 , x 2 )) = x . Let I 0 = I \{ I 1 , I 2 } . For each x ∈ Y and I ∈ I 0 we have E [ Z I ( x )] = 1 / |Y | = 1 / N . The sum Z ( x ) = P I ∈I Z I ( x ) equals the number of sets in I such that X I ( E ( x 1 , x 2 )) = x . Since Z I 1 ( x ) = 1 x 1 = x and Z I 2 ( x ) = 1 x 2 = x it will be helpful to disregard these fix ed terms in Z ( x ) . Thus we define Z 0 ( x ) = P I ∈I 0 Z I ( x ) , which is a sum of |I | − 2 pairwise independent terms, each with expectation E [ Z I ( x )] = 1 / N . Define µ = E [ Z 0 ( x )] = |I 0 | / N . W e bound the variance of Z 0 ( x ) : V ar ( Z 0 ( x )) = E   X I ∈I 0 ( Z I ( x ) − 1 N ) ! 2   = E " X I ∈I 0  Z I ( x ) − 1 N  2 # < E " X I ∈I 0 Z I ( x ) # = µ . The second equality uses that E [( Z I 1 ( x ) − 1 N )( Z I 2 ( x ) − 1 N )] = 0 for I 1 6 = I 2 because it is a product of two independent, zero-mean random variables. The inequality holds because Z I ( x ) is an indicator function. By Chebychev’ s inequality ov er the random choices in the encoder, for an y σ > 0 : Pr [ | Z 0 ( x ) − µ | > σ µ ] < V ar ( Z 0 ( x )) ( σ µ ) 2 < 1 σ 2 µ . (1) For m ≥ 4 we can bound |I | − 2 =  2 m m  − 2 as follo ws: 2 2 m − 1 / √ m <  2 m m  − 2 < 2 2 m / √ m Using this for upper and lower bounding µ in (1), and choosing σ = γ / 3 we get: Pr  | Z 0 ( x ) − µ | > γ 2 2 m / (3 N √ m )  < 18 √ mN µ 2 2 m . A union bound ov er all x ∈ Y implies that with probability at least 1 − 18 √ m N 2 γ 2 2 2 m : ∀ x ∈ Y : | Z 0 ( x ) − µ | ≤ γ 2 2 m / (3 N √ m ) (2) 6 Conditioned on (2) we hav e: Pr i ∈I [ X I ( E ( x 1 , x 2 )) = x ] = Z ( x ) / |I | ≤ ( Z 0 ( x ) + 2) / |I | ≤ µ + 2 + γ 2 2 m / (2 N √ m ) |I | ≤ 1 N + 2 + γ 2 2 m / (3 N √ m ) 2 2 m − 1 / √ m = 1 + √ m 2 2 m − 1 + 2 γ / 3 N ≤ 1 + γ N . The final inequality uses the assumption that γ > 6 √ m/ 2 2 m . A similar computation shows that conditioned on (2), Pr i ∈I [ X I ( E ( x 1 , x 2 )) = x ] ≥ 1 − γ N . Corollary 1. F or m ≥ 4 , and m = 3 d log N e , Pr  E ( x 1 , x 2 ) 6∈  Y 2 m  N − 1 -smooth  < 19 d log N e 2 N . Pr oof. W e in voke Lemma 1 with γ = N − 1 and m = 3 d log N e . The probability bound is 18 d log N e 2 N + 18 p 3 d log N e N 2 N − 2 2 6 d log N e < 18 d log N e 2 N + 18 d log N e N 2 . Because log N ≥ 3 and N ≥ 6 this sho ws the stated bound. Denote by E ( x 1 , x 2 ; y 1 , . . . , y m − 1 , y m +1 , . . . , y 2 m − 1 ) the sequence obtained by the deterministic encoding for given v alues y 1 , . . . , y m − 1 , y m +1 , . . . , y 2 m − 1 ∈ Y in Algorithm 1. Moreover , we denote by E ( x 1 , x 2 , y 1 , . . . , y m − 1 , y m +1 , . . . , y 2 m − 1 ) the corresponding multiset. Lemma 2. F or any y ∗ ∈  Y 2 m  and for any x 1 and x 2 , it is the case that Pr[ E ( x 1 , x 2 ) = y ∗ ] = 1 |Y | 2( m − 1) · X π ∈ S 2 m 1 E ( x 1 ,x 2 ; π ( y ∗ ) 1 ,...,π ( y ∗ ) m − 1 ,π ( y ∗ ) m +1 ,...,π ( y ∗ ) 2 m − 1 )= π ( y ∗ ) . Pr oof of Lemma 2. Using the f act that all the elements in y ∗ are distinct, we hav e that Pr[ E ( x 1 , x 2 ) = y ∗ ] = X y 1 ,...,y m − 1 , y m +1 ,...,y 2 m − 1 ∈Y 1 |Y | 2( m − 1) · 1 E ( x 1 ,x 2 ; y 1 ,...,y m − 1 ,y m +1 ,...,y 2 m − 1 )= y ∗ = 1 |Y | 2( m − 1) · X distinct y 1 ,...,y m − 1 y m +1 ,...,y 2 m − 1 ∈Y 1 E ( x 1 ,x 2 ; y 1 ,...,y m − 1 ,y m +1 ,...,y 2 m − 1 )= y ∗ = 1 |Y | 2( m − 1) · X distinct y 1 ,...,y m − 1 y m +1 ,...,y 2 m − 1 ∈Y X π ∈ S 2 m 1 E ( x 1 ,x 2 ; y 1 ,...,y m − 1 ,y m +1 ,...,y 2 m − 1 )= π ( y ∗ ) = 1 |Y | 2( m − 1) · X π ∈ S 2 m X distinct y 1 ,...,y m − 1 y m +1 ,...,y 2 m − 1 ∈Y 1 E ( x 1 ,x 2 ; y 1 ,...,y m − 1 ,y m +1 ,...,y 2 m − 1 )= π ( y ∗ ) = 1 |Y | 2( m − 1) · X π ∈ S 2 m 1 E ( x 1 ,x 2 ; π ( y ∗ ) 1 ,...,π ( y ∗ ) m − 1 ,π ( y ∗ ) m +1 ,...,π ( y ∗ ) 2 m − 1 )= π ( y ∗ ) 7 2.3 Analysis of Privacy under Sum-Pr eserving Changes Lemma 3. F or any y ∗ ∈  Y 2 m  γ -smooth and for all x 1 , x 2 , x 0 1 , x 0 2 that ar e inte ger multiples of 1 /k and that satisfy x 1 + x 2 = x 0 1 + x 0 2 , it is the case that Pr[ E ( x 1 , x 2 ) = y ∗ ] ≤ 1+ γ 1 − γ · Pr[ E ( x 0 1 , x 0 2 ) = y ∗ ] . Pr oof of Lemma 3. W e denote by P i y ∗ i := P i ∈ [2 m ] y ∗ i the sum of all elements in the set y ∗ . W e define B y ∗ ,x 1 := Number of subsets S of { 1 , . . . , 2 m } of size m for which X i ∈ S y ∗ i = x 1 k mod N . (3) W e similarly define B y ∗ ,x 0 1 by replacing x 1 in (3) by x 0 1 . Since y ∗ ∈  Y 2 m  , Lemma 2 implies that Pr[ E ( x 1 , x 2 ) = y ∗ ] = 1 |Y | 2( m − 1) · X π ∈ S 2 m 1 E ( x 1 ,x 2 ; π ( y ∗ ) 1 ,...,π ( y ∗ ) m − 1 ,π ( y ∗ ) m +1 ,...,π ( y ∗ ) 2 m − 1 )= π ( y ∗ ) = ( m !) 2 |Y | 2( m − 1) · B y ∗ ,x 1 · 1 P i y ∗ i = x 1 k + x 2 k . (4) Similarly , we hav e that Pr[ E ( x 0 1 , x 0 2 ) = y ∗ ] = ( m !) 2 |Y | 2( m − 1) · B y ∗ ,x 0 1 · 1 P i y ∗ i = x 0 1 k + x 0 2 k . (5) Since y ∗ is γ -smooth, Definition 2 implies that B y ∗ ,x 1 B y ∗ ,x 0 1 ≤ 1 + γ 1 − γ . (6) By Equations (4) and (5) and the assumption that x 1 + x 2 = x 0 1 + x 0 2 (as well as the assumption that x 1 , x 2 , x 0 1 , x 0 2 are all integer multiples of 1 /k ), we get that for e very γ -smooth y ∗ whose sum is not equal to x 1 k + x 2 k , it is the case that Pr[ E ( x 1 , x 2 ) = y ∗ ] = Pr[ E ( x 0 1 , x 0 2 ) = y ∗ ] = 0 , (7) and for e very γ -smooth y ∗ whose sum is equal to x 1 k + x 2 k , the ratio of Equations (4) and (5) along with (6) giv e that Pr[ E ( x 1 , x 2 ) = y ∗ ] ≤ 1 + γ 1 − γ · Pr[ E ( x 0 1 , x 0 2 ) = y ∗ ] . (8) Lemma 4. Suppose x 1 , x 2 , . . . , x n ∈ R and x 0 1 , x 0 2 , . . . , x 0 n ∈ R that ar e inte ger multiples of 1 /k satisfying x i = x 0 i for all i 6 = j 1 , j 2 , wher e 1 ≤ j 1 6 = j 2 ≤ n . Mor eover , suppose that for any set T consisting of multisets of 2 m elements fr om Y , we have the following guarantee: Pr[ E ( x j 1 , x j 2 ) ∈ T ] ≤ e ε · Pr[ E ( x 0 j 1 , x 0 j 2 ) ∈ T ] + δ (9) for some ε, δ > 0 . Then, it follows that for any set S of multisets consisting of mn elements fr om Y , Pr[ E ( x 1 , x 2 , . . . , x n ) ∈ S ] ≤ e ε · Pr[ E ( x 0 1 , x 0 2 , . . . , x 0 n ) ∈ S ] + δ. Pr oof of Lemma 4. W ithout loss of generality , assume j 1 = 1 and j 2 = 2 (by symmetry). Thus, x i = x 0 i for i = 3 , . . . , n . For ease of notation, let x = ( x 1 , x 2 , . . . , x n ) and x 0 = ( x 0 1 , x 0 2 , . . . , x 0 n ) . Suppose S is an arbitrary set of multisets of mn elements from Y . For any A ⊂ Y m , we let R S,A denote R S,A = [ T ∈ S T \ [ a ∈ A { a 1 , a 2 , . . . , a m } ! . 8 Then, we observe that Pr[ E ( x ) ∈ S ] = X y 3 ,...,y n ∈Y m Pr[ E ( x ) ∈ S | ∀ i > 2 , E ( x i ) = y i ] · n Y j =3 Pr[ E ( x j ) = y j ] = X y 3 ,...,y n Pr  E ( x 1 , x 2 ) ∈ R S, { y 3 ,y 4 ,...,y n }  · n Y j =3 Pr[ E ( x j ) = y j ] = X y 3 ,...,y n  e ε · Pr  E ( x 0 1 , x 0 2 ) ∈ R S, { y 3 ,y 4 ,...,y n }  + δ  · n Y j =3 Pr[ E ( x 0 j ) = y j ] (10) ≤ e ε · Pr [ E ( x 0 ) ∈ S ] + δ · X y 3 ,...,y n ∈Y   n Y j =3 Pr[ E ( x 0 j ) = y j ]   ≤ e ε · Pr [ E ( x 0 ) ∈ S ] + δ, where (10) follows from (9) and the fact that x i = x 0 i for i = 3 , 4 , . . . , n . This completes the proof. Lemma 5. Suppose x 1 , x 2 , . . . , x n ∈ R and x 0 1 , x 0 2 , . . . , x 0 n ∈ R such that x j 1 + x j 2 = x 0 j 1 + x 0 j 2 (each of these being an inte ger multiple of 1 /k ) and x i = x 0 i for all i 6 = j 1 , j 2 , where 1 ≤ j 1 6 = j 2 ≤ n . Then, for any set S of multisets consisting of mn elements fr om Y , we have Pr[ E ( x 1 , x 2 , . . . , x n ) ∈ S ] ≤ 1+ γ 1 − γ · Pr[ E ( x 0 1 , x 0 2 , . . . , x 0 n ) ∈ S ] + η , wher e η = 2 m 2 N + 18 √ mN 2 γ 2 2 2 m and γ > 6 √ m 2 2 m . Pr oof of Lemma 5. W ithout loss of generality , let j 1 = 1 and j 2 = 2 . W e no w consider any set T of multisets of 2 m elements from Y . Observe that Pr[ E ( x 1 , x 2 ) ∈ T ] ≤ Pr " E ( x 1 , x 2 ) 6∈  Y 2 m  γ -smooth # + Pr " E ( x 1 , x 2 ) ∈ T ∩  Y 2 m  γ -smooth # ≤ η + X A ∈ T ∩ ( Y 2 m ) γ -smooth Pr[ E ( x 1 , x 2 ) = A ] (11) ≤ η + X A ∈ T ∩ ( Y 2 m ) γ -smooth 1 + γ 1 − γ · Pr[ E ( x 0 1 , x 0 2 ) = A ] (12) ≤ η + 1 + γ 1 − γ · Pr[ E ( x 0 1 , x 0 2 ) ∈ T ] , where (11) and (12) follow from Lemma 1 and Lemma 3, respectively . The desired result no w follows from a direct application of Lemma 4. Using Lemma 5 as a b uilding block for analyzing differential priv acy guarantees in the conte xt of sum- preserving swaps , we can deriv e a differential pri vac y result with respect to general sum-preserving changes . Lemma 6. Suppose x = ( x 1 , x 2 , . . . , x n ) and x 0 = ( x 0 1 , x 0 2 , . . . , x 0 n ) have coor dinates that are inte ger multiples of 1 /k satisfying x 1 + x 2 + · · · + x n = x 0 1 + x 0 2 + · · · + x 0 n and x 0 can be obtained fr om x by a series of t sum-pr eserving swaps. Then, for any S , we have Pr[ E ( x 0 1 , x 0 2 , . . . , x 0 n ) ∈ S ] ≤ β t Pr[ E ( x 1 , x 2 , . . . , x n ) ∈ S ] + η · β t − 1 β − 1 , wher e η = 2 m 2 N + 18 √ mN 2 γ 2 2 2 m , γ > 6 √ m 2 2 m , and β = 1+ γ 1 − γ . Pr oof of Lemma 6. W e prove the lemma by induction on t . Note that the case t = 1 holds by Lemma 5. Now , for the inductiv e step, suppose the lemma holds for t = r . W e wish to sho w that it also holds for t = r + 1 . Note that there exists some x 00 ∈ Y n such that (1) x 00 can be obtained from x by a series of r sum-preserving swaps and (2) x 0 can be obtained from x 00 by a single sum-preserving swap. By the inductive hypothesis, we ha ve that Pr[ E ( x 00 1 , x 00 2 , . . . , x 00 n ) ∈ S ] ≤ β r Pr[ E ( x 1 , x 2 , . . . , x n ) ∈ S ] + η · β r − 1 β − 1 . (13) 9 Moreov er , by Lemma 4, we have that Pr[ E ( x 0 1 , x 0 2 , . . . , x 0 n ) ∈ S ] ≤ β Pr[ E ( x 00 1 , x 00 2 , . . . , x 00 n ) ∈ S ] + η . (14) Combining (13) and (14), we note that Pr[ E ( x 0 1 , x 0 2 , . . . , x 0 n ) ∈ S ] ≤ β  β r Pr[ E ( x 1 , x 2 , . . . , x n ) ∈ S ] + η · β r − 1 β − 1  + η ≤ β r +1 Pr[ E ( x 1 , x 2 , . . . , x n ) ∈ S ] + η · β r +1 − 1 β − 1 , which establishes the claim for t = r + 1 . As a consequence, we obtain the follo wing main theorem establishing dif ferential pri vac y of Algo- rithm 1 with respect to sum-preserving changes in the shuffled model: Theorem 2. Let ε > 0 and δ ∈ (0 , 1) be any r eal numbers and let m > 10 log ( n εδ ) . Ther e e xists a pr otocol in the shuffled model that is ( ε, δ ) -differ entially private under sum-pr eserving changes, has worst-case err or 2 − m , and wher e each encoder sends m messages of O ( m ) bits. Pr oof of Theorem 2. In Algorithm 1, each user communicates at most O ( m log N ) bits which are sent via m messages. Note that if x = ( x 1 , . . . , x n ) and x 0 = ( x 0 1 , . . . , x 0 n ) hav e coordinates that are integer multiples of 1 /k satisfying x 1 + · · · + x n = x 0 1 + · · · + x 0 n , then there is a sequence of t ≤ n − 1 sum-preserving swaps that allo ws us to transform x into x 0 . Thus, Lemma 6 implies that Algorithm 1 is ( ε, δ ) -differentially pri vate with respect to sum-preserving changes if (1+ γ ) n − 1 (1 − γ ) n − 1 ≤ e ε , and 2 m 2 N + 18 √ m N 2 γ 2 2 2 m ≤ δ , for an y γ > 6 √ m 2 2 m and m ≥ 4 . The error in our final estimate (which is due to rounding) is O ( n/k ) in the worst case. The theorem now follows by choosing m > 10 log  nk εδ  , γ = ε 10 n , k = 10 n and N being the first odd integer larger than 3 kn + 10 δ + 10 ε . 2.4 Analysis of Privacy under Single-User Changes The main idea is to run Algorithm 1 after having each player add some noise to her input, with some fixed probability independently from the other players. W e need the noise distribution to satisfy three properties: it should be supported on a finite interval, the logarithm of its probability mass function should hav e a small Lipschitz-constant (ev en under modular arithmetic) and its variance should be small. The following truncated version of the discrete Laplace distribution satisfies all three properties. Definition 3 (T runcated Discrete Laplace Distribution) . Let N be a positive odd inte ger and p ∈ (0 , 1) . The pr obability mass function of the truncated discr ete Laplace distrib ution D N ,p is defined by D N ,p [ k ] = (1 − p ) · p | k | 1 + p − 2 p N +1 2 (15) for every inte ger k in the range {− ( N − 1) 2 , . . . , + ( N − 1) 2 } . Lemma 7 (Log-Lipschitzness) . Let N be a positive odd integ er and p ∈ (0 , 1) a r eal number . Define the interval I = {− ( N − 1) 2 , . . . , + ( N − 1) 2 } . F or all k ∈ { 0 , . . . , N − 1 } and all t ∈ I , it is the case that p | t | ≤ D N,p [( k + t ) mod I ] D N,p [ k mod I ] ≤ p −| t | . Pr oof of Lemma 7. W e start by noting that (15) implies that D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p | ( k + t ) mod I | p | k mod I | . (16) 10 W e distinguish six cases depending on the values of k and k + t : Case 1 : 0 ≤ k ≤ N − 1 2 and − ( N − 1) 2 ≤ k + t ≤ − 1 . (17) Case 2 : 0 ≤ k ≤ N − 1 2 and 0 ≤ k + t ≤ N − 1 2 . (18) Case 3 : 0 ≤ k ≤ N − 1 2 and N + 1 2 ≤ k + t ≤ N − 1 . (19) Case 4 : N + 1 2 ≤ k ≤ N − 1 and 1 ≤ k + t ≤ N − 1 2 . (20) Case 5 : N + 1 2 ≤ k ≤ N − 1 and N + 1 2 ≤ k + t ≤ N − 1 . (21) Case 6 : N + 1 2 ≤ k ≤ N − 1 and N ≤ k + t ≤ N − 1 + N − 1 2 . (22) In Cases 1 , 2 and 3 , we hav e that 0 ≤ k ≤ N − 1 2 which implies that | k mod I | = k and hence the denominator in (16) satisfies p | k mod I | = p k . (23) Plugging (23) in (16), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p | ( k + t ) mod I | p k . (24) W e now separately examine each of these three cases. Case 1 1 1 . If (17) holds, then | ( k + t ) mod I | = − k − t and the numerator in (24) becomes p | ( k + t ) mod I | = p − k − t . (25) Plugging (25) in (24), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p − 2 k − t . (26) Using the facts that k + t < 0 and k ≥ 0 , and thus that t < 0 , we get that the quantity in (26) is at most p −| t | and at least p | t | . Case 2 2 2 . If (18) holds, then | ( k + t ) mod I | = k + t and the numerator in (24) becomes p | ( k + t ) mod I | = p k + t . (27) Plugging (27) in (24), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p t . Case 3 3 3 . If (19) holds, then | ( k + t ) mod I | = N − k − t and the numerator in (24) becomes p | ( k + t ) mod I | = p N − k − t . (28) Plugging (28) in (24), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p N − 2 k − t . (29) Using the fact that k + t ≥ N +1 2 which, along with the fact that k ≤ N − 1 2 , implies that t > 0 , we get that the quantity in (29) is at most p −| t | and at least p | t | . W e no w turn to Cases 4 , 5 and 6 . In these, N +1 2 ≤ k ≤ N − 1 , which implies that | k mod I | = N − k and hence the denominator in (16) satisfies p | k mod I | = p N − k . (30) Plugging (30) in (16), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p | ( k + t ) mod I | p N − k . (31) W e now separately examine each of these three cases. 11 Case 4 4 4 . If (20) holds, then | ( k + t ) mod I | = k + t and the numerator in (31) becomes p | ( k + t ) mod I | = p k + t . (32) Plugging (32) in (31), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p 2 k + t − N . (33) Using the facts that k + t ≤ N − 1 2 and k ≥ N +1 2 , we deduce that t < 0 and that the quantity in (33) is at most p −| t | and at least p | t | . Case 5 5 5 . If (21) holds, then | ( k + t ) mod I | = N − k − t and the numerator in (31) becomes p | ( k + t ) mod I | = p N − k − t . (34) Plugging (34) in (31), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p − t . Case 6 6 6 . If (22) holds, then | ( k + t ) mod I | = k + t − N and the numerator in (31) becomes p | ( k + t ) mod I | = p k + t − N . (35) Plugging (35) in (31), we get D N ,p [( k + t ) mod I ] D N ,p [ k mod I ] = p 2 k + t − 2 N . (36) Using the facts that k < N and k + t ≥ N , we get that t > 0 and that the quantity in (36) is at most p −| t | and at least p | t | . Lemma 8. Let N be a positive odd inte ger and p ∈ (0 , 1) a r eal number . Let X be a random variable drawn fr om the truncated discr ete Laplace distribution D N ,p . Then, the mean and variance of X satisfy E [ X ] = 0 and V ar [ X ] ≤ 2 p (1+ p ) (1 − p ) 2 (1+ p − 2 p ( N +1) / 2 ) . In order to prov e Lemma 8, we will need the simple fact gi ven in Lemma 9. Lemma 9. F or any p ∈ [0 , 1) , it is the case that ∞ X k =1 k 2 p k = p (1 + p ) (1 − p ) 3 . Pr oof of Lemma 9. For e very p ∈ [0 , 1) , we consider the geometric series f ( p ) := ∞ X k =1 p k . Differen- tiating and multiplying by p , we get pf 0 ( p ) = ∞ X k =1 k p k . Differentiating a second time and multiplying by p , we get p ( pf 0 ( p )) 0 = ∞ X k =1 k 2 p k . (37) Using the formula for a con ver gent geometric series, we have f ( p ) = p 1 − p . Plugging this expression in (37) and differentiating, we get ∞ X k =1 k 2 p k = p (1 + p ) (1 − p ) 3 . Pr oof of Lemma 8. W e have that E [ X ] = ( N − 1) / 2 X k = − ( N − 1) / 2 k · D N ,p [ k ] = ( N − 1) / 2 X k =1 k · ( D N ,p [ k ] − D N ,p [ − k ]) = 0 , (38) 12 where the last equality follows from the f act that D N ,p [ k ] = D N ,p [ − k ] for all k ∈ { 1 , . . . , ( N − 1) / 2 } (which directly follows from (15)). Using this same property along with (38), we also get that V ar [ X ] = E [ X 2 ] = ( N − 1) / 2 X k = − ( N − 1) / 2 k 2 · D N ,p [ k ] = 2 · ( N − 1) / 2 X k =1 k 2 · D N ,p [ k ] (39) Plugging the definition (15) of D N ,p [ k ] in (39), we get V ar [ X ] = 2(1 − p ) (1 + p )(1 + p − 2 p ( N +1) / 2 ) ( N − 1) / 2 X k =1 k 2 p k ≤ 2(1 − p ) (1 + p )(1 + p − 2 p ( N +1) / 2 ) ∞ X k =1 k 2 p k . (40) Applying Lemma 9 in (40) and simplifying, we get that V ar [ X ] ≤ 2 p (1+ p ) (1 − p ) 2 (1+ p − 2 p ( N +1) / 2 ) . The next lemma will be used to show that our algorithm is differentially priv ate with respect to single-user changes. Lemma 10. Let w 1 , w 2 be two independent random variable sampled fr om the truncated discr ete Laplace distribution D N ,p wher e N is any positive odd inte ger and p ∈ (0 , 1) is any real number , and let z 1 = w 1 k and z 2 = w 2 k . F or any y ∗ ∈  Y 2 m  γ -smooth and for all x 1 , x 2 , x 0 1 ∈ [0 , 1) , if we denote ˜ x 1 = b x 1 k c k , ˜ x 2 = b x 2 k c k and ˜ x 0 1 = b x 0 1 k c k , then Pr[ E ( ˜ x 1 , ˜ x 2 + z 2 ) = y ∗ ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 0 1 , ˜ x 2 + z 2 ) = y ∗ ] , (41) Pr[ E ( ˜ x 1 + z 1 , ˜ x 2 ) = y ∗ ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 0 1 + z 1 , ˜ x 2 ) = y ∗ ] . (42) and Pr[ E ( ˜ x 1 + z 1 , ˜ x 2 + z 2 ) = y ∗ ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 0 1 + z 1 , ˜ x 2 + z 2 ) = y ∗ ] . (43) Pr oof of Lemma 10. As in Lemma 7, we define the interv al I = {− ( N − 1) 2 , . . . , + ( N − 1) 2 } . W e define B y ∗ ,x 1 := Number of subsets S of { 1 , . . . , 2 m } of size m for which X i ∈ S y ∗ i = b x 1 k c mod N . (44) W e similarly define B y ∗ ,x 0 1 and B y ∗ ,x 2 by replacing x 1 in (44) by x 0 1 and x 2 respectiv ely . Proof of Inequality (41). By Lemma 2, we have that Pr[ E ( ˜ x 1 , ˜ x 2 + z 2 ) = y ∗ ] = 1 |Y | 2( m − 1) · X π ∈ S 2 m 1 E ( ˜ x 1 , ˜ x 2 + z 2 ; π ( y ∗ ) 1 ,...,π ( y ∗ ) m − 1 ,π ( y ∗ ) m +1 ,...,π ( y ∗ ) 2 m − 1 )= π ( y ∗ ) = ( m !) 2 |Y | 2( m − 1) · B y ∗ ,x 1 · Pr z 2 ∼D N,p [ z 2 = ( X i ∈ [2 m ] y ∗ i − b x 1 k c − b x 2 k c ) mod N ] (45) = ( m !) 2 |Y | 2( m − 1) · B y ∗ ,x 1 · D N ,p [( X i ∈ [2 m ] y ∗ i − b x 1 k c − b x 2 k c ) mod I ] . (46) 13 By Lemma 2, we also hav e that Pr[ E ( ˜ x 0 1 , ˜ x 2 + z 2 ) = y ∗ ] = 1 |Y | 2( m − 1) · X π ∈ S 2 m 1 E ( ˜ x 0 1 , ˜ x 2 + z 2 ; π ( y ∗ ) 1 ,...,π ( y ∗ ) m − 1 ,π ( y ∗ ) m +1 ,...,π ( y ∗ ) 2 m − 1 )= π ( y ∗ ) = ( m !) 2 |Y | 2( m − 1) · B y ∗ ,x 0 1 · Pr z 2 ∼D N,p [ z 2 = ( X i ∈ [2 m ] y ∗ i − b x 0 1 k c − b x 2 k c ) mod N ] = ( m !) 2 |Y | 2( m − 1) · B y ∗ ,x 0 1 · D N ,p [( X i ∈ [2 m ] y ∗ i − b x 0 1 k c − b x 2 k c ) mod I ] . (47) Since y ∗ is γ -smooth, Definition 2 implies that B y ∗ ,x 1 B y ∗ ,x 0 1 ≤ 1 + γ 1 − γ . (48) Applying Lemma 7 with k = P i ∈ [2 m ] y ∗ i − b x 0 1 k c − b x 2 k c and t = b x 0 1 k c − b x 1 k c and using the fact that x 1 , x 0 1 ∈ [0 , 1) giv es D N ,p [( P i ∈ [2 m ] y ∗ i − b x 1 k c − b x 2 k c ) mod I ] D N ,p [( P i ∈ [2 m ] y ∗ i − b x 0 1 k c − b x 2 k c ) mod I ] ≤ p −| b x 0 1 k c −b x 1 k c| ≤ p − k . (49) Dividing (46) by (47) and using (48) and (49), we get Inequality (41). Proof of Inequality (42). W e note that similarly to (47) we hav e Pr[ E ( ˜ x 1 + z 1 , ˜ x 2 ) = y ∗ ] = ( m !) 2 |Y | 2( m − 1) · | B x 2 | · D N ,p [( X i ∈ [2 m ] y ∗ i − b x 1 k c − b x 2 k c ) mod I ] , (50) and Pr[ E ( ˜ x 0 1 + z 1 , ˜ x 2 ) = y ∗ ] = ( m !) 2 |Y | 2( m − 1) · | B x 2 | · D N ,p [( X i ∈ [2 m ] y ∗ i − b x 0 1 k c − b x 2 k c ) mod I ] , (51) Dividing (50) by (51) and using (49), we get Inequality (42). Proof of Inequality (43). By av eraging over z 2 and applying Inequality (42) with ˜ x 2 replaced by ˜ x 2 + z 2 (for ev ery fixed setting of z 2 ), we get Inequality (43). Lemma 11. Let N be a positive odd integ er and p ∈ (0 , 1) and q ∈ (0 , 1] be real numbers. Let b 1 , . . . , b n be iid random variables that are equal to 1 with pr obability q and to 0 otherwise, let w 1 , . . . , w n be iid random variables that ar e drawn fr om the truncated discr ete Laplace distribution D N ,p independently of b 1 , . . . , b n , and let z i = b i w i k for all i ∈ [ n ] . Then, for all j ∈ [ n ] , all x 1 , . . . , x j , . . . , x n , x 0 j ∈ [0 , 1) , if we denote ˜ x i = b x i k c k for all i ∈ [ n ] and ˜ x 0 j = b x 0 j k c k , then for all S , the following inequality holds Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S ] ≤ 1 + γ 1 − γ · p − k 1 − e − q n · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S ] + η + e − q n , (52) for any γ > 6 √ m 2 2 m , m ≥ 4 and η = 2 m 2 N + 18 √ m N 2 γ 2 2 2 m , and where the pr obabilities in (52) ar e over z 1 , . . . , z n and the internal randomness of E ( · ) . Pr oof of Lemma 11. Let A denote the ev ent that there exists at least one i ∈ [ n ] for which b i = 1 . Then, Pr[ A ] = 1 − (1 − q ) n ≥ 1 − e − q n , (53) 14 where the last inequality follows from the fact that e t ≥ 1 + t for any real number t . T o prov e (52), it suffices to sho w a similar inequality conditioned on the ev ent A , i.e., Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | A ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S | A ] + η . (54) T o see this, denote by A the complement of the event A and assume that (54) holds. Then, Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S ] = Pr[ A ] · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | A ] + Pr[ A ] · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | A ] ≤ Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | A ] + e − q n (55) ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S | A ] + η + e − q n (56) ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S ] Pr[ A ] + η + e − q n ≤ 1 + γ 1 − γ · p − k 1 − e − q n · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S ] + η + e − q n , (57) where (55) and (57) follo w from (53), and where (56) follows from the assumption that (54) holds. W e thus turn to the proof of (54). Note that it suffices to pro ve this inequality for any fix ed setting of b 1 , . . . , b n satisfying the ev ent A , i.e., Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | b 1 , . . . , b n ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S | b 1 , . . . , b n ] + η , (58) and (54) would follo ws from (58) by averaging. Henceforth, we fix a setting of b 1 , . . . , b n satisfying the ev ent A . Without loss of generality , we assume that j = 1 . If b j = 0 , then the e vent A implies that there exists j 2 6 = j such that b j 2 = 1 . W ithout loss of generality , we assume that j 2 = 2 . In order to sho w (58) for this setting of b 1 , . . . , b n , it suf fices to sho w the same inequality where we also condition on any setting of w 3 , . . . , w n , i.e., Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | b 1 , . . . , b n , w 3 , . . . , w n ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S | b 1 , . . . , b n , w 3 , . . . , w n ] + η , (59) Applying Lemma 4 with j 1 = j = 1 and j 2 = 2 and with inputs ˜ x 0 3 + z 3 , . . . , ˜ x 0 n + z n for the non-selected players, we get that to pro ve (59), it suf fices to sho w that for any set T , the follo wing inequality holds Pr[ E ( ˜ x 1 + z 1 , ˜ x 2 + z 2 ) ∈ T ] ≤ 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 0 1 + z 1 , ˜ x 2 + z 2 ) ∈ T ] + η . (60) W e now prov e (60): Pr[ E ( ˜ x 1 + z 1 , ˜ x 2 + z 2 ) ∈ T ] ≤ Pr " E ( ˜ x 1 + z 1 , ˜ x 2 + z 2 ) 6∈  Y 2 m  γ -smooth # + Pr " E ( ˜ x 1 + z 1 , ˜ x 2 + z 2 ) ∈ T ∩  Y 2 m  γ -smooth # ≤ η + X A ∈ T ∩ ( Y 2 m ) γ -smooth Pr[ E ( ˜ x 1 + z 1 , ˜ x 2 + z 2 ) = A ] (61) ≤ η + X A ∈ T ∩ ( Y 2 m ) γ -smooth 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 0 1 + z 1 , ˜ x 2 + z 2 ) = A ] (62) ≤ η + 1 + γ 1 − γ · p − k · Pr[ E ( ˜ x 0 1 + z 1 , ˜ x 2 + z 2 ) ∈ T ] , 15 with η = 2 m 2 N + 18 √ m N 2 γ 2 2 2 m and where (61) follo ws by av eraging ov er all settings of z 1 , z 2 and in voking Lemma 1, and (62) follows from Lemma 10 and the f act that at least one of b 1 , b 2 is equal to 1 . As a consequence, we obtain the follo wing main theorem establishing dif ferential pri vac y of Algo- rithm 1 with respect to single-user changes in the shuffled model: Theorem 1. Let ε > 0 and δ ∈ (0 , 1) be any real number s. There e xists a pr otocol in the shuffled model that is ( ε, δ ) -differ entially private under single-user c hanges, has e xpected err or O ( 1 ε q log 1 δ ) , and wher e each encoder sends O (log( n εδ )) messages of O (log( n δ )) bits. Pr oof of Theorem 1. In Algorithm 1, each user communicates at most O ( m log N ) bits which are sent via m messages. By Lemma 11, Algorithm 1 is ( ε, δ ) -differentially priv ate with respect to single-user changes if 1+ γ 1 − γ · p − k 1 − e − qn ≤ e ε , and 2 m 2 N + 18 √ m N 2 γ 2 2 2 m + e − q n ≤ δ , for any γ > 6 √ m 2 2 m and m ≥ 4 . The error in our final estimate consists of two parts: the rounding error which is O ( n/k ) in the worst case, and the error due to the added folded Discrete Laplace noise whose av erage absolute value is at most O  √ q n 1 − p  (this follows from Lemma 8 along with the facts that the variance is additi ve for independent random v ariables, and that for any zero-mean random v ariable X , it is the case that E [ | X | ] ≤ p V ar [ X ] ). The theorem now follows by choosing p = 1 − ε 10 k , q = 10 log(1 /δ ) n , m = 10 log  nk εδ  , γ = ε 10 , k = 10 n and N being the first odd integer larger than 3 k n + 10 δ + 10 ε . 2.5 Resilience Against Colluding Users In this section, we formalize the resilience of Algorithm 1 against a very large fraction of the users colluding with with the server (thereby re vealing their inputs and messages). Lemma 12 (Resilient priv acy under sum-preserving changes) . Let C ⊆ [ n ] denote the subset of colluding users. Then, for all x 1 , . . . , . . . , x n and x 0 1 , . . . , . . . , x 0 n that ar e inte ger multiples of 1 /k in the interval [0 , 1) and that satisfy P j / ∈ C x j = P j / ∈ C x 0 j and x 0 j = x j for all j ∈ C , and for all subsets S , the following inequality holds Pr[ E ( x 1 , . . . , x n ) ∈ S | E ( x i ) ∀ i ∈ C ] ≤ β n − 1 · Pr[ E ( x 0 1 , . . . , x 0 n ) ∈ S | E ( x i ) ∀ i ∈ C ] + ( β n − 1 − 1) ( β − 1) · η , (63) for β = 1+ γ 1 − γ , any γ > 6 √ m 2 2 m , m ≥ 4 and η = 2 m 2 N + 18 √ m N 2 γ 2 2 2 m , and wher e the pr obabilities in (63) ar e over the internal randomness of E ( · ) . Lemma 13 (Resilient priv acy under single-user changes) . Let N be a positive odd inte ger and p ∈ (0 , 1) and q ∈ (0 , 1] be r eal numbers. Let C ⊆ [ n ] denote the subset of colluding users. Let b 1 , . . . , b n be iid random variables that are equal to 1 with pr obability q and to 0 otherwise, let w 1 , . . . , w n be iid random variables that ar e drawn from the folded discrete Laplace distribution D N ,p independently of b 1 , . . . , b n , and let z i = b i w i k for all i ∈ [ n ] . If | C | ≤ 0 . 9 n , then for all j / ∈ C , all x 1 , . . . , x j , . . . , x n , x 0 j ∈ [0 , 1) and all subsets S , if we denote ˜ x i = b x i k c k for all i ∈ [ n ] and ˜ x 0 j = b x 0 j k c k , then Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x j + z j , . . . , ˜ x n + z n ) ∈ S | E ( ˜ x i + z i ) ∀ i ∈ C ] ≤ 1 + γ 1 − γ · p − k 1 − e − q ( n −| C | ) · Pr[ E ( ˜ x 1 + z 1 , . . . , ˜ x 0 j + z j , . . . , ˜ x n + z n ) ∈ S | E ( ˜ x i + z i ) ∀ i ∈ C ] + η + e − q ( n −| C | ) , (64) for any γ > 6 √ m 2 2 m , m ≥ 4 and η = 2 m 2 N + 18 √ m N 2 γ 2 2 2 m , and where the pr obabilities in (64) ar e over z 1 , . . . , z n and the internal randomness of E ( · ) . 16 Pr oof of Lemma 12. W e start by applying Lemma 4 in order to condition on the messages of all the colluding users. This allows us to reduce to the case where the messages of all users in C are fixed and where we would like to pro ve the dif ferential priv acy guarantee with respect to single-user changes on the inputs of the smaller subset [ n ] \ C of (non-colluding) users. The rest of the proof follows along the same lines as the proof of Lemma 6 with an y modification in the bounds. Pr oof of Lemma 13. W e start by applying Lemma 4 in order to condition on the messages of all colluding users. This allo ws us to reduce to the case where the messages of all users in C are fixed and where we would like to prove differential priv acy guarantees with respect to sum-preserving changes on the smaller subset [ n ] \ C of (non-colluding) users. The rest of the proof follows along the same lines as the proof of Lemma 11. Note that that the tail probability term e − q n in (52) is replaced by the slightly larger quantity e − q ( n −| C | ) in (64) as the ev ent A in the proof of Lemma 11 has now to be defined ov er the smaller set [ n ] \ C of non-colluding users (and consequently the bounds in (53) and (57) are modified similarly). 3 Conclusion and Open Problems Our work pro vides further e vidence that the shuf fled model of dif ferential pri v acy [ 5 , 7 ] is a fertile ”middle ground” between local differential pri vacy and general multi-party computations, combining the scalability of local DP with the high utility and pri vac y of MPC. This makes it more feasible to design scalable machine learning systems in a federated setting. The main open problem that we leave is ho w many messages m are necessary to achiev e differential priv acy without a cost of n Ω(1) in error or communication. It is shown in [ 4 ] that m = 1 is not enough, but we cannot rule out that m = O (log n k ) suffices to achiev e error 1 /k under sum- preserving changes, using our protocol unchanged. Another issue is that our current protocol fails to provide priv acy with some small probability , for example if all random numbers chosen by the encoder happen to be zero. The question is whether the error probability can be eliminated by somehow changing the protocol, achie ving pure differential pri vacy . References [1] M. Abadi, A. Chu, I. Goodfellow , H. B. McMahan, I. Mironov , K. T alwar , and L. Zhang. Deep learning with differential priv acy . In Pr oceedings of the 2016 A CM SIGSAC Confer ence on Computer and Communications Security , pages 308–318. A CM, 2016. [2] B. Balle, J. Bell, A. Gascón, and K. Nissim. Improv ed summation from shuffling, http://arxiv .org/abs/1909.11225. [3] B. Balle, J. Bell, A. Gascon, and K. Nissim. Differentially Priv ate Summation with Multi- Message Shuffling. arXiv e-prints , page arXiv:1906.09116, Jun 2019. [4] B. Balle, J. Bell, A. Gascón, and K. Nissim. The priv acy blanket of the shuf fle model. CoRR , abs/1903.02837, 2019. [5] A. Bittau, Ú. Erlingsson, P . Maniatis, I. Mironov , A. Raghunathan, D. Lie, M. Rudominer , U. Kode, J. Tinnés, and B. Seefeld. Prochlo: Strong priv acy for analytics in the crowd. In Pr oceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017 , pages 441–459. A CM, 2017. [6] K. Bonawitz, V . Ivano v , B. Kreuter , A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth. Practical secure aggregation for priv acy-preserving machine learning. In B. M. Thuraisingham, D. Evans, T . Malkin, and D. Xu, editors, Pr oceedings of the 2017 A CM SIGSA C Confer ence on Computer and Communications Security , CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017 , pages 1175–1191. A CM, 2017. [7] A. Cheu, A. D. Smith, J. Ullman, D. Zeber , and M. Zhilyae v . Distributed dif ferential priv acy via shuf fling. In Y . Ishai and V . Rijmen, editors, Advances in Cryptology - EUR OCRYPT 2019 - 38th Annual International Confer ence on the Theory and Applications of Cryptographic T echniques, Darmstadt, Germany , May 19-23, 2019, Pr oceedings, P art I , volume 11476 of Lectur e Notes in Computer Science , pages 375–403. Springer , 2019. 17 [8] G. Cormode, M. Garofalakis, P . J. Haas, C. Jermaine, et al. Synopses for massi ve data: Samples, histograms, wa velets, sketches. F oundations and T r ends in Databases , 4(1–3):1–294, 2011. [9] Ú. Erlingsson, V . Feldman, I. Mironov , A. Raghunathan, K. T alwar , and A. Thakurta. Amplifi- cation by shuffling: From local to central differential pri vac y via anonymity . In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discr ete Algorithms , pages 2468–2479. SIAM, 2019. [10] B. Ghazi, P . Manurangsi, R. P agh, and A. V elingker . Priv ate aggre gation from fe wer anon ymous messages. arXiv pr eprint arXiv:1909.11073 , 2019. [11] S. Goryczka, L. Xiong, and V . Sunderam. Secure multiparty aggregation with differential priv acy: A comparative study . In Pr oceedings of the Joint EDBT/ICDT 2013 W orkshops , pages 155–163. A CM, 2013. [12] Y . Ishai, E. Kushile vitz, R. Ostrovsky , and A. Sahai. Cryptography from anonymity . In 2006 47th Annual IEEE Symposium on F oundations of Computer Science (FOCS’06) , pages 239–248. IEEE, 2006. [13] M. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (J ACM) , 45(6):983–1006, 1998. [14] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al. Communication-efficient learning of deep networks from decentralized data. arXiv preprint , 2016. [15] H. B. McMahan and D. Ramage. Federated learning: Collaborati ve machine learning without centralized training data. Google AI Blog, April 2017. https://ai.googleblog.com/2017/ 04/federated- learning- collaborative.html . [16] D. P . W oodruff et al. Sketching as a tool for numerical linear algebra. F oundations and T r ends in Theor etical Computer Science , 10(1–2):1–157, 2014. 18

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment