Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

We provide a detailed study of the estimation of probability distributions---discrete and continuous---in a stringent setting in which data is kept private even from the statistician. We give sharp minimax rates of convergence for estimation in these…

Authors: John C. Duchi, Michael I. Jordan, Martin J. Wainwright

Lo cal Priv acy and Minima x Bounds: Sharp Rate s for Probabilit y Esti mation John C. Duc hi † Mic hael I. Jordan † , ∗ Martin J. W ain wrigh t † , ∗ jduchi@e ecs.berkeley.ed u jordan@s tat.berkeley.edu wainwrig @stat.berkeley. ed u Departmen t of Statistics ∗ Departmen t of Electrical E n gi neering and Computer S ci ence † Univ ersit y of California, Berk eley Berk eley , CA, 94720 Abstract W e provide a detailed study of the estimation of pro babilit y d istributions—discrete and contin uous—in a s t ringent setting in which data is kept priv ate even from the sta tistician. W e giv e sharp minimax rates o f conv ergence for estimation in these loca lly priv ate settings , exhibiting fundamen tal tradeoffs betw een priv a cy and co nvergence rate, as w ell as pr oviding to ols to allow mov ement a long the priv acy-statistica l efficiency con tinuum. One of the conseque nc e s of our results is tha t W ar ner’s clas sical work on rando mized resp onse is an optimal wa y to per form survey s a mpling while ma in taining priv acy of the resp onden ts. 1 In tro duction The original motiv ation for pro viding priv acy in statistical problems, fi rst discussed b y W arner [ 27 ], was th a t “for reasons of mo dest y , fear of b eing though t bigoted, or merely a reluctance to confide secrets to strangers,” resp onden ts to surve ys migh t prefer to b e ab le to answ er certain questions non-truth fully , or at least without the interview er kno w in g their true r e sp onse. Wit h this motiv ation, W arn e r considered th e problem of estimating the fr a ctions of the p opulation b elonging to certain strata, w hic h can b e view ed as p robabilit y estimation within a multinomial mod el . In this pap er, w e revisit W arner ’s probabilit y estimation problem, doing so w i thin a theoretical framework that allo ws us to c haracterize optimal esti mation u nder constrain ts on priv acy . W e al so apply our theoretical tools to a f urther p r obabilit y estimat ion problem—that of nonp aramet ric density estimation. In the large b o dy of researc h on pr iv acy and statistical inference [e.g., 27 , 18 , 12 , 13 , 19 ], a ma jor fo cus has b een on the prob lem of redu ci ng d isclosur e r isk: the p robabilit y that a member of a dataset can b e iden tified giv en released statistics of the dataset. The literature h a s stopp ed short, h o wev er, of providing a formal treatment of d isc losure risk that w ould p ermit decision-theoretic to o ls to b e used in charac terizing tradeoffs b et we en th e utilit y of achieving priv acy and the u til it y asso ciated with an inferentia l goal. Rece n tly , a formal treatmen t of disclosure r isk kn o wn as “ differen tial priv acy” has b een pr o p osed and studied in the cryptography , database and th eoretical compu te r science literatures [ 15 , 2 , 14 ]. Different ial priv acy h as strong seman tic p riv acy guarantee s mak e it a go o d candidate for declaring a statistica l pro cedure or data collectio n mec hanism p riv ate, and has b een the fo cus of a gro wing b o dy of recent work [ 14 , 17 , 20 , 28 , 25 , 7 , 21 , 9 , 6 , 11 ]. In this p a p er, we bring together the formal treatmen t of disclosur e risk pro vided by d iffe ren tial priv acy with the to ols of minimax decision theory to provide a th e oretical treatmen t of p r o babilit y estimation un d er priv acy constraints. Ju st as in classical m in ima x theory , w e are able to pr ovide 1 lo wer b ounds on the con verge nce rates of any estimator, in our case under a restriction to estima- tors that guaran tee priv acy . W e co mplemen t these results with m a tc hing upp er b ounds that are ac hiev able using computationally efficien t algorithms. W e th u s bring classical notions of priv acy , as intro d uced b y W arner [ 27 ], int o conta ct with d iffe ren tial p riv acy an d statistical decision theory , obtaining quanti tativ e tradeoffs b et ween priv acy and statistical efficiency . 1.1 Setting and contributions Let us dev elop a bit of basic formalism b efore describing—at a high level —our main r esults. W e study procedu res that receiv e priv ate views Z 1 , . . . , Z n ∈ Z of an original set of observ ations, X 1 , . . . , X n ∈ X , w here X is the (kno wn) sample sp a ce. In our setting, Z i is d ra wn co nditional on X i via the channel distribution Q ( Z i | X i = x, Z j = z j , j 6 = i ). Note that this channel allo ws “in teractivit y” [ 15 ], m e aning that the d istribution of Z i ma y dep end on X i as w ell as the pr iv ate views Z j of X j for j 6 = i . Allo wing interac tivit y—rather than forcing Z i to b e indep endent of Z j —in some cases allo w s more efficient algorithms, and in our setting means that our lo wer b ounds are stronger. W e assume eac h of these priv ate views Z i is α -differen tially priv ate for the original data X i . T o giv e a p reci se definition for this t yp e of priv acy , kno wn as “local p riv acy ,” let σ ( Z ) b e th e σ -field on Z o ve r which the c hannel Q is defined. Then Q pro vid es α -lo c al differ ential privacy if sup  Q ( S | X i = x, Z j = z j , j 6 = i ) Q ( S | X i = x ′ , Z j = z j , j 6 = i ) | S ∈ σ ( Z ) , z j ∈ Z , and x, x ′ ∈ X  ≤ exp( α ) . (1) In the non-in teractiv e setting (in whic h w e imp o se th e constrain t that the pro viders of the d a ta release a priv ate view in depen d en tly of the other data pro viders) the expression ( 1 ) simplifies to sup S ∈ σ ( Z ) sup x,x ′ ∈X Q ( S | X = x ) Q ( S | X = x ′ ) ≤ exp( α ) , (2) a formulat ion of lo cal priv acy first p roposed by Evfimievski et al. [ 17 ]. Although m o re complex to analyze, the likel iho od ratio b ound ( 1 ) is attractiv e for many reasons. It means that any individual pro viding data guaran tees his or her o w n priv acy—no furth e r p r ocessing or mistak es b y a collec tion agency can compromise one’s data—and the individu al has plausible deniabilit y ab out taking a v alue x , since any outcome z is nearly as lik ely to ha v e come from s o me other initial v alue x ′ . The lik eliho o d ratio also cont rols the error rate in tests for the p r ese nce of p oin ts x in the data [ 28 ]. All that is required is that the like liho od r atio ( 1 ) b e b ound ed no matter the data provided by other participan ts. In the curr e n t pap er, we stud y minimax conv ergence r a tes when the data pro vided satisfies the lo c al pr iv acy guaran tee ( 1 ). Our tw o main r e sults quantify the p enalt y that m ust b e paid when lo c al p r iv acy at a lev el α is pro vided in multinomia l estimation and densit y estimation problems. A t a high lev el, our first result imp li es th a t for estimation of a d -dim en sio nal multi nomial probability mass function, the effectiv e sample size of any stat istical estimation pro cedure d ec reases from n to nα 2 /d wh e nev er α is a su fficie n tly small constan t. A consequence of our results is that W arner’s randomized resp onse pro cedure [ 27 ] enjo ys optimal sample complexit y; it is inte resting to note that ev en with the recent fo cus on pr iv acy an d statistical inference, th e optimal priv acy-preserving strategy for pr o blems suc h as surve y collecti on has b een k n o w n for almost 50 years. Our second main result, on den sit y estimation, exhibits an inte resting departur e from stand a rd minimax estimation results. If th e densit y b e ing estimated h a s β cont in u ous deriv ativ es, then 2 classical results on densit y estimation [e.g., 30 , 29 , 26 ]) show that the minimax in tegrate d squared error scales (in the n u m b er n of samples) as n − 2 β / (2 β +1) . In the lo cal ly priv ate case, w e sh o w that— ev en when dens i ties are b ounded and we ll-b eha v ed —there is a difference in th e p olynomial rate of con verge nce: we obtain a scaling of ( α 2 n ) − 2 β / (2 β +2) . W e giv e efficient ly imp le men table algorithms that attain sharp u pp er b ound s as companions to our lo wer b ounds, whic h in some cases exhibit the necessit y of non-trivial samp li ng strategies to guarantee priv acy . Notation: W e sum m a rize here th e n o tation u sed th r oughout the pap er. Given distribu tio ns P and Q defined on a space X , eac h absolutely contin uous with resp ect to a distribution µ (with corresp onding dens ities p and q ), th e KL-div ergence b et ween P and Q is d e fined b y D kl ( P k Q ) := Z X dP log dP dQ = Z X p lo g p q dµ. Letting σ ( X ) d enot e the (an app ropriate) σ -field on X , the total v ariation d istance b et ween the distributions P and Q on X is giv en b y k P − Q k TV := sup S ∈ σ ( X ) | P ( S ) − Q ( S ) | = 1 2 Z X | p ( x ) − q ( x ) | dµ ( x ) . F or random vect ors X and Y , where X is distrib uted according to the distribution P and Y | X is distributed according to Q ( · | X ), let M ( · ) = R Q ( · | x ) dP ( x ) denote the marginal distribu t ion of Y . The mutual information b et w een X and Y is I ( X ; Y ) := E P [ D kl ( Q ( · | X ) k M ( · ) )] = Z D kl ( Q ( · | X = x ) k M ( · )) dP ( x ) . A r a ndom v ariable Y has Laplace( α ) distribu tio n if its d ensit y p Y ( y ) = α 2 exp ( − α | y | ), where α > 0. F or matrices A, B ∈ R d × d , the notation A  B means B − A is p ositiv e semidefinite, and A ≺ B means B − A is p ositiv e definite. W e wr ite a n . b n to denote that a n = O ( b n ) and a n ≍ b n to denote that a n = O ( b n ) and b n = O ( a n ). F or a con vex set C ⊂ R d , we let Π C denote the orthogonal pro jection op erator ont o C , i.e., Π C ( v ) := argmin w ∈ C {k v − w k 2 } . 2 Bac kground and Problem F orm ulation In this section, w e pro vide th e necessary bac kgroun d on th e m inimax fr a mew ork used throughout the pap er. F u rther details on m inimax tec hniques can b e found in several standard sour c es [e.g., 3 , 29 , 30 , 26 ]. W e also reference our co mpanion pap er on parametric stati stical inference under differen tial p riv acy constraint s [ 11 ]; we mak e use of tw o theorems from that earlier p a p er, bu t in order to ke ep the cur ren t pap er self-con tained, w e restate them in this section. 2.1 Minimax framew ork Let P d en o te a class of distrib u tio ns on the sample sp a ce X , and let θ : P → Θ denote a fu nctio n defined on P . The range Θ dep ends on the un derlying statistical mo del; for example, for d e nsit y estimation, Θ may consist of the set of probabilit y densities defined on [0 , 1]. W e let ρ denote the semi-metric on the space Θ that w e use to measure the error of an estimator for θ , and Φ : R + → R + b e a non-decreasing f u nctio n with Φ ( 0) = 0 (for example, Φ( t ) = t 2 ). 3 Recalling that Z is the domain of the p riv ate v ariables Z i , let b θ : Z n → Θ denote an arbitrary estimator for θ . Let Q α denote the set of conditional (or c hann e l) distribu t ions guaran teeing α -lo ca l priv acy ( 1 ) ; then for any Q ∈ Q α w e can defin e the min ima x rate M n ( θ ( P ) , Φ ◦ ρ, Q ) := inf b θ sup P ∈P E P ,Q h Φ  ρ ( b θ ( Z 1 , . . . , Z n ) , θ ( P )) i (3a) asso ci ated with estimating θ based on th e priv ate samples ( Z 1 , . . . , Z n ). In the defi n iti on ( 3a ), the exp ecta tion is take n b oth with resp ect to the distrib utio n P on the v ariables X 1 , . . . , X n and the α -priv ate c h a nnel Q . By taking the infimum ov er all p ossible c hannels Q ∈ Q α , we obtain the cen tral ob ject of int erest f o r this pap er, the α -private minimax r ate for the f a mily θ ( P ), defined as M n ( θ ( P ) , Φ ◦ ρ, α ) := inf b θ ,Q ∈Q α sup P ∈P E P ,Q h Φ  ρ ( b θ ( Z 1 , . . . , Z n ) , θ ( P )) i . (3b) A standard route for low er b ounding the minimax risk ( 3a ) is b y reducing the estimation problem to the testing problem of iden tifying a p oin t θ ∈ Θ from a finite coll ection of w ell-separated p oin ts [ 3 0 , 29 ]. Gi v en an in dex set V of finite card in a lit y , the ind e xed family of distrib utio ns { P ν , ν ∈ V } ⊂ P is said to b e a 2 δ -pac kin g of Θ if ρ ( θ ( P ν ) , θ ( P ν ′ )) ≥ 2 δ for all ν 6 = ν ′ in V . The setup is that of a standard hyp othesis testing problem: nature chooses V ∈ V uniformly at random, then data ( X 1 , . . . , X n ) are dra w n from the n -fold cond i tional p rodu ct distribution P n ν , conditioning on V = ν . The problem is to identify the mem b er ν of the p a c king set V . In this work w e ha ve the additional complication that all the statistician observ es are th e priv ate samples Z 1 , . . . , Z n . T o th a t end, if w e let Q n ( · | x 1: n ) denote the conditional distribution of Z 1 , . . . , Z n giv en that X 1 = x 1 , . . . , X n = x n , w e defin e the marginal c han n el M n ν via the expression M n ν ( A ) := Z Q n ( A | x 1 , . . . , x n ) dP ν ( x 1 , . . . , x n ) for A ∈ σ ( Z n ) . (4) Letting ψ : Z n → V den ote an arbitrary testing pr ocedure—a measureable mapping Z n → V —we ha ve the follo wing minimax risk b ound, w hose t wo parts are kno wn as Le Cam’s t w o-p o in t m etho d and F ano’s inequalit y . In the lemma, we let P denote the join t distr ibution of the rand om v ariable V and the samples Z i . Lemma 1 (Minimax risk bou n d) . F or the pr eviously describ e d e stimation and testing pr oblems, we have the lower b ound M n ( θ ( P ) , Φ ◦ ρ, Q ) ≥ Φ( δ ) in f ψ P ( ψ ( Z 1 , . . . , Z n ) 6 = V ) , (5) wher e the infimum is taken over al l testing pr o c e dur es. F or a binary test sp e ci fie d by V = { ν, ν ′ } , inf ψ P ( ψ ( Z 1 , . . . , Z n ) 6 = V ) = 1 2 − 1 2 k M n ν − M n ν ′ k TV , (6a) and mor e gener al ly, inf ψ P ( ψ ( Z 1 , . . . , Z n ) 6 = V ) ≥  1 − I ( Z 1 , . . . , Z n ; V ) + log 2 log |V |  . (6b) F or Le Cam’s inequalit y ( 6a ), see, e.g., Lemma 1 of Y u [ 30 ] or Theorem 2.2 of Tsybak o v [ 26 ]; for F ano’s inequalit y ( 6b ), s ee Eq. (1) of Y ang and Barron [ 29 ] or C hapter 2 of Cov er an d Thomas [ 8 ]. 4 2.2 Information b oun ds The m a in step in p ro ving min ima x lo w er b o unds is to cont rol the dive rgences in vo lv ed in the lo wer b ounds ( 6a ) and ( 6b ). In our companion pap er [ 11 ], we pr e sen t t w o results, which we now review, in whic h b ound s on   M n ν − M n ν ′   TV and I ( Z 1 , . . . , Z n ; V ) are obtained as a fu nctio n of the amount of p riv acy pro vided and the d ista nces b et wee n the und erlying distribu tions P ν . The first resu lt [ 11 , Theorem 1 and Corollary 1] giv es con trol o ver pairwise KL-diverge nces b et we en the marginals ( 4 ), whic h lends itself to application of Le C a m’s metho d ( 6a ) and s imple applications of F ano’s inequalit y ( 6b ). T he second r e sult [ 11 , T heorem 2 and Corollary 4] pro vides a v ariational upp er b ound on the m u tual information I ( Z 1 , . . . , Z n ; V )—v ariational in the sens e that it r equires optimization o ver the s et of functions G α := n γ ∈ L ∞ ( X ) | sup x ∈X | γ ( x ) | ≤ 1 2 ( e α − e − α ) o . Here L ∞ ( X ) := { f : X → R | sup x ∈X | f ( x ) | < ∞} denotes the space of b ounded functions on X . Our b ound s apply to an y c hannel distribution Q th at is α -locally priv ate ( 1 ). F or eac h i ∈ { 1 , . . . , n } , let P ν,i b e the distribu tio n of X i conditional on th e r a ndom pac kin g elemen t V = ν , and let M n ν b e the m a rginal distribu tio n ( 4 ) indu c ed by passing X i through Q . Define the mixture distribution P i = 1 |V | P ν ∈V P ν,i , and the linear functionals ϕ ν,i : L ∞ ( X ) → R by ϕ ν,i ( γ ) := Z X γ ( x )  dP ν,i ( x ) − d P i ( x )  . With th i s notation we can state the follo wing p roposition, w hic h su mmarizes the resu lt s that we will need from Duchi et al. [ 11 ]: Prop o sition 1 (Information b ound s) . (a) F or al l α ≥ 0 , D kl ( M n ν k M n ν ′ ) + D kl ( M n ν ′ k M n ν ) ≤ 4( e α − 1) 2 n X i =1   P ν,i − P ν ′ ,i   2 TV . (7) (b) F or al l α ∈ [0 , lo g ( 1 2 + 1 2 √ 3)] , I ( Z 1 , . . . , Z n ; V ) ≤ C α n X i =1 1 |V | sup γ ∈G α X ν ∈V ( ϕ ν,i ( γ )) 2 , (8) wher e C α := 4 / ( e − α − 2( e α − 1)) . By com bining Prop osition 1 with Lemma 1 , it is possib l e to derive sharp lo wer b ounds on arbitrary estimation pro cedures u nder α -lo cal p riv acy . In p a rticular, we ma y apply the b ound ( 7 ) with Le Cam’s metho d ( 6a ), though lo we r b ounds so obtained often lack d ime nsion dep endence we migh t h o p e to capture (see Section 3.2 of Duc hi et al. [ 11 ] for more discussion of this issue). The b ound ( 8 ), whic h (up to constants) implies the b ound ( 7 ), allo ws more careful con trol via s uita bly constructed pac kin g sets V and application of F ano’s inequalit y ( 6b ), since the supr emum con trols a more global view of the stru ct ure of V . In the rest of this pap er, we demonstrate this com bination for probabilit y estimation p roblems. 5 3 Multinomial Estimati on und e r Lo cal Priv acy In this section w e return to the classical problem of a v oiding answ er bias in sur veys, the original motiv ation for studying lo cal p riv acy [ 27 ]. W e pro vide a d et ailed stud y of estimation of a multino- mial probability u nder α -lo c al d ifferential priv acy , pro viding sh a rp upp er and low er b ounds for the minimax rate of conv ergence. 3.1 Minimax rates of con v ergence for m ultinomial estimation Consider the probabilit y simplex ∆ d :=  θ ∈ R d | θ ≥ 0 , P d j =1 θ j = 1  in R d . The multinomial estimation problem is d efined as follo ws. Giv en a v ector θ ∈ ∆ d , samples X are dra wn i.i.d. fr o m a m u lt inomial with parameters θ , where P θ ( X = j ) = θ j for j ∈ { 1 , . . . , d } , and our goal is to estimate th e probability vect or θ . In one of the earliest ev aluations of priv acy , W arner [ 27 ] stud ie d the Bernoulli v ariant of this p roblem, and prop osed a simple pr i v acy-preservin g mechanism kn o wn as r andomize d r esp onse : f or a giv en surv ey question, resp ondents pro vide a truthfu l an s w er with probabilit y p > 1 / 2, and a lie with probabilit y 1 − p . In our setting, we assume that the statistician sees random v ariables Z i that are all α -lo ca lly priv ate ( 1 ) for the corresp ondin g samples X i from the multinomial. In this case, we ha v e the follo wing result, wh ich c h aracterizes the minimax rate of estimation of a multinomial in terms of mean-squared error E [ k b θ − θ k 2 2 ]. Theorem 1. Ther e exist universal c onstants 0 < c ℓ ≤ c u < 5 such that for al l α ∈ [0 , 1 / 4 ] , the minimax r ate for multinomial estimation satisfies the b ounds c ℓ max k ∈{ 1 ,...,d } min ( 1 k , k log d k nα 2 ) ≤ M n  ∆ d , k ·k 2 2 , α  ≤ c u min  1 , d nα 2  . (9) W e provi de a pro o f of the lo wer b ound in Theorem 1 in Section 5 . Simple estimation strategies ac hieve the lo w er b ound, and we b elie v e exploring them is interesting, so we pr o vid e them in the next section. Theorem 1 shows that pr o vid in g local priv acy can sometimes b e quite detrimenta l to the qualit y of statistical estimators. Indeed, let us compare this rate to the classical rate in whic h there is no priv acy . Then estimating θ via prop ortions (i.e., m a xim u m likeli ho od), w e ha ve E h k b θ − θ k 2 2 i = d X j =1 E h ( b θ j − θ j ) 2 i = 1 n d X j =1 θ j (1 − θ j ) ≤ 1 n  1 − 1 d  < 1 n . On the other h a nd, an appropriate c hoice of k in T h eo rem 1 implies that min  1 , 1 √ nα 2 , d nα 2  . M n  ∆ d , k ·k 2 2 , α  . min  1 , d nα 2  , (10) whic h we sho w in Section 5 . T h us, f o r suitably large samp le sizes n , the effect of pr o vid ing differ- en tial priv acy at a lev el α causes a reduction in the effectiv e sample size of n 7→ nα 2 /d . 6 3.2 Priv ate m ultinomial estimation strategies An in teresting consequence of the lo wer b ound in ( 9 ) is the follo wing fact that we n ow demon- strate: W arner’s classical randomized resp onse mec han ism [ 27 ] (with minor mo dification) ac hiev es the optimal con v ergence rate. Thus, although it was n ot originally r e cognized as su c h , W arner’s prop osal is actually optimal in a strong sense. T h ere are also other relativ ely simple estima tion strategies that ac hieve con verge nce rate d/nα 2 ; for instance, as we sho w, the Laplace p erturbation approac h prop osed by Dw ork et al. [ 15 ] is one suc h strategy . Nonetheless, its ease of use coupled with our optimalit y results provide supp ort for randomized resp onse as a preferred metho d f o r priv ate estimation of p opulation probabilities. Let u s no w prov e th a t these strategies attain the optimal rate of con v ergence. Since th e re is a bijection b et ween m u lti nomial samples x ∈ { 1 , . . . , d } and the d stand a rd basis vect ors e 1 , . . . , e d ∈ R d , we abuse n otation and represen t samp les x as either when d esignin g estimation strategie s. Randomized resp onse: In rand o mized resp onse, w e co nstruct the priv ate v ector Z ∈ { 0 , 1 } d from a m ultinomial samp l e x ∈ { e 1 , . . . , e d } b y sampling d co ordinates indep enden tly via the pro cedure [ Z ] j = ( x j with probabilit y exp( α/ 2) 1+exp( α/ 2) 1 − x j with probabilit y 1 1+exp( α/ 2) . (11) W e claim th a t this c h a nnel ( 11 ) is α -differen tially priv ate: ind ee d, n o te that for an y x, x ′ ∈ ∆ d and an y v ector z ∈ { 0 , 1 } d w e ha ve Q ( Z = z | x ) Q ( Z = z | x ′ ) =  exp( α/ 2) 1+exp( α/ 2)  k z − x k 1  1 1+exp( α/ 2)  d −k z − x k 1  exp( α/ 2) 1+exp( α/ 2)  k z − x ′ k 1  1 1+exp( α/ 2)  d −k z − x ′ k 1 = exp  α 2  k z − x k 1 −   z − x ′   1   ∈ [exp( − α ) , exp ( α )] , where we used the triangle inequalit y to assert that | k z − x k 1 − k z − x ′ k 1 | ≤ k x − x ′ k 1 ≤ 2. W e can compute the exp ected v alue and v ariance of the r a ndom v ariables Z ; indeed, by d efinition ( 11 ) E [ Z | x ] = e α/ 2 1 + e α/ 2 x + 1 1 + e α/ 2 ( 1 − x ) = e α/ 2 − 1 e α/ 2 + 1 x + 1 1 + e α/ 2 1 . Since the Z are Bernoulli, we obtain the v ariance b ound E [ k Z − E [ Z ] k 2 2 ] < d/ 4 + 1. Recalling the definition of the pr o jecti on Π ∆ d on to the simplex, w e arriv e at th e natural estimator b θ part := 1 n n X i =1  Z i − 1 / (1 + e α/ 2 )  e α/ 2 + 1 e α/ 2 − 1 and b θ := Π ∆ d  b θ part  . ( 12) The pro j e ction of b θ part on to the probabilit y simplex can b e done in time linear in the dimension d of the p roblem [ 4 ], so th e estimator ( 12 ) is efficientl y computable. Sin ce pro jections only decrease distance, vec tors in the simplex are at m o st distance √ 2 apart, and E θ [ b θ part ] = θ by construction, w e find that E h k b θ − θ k 2 2 i ≤ min n 2 , E h k b θ part − θ k 2 2 io ≤ min  2 ,  d 4 n + 1 n  e α/ 2 + 1 e α/ 2 − 1 ! 2  . min  1 , d nα 2  . 7 Laplace p erturbation: W e no w turn to the strateg y of Dwork et al. [ 15 ], where we add Laplacia n noise t o th e d at a. Let the v ector W ∈ R d ha ve in depen den t co ordinates, eac h distributed as Laplace( α/ 2). Then x + W is α -differentiall y priv ate for x ∈ ∆ d : the ratio of the dens ities q ( · | x ) and q ( · | x ′ ) of x + W and x ′ + W is q ( z | x ) q ( z | x ′ ) = exp( − ( α/ 2) k x − z k 1 ) exp( − ( α/ 2) k x ′ − z k 1 ) ∈ [exp( − α ) , exp ( α )] . By d e fining th e priv ate data Z i = X i + W i , where W i ∈ R d are ind epend e n t, w e ca n define the partial estimator b θ part = 1 n P n i =1 Z i and the pro jected estimator b θ := Π ∆ d ( b θ part ), similar to our randomized resp onse construction ( 12 ). T hen by computing the v ariance of the noise samp le s W i , it is clear that E h k b θ − θ k 2 2 i ≤ min n 2 , E h k b θ part − θ k 2 2 io ≤ min  2 , 1 n + 4 d nα 2  . min  1 , d nα 2  . F or s m a ll α , the Laplace p erturbation app roac h has a s omewh a t sharp er con v ergence rate in terms of constan ts than that of the rand omiz ed resp onse estimator ( 12 ), so in some cases it ma y b e preferred. Nonetheless, th e simplicit y of explaining th e sampling pro cedure ( 11 ) ma y argue for its use in scenarios such as surv ey samp ling. 4 Densit y Estimation under Lo cal Priv acy In this section, we turn to studying a non p aramet ric statistical problem in wh ic h th e effects of lo cal differen tial priv acy turn out to b e somewhat more sev ere. W e sho w that for the problem of densit y estimation, instead of ju st multi plicativ e loss in the effectiv e sample size as in previous section (see also our p aper [ 11 ]), imp osing lo cal differential pr iv acy leads to a completely differen t con vergence rate. This result holds ev en though w e solv e an estimation p roblem in whic h the fun c tion estimated and the samples thems elves b elong to compact sp aces. In more detail, w e consid er estimation of pr ob ab ility dens i ties f : R → R + , R f ( x ) dx = 1 and f ≥ 0, defined on the real line, fo cusing on a standard family of d e nsities of v aryin g smo othness [e.g. 26 ]. Th roughout this section, we let β ∈ N d enot e a fixed p ositiv e integ er. Roughly , w e consider densities that hav e b ound ed β th d e riv ativ e, and w e study density estimatio n using the squared L 2 -norm k f k 2 2 := R f 2 ( x ) dx as our metric; in formal terms, we imp ose these constraints in terms of Sob olev classes (e.g. [ 26 , 16 ]). L et the countable collec tion of fun ctions { ϕ j } ∞ j =1 b e an orthonormal basis for L 2 ([0 , 1]). Then an y fu nctio n f ∈ L 2 ([0 , 1]) ca n b e expanded as a sum P ∞ j =1 θ j ϕ j in terms of the basis co efficien ts θ j := R f ( x ) ϕ j ( x ) dx . By P arsev al’s theorem, we are guarant eed that { θ j } ∞ j =1 ∈ ℓ 2 ( N ). The Sob olev sp ac e F β ,C is obtained by enforcing a particular deca y r a te on the co e fficien ts θ : Definition 1 (Elliptical Sob olev sp a ce) . F or a giv en orthonormal basis { ϕ j } of L 2 ([0 , 1]), smo oth- ness parameter β > 1 / 2 and radius C , the fun c tion class F β ,C is giv en by F β ,C :=  f ∈ L 2 ([0 , 1]) | f = ∞ X j =1 θ j ϕ j suc h that ∞ X j =1 j 2 β ϕ 2 j ≤ C 2  . 8 If w e choose the trignometric b a sis as our orthon orm a l b a sis, then membersh ip in the class F β ,C corresp onds to certain smo othness constraints on th e d e riv ativ es of f . More pr ec isely , f o r j ∈ N , consider the orthonormal basis f o r L 2 ([0 , 1]) of trigonometric f unctio ns: ϕ 0 ( t ) = 1 , ϕ 2 j ( t ) = √ 2 cos (2 π j t ) , ϕ 2 j +1 ( t ) = √ 2 sin(2 π j t ) . (13) No w consider a β -times almost ev erywhere different iable function f for whic h | f ( β ) ( x ) | ≤ C for almost ev ery x ∈ [0 , 1] satisfying f ( k ) (0) = f ( k ) (1) for k ≤ β − 1. T h en it is known [ 26 , Lemma A.3] that, uniformly for suc h f , there is a universal constant c suc h that that f ∈ F β ,cC . Th us, Definition 1 (essential ly) captures densities that ha ve Lipschitz -con tinuous ( β − 1)th deriv ativ e. In the s equ e l, w e wr ite F β when the b ound C in F β ,C is O (1). It is well kno wn [ 30 , 29 , 26 ] that the minimax risk for non-p r iv ate estimation of densities in the class F β scales as M n  F β , k ·k 2 2 , ∞  ≍ n − 2 β 2 β +1 . (14) The goal of this section is to u nderstand ho w this minimax rate c hanges wh en we add an α -priv acy constrain t to the problem. Our main result is to demonstrate that the classical rate ( 14 ) is no longer attainable when we require α -lo cal differentia l priv acy . In particular, w e p ro v e a lo wer b ound that is substan tially large r. In Sections 4.2 and 4.3 , w e sho w how to ac hieve this lo wer b ound using histogram and orthogonal series estimators. 4.1 Lo wer b ounds on density estimation W e b egin b y giving our main low er b ound on the minimax rate of estimation of densities when samples from the d en sit y m u st b e k ept differentia lly priv ate. W e p ro vide the p roof of the follo win g theorem in Section 6.1 . Theorem 2. Consider the class of densities F β define d using the trigonometric b asis ( 13 ) . F or some α ∈ (0 , 1 / 4] , supp ose Z i ar e α -lo c al ly pr ivate ( 1 ) for the sam ples X i ∈ [0 , 1] . Ther e exists a c onstant c > 0 , dep endent only on β , such that M n  F β , k ·k 2 2 , α  ≥ c  nα 2  − 2 β 2 β +2 . (15) In comparison with the classical minimax r ate ( 14 ), the lo w er b ound ( 15 ) is su b stan tially different , in that it inv olv es a differen t p olynomial exp onen t: namely , the exp o nen t is 2 β / (2 β + 1) in the classical case ( 14 ), w hile in the differen tially priv ate case ( 15 ), the exp onen t has b een reduced to 2 β / (2 β + 2). F or example, when we estimat e Lipsc h itz densities, we ha v e β = 1, and the r a te degrades from n − 2 / 3 to n − 1 / 2 . Moreo ver, t his degradation o c curs even though our samples are dra wn from a compact sp a ce and the set F β is also compact. In terestingly , no estimator b ase d on Laplace (or exponential ) p erturbation of the samples X i themselv es can attain the r a te of conv ergence ( 15 ). T h is can b e established by connecting suc h a p erturbation-based approac h t o the pr o blem of nonparametric d e con vol ution. In their study of the decon vo lution pr o blem, Carroll and Hall [ 5 ] sho w that if samples X i are p erturb ed b y additiv e noise W , where the charac teristic fu nctio n φ W of the add it iv e noise has tails b eha vin g as | φ W ( t ) | = O ( | t | − a ) f or some a > 0, then no estimator can deconv olv e the samples X + W and attain a rate of con v ergence b etter than n − 2 β / (2 β +2 a +1) . Since the Laplac e distribution’s characte ristic function has tails deca ying as t − 2 , no estimator based on p erturbing the samples directly can attain a r a te of con vergence b etter than n − 2 β / (2 β +5) . I f the lo wer b ound ( 15 ) is attainable, w e must then study priv acy mec hanisms that are not simply based on d i rect p erturb a tion of the samples { X i } n i =1 . 9 4.2 Ac hiev abilit y b y histogram estimators W e no w turn to the m ean-squ a red errors ac hiev ed b y sp ecific practical schemes, b eginning with the sp ecial case of L ip sc h itz d e nsit y fu ncti ons ( β = 1), for w h ic h it su ffices to consider a priv ate v er s i on of a classical h istogram estimate . F or a fixed p ositiv e intege r k ∈ N , let {X j } k j =1 denote the partition of X = [0 , 1] in to the in terv als X j = [( j − 1) /k , j /k ) for j = 1 , 2 , . . . , k − 1, and X k = [( k − 1) /k , 1] . An y histogram estimate of the den sit y b a sed on th e se k bins can b e sp ecified b y a v ector θ ∈ k ∆ k , where we recall ∆ k ⊂ R k + is the probability simplex. An y suc h v ector d e fines a densit y estimate via the su m f θ := k X j =1 θ j 1 X j , where 1 E denotes the charact eristic (indicator) f u nctio n of the set E . Let u s now describ e a mechanism that guaran tees α -lo cal differen tial priv acy . Given a data set { X 1 , . . . , X n } of samples from the distr ibution f , consider the v ectors Z i := e k ( X i ) + W i , for i = 1 , 2 , . . . , n, (16) where e k ( X i ) ∈ ∆ k is a k -v ector with the j th en tr y equal to one if X i ∈ X j , and zero es in all other ent ries, and W i is a random vecto r with i.i.d. Laplace ( α/ 2) entries. Th e v ariables { Z i } n i =1 so-defined are α -lo cally different ially priv ate for { X i } n i =1 . Using these priv ate v ariables, we then form the densit y estimate b f := f b θ = P k j =1 b θ j 1 X j based on the ve ctor b θ := Π k  k n n X i =1 Z i  , (17) where Π k denotes the Euclidean pr o jecti on op erator on to the set k ∆ k . By construction, we ha ve b f ≥ 0 and R 1 0 b f ( x ) dx = 1, so b f is a v alid densit y estimate. Prop o sition 2. Consider the estimate b f b ase d on k = ( nα 2 ) 1 / 4 bins in the histo gr am. F or any 1 -Lipschitz density f : [0 , 1] → R + , we have E f h   b f − f   2 2 i ≤ 5( α 2 n ) − 1 2 + √ αn − 3 / 4 . (18) F or any fi xed α > 0, the fi rst term in the b ound ( 18 ) dominates, and the O (( α 2 n ) − 1 2 ) rate matc h e s the minimax lo wer b ound ( 15 ) in the case β = 1. Consequent ly , we hav e s ho w n that the p riv atized histogram estimat or is minimax-optimal for Lipschitz densities. This resu lt p ro vides the priv ate analog of the classical result that histogram estimators are minimax-optimal (in the non-priv ate setting) f or Lipshitz densities. See Section 6.2 for a pro of of Prop ositio n 2 . W e remark in p ass- ing th at a randomized r e sp onse scheme parallel to that of Sectio n 3.2 achiev es the same rate of con verge nce; once again, rand omized resp onse is optimal. 10 4.3 Ac hiev abilit y b y orthogonal pro jection estimators F or h igher d eg rees of sm oothness ( β > 1), h isto gram estimators no longer ac hiev e optimal rates in the classical setting. Accordingly , w e no w tur n to dev eloping estimators based on orthogonal series expansion, and sho w that eve n in the setting of lo cal priv acy , they can ac hiev e the lo w er b ound ( 15 ) for all orders of smo o thness β ≥ 1. Recall the elliptical S o b olev space (Definition 1 ), in whic h a fu nctio n f is r epresen ted in terms of its basis expansion f = P ∞ j =1 θ j ϕ j , where θ j = R f ( x ) ϕ j ( x ) dx . Th is repr ese n tation un derlies the classical metho d of orthon orm a l s e ries estimation: given a data set, { X 1 , X 2 , . . . , X n } , dra wn i.i.d. according to a density f ∈ L 2 ([0 , 1]), w e fi rst compute the empir ic al basis co efficien ts b θ j = 1 n n X i =1 ϕ j ( X i ) and then set b f = k X j =1 b θ j ϕ j , (19) where the v alue k ∈ N is c hosen either a p riori based on kno wn prop erties of th e estimation problem or adaptiv ely , for example, using cross-v alidation [ 16 , 26 ]. In th e setting of lo cal p riv acy , w e consider a mec h a nism that, instead of releasing the v ector of co e fficien ts  ϕ 1 ( X i ) , . . . , ϕ k ( X i )  for eac h data p oin t, emplo ys a random vec tor Z i = ( Z i, 1 , . . . , Z i,k ) with the prop ert y th a t E [ Z i,j | X i ] = ϕ j ( X i ) for eac h j = 1 , 2 , . . . , k . W e assu me the basis fun c tions are uniformly b ounded; i.e., there exists a constan t B 0 = sup j sup x | ϕ j ( x ) | < ∞ . (This b oundedness condition holds for m a n y standard b a ses, includin g the trigonometric b asis underlying the classical Sob olev classes and the W alsh b a sis.) F or a fixed n um b er B str ic tly larger than B 0 (to b e sp ecified momen tarily), consider the follo wing s cheme: Sampling st rategy Giv en a v ector τ ∈ [ − B 0 , B 0 ] k , construct e τ ∈ {− B 0 , B 0 } k with co ordinates e τ j sampled indep endently from {− B 0 , B 0 } with probabilities 1 2 − τ j 2 B 0 and 1 2 + τ j 2 B 0 . Sample T from a Bernoulli( e α / ( e α + 1)) distribu tio n. Th e n choose Z ∈ {− B , B } k via Z ∼ ( Uniform on  z ∈ {− B , B } k : h z , e τ i > 0  if T = 1 Uniform on  z ∈ {− B , B } k : h z , e τ i ≤ 0  if T = 0 . (20) By insp ection, Z is α -d iffe ren tially p riv ate f o r any initial v ector in th e b o x [ − B 0 , B 0 ] k , and moreo v er, the samples ( 20 ) are efficient ly compu ta ble (for example by rejection sampling). Starting from the v ector τ ∈ R k , τ j = ϕ j ( X i ), in the ab o v e samp lin g strategy , iteration of exp ecta tion yields E [[ Z ] j | X = x ] = c k B B 0 √ k  e α e α + 1 − 1 e α + 1  ϕ j ( x ) = c k B B 0 √ k e α − 1 e α + 1 ϕ j ( x ) , (21) for a constan t c k that ma y dep end on k b ut is O (1) and b ound ed a w ay from 0. C o nsequent ly , to attain the unbiasedness cond it ion E [[ Z i ] j | X i ] = ϕ j ( X i ), it suffi ces to take B = O ( B 0 √ k /α ). The fu ll s a mpling and inferential scheme are as follo ws: giv en a data p oin t X i , we sample Z i according to the strategy ( 20 ), where w e start from the vec tor τ = [ ϕ j ( X i )] k j =1 and use the b ound B = B 0 √ k ( e α + 1) /c k ( e α − 1), where the constant c k is as in the expression ( 21 ) . De fining the densit y estimator b f := 1 n n X i =1 k X j =1 Z i,j ϕ j , (22) w e obtain the follo win g p roposition. 11 Prop o sition 3. L et { ϕ j } b e a B 0 -uniformly b ounde d orthonorma l b asis for L 2 ([0 , 1]) . Ther e exists a c onstant c (dep ending only on C and B 0 ) such that the estimator ( 22 ) with k = ( nα 2 ) 1 / (2 β +2) satisfies E f h k f − b f k 2 2 i ≤ c  nα 2  − 2 β 2 β +2 . for any f in the Sob olev sp ac e F β ,C . See Section 6.3 for a pro of. Prop ositions 2 and 3 make clear that the minimax lo w er b ound ( 15 ) is sh a rp, as claimed. W e ha ve thus at tained a v arian t of the kno wn minimax densit y estimati on rates ( 14 ), b ut with a p olynomially worse sample complexit y as so on as w e require lo ca l differenti al pr iv acy . Before concluding our exp osition, w e mak e a few remarks on other p oten tial dens it y estimators. Our orthogonal-serie s estimator ( 22 ) (and sampling s cheme ( 21 )) , while similar in spirit to that prop osed by W asserman and Zhou [ 28 , Sec. 6], is different in that it is lo cally priv ate and requ ires a d iffe ren t n o ise str ategy to obtain b oth α -lo cal priv acy and optimal conv ergence rate. Finally , we consider the insufficiency of standard Laplace noise addition for estimation in the setting of this section. Consider the vec tor [ ϕ j ( X i )] k j =1 ∈ [ − B 0 , B 0 ] k . T o mak e this vect or α -differentiall y priv ate b y addin g an indep enden t Laplace noise v ector W ∈ R k , w e must tak e W j ∼ Laplace( α/ ( B 0 k )). The natural orthogonal series estimator (e.g. [ 28 ]) is to tak e Z i = [ ϕ j ( X i )] k j =1 + W i , wher e W i ∈ R k are indep enden t Laplace n o ise v ectors. W e then use the dens ity estimator ( 22 ), except that w e u se the Laplacian p erturb ed Z i . How ev er , th is estimator su ffer s the follo wing dra w bac k (see section 6.4 ): Observ ation 1. L et b f = 1 n P n i =1 P k j =1 Z i,j ϕ j , wher e the Z i ar e the L aplac e-p erturb e d ve ctors of the pr evious p ar agr aph. Assume the orthon ormal b asis { ϕ j } of L 2 ([0 , 1]) c ontains the c onstant function. Ther e is a c onstant c such that for any k ∈ N , ther e is an f ∈ F β , 2 such that E f h k f − b f k 2 2 i ≥ c ( nα 2 ) − 2 β 2 β +3 . This low er b ound sho ws that standard estimato rs based on adding Laplac e noise to appropriate basis expansions of the data fail: there is a degradation in rate from n − 2 β 2 β +2 to n − 2 β 2 β +3 . While this is not a formal pro of that no approac h based on Laplac e p erturbation can p ro vide optimal con verge nce rates in our setting, it d oes suggest that finding suc h an estimator is non -trivial. 5 Pro of of Theorem 1 A t a high lev el, our pro of can b e split in to three steps, the fi rst of whic h is relativ ely standard, while the second t w o exploit sp ecific asp ects of the lo ca l priv acy set-up: (1) The first step is a standard reduction, based on Lemma 1 , from an estimation problem to a multi-w a y testi ng pr ob lem that in v olv es discriminating b et ween indices ν con tained within some s u bset V of R d . (2) The second step is an appropriate constru c tion of the set V ⊂ R d suc h that eac h pair is δ - separated and the resulting set is as large as p ossible (a maximal δ -pac k in g ). In add iti on, our argumen ts require that, for a r a ndom v ariable V u niformly distr i buted ov er V , the co v ariance Co v ( V ) has relativ ely small op erator norm. 12 (3) The fi nal step is to apply Pr o p osition 1 in ord er to cont rol the mutual information asso ciated with the testing pr o blem. T o do so, it is necessary to sho w that con trolling the su prem um subsets of L ∞ ( X ) in the b ound ( 8 ) can b e r e duced to b oundin g th e op erator norm of Co v( V ). W e ha ve already describ ed the reduction of Step 1 in Section 2.1 . Accordingly , we turn to the second step. Constructing a go o d pac king: T he follo wing result on the binary hyp ercub e H d := { 0 , 1 } d underlies our construction: Lemma 2. Ther e e xist universal c onstants c 1 , c 2 ∈ (0 , ∞ ) such that for e ach k ∈ { 1 , . . . , d } , ther e is a set V ⊂ H d with the fol lowing pr op erties: (i) A ny ν ∈ V has e xa ctly k non-zer o entries. (ii) F or any ν , ν ′ ∈ V with ν 6 = ν ′ , the ℓ 1 -norm is lower b ounde d as k ν − ν ′ k 1 ≥ max {⌊ k / 4 ⌋ , 1 } . (iii) The set V has c ar dinality card( V ) ≥ ( d/k ) c 1 k . (iv) F or a r andom ve ctor V unif ormly distribute d over V , we have Co v ( V )  c 2 k d I d × d . The pr oof of Lemma 2 is b a sed on the probabilistic metho d [ 1 ]: w e show that a certain r a ndomized pro cedure generates su c h a pac king with strictly p ositiv e probabilit y . Along the w a y , w e use matrix Bernstein inequalities [ 23 ] and some appr oximati on-theoretic ideas dev elop ed b y K ¨ uhn [ 22 ]. W e pro vide details in App endix A . W e now construct a suitable pac king of the the unit s implex ∆ d . Given an integ er k ∈ { 1 , . . . , d } , consider the pac king V ⊂ { 0 , 1 } d giv en by Lemma 2 . F or a fixed δ ∈ [0 , 1], consider the follo wing family of v ectors in R d θ ν := δ k ν + (1 − δ ) d 1 , for eac h ν ∈ V . By insp ection, eac h of th e se vec tors b el ongs to th e d -v ariate probabilit y simplex (i.e., satisfies h 1 , θ ν i = 1 and θ ν ≥ 0). Moreo v er, sin ce the v ector ν − ν ′ can h a ve at most 2 k non-zero en tries, w e hav e k ν − ν ′ k 1 ≤ √ 2 k k ν − ν ′ k 2 . Com bined w it h p roperty ( ii ), we conclude th a t for u niv ersal constan ts c, c ′ > 0   ν − ν ′   2 ≥ k ν − ν ′ k 1 √ 2 k ≥ c ′ k 1 √ 2 k = c √ k . By the definition of θ ν , w e then ha v e for a universal constan t c that k θ ν − θ ν ′ k 2 2 = δ 2 k 2   ν − ν ′   2 2 ≥ c δ 2 k . (23) 13 Upp er b ounding the m utual information: Ou r next step is to up per b ound the m utual information I ( Z 1 , . . . , Z n ; V ). Recall the definition of the linear functionals ϕ ν from Prop osition 1 . Since X = { 1 , 2 , . . . , d } , any elemen t of L ∞ ( X ) m ay b e identi fied with a v ector γ ∈ R d . F ollo wing this iden tification, we hav e ϕ ν ( γ ) = d X j =1 θ ν,j γ j − 1 |V | X ν ′ ∈V d X j =1 θ ν ′ ,j γ j = δ k h γ , ν − E [ V ] i , where V is a r a ndom v ariable distributed unif orm ly ov er V . As a consequence, we hav e 1 |V | X ν ∈V ( ϕ ν ( γ )) 2 = δ 2 k 2 γ ⊤ Co v ( V ) γ ≤ c 2 δ 2 dk k γ k 2 2 , where the final inequalit y follo ws from L e mma 2 ( iv ). F or any γ ∈ G α , we ha ve the upp er b ound k γ k 2 2 ≤ d ( e α − e − α ) 2 / 4, whence sup γ ∈G α 1 |V | X ν ∈V ( ϕ ν ( γ )) 2 ≤ c ( e α − e − α ) 2 δ 2 k , for some unive rsal constan t c . Consequent ly , b y applying the in formati on inequalit y ( 8 ), w e con- clude that there is a univ ersal constan t constan t C suc h that I ( Z 1 , . . . , Z n ; V ) ≤ C nα 2 δ 2 k for all α ≤ 1 / 4. (24) Applying testing inequalities: The fin a l step is to lo wer b ound the testing error. S ince the v ectors { θ ν , ν ∈ V } are c/ √ k -separate d in the ℓ 2 -norm ( 23 ) and Lemma 2 implies card( V ) ≥ ( d/k ) c 2 k for a constan t c 2 , F ano’s inequalit y ( 6b ) implies M n  ∆ d , k ·k 2 2 , α  ≥ c 0 δ 2 k  1 − c 1 nδ 2 α 2 /k + log 2 c 2 k log( d/k )  , (25) for un iv ersal constants c 0 , c 1 , c 2 . W e split the remainder of our analysis into cases, dep ending on the v alues of ( k , d ). Case 1: First, supp ose th a t ( k , d ) are large enough to guarant ee that c 2 k log( d/k ) ≥ 3 log 2 . (26) In this case, if we set δ 2 = min  1 , c 2 k 2 2 c 1 nα 2 log d k  , then w e ha v e 1 − c 1 nδ 2 α 2 /k + log 2 c 2 k log ( d/ k ) ≥ 1 − c 1 nα 2 k c 2 k 2 2 c 1 nα 2 log d k + log 2 c 2 k log d k = 1 − 1 2 c 2 k log d k + log 2 c 2 k log d k ≥ 1 − 1 2 − 1 3 = 1 6 . 14 Com b ined with the F ano low er b ound ( 25 ), this yields the claim ( 9 ) under condition ( 26 ). Case 2: Alternativ ely , when d is small enough that condition ( 26 ) is violated, we instead apply Le Cam’s inequalit y ( 6a ) to a tw o-p oin t hyp othesis. F or our purp oses, it s uffices to consider the case d = 2, since for the purp oses of lo wer b ounds, any higher-dimensional pr o blem is at least as hard as this case. Define the t w o ve ctors θ 1 = 1 + δ 2 e 1 + 1 − δ 2 e 2 and θ 2 = 1 − δ 2 e 1 + 1 + δ 2 e 2 . By construction, eac h of these vec tors b elongs to the probab ility simp le x in R 2 , and moreo ve r, we ha ve k θ 1 − θ 2 k 2 2 = 2 δ 2 . L e tting P j denote the m ultinomial distr ib ution defined by θ j , w e also hav e k P 1 − P 2 k TV = k θ 1 − θ 2 k 1 / 2 = δ . In terms of the marginal measur e s M n ν defined in equation ( 4 ), P insk er’s inequ a lit y (e.g. [ 26 , Lemma 2.5]) implies that k M n 1 − M n 2 k TV ≤ q D kl ( M n 1 k M n 2 ) / 2 . Com b ined with Le Cam’s in e qualit y ( 6a ) an d the upp er b ound on KL dive rgences f r om Prop osi- tion 1 , w e find that the min ima x risk is lo w er b ounded as M n (∆ d , k ·k 2 2 , α ) ≥ δ 2 2  1 2 − 1 2 q 2( e α − 1) 2 n k P 1 − P 2 k 2 TV  , where P i denotes the m u lt inomial probabilit y asso ciated with the ve ctor θ i . Since k P 1 − P 2 k TV = δ b y construction and e α − 1 ≤ (5 / 4) α for all α ∈ [0 , 1 / 4] , we hav e δ 2 2  1 2 − 1 2 p 25 nα 2 δ 2 / 8  = δ 2 2  1 2 − 5 4 √ 2 √ nαδ  . Cho osing δ = min { 1 , √ 2 / (5 √ nα 2 ) } guaran tees that 1 2 − 5 √ nαδ / 4 √ 2 ≥ 1 / 4, and hence M n (∆ d , k ·k 2 2 , α ) ≥ 1 8 min  1 , 2 25 nα 2  , whic h completes the pro of of Theorem 1 . Pro o f of inequality ( 10 ) W e conclude b y pro ving inequalit y ( 10 ). W e d isti nguish three cases: (i) nα 2 < log d, (ii) log d ≤ nα 2 ≤ 1 2 d 2 , (iii) nα 2 ≥ 1 2 d 2 . In case (i), b y taking k = 1 in the lo wer b ound ( 9 ) we obtain the lo wer b ound 1. In case (ii), w e set k = √ nα 2 ∈ [ √ log d, d/ √ 2], and we obtain min ( 1 k , k log d k nα 2 ) = min ( 1 √ nα 2 , log d √ nα 2 − 1 2 log( nα 2 ) √ nα 2 ) ≥ log d √ nα 2 − 1 2 (2 log d + log 1 2 ) √ nα 2 = log 2 2 √ nα 2 . In the final case (iii), c ho osing k = d/ 2 yields th e b ound m in { 2 /d, d log 2 / ( nα 2 ) } ≥ d log 2 / ( nα 2 ). 15 0 0.5 1 −0.3 0 0.3 P S f r a g r e p l a c e m e n t s g 1 0 0.5 1 −0.03 0 0.03 P S f r a g r e p l a c e m e n t s g 2 (a) (b) Figure 1. Panel (a): illustration of 1-Lipschitz contin uous bump function g 1 used to pack F β when β = 1. P anel (b): bump function g 2 with | g ′′ 2 ( x ) | ≤ 1 used to pack F β when β = 2. 6 Pro ofs of Densit y Estimation Results In this section, w e pr o vid e th e pr o ofs of the results stated in S ec tion 4 on density estimation. W e defer the pro o fs of m o re tec h nica l results to the app endices. Thr ou gh ou t all pro ofs, w e u s e c to denote a un iv ers a l constant whose v alue ma y c hange from line to line. 6.1 Pro of of Theorem 2 As with our pr evi ous pr oof, the argument follo ws the general outline describ ed at the b e ginning of Section 5 . W e remark that our pro of is based on a lo c al pac king tec hnique, a more classical approac h than the metric ent rop y approac h dev elop ed b y Y an g and Barron [ 29 ]. W e do so b ecause in the setting of lo ca l differenti al pr iv acy we do not exp ect that glo bal results on metric en trop y will b e generally useful; r a ther, w e m ust carefully construct our pac king set to con trol th e m u tual information, rela ting the geometry of the p a c king to the actual in formatio n comm un ic ated. In comparison with our pro of of Theorem 1 , the construction of a suitable pac kin g of F β is somewhat more c hallenging: the identificat ion of densities with finite-dimensional vect ors, which w e require for our application of Prop osition 1 , is n ot immediately obvi ous. In all cases, w e use th e trigonometric basis to prov e our lo wer b ounds, s o w e ma y w ork d irec tly with smo oth density functions f . Constructing a go od packing: W e b egin by d escribing the coll ection of functions we use to pro v e our lo wer bou n d. Our co nstruction and identificat ion of den s it y fun c tions by vec tors is essen tially standard [ 26 ], b ut we s pecify some necessary conditions th a t we u se later. First, let g β b e a fun c tion defined on [0 , 1] satisfying the follo wing p roperties: (a) The function g β is β -times d iffe ren tiable, and 0 = g ( i ) β (0) = g ( i ) β (1 / 2) = g ( i ) β (1) for all i < β . 16 (b) The function g β is cen tered with R 1 0 g β ( x ) dx = 0, and there exist constan ts c, c 1 / 2 > 0 suc h that Z 1 / 2 0 g β ( x ) dx = − Z 1 1 / 2 g β ( x ) dx = c 1 / 2 and Z 1 0  g ( i ) β ( x )  2 dx ≥ c for all i < β . (c) The fu n ct ion g β is non-n e gativ e on [0 , 1 / 2] and non-p ositiv e on [1 / 2 , 1], and Leb esgue measure is absolutely con tinuous with resp ect to the measures G j , j = 1 , 2 giv en b y G 1 ( A ) = Z A ∩ [0 , 1 / 2] g β ( x ) dx and G 2 ( A ) = − Z A ∩ [1 / 2 , 1] g β ( x ) dx. (27) (d) Lastly , for almost ev ery x ∈ [0 , 1] , we ha ve | g ( β ) β ( x ) | ≤ 1 and | g β ( x ) | ≤ 1. The functions g β are smo oth “b u mps” that we u se as p ieces in our general constru c tion; see Figure 1 for an illustration of su c h functions in the cases β = 1 and β = 2 Fix a positive intege r k (to b e sp ecified momenta rily). Our pro of mak es use of the follo wing result from our pr evi ous pap er [ 11 , Lemma 7]: Lemma 3 (Re-stated f rom the p a p er [ 11 ]) . Th er e exists a p acking V of size at le ast exp( c 0 k ) of the hyp er cub e {− 1 , 1 } k such that   ν − ν ′   1 ≥ c 1 k for al l ν 6 = ν ′ , and 1 |V | X ν ∈V ν ν ⊤  c 2 I k × k , wher e ( c 0 , c 1 , c 2 ) ar e universal p ositive c onstants. W e n o w mak e u s e of this pac king of the h yp ercub e in order to construct a pac king of our densit y class. F or eac h j ∈ { 1 , . . . , k } , defin e the function g β ,j ( x ) := 1 k β g β  k  x − j − 1 k   1 ( x ∈ [ j − 1 k , j k ] ) . Based on this defin i tion, w e define the family of d en sit ies  f ν := 1 + k X j =1 ν j g β ,j for ν ∈ V  ⊆ F β . (28) It is a standard fact [ 30 , 26 ] th a t for any ν ∈ V , the function f ν is β -times differen tiable, satisfies | f ( β ) ( x ) | ≤ 1 for all x , and k f ν − f ν ′ k 2 2 ≥ ck − 2 β . C o nsequent ly , the class ( 28 ) is a ( ck − β )-pac king of F β with cardinalit y at least exp( c 0 k ). Con t rolling t he op erator norm of the pa cking: Ha ving constructed a suitable pac king of the s pac e F β , we no w turn to b ounding the mutual information asso cia ted with a certain multi-w a y h yp othesis testing problem. Supp ose that an index V is drawn uniform ly at random from V , and conditional on V = ν , the data p oin ts X i are d ra wn i.i.d. according to the densit y f ν . The data { X 1 , . . . , X n } are then passed through an α -lo cally priv ate distribution Q , yielding the p erturb ed quan tities { Z 1 , . . . , Z n } . The f ollo wing lemma b ound s the mutual information b et w een the random index V and the outp uts Z i . 17 Lemma 4. Th er e exists a univ e r sal c onstant c such that for any α -lo c al ly private ( 1 ) c onditional distribution Q , the mutual informatio n i s upp er b ounde d as I ( Z 1 , . . . , Z n ; V ) ≤ n cα 2 k 2 β +1 . The pro of of th i s claim is fairly inv olv ed, s o we d efe r it to App endix B . W e remark, ho wev er, that standard mutual information b ounds [ 30 , 26 ] show I ( Z 1 , . . . , Z n ; V ) . n/k 2 β ; our b ound is th us essen tially a factor of the “dimension” k tigh ter. Applying te st ing inequalities: The remainder of the p roof is an application of F ano’s inequ a l- it y . In particular, w e apply L emm a 1 with ou r k − β pac king of F β in k·k 2 of size exp( c 0 k ), and we find that for any α -lo cally priv ate channel Q , there are u niv ersal constan ts c 0 , c 1 , c 2 suc h that M n  F β , k· k 2 2 , Q  ≥ c 0 k 2 β  1 − I ( Z 1: n ; V ) + log 2 c 1 k  ≥ c 0 k 2 β  1 − c 2 nα 2 k − 2 β − 1 + log 2 c 1 k  . Cho osing k n,α,β =  2 c 2 nα 2  1 2 β +2 ensures that the quan tit y inside the parentheses is a strictly p ositiv e constan t. As a consequence, th er e are univ ersal constan ts c, c ′ > 0 such that M n  F β , k ·k 2 2 , α  ≥ c k 2 β n,α,β = c ′  nα 2  − 2 β 2 β +2 as claimed. 6.2 Pro of of Prop osition 2 Note th a t the op erator Π k p erforms a Euclidean pro jection of the v ector ( k /n ) P n i =1 Z i on to the scaled probabilit y simplex, thus pro jecting b f on to the s et of probability densities. Give n the non-expansivit y of Eu clidean p ro jectio n, this op eration can only decrease the error k b f − f k 2 2 . Con- sequen tly , it suffices to b ound the error of the unpro jected estimator; to redu c e n o tational o verhead w e retain our pr evi ous n otation b θ for the unp ro jected ve rsion. Using this notation, we h a ve E h   b f − f   2 2 i ≤ k X j =1 E f " Z j k j − 1 k ( f ( x ) − b θ j ) 2 dx # . By expandin g this expression and n oting that the indep enden t noise v ariables W ij ∼ Laplace( α/ 2) ha ve zero mean, we obtain E h   b f − f   2 2 i ≤ k X j =1 E f " Z j k j − 1 k  f ( x ) − k n n X i =1 [ e k ( X i )] j  2 dx # + k X j =1 Z j k j − 1 k E  k n n X i =1 W ij  2  = k X j =1 Z j k j − 1 k E f "  f ( x ) − k n n X i =1 [ e k ( X i )] j  2 # dx + k 1 k 4 k 2 nα 2 . (29) W e b ound the error term inside the exp ectation ( 29 ). Defining p j := P f ( X ∈ X j ) = R X j f ( x ) dx , w e ha ve k E f [[ e k ( X )] j ] = k p j = k Z X j f ( x ) dx ∈  f ( x ) − 1 k , f ( x ) + 1 k  for an y x ∈ X j , 18 b y the Lipsc hitz conti n uit y of f . Th us, expanding the b i as and v ariance of the in tegrated exp ecta- tion ab o ve , we fi nd that E f "  f ( x ) − k n n X i =1 [ e k ( X i )] j  2 # ≤ 1 k 2 + V ar k n n X i =1 [ e k ( X i )] j ! = 1 k 2 + k 2 n V ar ([ e k ( X )] j ) = 1 k 2 + k 2 n p j (1 − p j ) . Recalling the inequalit y ( 29 ), we obtain E f h   b f − f   2 2 i ≤ k X j =1 Z j k j − 1 k  1 k 2 + k 2 n p j (1 − p j )  dx + 4 k 2 nα 2 = 1 k 2 + 4 k 2 nα 2 + k n k X j =1 p j (1 − p j ) . Since P k j =1 p j = 1, w e fi nd that E f h   b f − f   2 2 i ≤ 1 k 2 + 4 k 2 nα 2 + k n , and c h oosing k = ( nα 2 ) 1 4 yields the claim. 6.3 Pro of of Prop osition 3 W e b egi n by fi xing k ∈ N ; w e will optimize th e c hoice of k sh ortly . Recall that, s ince f ∈ F β ,C , w e ha ve f = P ∞ j =1 θ j ϕ j for θ j = R f ϕ j . Thus we may d efi ne Z j = 1 n P n i =1 Z i,j for eac h j ∈ { 1 , . . . , k } , and we hav e k b f − f k 2 2 = k X j =1 ( θ j − Z j ) 2 + ∞ X j = k +1 θ 2 j . Since f ∈ F β ,C , w e are guaran teed that P ∞ j =1 j 2 β θ 2 j ≤ C 2 , and hence X j >k θ 2 j = X j >k j 2 β θ 2 j j 2 β ≤ 1 k 2 β X j >k j 2 β θ 2 j ≤ 1 k 2 β C 2 . F or the indices j ≤ k , w e note that b y assump tio n, E [ Z i,j ] = R ϕ j f = θ j , and sin c e | Z i,j | ≤ B , we ha ve E  ( θ j − Z j ) 2  = 1 n V ar ( Z 1 ,j ) ≤ B 2 n = B 2 0 c k k n  e α + 1 e α − 1  2 , where c k = Ω(1) is the constan t in expr ession ( 21 ). Pu tti ng together the pieces, the mean-squared L 2 -error is up per b ounded as E f h k b f − f k 2 2 i ≤ c  k 2 nα 2 + 1 k 2 β  , where c is a constant dep ending on B 0 , c k , and C . Cho ose k = ( nα 2 ) 1 / (2 β +2) to complete the pro of. 19 6.4 Pro of of Observ ation 1 W e b egin by n oting that for f = P j θ j ϕ j , by definition of b f = P j b θ j ϕ j w e ha ve E h k f − b f k 2 2 i = k X j =1 E h ( θ j − b θ j ) 2 i + X j ≥ k +1 θ 2 j = k X j =1 B 2 0 k 2 nα 2 + X j ≥ k +1 θ 2 j = B 2 0 k 3 nα 2 + X j ≥ k +1 θ 2 j . Without loss of generalit y , let us assum e ϕ 1 = 1 is the constan t function. Then R ϕ j = 0 for all j > 1, and by defin ing the true function f = ϕ 1 + ( k + 1) − β ϕ k +1 , w e ha ve f ∈ F β , 2 and R f = 1, and moreo ver, E h k f − b f k 2 2 i ≥ B 2 0 k 3 nα 2 + 1 ( k + 1) − 2 β ≥ C β ,B 0 ( nα 2 ) − 2 β 2 β +3 , where C β ,B 0 is a constan t dep ending on β and B 0 . This fin a l low er b o und comes by minimizing o ver all k . (If ( k + 1) − β B 0 > 1, w e can rescale ϕ k +1 b y B 0 to ac h ie v e the same result and guarante e that f ≥ 0.) 7 Discussion W e hav e linked minimax analysis from statistical decision theory w it h differentia l priv acy , bringing some of their resp ectiv e f o undational principles into close con tact. Our main tec h nique, in the form of the diverge nce b ounds in Pr op osition 1 , s ho ws that applying differen tially pr iv ate sampling sc h emes essen tially acts as a contract ion on distributions, and we think t hat such results ma y b e more generally applicable. In this pap er particularly , we sho w ed h o w to apply our div ergence b ounds to obtain sharp b ounds on the con ve rgence rate for certain nonparametric problems in addition to standard finite-dimensional settings. Wit h our earlier pap er [ 11 ], w e ha ve dev elop ed a set of tec hniqu e s that sho w that roughly , if one can construct a family of distr i butions { P ν } on th e s a mple space X th at is not we ll “c orrelated” with any memb e r of f ∈ L ∞ ( X ) for wh ic h f ( x ) ∈ {− 1 , 1 } , then pro vidin g priv acy is costly—the con traction P r oposition 1 p ro vides is strong. By p ro v id ing (to our knowledge, th e first) sharp conv ergence rates for many standard s t atis- tical inference p rocedur e s un d er lo cal d iffe ren tial priv acy , we ha v e d e v elop ed and explored some to o ls that ma y b e used to b etter u nderstand priv acy-preserving statistical inference and estimation pro cedures. W e hav e id e n tified a f u ndamen tal conti n uum along whic h priv acy may b e traded for utilit y in the f o rm of accurate statistical estimates, p ro vid ing a w a y to adjust statistical pro cedures to m e et th e priv acy or utilit y n e eds of the statistician and th e p opulation b eing sampled. F ormally iden tifying this tradeoff in other statistical problems should allo w us to b etter unders t and the costs and b enefits of priv acy; we b eliev e we h av e laid some of the groundwork for doing so. Ac kno wledgmen t s W e th an k Guy Roth blum for v ery helpf ul discussions. JCD was su pp o rted b y a F acebo ok Graduate F ello wship and an NDSEG fello wship. Our w ork was su pp orted in part by th e U.S. Arm y Researc h Lab oratory , U.S. Ar my Research Office under grant num b er W911NF-11-1 -0391 , and Office of Na v al Researc h MURI gran t N0001 4-11-1 -0688. 20 A Pro of of Lemma 2 In the regime k ∈ ( d/ 2 , d ], the s t atemen t of lemma follo ws Lemma 7 in Duc hi et al. [ 11 ]; conse- quen tly , w e pr o ve the claim for k ≤ d/ 2. If k ∈ { 1 , 2 , 3 , 4 } , taking V = { ν ∈ H d : k ν k 1 = k } implies that k ν − ν ′ k 1 ≥ k / 4 f o r ν 6 = ν ′ , card( V ) =  d k  ≥ ( d/k ) ck for some constan t c > 0, and Co v ( V ) =  k d − k 2 d 2  I d × d , from whic h the claim follo ws. Accordingly , w e fo cus on k ∈ (4 , d/ 2] . T o further simplify the analysis, we claim it s uffices to establish th e claim in the case that k / 4 is in tegral (i.e., k ∈ 4 N ). Indeed, assu me that the result holds for all suc h int egers. Giv en some k / ∈ 4 N , we ma y consid e r a pac kin g V ′ of the binary h yp ercub e H d ′ with d ′ = d − ( k − 4 ⌊ k / 4 ⌋ ) and k ν k 1 = k ′ = 4 ⌊ k/ 4 ⌋ for ν ∈ V ′ . By assumption, there is a pac king V ′ of H d ′ satisfying the lemma. No w to eac h ve ctor ν ∈ V ′ , we concatenate the ( k − 4 ⌊ k/ 4 ⌋ )-v ector 1 , w h ic h giv es [ ν ⊤ 1 ⊤ ] ⊤ ∈ { 0 , 1 } d and   [ ν ⊤ 1 ⊤ ]   1 = k . Th is concatenation do es not increase Co v ( V )—the last k − 4 ⌊ k / 4 ⌋ co ordinates hav e co v ariance zero—and the rest of the terms in items ( i )–( iv ) incur only constan t f actor c h an ges. It remains to prov e th e claim for k ∈ 4 N ov er the range { 5 , . . . , ⌊ d/ 2 ⌋} . T o ease notation, we let ℓ = k / 4 b elong to th e interv al [2 , d/ 8]. Our pro of is based on the probabilistic metho d [ 1 ]: w e prop ose a rand o m construction of a pac king, and show that it satisfies the desired p r o p erties with strictly p ositiv e p robabilit y . Our random construction is straight forw ard: letting H d = { 0 , 1 } d denote the Bo olea n hyp ercub e, we sample K i.i.d. random ve ctors U i from the uniform d istribution o ver the set S ℓ := { ν ∈ H d | k ν k 1 := 4 ℓ } . (30) W e claim that for K = ( d/ (6 ℓ )) 3 ℓ/ 2 , th e resulting random set U K := { U 1 , . . . , U K } satisfies the claimed pr o p erties w it h non-zero pr o babilit y . W e sa y th a t U K is ℓ -separated if k U i − U j k 1 > ℓ for all i 6 = j , and w e u s e Co v ( U K ) to denote the co v ariance of a random vec tor V dra wn uniform ly at random from U K . Our pro of is b ase d on the follo wing tw o tail b ounds , which we pr o ve shortly: for a universal constan t c < ∞ , P [ U K is not ℓ -separated] ≤  K 2   6 ℓ d  3 ℓ , and (31a) P  λ max  Co v ( U K )  ≥ t  ≤ d exp  − K t 2 3 c max { ℓ, ℓ 3 /d } + ctℓ  for all t > 0. (31b) F or the momen t, let us assum e the v alidity of th ese b ounds and use them to complete the pro of. By the union b ound , we hav e P ( U K is not ℓ -separated or Co v( U K ) 6 t I ) ≤  K 2   6 ℓ d  3 ℓ + d exp  − K t 2 3 c max { ℓ, ℓ 3 /d } + ctℓ  . By c ho osing t = C ℓ/d and r ec alling that K = ( d/ (6 ℓ )) 3 ℓ/ 2 , we obtain the b ound 1 2 + d exp − C 2 ℓ 2 ( d/ (6 ℓ )) 3 ℓ/ 2 3 c max { d 2 ℓ, d ℓ 3 } + C cdℓ 2 ! . 21 If ℓ ≥ ℓ 3 /d , the second term can b e easily seen to b e less than 1 2 for suitably large constants C , so assume that ℓ ≤ ℓ 3 /d . Then w e hav e, wh ere c is a constan t w hose v alue may c hange from inequalit y to inequalit y , ℓ 2 ( d/ (6 ℓ )) 3 ℓ/ 2 3 c max { d 2 ℓ, d ℓ 3 } + C cdℓ 2 = ℓ 2 d 3 ℓ/ 2 (6 ℓ ) 3 ℓ/ 2 (3 cdℓ 3 + C cdℓ 2 ) ≥ c d 3 ℓ/ 2 (6 ℓ ) 3 ℓ/ 2 dℓ ≥ c dℓ  d 6 ℓ  3 ℓ/ 2 . F or suitably large d and an y ℓ ≥ 2, th e final term is greater than c ′ log d for some constan t c ′ > 0, whic h implies that with appropriate choice of the constan t C earlier, we ha ve the b ound P ( U K is not ℓ -separated or Co v( U K ) 6 t I ) < 1 . Consequent ly , recalling that k = 4 ℓ by definition, a pac king as describ ed in the statemen t of th e lemma m ust exist. It r e mains to pro v e the tail b ounds ( 31a ) and ( 31b ). Be ginning with the former b ound , define the set N ( ν , ℓ ) :=  ν ′ ∈ H d |   ν − ν ′   1 ≤ ℓ  . Recalling the definition ( 30 ) S ℓ , let U i and U j b e samp le d ind epend e n tly an d uniformly at rand om from S ℓ . Then P ( k U i − U j k 1 ≤ ℓ ) ≤ card( N ( ν, ℓ )) card( S ℓ ) . Note th a t N ( ν , ℓ ) can b e constru c ted b y choosing an arb it rary subset J ⊂ { 1 , . . . , d } of size ℓ , and then setting ν ′ j = ν j for j 6∈ J and ν j arbitrarily otherwise; consequ ently , its cardinalit y is u pp er b ounded as card( N ( ν, ℓ )) ≤  d ℓ  2 ℓ . S ince card( S ℓ ) =  d 4 ℓ  , we fin d that card( N ( ν, ℓ )) card( S ℓ ) =  d ℓ  2 ℓ  d 4 ℓ  = 2 ℓ ( d − 4 ℓ )!(4 ℓ )! ( d − ℓ ) ! ℓ ! = 2 ℓ 3 ℓ Y j =1 ℓ + j d − 4 ℓ + j ≤ 2 ℓ  4 ℓ d − ℓ  3 ℓ , where the final inequalit y follo ws b ecause the function x 7→ h ( x ) = ℓ + x d − 4 ℓ + x is increasing for x > 0. Since ℓ ≤ d/ 8 by assump tion, we arr iv e at the upp er b ound card( N ( ν, ℓ )) card( S ℓ ) ≤ 2 ℓ  4 ℓ d − ℓ  3 ℓ = 4 · 2 1 / 3 ℓ d − ℓ ! 3 ℓ ≤  6 ℓ d  3 ℓ . Since w e hav e to compare  K 2  suc h p ai rs o ve r the set U K , the claim ( 31a ) follo ws from the union b ound. W e no w turn to establishin g th e clai m ( 31b ), for w hic h we make use of matrix Bernstein in- equalities. Letting U b e drawn uniformly at r a ndom from S ℓ , we ha ve E [ U U ⊤ ] = β ℓ,d 11 ⊤ +  4 ℓ d − β ℓ,d  I d × d . where β ℓ,d :=  4 ℓ 2  d 2  − 1 . C o nsequent ly , the d × d random matrix A := U U ⊤ − β ℓ,d 11 ⊤ −  4 ℓ d − β ℓ,d  I d × d . 22 is cen tered ( E [ A ] = 0) , and by definition of our constru c tion, Co v ( U K ) = 1 K P K i =1 A i , where the random matrices { A i } K i =1 are drawn i.i.d. In ord er to app ly a matrix Berns tein inequalit y , it remains to b ound the op erator norm (maxi- m u m singular v alue) of A and its v ariance. Th e op erator n o rm of A is up per b ounded as | | | A | | | ≤          U U ⊤ − (4 ℓ/d − β ℓ,d ) I          + β ℓ,d          11 ⊤          = 4 ℓ − 4 ℓ d + β ℓ,d + dβ ℓ,d ≤ 5 ℓ. Moreo v er, w e claim that there is a univ ersal p ositiv e constan t c such that       E [ A 2 ]       ≤ c max { ℓ, ℓ 3 /d } . (32) T o establish this claim, we b egin b y computing E [ A 2 ] = E [ U U ⊤ U U ⊤ ] −  4 ℓ d − β ℓ,d  I d × d + β ℓ,d 11 ⊤  2 = 4 ℓ  4 ℓ d − β ℓ,d  I d × d + β ℓ,d 11 ⊤  −  4 ℓ d − β ℓ,d  I d × d + β ℓ,d 11 ⊤  2 . Consequent ly , if we define the constants, a ℓ,d :=  4 ℓ − 4 ℓ d + β ℓ,d  and b ℓ,d :=  4 ℓβ ℓ,d − 8 ℓβ ℓ,d d + 2 β 2 ℓ,d − dβ 2 ℓ,d  , then E [ A 2 ] = a ℓ,d I d × d + b ℓ,d 11 ⊤ . It is easy to see that | a ℓ,d | ≤ 4 ℓ and that | b ℓ,d | ≤ c ′ ℓ 3 d 2 for some unive rsal constan t c ′ , from whic h the in termediate claim ( 32 ) follo ws. With these pieces in place, the claimed tail b ound ( 31b ) follo ws a matrix Bernstein inequalit y (e.g., [ 23 , Corollary 5.2]), applied to the quantit y Co v ( U K ) = 1 K P K i =1 A i . B Pro of of Lemma 4 This r esult relies on inequalit y ( 8 ) f r om Prop ositio n 1 , along with a careful argument to understand the extreme p oin ts of γ ∈ L ∞ ([0 , 1]) that we use when applying the pr o p ositio n. First, w e tak e a pac king V as guaran teed b y Lemma 3 and consider densities f ν for ν ∈ V . O v erall, our first step is to sho w for the purp oses of applying inequalit y ( 8 ), it is no lo ss of generalit y to identi fy γ ∈ L ∞ ([0 , 1]) with ve ctors γ ∈ R 2 k , where γ is constant on inte rv als of the form [ i/ 2 k, ( i + 1) / 2 k ]. With this iden tification complete, we can then use the packing s e t V from Lemma 3 to pro vid e a b ound on th e correlation of any γ ∈ L ∞ ([0 , 1]) with the d e nsities f ν , which completes the p roof. With this outline in mind, let th e sets D i , i ∈ { 1 , 2 , . . . , 2 k } , b e defined as D i = [( i − 1) / 2 k , i/ 2 k ) except that D 2 k = [(2 k − 1) / 2 k , 1], so the collection { D i } 2 k i =1 forms a partition of th e unit interv al [0 , 1]. By construction of the d e nsities f ν , the sign of f ν − 1 remains constan t on eac h D i . Recalling the linear fun ctionals ϕ ν in Prop osition 1 , w e ha v e ϕ ν : L ∞ ([0 , 1]) → R defined via ϕ ν ( γ ) = 2 k X i =1 Z D i γ ( x )( f ν ( x ) − f ( x )) dx, = 2 k X i =1 Z D i γ ( x )( f ν ( x ) − 1 − ( f ( x ) − 1)) dx, 23 where f = (1 / |V | ) P ν ∈V f ν . E xpanding the squ a re, w e find that since f is the a verage , w e ha ve 1 |V | X ν ∈V ϕ ν ( γ ) 2 ≤ 1 |V | X ν ∈V  2 k X i =1 Z D i γ ( x )( f ν ( x ) − 1) dx  2 . Since the set G α from P roposition 1 is co mpact, conv ex, an d Hausdorff, the Krein-Milma n theorem [ 24 , Prop osition 1.2] guaran tees that it is equal to the conv ex h ull of its extreme p oin ts; moreo ver, since the functionals γ 7→ ϕ 2 ν ( γ ) are con ve x, th e su prem um in Pr op osition 1 m u st b e attained at the extreme p oint s of G α . As a consequence, when applying the inf o rmation b ound I ( Z 1 , . . . , Z n ; V ) ≤ n C α 1 |V | sup γ ∈G α X ν ∈V ϕ 2 ν ( γ ) , (33) w e can restrict our attenti on to γ ∈ L ∞ ([0 , 1]) for which γ ( x ) ∈ { e − α − e α , e α − e − α } / 2. No w we a rgue that it is no loss of g eneralit y to assume that γ , when restricted to D i , is a constant (apart from a measure zero se t). Using µ to denote Lebesgue measure, define the shorthand κ = ( e α − e − α ) / 2. Fix i ∈ [2 k ], and assume for the sake of con tradiction that there exist sets B i , C i ⊂ D i suc h that γ ( B i ) = { κ } and γ ( C i ) = {− κ } , while µ ( B i ) > 0 and µ ( C i ) > 0. 1 W e will construct v ectors γ 1 and γ 2 ∈ G α and a v alue λ ∈ (0 , 1) suc h that Z D i γ ( x )( f ν ( x ) − 1) dx = λ Z D i γ 1 ( x )( f ν ( x ) − 1) dx + (1 − λ ) Z D i γ 2 ( x )( f ν ( x ) − 1) dx sim u lta neously for all ν ∈ V , w hile on D c i = [0 , 1] \ D i , w e will ha v e the equiv alence γ 1 | D c i ≡ γ 2 | D c i ≡ γ | D c i . Indeed, set γ 1 ( D i ) = { κ } and γ 2 ( D i ) = {− κ } , otherw ise setting γ 1 ( x ) = γ 2 ( x ) = γ ( x ) for x 6∈ D i . W e define λ := R B i ( f ν ( x ) − 1) dx R D i ( f ν ( x ) − 1) dx so 1 − λ = R C i ( f ν ( x ) − 1) dx R D i ( f ν ( x ) − 1) dx . By the construction of the function g β , the function f ν − 1 d oes not change signs on D i , and th e absolute contin uit y conditions on g β sp ecified in equation ( 27 ) guarant ee 1 > λ > 0, since µ ( B i ) > 0 and µ ( C i ) > 0. Moreo v er, the quant it y λ is constan t for all ν by the construction of the f ν , since B i ⊂ D i and C i ⊂ D i . W e thus fin d that for any ν ∈ V , Z D i γ ( x )( f ν ( x ) − 1) dx = Z B i γ 1 ( x )( f ν ( x ) − 1) dx + Z C i γ 2 ( x )( f ν ( x ) − 1) dx = κ Z B i ( f ν ( x ) − 1) dx − κ Z C i ( f ν ( x ) − 1) dx = κλ Z D i ( f ν ( x ) − 1) dx − κ (1 − λ ) Z D i ( f ν ( x ) − 1) dx = λ Z γ 1 ( x )( f ν ( x ) − 1) dx + (1 − λ ) Z γ 2 ( x )( f ν ( x ) − 1) dx. 1 F or a function f and set A , the notation f ( A ) denotes t he image f ( A ) = { f ( x ) | x ∈ A } . 24 By linearit y and the stron g conv exit y of the function x 7→ x 2 , then, we fin d that X ν ∈V  2 k X i =1 Z D i γ ( x )( f ν ( x ) − 1) dx  2 < λ X ν ∈V  2 k X i =1 Z D i γ 1 ( x )( f ν ( x ) − 1) dx  2 + (1 − λ ) X ν ∈V  2 k X i =1 Z D i γ 2 ( x )( f ν ( x ) − 1) dx  2 . Th us one of the dens ities γ i , i ∈ { 1 , 2 } m ust ha v e a larger ob jectiv e v alue th an γ . This is our desired con tradiction, whic h shows th a t (up to measure zero s ets) an y γ attaining th e su prem um in the inform a tion b ound ( 33 ) must b e constan t on eac h of the D i . Ha ving shown that γ is constan t on eac h of the interv als D i , we conclude that the supr em um ( 33 ) can b e redu c ed to a finite-dimensional problem o ver th e sub set G α, 2 k :=  u ∈ R 2 k | k u k ∞ ≤ e α − e − α 2  of R 2 k . I n terms of this subset, w e h a ve the up per b oun d |V | C α n I ( Z 1 , . . . , Z n ; V ) ≤ sup γ ∈G α X ν ∈V ϕ ν ( γ ) 2 ≤ sup γ ∈G α, 2 k X ν ∈V  2 k X i =1 γ i Z D i ( f ν ( x ) − 1) dx  2 . By construction of the f ν and g β , we ha ve the equalit y Z D i ( f ν ( x ) − 1) dx = ( − 1) i +1 ν i Z 1 2 k 0 g β , 1 ( x ) dx = ( − 1) i +1 ν i Z 1 2 k 0 1 k β g ( kx ) dx = ( − 1) i +1 ν i c 1 / 2 k β +1 , whic h implies that |V | C α n I ( Z 1 , . . . , Z n ; V ) ≤ s up γ ∈G α, 2 k X ν ∈V  c 1 / 2 k β +1 γ ⊤  1 − 1  ⊗ ν  2 = c 2 1 / 2 k 2 β +2 sup γ ∈G α, 2 k γ ⊤  X ν ∈V  1 − 1  ⊗ ν ν ⊤ ⊗  1 − 1  ⊤  γ , (34) where ⊗ denotes the Kr on eck er pro duct. By our construction of the pac king V of {− 1 , 1 } k , there exists a constan t c su c h that (1 / |V | ) P ν ∈V ν ν ⊤  cI k × k . Moreov er, observe that th e mappin g A 7→  1 − 1  ⊗ A ⊗  1 − 1  ⊤ satisfies  x y  ⊤   1 − 1  ⊗ A ⊗  1 − 1  ⊤   x y  = ( x − y ) ⊤ A ( x − y ) , whence it is op erat or monotone ( A  B imp l ies ( x − y ) ⊤ A ( x − y ) ≥ ( x − y ) ⊤ B ( x − y )). Consequ e n tly , b y linearit y of the Kroneck er pr oduct ⊗ and Lemma 3 , there is a u niv ersal constan t c such that 1 |V | X ν ∈V  1 − 1  ⊗ ν ν ⊤ ⊗  1 − 1  ⊤ =  1 − 1  ⊗  1 |V | X ν ∈V ν ν ⊤  ⊗  1 − 1  ⊤  cI 2 k × 2 k . 25 Com b ining this b ound with our in e qualit y ( 34 ), we obtain I ( Z 1 , . . . , Z n ; V ) ≤ n c k 2 β +2 sup γ ∈G α, 2 k γ ⊤ I γ = n c ( e α − e − α ) 2 k 2 k 2 β +2 for some un iv ers a l n umerical constan t c . S ince α ∈ [0 , 1 / 4], w e hav e ( e α − e − α ) 2 ≤ c ′ α 2 , whic h completes the pro of. References [1] N. Alon and J. H. Sp encer. The Pr ob abilistic Me t ho d . Wiley-In terscience, second edition, 2000. [2] B. Barak, K. Ch a udhuri, C. Dwork, S. K a le, F. McSherry , and K. T alw ar. Priv acy , accuracy , and consistency too: A holistic s o lution to co n tingency table release . In Pr o c e e dings of the 26th ACM Symp osium on Principles of Datab ase Systems , 2007. [3] L. Birg ´ e. Appro ximation d an s les espaces m ´ e triques et th ´ e orie de l’estimatio n. Z. f¨ ur Wahrscheinlichkeitsthe orie und verwebte Gebiet , 65:181 –238, 1983. [4] P . Bruc k er. An O ( n ) algorithm for quadratic knaps ack pr oblems. Op er ations R ese ar ch L etters , 3(3):1 63–16 6, 1984. [5] R. Carr o ll and P . Hall. O ptimal rates of conv ergence for decon v olving a dens it y . Journal of the Am eric an Statistic al Asso ciation , 83(4 04):11 84–1186, 1988. [6] K. Chaud h uri and D. Hsu. C o n vergence rates for differenti ally pr iv ate s tatistical estimation. In Pr o c e e dings of the 29th International Confer enc e on Machine L e arning , 2012. [7] K. Ch audh ur i, C. Moneleoni, and A. D. S a rw ate. Differen tially priv ate empirical risk mini- mization. J o urnal of M ach ine L e arning R ese ar ch , 12:1069– 1109, 2011. [8] T. M. Cov er and J. A. T h oma s. Elements of Information The ory, Se c ond Edition . Wiley , 2006. [9] A. De. Low er b ounds in differen tial priv acy . In Pr o c e e dings of the Ni nth The ory of Crypto gr aphy Confer enc e , 2012 . [10] J. C. Duchi, M. I. Jordan , and M. J . W ain wrigh t. Priv acy a ware learning. [stat.ML] , 2012. URL http://a rxiv.org/abs/12 10 .2085 . [11] J. C . Duc hi, M. I. Jordan, and M. J. W ain w righ t. L o cal priv acy and statistical minimax rates. arXiv:1302.32 03 [math.ST] , 2013. URL ht tp://arxiv.org/ abs/1302.3203 . [12] G. T. Dun c an and D. Lam b ert. Disclosure-limited data d isseminatio n. Journal of the Americ an Statistic al A sso ciation , 81(393) :10–1 8, 1986. [13] G. T. Dun can and D. Lam b ert. The risk of disclosure for micro data . Journal of Business and Ec onomic Statistics , 7(2):207 –217, 1989. [14] C. Dw ork. Different ial pr iv acy: a survey of results. In The ory a nd Applic ations of M o dels of Computation , vo lume 4978 of L e ctur e Notes in Computer Scienc e , pages 1–19. Sp ringer, 2008. 26 [15] C. Dw ork, F. McSherry , K. Nissim, and A. Smith. Calibrating noise to sensitivity in priv ate data analysis. In Pr o c e e dings of th e 3r d The ory of Crypto gr aphy Confer enc e , pages 265–284 , 2006. [16] S. Efromovic h . Nonp ar ametric Curve Estimation: M etho ds, The ory, and Appl ic ations . Springer-V erlag, 1999. [17] A. V. Evfimievski, J. Gehr k e, and R. Srik ant. Limiting pr iv acy breac h es in p r iv acy preserving data mining. In Pr o c e e dings of the Twenty-Se c ond Symp osium on Principles of D atab ase Systems , pages 211–22 2, 2003. [18] I. P . F ellegi. On the question of statistical confidentia lit y . Journal of the Americ an Statistic al Asso ciation , 67(337 ):7–1 8, 1972. [19] S. E. Fienber g, U. E. Mak o v, and R. J. Steele. Disclosure limitation using p erturbation and related metho ds f o r categoric al data. J o urnal of Official Statistics , 14(4):4 85–50 2, 1998. [20] M. Hard t and K. T alw ar. On the geometry o f differen tial priv acy . In P r o c e e dings of the F ourty-Se c ond Annual ACM Symp osium on the The ory of Computing , pages 705–714, 2010. [21] S. P . Kasiviswa nathan, H. K. Lee, K. Nissim, S. Raskh odniko v a, and A. Smith. What can we learn priv ately? SIAM Journal on Computing , 40(3):793 –826, 2011. [22] T. K ¨ uhn. A lo w er estimate f o r en tropy n u m b ers. Journal of App r oximation The ory , 110: 120–1 24, 2001. [23] L. W. Mac key , M. I. Jordan, R. Y. Chen, B. F arrell, and J. A. T ropp. Matrix concen tration inequalities via th e metho d of exc hangeable pairs. arXiv:1201.6002 [math.PR] , 2012 . URL http://a rxiv.org/abs/12 0 1.6002 . [24] R. R. Phelps. L e c tu r es on Cho quet’s The or em, Se c ond E d ition . Springer, 2001. [25] A. Smith. Priv acy-preserving statistical estimation with optimal conv ergence rates. In Pr o- c e e dings of the F ourty-Thir d Annual ACM Symp osium on the The ory of Computing , 2011. [26] A. B. Tsyb a k ov. Intr o duction to Nonp ar ametric Estimation . Springer, 2009. [27] S. L. W arner. Randomized resp onse: a sur v ey tec hniqu e f o r eliminating ev asiv e ans wer bias. Journal of the Americ an Statistic al Asso ci a tion , 60(309):6 3–69, 1965. [28] L. W asserman and S. Zhou. A statistical framew ork for differen tial priv acy . Journal of the Americ an Statistic al A sso ciation , 105(489 ):375 –389, 2010. [29] Y. Y an g and A. Barron. Information-theoretic determination of minimax rates of conv ergence. Anna ls of Statistics , 27(5): 1564– 1599, 1999. [30] B. Y u. Assouad, Fano, and Le Cam. In F estschrift for Lucien L e Cam , pages 42 3–435 . Springer-V erlag, 1997. 27

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment