A note on privacy preserving iteratively reweighted least squares

A note on priv acy preserving iterativ ely rew eigh ted least squares Mijung P ark and Max W elling QUvA Lab. Univ ersity of Amsterdam { mijungi.p, welling.max } @gmail.com Abstract Iterativ ely rew eighted least squares (IRLS) is a widely-used method in mac hine learning to estimate the parameters in the generalised linear mo dels. In particular, IRLS for L1 minimisation under the linear mo del pro vides a closed-form solution in each step, whic h is a simple multiplication betw een the in verse of the w eigh ted second momen t matrix and the w eigh ted ﬁrst momen t v ector. When dealing with priv acy sensitiv e data, ho wev er, dev eloping a priv acy preserving IRLS algorithm faces tw o c hallenges. First, due to the inv ersion of the second momen t matrix, the usual sensitivity analysis in diﬀeren tial priv acy incorporating a single datapoint p erturbation gets complicated and often requires unrealistic assumptions. Second, due to its iterative nature, a signiﬁcan t cumulativ e priv acy loss o ccurs. Ho wev er, adding a high level of noise to comp ensate for the priv acy loss hinders from getting accurate estimates. Here, we develop a practical algorithm that ov ercomes these challenges and outputs priv atised and accurate IRLS solutions. In our metho d, w e analyse the sensitivity of each momen ts separately and treat the matrix in v ersion and multiplication as a post-pro cessing step, whic h simpliﬁes the sensitivity analysis. F urthermore, we apply the c onc entr ate d diﬀer ential privacy formalism, a more relaxed v ersion of diﬀeren tial priv acy , whic h requires adding a signiﬁcan tly less amoun t of noise for the same level of priv acy guarantee, compared to the conv entional and adv anced comp ositions of diﬀerentially priv ate mechanisms. 1 In tro duction Diﬀeren tial priv acy (DP) algorithms provide strong priv acy guaran tees by t ypically p erturbing some statistics of a giv en dataset, which app ear in the outputs of an algorithm [1]. The amount of noise added for the p erturbation is set in order to comp ensate for any diﬀerence in the probability of any outcome of an algorithm by adding a single individual’s datap oint to or removing it from the data. So, in order to dev elop a DP algorithm, one ﬁrst needs to analyse the maximum diﬀerence in the probabilit y of an y outcome, whic h often called sensitivity , to set the lev el of additive noise. In this note, we’re interested in dev eloping a priv acy preserving iterativ ely rew eighted least squares (IRLS) metho d. In the compressed sensing literature [2], IRLS is used for solving the L-1 minimi- sation problem, in whic h a closed-form update of parameters in eac h step is a v ailable. This IRLS solution in each step is a simple multiplication b etw een the in verse of the weigh ted second moment matrix and the weigh ted ﬁrst momen t vector. Due to the inv erse of the second momen t matrix, analysing the sensitivity b ecomes c hallenging. Previous work [3] assumes each feature of eac h data- p oin t is i.i.d. drawn from a standard normal distribution, and analysed the sensitivity of the inv erse of the second moment matrix. Unfortunately , the assumption on each features b eing indep enden t is often not realistic. Another challenge in developing a priv acy preserving IRLS metho d comes from the iterativ e nature of the IRLS algorithm. The conv en tional DP comp osition theorem (Theorem 3.16 in [1]) states that m ultiple iterations of a  -DP algorithm faces a linearly degrading priv acy , which yields J  -DP after 1 J iterations. A more adv anced comp osition theorem (Theorem 3.20 in [1]) yields ( p 2 J log (1 /δ )  + J  ( e  − 1), δ )-DP . The new v ariable δ (stating the mec hanism’s failure probability) that needs to b e set to a v ery small v alue, whic h mak es the cum ulative priv acy loss still relativ ely high. T o comp ensate for the priv acy loss, one needs to add a signiﬁcan t amount of noise to the IRLS solution to a void rev ealing any individual information from the output of the algorithm. In this note, we tackle these challenges b y : (1) w e analyse the sensitivity of the weigh ted second momen t matrix and the w eighted ﬁrst moment v ector separately and p erturb eac h moment b y adding noise consistent with its o wn sensitivit y . Then, we do the multiplication of the in verse of p erturb ed second moment matrix and the p erturb ed ﬁrst moment v ector. This inv ersion and multiplication can b e view ed as a p ost-pro cessing step, whic h do esn’t alter the priv acy level. Since we p erturb eac h momen t separately , this metho d do es not require an y restrictive assumptions on the data. In addition, the noise v ariance naturally scales with the amoun t of data. (2) we apply the c onc entr ate d diﬀer ential privacy formalism, a more relaxed version of diﬀeren tial priv acy , to obtain more accurate estimates for the same cumulativ e priv acy loss, compared to DP and its ( , δ )-relaxation. In the follo wing, we start by describing our priv acy preserving IRLS algorithm. 2 Priv acy preserving IRLS Giv en a dataset whic h consists of N input-output pairs { x i , y i } N i =1 where we assume || x i || 2 ≤ 1 and || y i || ≤ 1. The iterativ ely reweigh ted least squares solution has the form: ˆ θ ( t ) irl s = ( X > S X ) − 1 X > S y := B − 1 A (1) where X ∈ R N × d is a design matrix in whic h the i th ro w is the transp osed i th input x > (of length d ), and y is a column vector of outputs. W e denote B = 1 N X > S X , and A = 1 N X > S y . Here S is a diagonal matrix with diagonal s = | y − X ˆ θ ( t − 1) | p − 2 . Here we set p = 1 and compute L 1 norm constrained least squares. T o av oid dividing by 0, w e set s i = 1 max(1 /δ, | y i − X i ˆ θ ( t − 1) | ) , (2) where X i is the i th row. δ sets the sparsit y (num b er of non-zero v alues) of the IRLS solution. W e will p erturb each of these statistics A and B by certain amounts, such that each statistic is  0 − diﬀeren tially priv ate in each iteration.  0 -diﬀeren tially priv ate momen t A b y Laplace mechanism. F or p erturbing A , w e use the Laplace mechanism. T o use the Laplace mechanism, we ﬁrst need to analyse the follo wing L1- sensitivit y: ∆ A := max D , ˜ D∈ N | χ | , ||D− ˜ D|| 1 =1 || 1 N X > S y − 1 N ˜ X > ˜ S ˜ y || 1 = max x k , ˜ x k || 1 N x k s k y k > − 1 N ˜ x k ˜ s k ˜ y k > || 1 , ≤ 1 N d X l =1 | x k,l s k y k | + 1 N d X l =1 | ˜ x k,l ˜ s k ˜ y k | , triangle inequality ≤ s k N d X l =1 | x k,l | + ˜ s k N d X l =1 | ˜ x k,l | , since | y k 0 | ≤ 1 and | ˜ y k | ≤ 1 , ≤ 2 δ √ d N , since s k ≤ δ and | x l | 1 ≤ √ d. (3) 2 Hence, the following Laplace mec hanism pro duces  0 -diﬀeren tially priv ate moment of A : ˜ A = A + ( Y 1 , · · · , Y d ) , (4) where Y i ∼ i.i.d. Laplace( 2 δ √ d N  0 ).  0 -diﬀeren tially priv ate moment A by Gaussian mechanism. One could p erturb the ﬁrst momen t by using the Gaussian mec hanism. T o use the Gaussian mec hanism, one needs to analyse the L2-sensitivity , which is ∆ 2 A = 2 δ / N straightforw ardly coming from Eq.(3) ˜ A = A + ( Y 1 , · · · , Y d ) , (5) where Y i ∼ i.i.d. Gaussian(0 , σ 2 ), where σ ≥ c ∆ 2 A/ 0 for c 2 ≥ 2 log(1 . 25 /δ ).  0 -diﬀeren tially priv ate momen t B . W e p erturb B b y adding Wishart noise follo wing [4], which pro vides strong priv acy guaran tees and signiﬁcantly higher utilit y than other methods (e.g., [5 – 7]) as illustrated in [4] when p erturbing p ositive deﬁnite matrices. T o draw Wishart noise, w e ﬁrst draw Gaussian random v ariables: z i ∼ N  0 , δ 2  0 N I d  , for i = { 1 , · · · , d + 1 } , (6) and construct a matrix Z := [ z 1 , · · · , z d +1 ] ∈ R d × ( d +1) ˜ B := B + Z Z > (7) then ˜ B is a  0 -diﬀeren tially priv ate second moment matrix. Pro of follows the pap er [4]. The matrix Z Z > is a sample from a Wishart distribution W( Z Z > | δ 2  0 N I d , d + 1). The probability ratio b etw een a noised-up version ˜ B given a dataset D (where B is the exact second moment matrix given D ) and giv en a neigh b ouring dataset D 0 (where B 0 is the exact second mome n t matrix given D 0 ) is given b y W( ˜ B − B | δ 2  0 N I d , d + 1) W( ˜ B − B 0 | δ 2  0 N I d , d + 1) = exp( −  0 N δ tr( ˜ B − B )) exp( −  0 N δ tr( ˜ B − B 0 )) , (8) = exp(  0 N δ tr( B − B 0 )) , (9) = exp(  0 N δ 1 N tr( s k x k x k > − ˜ s k ˜ x k ˜ x k > )) , (10) = exp(  0 δ ( s k x k > x k − ˜ s k ˜ x k > ˜ x k )) , (11) ≤ exp(  0 ) , since 0 ≤ x k > x k ≤ 1, and 0 ≤ s k ≤ δ . 3 Concen trated diﬀeren tial priv acy for IRLS Here we adopt a relaxed version of DP , the so-called concentrated diﬀeren tial priv acy (CDP) in order to signiﬁcan tly lo w er the amoun ts of noise to add to the moments without compromising on cum ulative priv acy loss ov er several iterations. According to Theorem 3.5 in [8], any  -DP algorithm is (  (exp(  ) − 1) / 2 ,  )-CDP . F urthermore, theorem 3.4 states that J-comp osition of ( µ, τ )-CDP mechanism guaran tees ( P J i =1 µ i , q P J i =1 τ 2 i )- CDP . Supp ose w e p erturb some k ey statistic in each IRLS iteration using the Laplace mec hanism. 3 Denote the diﬀerence in statistic giv en dataset x and y by ∆ S := S ( x ) − S ( y ). The conv entional comp osition theorem sa ys that I should add Lap(∆ S J / ) in each iteration to ensure  -DP after J iterations. No w supp ose we perturb the key statistic in eac h iteration by adding Laplace noise dra wn from Lap(∆ S/ 0 ), whic h, according to Theorem 3.5 in [8], giv es us a (  0 (exp(  0 ) − 1) / 2 ,  0 )-CDP solution. According to Theorem 3.4 in [8], after J iterations, we obtain a ( J  0 (exp(  0 ) − 1) / 2 , √ J  0 )- CDP solution. What we wan t to mak e sure is if the expected priv acy loss is equal to our priv acy budget  , i.e., J  0 (exp(  0 ) − 1) / 2 =  . Using T aylor’s expansion, we can rewrite the left hand side by J  0 (1 +  0 + P ∞ j =2  0 j j ! − 1) / 2 =  , whic h we can low er b ound by ignoring the inﬁnite sum, J  0 2 / 2 ≤  . Hence, the largest  0 should b e less than equal to p 2 /J . This sa ys, in each iteration, the key statistic should b e p erturb ed b y adding Laplace noise dra wn from Lap( √ J ∆ S/ √ 2  ), in order to obtain a ( , √ 2  )-CDP solution after J iterations. In the IRLS algorithm, we hav e tw o statistics to p erturb in each iteration. Supp ose w e perturb each statistic to ensure  0 -DP . Then, we can mo dify the result ab ov e by replacing J with 2 J for the IRLS algorithm. Hence, each p erturbation should result in  0 -DP parameter, where  0 := q 2  2 J = p  J . This gives us the  -CDP IRLS algorithm b elo w. Algorithm 1 ( , √ 2  )-CDP IRLS algorithm via momen t p erturbation Require: Dataset D Ensure:  -IRDP least squares solution after J -iteration (1) Compute A = 1 N X > y and add either Laplace or Gaussian noise b y Eq.(4) or Eq.(5) (2) Compute B = 1 N X > X and add Wishart noise by Eq.(7) (3) Compute the  -CDP least squares solution by θ cdpirl s := ˜ B − 1 ˜ A . 4 Exp erimen ts Our simulated dataset consists of N datatpoints, each with d dimensional cov ariates, generated using i.i.d. draws x i ∼ N (0 , I d ), then normalise X such that the largest squared L2 norm is 1. W e generated the true parameter θ ∈ R d from N (0 , I d ). W e generated eac h observ ation y i from N ( X θ , σ 2 I ), where σ 2 = 0 . 01. W e also normalised Y suc h that the largest squared L2-norm is 1. 10000 20000 40000 80000 160000 −10 3 −10 2 −10 1 −10 0 nonpriv-IRLS # training datapoints log-likelihood per test point - CDPlap -DP - CDPgau -DP Figure 1: W e tested ( , √ 2  )-CDP-IRLS (gau: Gaussian mechanism for mean p er- turbation, lap: Laplace mechanism for mean p erturbation),  -DP-IRLS (using the con ven tional comp osition theorem), and ( , δ )-DP-IRLS (using the Adv anced com- p osition theorem) for d = 10 and  = 0 . 9 with v arying N for which we generated 20 indep enden t datasets. F or eac h IRLS solu- tion, w e computed the log-lik eliho o d of test data (10% of training data), then divided b y the num b er of test points to show the log-lik eliho o d p er test p oint. CDP-IRLS re- quires signiﬁcan tly less data than DP-IRLS for the same level of exp ected priv acy . 4 Ac kno wledgemen ts This work is supp orted by Qualcomm. References [1] Cyn thia Dw ork and Aaron Roth. The algorithmic foundations of diﬀeren tial priv acy . F ound. T r ends The or. Comput. Sci. , 9:211–407, August 2014. [2] R. Chartrand and W otao Yin. Iteratively reweigh ted algorithms for compressiv e sensing. In 2008 IEEE International Confer enc e on A c oustics, Sp e e ch and Signal Pr o c essing , pages 3869–3872, March 2008. [3] Or Sheﬀet. Priv ate approximations of the 2nd-momen t matrix using existing techniques in linear regression. CoRR , abs/1507.00056, 2015. [4] Rutgers Haﬁz Im tiaz, Anand D. Sarwate. Symmetric matrix p erturbation for diﬀeren tially-priv ate principal comp onen t analysis. In ICCASP , 2016. [5] Cyn thia Dwork, Kunal T alw ar, Abhradeep Thakurta, and Li Zhang. Analyze gauss: optimal bounds for priv acy-preserving principal comp onent analysis. In Symp osium on The ory of Computing, STOC 2014, New Y ork, NY, USA, May 31 - June 03, 2014 , pages 11–20, 2014. [6] Moritz Hardt and Eric Price. The noisy pow er method: A meta algorithm with applications. In Z. Ghahra- mani, M. W elling, C. Cortes, N. D. Lawrence, and K. Q. W einberger, editors, A dvanc es in Neur al Infor- mation Pr o c essing Systems 27 , pages 2861–2869. Curran Asso ciates, Inc., 2014. [7] Kamalik a Chaudh uri, Anand Sarwate, and Kaushik Sinha. Near-optimal diﬀerentially priv ate principal comp onen ts. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. W einberger, editors, A dvanc es in Neur al Information Pr o c essing Systems 25 , pages 989–997. Curran Asso ciates, Inc., 2012. [8] C. Dwork and G. N. Rothblum. Concen trated Diﬀerential Priv acy. ArXiv e-prints , March 2016. 5

A note on privacy preserving iteratively reweighted least squares

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment