Local Information Privacy and Its Application to Privacy-Preserving Data Aggregation

In this paper, we study local information privacy (LIP), and design LIP based mechanisms for statistical aggregation while protecting users' privacy without relying on a trusted third party. The notion of context-awareness is incorporated in LIP, whi…

Authors: Bo Jiang, Ming Li, Ravi T

Local Information Privacy and Its Application to Privacy-Preserving Data   Aggregation
JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 1 Local Inf or mation Pr iv acy and Its Application to Pr iv acy-Preser ving Data Aggregation Bo Jiang, Student Member , IEEE, Ming Li, Senior Member ,IEEE, and Ravi T andon, Senior Member ,IEEE Abstract —In this paper , we propose local inf or mation privacy (LIP), and design LIP based mechanisms f or statistical aggregation while protecting users’ privacy without relying on a trusted third par ty . The concept of conte xt-awareness is incorporated in LIP , which can be viewed as e xploiting of data pr ior (both in privatizing and post-processing) to enhance data utility . W e present an optimization framew ork to minimize the mean square error of data aggregation while protecting the privacy of each user’ s input data or a correlated latent variable b y satisfying LIP constraints. Then, w e study optimal mechanisms under different scenarios considering the pr ior uncer tainty and correlation with a latent variable . Three types of mechanisms are studied in this paper , including randomized response (RR), unary encoding (UE), and local hashing (LH), and we der ive closed-f or m solutions for the optimal perturbation parameters that are prior-dependent. We compare LIP based mechanisms with those based on LDP , and theoretically show that the f ormer achieve enhanced utility . W e then study two applications: (weighted) summation and histogram estimation, and sho w how proposed mechanisms can be applied to each application. Finally , we v alidate our analysis by simulations using both synthetic and real-w or ld data. Results show the impact on data utility b y different prior distributions , correlations, and input domain siz es. Results also show that our LIP-based mechanisms provide better utility-privacy tr adeoffs than LDP-based ones. Index T erms —privacy-preser ving data aggregation, local inf or mation privacy , inf ormation-theoretic pr ivacy F 1 I N T R O D U C T I O N P R I V A C Y issues are crucial in this big data era, as users’ data are collected both intentionally or unintentionally by an increasing number of private or public organizations. Most of the collected data is used for ensuring high quality of service, but may also put one’s sensitive information at potential risk. For instance, when people are rating movies, their preferences may be leaked; when users are searching for a parking spot nearby using a smartphone, their real locations ar e uploaded and pr one to leakage. Besides the cases where collected data itself is sensitive and causes privacy leakage, non-sensitive data release may also enable malicious inference on one’s private attributes: whenever there is a correlation between the collected data and peo- ple’s private latent attribute, directly releasing it causes privacy leakage. For instance, heartbeat data collected by smartwatch may potentially reveal one’s heart disease [1]; One can easily infer a target user ’s home or work location by tracking his daily location trace [2]; Smart meters can reveal the activities of people inside a home by tracking their electricity , gas, or water usage frequently over time [3]. It is, therefor e, desirable to design privacy-preserving mechanisms providing privacy guarantees without affecting data utility . T raditional privacy notions such as k -anonymity [4] do not provide rigor ous privacy guarantees and ar e prone to various attacks. Nowadays, Dif ferential Privacy (DP) [5] has become the de facto standard for ensuring data privacy in the database community [6] and has been adopted by the U.S. Census in 2020 [7]. The definition of DP assures that each • Bo Jiang, Ming Li and Ravi T andon are with the Department of Electrical and Computer Engineering, University of Arizona, T ucson, AZ, 85721. E-mail: bjiang@email.arizona.edu, lim@email.arizona.edu, tan- donr@email.arizona.edu user ’s data has minimal influence on the output of statistical queries on a database. In the classical DP setting, a trusted server is assumed to hold all users’ data and provide noisy answers to queries. However , or ganizations or companies collecting users’ data may not be trustworthy , and the data storage system may not be secure. As a result, r ecently , local privacy protection mechanisms have gained attention as the local setting allows data aggregation while protecting each user ’s data without relying on a trusted third party . 1.1 Local Priv acy Notions In local privacy-pr eserving data release, individuals per- turb their data locally before uploading it. Organizations that want to take advantage of users’ data then aggregate over the collected data. The earliest such mechanism is randomized response (RR) [8], which randomly perturbs each user ’s data. However , the original RR does not have formal privacy guarantees. Later , Local Differential Privacy (LDP) was proposed as a local variant of DP that bounds the privacy leakage in the local setting [9]. Many schemes were proposed under the notion of LDP . For example, [10]– [12], and Google’s RAPPOR [13]. LDP based data aggrega- tion mechanisms have already been deployed in the real- world. For example, in June 2016, Apple announced that it would deploy LDP-based mechanisms for data collection [14]. However , T ang et al . show that although Apple’s de- ployment ensures that the privacy budget 1 of each datum submitted to its servers is 1 or 2 , the overall privacy budget permitted by the system can be as high as 16 . W ang et al . proposed a variety of LDP protocols for frequency estima- tion [15] and compared their performance with Google’s 1. The parameters,  ≥ 0 , measures the privacy level. A smaller  corresponds to a higher privacy level. JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 2 RAPPOR. However , for a given reasonable privacy budget, these pr otocols provide limited utility . Intuitively , compar ed with the central DP model, it is more challenging to achieve a good utility-privacy tradeoff in the local setting. The main reasons ar e: (1) LDP requires intr oducing noise at a significantly higher level than requir ed in the central setting. That is, for a summation/count query , with additive noise privacy-preserving mechanism, a lower bound of noise magnitude of Ω( √ N ) is requir ed for LDP in or der to defend against potential coalitions of compromised users, where N is the number of users. In contrast, only O (1) is r equired for central DP [16]. (2) LDP does not assume a neighborhood constraint on input data, for data with large domain, LDP leads to a significantly reduced utility [17]. In general, both local and central DP pr ovide strong context-free theor etical guarantees against worst-case adver- saries [18]. Context-free means the adversary can possess arbitrary background knowledge of a user ’s data (except her specific input instance). In other words, the definition of (L)DP is too strong and regar dless of scenarios where the particular context or prior knowledge of the data is avail- able. Such scenarios exist in many applications. For instance, in Internet of Things (IoT), the prior distribution of context related to sensor data plays a critical role in distributed data transmission and computation [19]. Another example is location-based services: people have a higher likelihood to be at some locations than others; such as in Paris, people are more likely closer to Eiffel tower than a coffee shop nearby [20]. In mobile-health data collection, background knowledge such as the likelihood of people having certain diseases is available through previously published medical studies [21]. When background information is available, (L)DP fails to capture the explicit privacy leakage of users or the information gain at the adversary . On the other hand, for a given utility , (L)DP may not always be feasible depending on the privacy budget [22]. Although approximated ( , δ ) - (L)DP is intr oduced [23] to r ealize an achievable mechanism, the non-negative addend δ could be large enough (close to 1) to provide limited privacy guarantee. 1.2 Relaxing Local Differential Privacy There is a trend among the privacy research community that leverages the background knowledge to relax the definition of DP , and the utility can be incr eased by explicitly modeling the adversary’s knowledge. Privacy notions that consider such prior knowledge are denoted as “context-aware” pri- vacy notions. For context-aware privacy notions, besides the privacy budget  , the amount of requir ed noise also depends on the prior distribution of the data: context-dependent privacy mechanisms add noise selectively according to the data prior when most needed so that utility can be en- hanced. For example, less noise is requir ed to perturb for data with higher certainty [18], [24]. In general the existing context-aware privacy definitions fall into two categories based on either average-case or worst-case guarantees. All information-theoretic privacy notions belong to the former class [25]–[27]. The latter includes Pufferfish [28], Bayes DP [29], Membership privacy [30], etc. A verage-case notions ar e generally weaker than the latter since they cannot bound the leakage for all the input and output pairs, which may not be easily adopted by the privacy-sensitive users. On the other hand, existing context-aware worst-case privacy notions like Pufferfish and Bayesian DP still follow the same structure of (L)DP – the maximum ratio between two likelihoods of a certain output given differ ent input data. Since the relationship with prior distribution is not directly captured in the definition, this makes context-awar e privacy mechanism design challenging (either high complexity or not easily composable). 1.3 Local Information Privac y In this paper , we make use of the maximum ratio of pos- terior to prior to capture information leakage in the local setting, denote as local information privacy (LIP). Originally , information privacy (IP) was proposed in a central setting by Calmon et. al. [31], which requires a trusted curator . The main r eason that prohibits Centralized IP from being adopted in practice is that the distribution of all users’ data is too complex to express or capture, especially for a large-size dataset. In contrast, LIP r equires only the prior distribution of one particular user ’s data, which can be obtained through many approaches in practice. An illustrative example of why context-aware privacy notions result in increased utility is shown in Fig. 1, which shows the perturbation mechanisms of context-free (LDP) and context-aware (LIP) notions and the comparison of the mean squar e err ors when collecting private binary data with specific prior . W e illustrate the optimal perturbation pr ob- abilities for the same privacy budget (epsilon=0.6) under both LDP and LIP privacy notions. Observe that the per- turbation channel of LDP is symmetric, while LIP designs perturbation parameters according to the prior knowledge. When the data value is quite certain, it has a smaller probability of flipping the value to increase utility . While when the data takes a value that has a small probability of happening, the mechanism also protects its privacy by a large perturbation probability (a large amount of additive noise). In this example, the probability of flipping the data value through the LDP mechanism is 0 . 35 in contrast to 0 . 2 × 0 . 55 + 0 . 8 × 0 . 1 = 0 . 19 of the LIP based mechanism. As a result, LIP leads to an enhanced utility than LDP . 1.4 Related W ork In the original paper on differential privacy , Dwork et al. [32] defined a notion of “semantic” privacy that in- volves comparing the prior and posterior distributions of the database or a user ’s participation. Since then, similar privacy notions have been investigated in the central set- ting. Such as, in [33],  -Semantic Privacy is studied, which captures the additional information caused by releasing a contingency table. The privacy is measured by the absolute distance between the prior to posterior ratio and 1, and it allows the ratio to scale linearly with  . In [34], semantic privacy is redefined by capturing the statistical differ ence between two posterior beliefs at the adversary . The posterior probabilities are calculated by priors of two neighboring datasets and the same output of the mechanism. In [35], privacy is measured by the prior to posterior ratio at the adversary that one user ’s tuple belongs to a collection of recor ds. In [30], more specifically ,  -Membership privacy JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 3 Figure 1. LIP increases utility by e xplicitly designing per turbation parameters according to prior knowledge. measures the adversary’s prior and posterior beliefs on whether the tuple of the target user belongs to the dataset. However , the privacy notions described above consider the central setting, and the input of the mechanism is a dataset rather than each individual’s data. This makes it inconvenient to adapt them into the local setting. T o the best of our knowledge, the prior to posterior structur e has not yet been thoroughly explored in privacy definitions for the local setting, wher e each individual releases a privatized answer to an untrusted third party directly . T o avoid explicitly modeling the adversary’s background knowledge, a more robust and practical way to define pri- vacy is to relax the exact prior assumption. In [21], bounded prior differential privacy is studied, which assumes that the real prior distribution comes from a bounded set of prob- ability simplex. In [35], it is assumed that the adversary’s belief on the targeted individual’s membership is upper bounded. Pufferfish privacy [28] also assumes bounded knowledge of the adversary . The knowledge is captured by set P , which contains all plausible evoluti on scenarios of the hidden secret and the input data. As a result, by adjusting the size of P , Pufferfish can be viewed as a generalization of DP while accounting for prior knowledge. In this paper , we also define a bounded set of priors to avoid modeling the adversary’s knowledge explicitly . On the other hand, in many applications, the user ’s secret information to be pr otected is different from but corre- lated with the data being collected. T o this end, the privacy notions leveraging latent variable like Puf ferfish enables a variety of definitions of data utility , such as principal inertia components [36], data pattern [37], distribution estimation [38], etc . However , one of the drawbacks of Pufferfish pri- vacy is the difficulty of mechanism design. Recently , in [39], W ang et al . designed a W asserstein Mechanism, which achieves Pufferfish privacy , but it is computationally ineffi- cient, and the mechanism they proposed is appr oximated. In this work, we combine the bounded prior set and latent variable into the prior to posterior structure, and we show that LIP only assumes the adversary has access to the statistic of the input data, and the correlation with the latent variable, but not the distribution of latent variables. T o derive the utility-privacy tradeoff, there’s a line of work that formulating optimization problems to maximize the utility while subject to certain privacy constraints or doing conversely [13], [15], [24], [40], [41]. Firstly , most of them define utility for some specific applications, such as frequency estimation, itemset aggr egation, statistic esti- mation, etc. In this paper , we consider a general type of utility defined by the mean square error of a function of the input and output. W e showed that by instantiating it with differ ent functions, the proposed mechanisms could be applied to multiple real-world applications. Secondly , only a few works above provide closed-form optimal solutions for mechanism design. In [15], optimization problems are formulated to increase the accuracy in fr equency estimation, and different protocols ar e studied under LDP . However , the utility provided by various mechanisms is limited because no prior information about the data is incorporated into the mechanism. In [40], the utility optimized LDP mechanism is proposed, which is shown to achieve better utility by ex- ploiting different data input’s sensitivity , which is a different type of context than priors. 1.5 Main Contributions The main contributions of this paper are listed as follow: (1) W e propose Local Information Privacy (LIP) for local data release (without a trusted thir d party), which r elaxes the notion of LDP by incorporating prior knowledge and introducing latent variables. W e formally derive the rela- tionships between existing privacy definitions and LIP . (2) W e apply LIP to privacy-preserving data aggr egation: we pr esent a general framework to estimate a function of the collected data and minimize the mean squared error of the estimation while protecting each individual’s privacy by satisfying LIP constraints. W e consider three perturbation mechanisms. One can be viewed as a general form of the RR; the other two incorporate unary encoding and local hashing. W e derive the optimal mechanisms for differ ent scenarios on prior uncertainty and correlation between input data and latent secret. (3) W e consider two r eal-world applications in this paper , including weighted summation and histogram estimation. W e demonstrate that considering prior knowledge helps the curator design an unbiased estimator , which significantly improves data utility by post-pr ocessing; On the other hand, for the users, we show how proposed mechanisms can be applied to these two applications. Compared with LDP based mechanisms, we show that LIP based mechanisms provide enhanced utility . (4) W e validate our analysis by simulations on both synthetic and real-world datasets (Karosak, a website-click stream data set, and Adult, a survey of census income). W e illustrate the impact of data correlation, input data domain, and prior uncertainty on data utility provided by different mechanisms. When compared to LDP based mechanisms, LIP based mechanisms always provide better utility . For input data with a large domain, encoding methods could potentially increase utility than compared to RR. 1.6 Paper Organization The remainder of the paper is organized as follows: In Section 2, we introduce the proposed LIP notion and its re- JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 4 lationship with other existing privacy notions. In Section 3, we introduce the system model and problem formulation. In Section 4, we derive the utility-privacy tradeoff, including model with a fixed prior , model with an uncertain prior . Under each model, encoding based mechanisms are studied. Then, we compare with LDP based model. Finally , we dis- cuss the applications of these models, including weighted summation and histogram estimation. In Section 5, we present the simulation results and compare the utility- privacy tradeoffs provided by different mechanisms under differ ent data domain, data prior , data correlations with differ ent datasets. In Section 6, we of fer concluding remarks. 2 P R I VAC Y D E FI N I T I O N S A N D R E L AT I O N S H I P S In this Section, we first recap several existing privacy no- tions in the local setting. W e then introduce LIP and study its relationships with other notions. In this paper , we focus on discrete-valued data. 2.1 Privacy Definitions Consider a privacy-protection mechanism M takes input data X and outputs a perturbed version of Y . It is assumed that X takes value from a discrete domain X with the prior distribution of θ X ∈ P X , where P X is the set containing all possible prior distributions on X . In the latent variable setting, denote G , which takes value from G as the hidden secret that is correlated with X . Denote θ X G ∈ P X G as the joint distribution of X and G . Denote Y = Range ( M ) as the domain of Y . The context-free LDP definition states that any two in- puts from the data domain X result in the same output with similar probabilities. Definition 1. (  -Local Differential Privacy (LDP)) [10] M satisfies  -LDP for some  ∈ R + , if ∀ x, x 0 ∈ X and ∀ y ∈ Y : P r ( Y = y | X = x ) P r ( Y = y | X = x 0 ) ≤ e  . (1) LDP provides strong context-free privacy protection, since it pr ovides indistinguishability of input’s data-value regar dless of the data prior distribution. Context-fr ee no- tions typically suffer poor utility-privacy tradeoff. W e next introduce context-aware privacy definitions. Maximal Information Leakage captures the adversary’s ability without assuming a particular accessible prior . Definition 2. (  -Maximal Information Leakage (MIL)) [42] The maximal information leakage of M is defined as: L ( X → Y ) = log X y ∈Y max x ∈X P r ( Y = y | X = x ) , (2) and M satisfies Maximal Information Leakage privacy if for some  ∈ R + : L ( X → Y ) ≤  . MIL captures the average likelihood probability over all possible y ∈ Y given the corresponding value of x that maximizes this probability . However , MIL does not provide pairwise protection over all possible values of x and y and hence is relatively weak. Mutual information privacy measures the average infor- mation leakage of X contained in Y : Definition 3. (  -Mutual Information Privacy (MIP)) [26] M satisfies  -MIP for some  ∈ R + , if the mutual information between X and Y satisfies I ( X ; Y ) ≤  , where I ( X ; Y ) is: X x ∈X ,y ∈Y P r ( X = x, Y = y ) log P r ( X = x, Y = y ) P r ( X = x ) P r ( Y = y ) . (3) Although MIP is context-aware, it provides relatively weak privacy pr otection since it only bounds the average infor- mation leakage over all possible x and y in the domain. Another context-aware privacy notion that provides pairwise protection over each possible values of x and y is differ ential identifiability . Definition 4. (  -Differential Identifiability (DI)) [43] M satis- fies  -DI for some  ∈ R + , if ∀ x, x 0 ∈ X and ∀ y ∈ Y : P r ( X = x | Y = y ) P r ( X = x 0 | Y = y ) ≤ e  . (4) The operational meaning of DI is, given the output y , the adversary cannot tell whether the original data(set) is x or x 0 . DI can be directly adapted in the local setting, and is context-aware due to the dependence on the data prior: P r ( Y = y | X = x ) P r ( X = x ) P r ( Y = y | X = x 0 ) P r ( X = x 0 ) ≤ e  . One major drawback of DI is the difficulty of designing practical mechanisms, as DI measur es the ratio of posteriors, which means the likelihood ratio (perturbation parameters) of any two different inputs is dependent on the prior ratio. For example, if P r ( X = x ) P r ( X = x 0 ) is small, DI requires P r ( Y = y | X = x 0 ) P r ( Y = y | X = x ) to be lar ge for all y ∈ Y . However , we know that P y ∈Y P r ( Y = y | X = x 0 ) = P y ∈Y P r ( Y = y | X = x ) = 1 . Pufferfish privacy is originally proposed in the central setting [28], and here we adapt it into the local setting where X and Y stand for user ’s input and output data, respectively . Definition 5 (Local Puf ferfish Privacy) . Given a set of po- tential secrets G , a set of discriminative pairs G pairs , a set of data evolution scenarios P X G , M satisfies  -Pufferfish ( G , G pairs , P X G ) privacy , for some  ∈ R + if • for all possible inputs x ∈ X , y ∈ Y , • for all pairs ( g i , g j ) ∈ G pairs of potential secrets, • for all distributions θ X G ∈ P X G that P r ( g i | θ X G ) 6 = 0 and P r ( g j | θ X G ) 6 = 0 , the following holds: e −  ≤ P r ( M ( x ) = y | θ X G , g i ) P r ( M ( x ) = y | θ X G , g j ) ≤ e  . (5) Note that, when the set P X G spans all possible joint distri- butions including the case when X = G . Then for such a special case, Local Pufferfish becomes equivalent to LDP . Motivated by central information privacy [31], to pro- vide a pairwise constraint on the information leakage of secret G through Y in the local setting, we consider a bound on the ratio between the prior and posterior , which leads to the notion of local information privacy . Denote θ X as the prior of input data X , T GX as the conditional probability of P r ( X = x | G = g ) . Denote θ X G as a fixed data evolution scenario: θ X G = { θ X , T GX } . The definition of Local Information Privacy is defined as: JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 5 (a) A summary of different privacy notions (b) Relationship between LIP and other privacy notions. Figure 2. A summary of different privacy notions and the relationships among them. Definition 6. (  -Local Information Privacy (LIP)) Given a set of potential secrets G , given a set of data evolution scenarios P X G , M satisfies  -LIP for some  ∈ R + , if ∀ g ∈ G , ∀ θ X G ∈ P X G and ∀ y ∈ Y : e −  ≤ P r ( G = g | θ X G ) P r ( G = g | Y = y , θ X G ) ≤ e  . (6) There are three cases regarding the range of P X G : • When P X G includes one given prior distribution, LIP becomes LIP for fixed prior θ X G ; • When P X G includes all possible priors, LIP becomes Worst-Case-LIP (WC-LIP); • When P X G includes a subset of all possible priors, LIP becomes Bounded-Prior-LIP (BP-LIP). The operational meaning of LIP is: By observing any output y , the change of the belief about the latent variable taking any specific value compared with the prior distribu- tion is not increased or decr eased too much. Note that, when  is small, this ratio is bounded close to 1, which means the output Y is independent of the latent secret G . Note that, LIP assumes the adversary is accessible to the statistic of the input data and the conditional probability of P r ( X = x | G = g ) , but may not be accessible to the prior of the secret G . Such assumption also helps avoid modeling the adversary’s ability explicitly . Moreover , it also enables LIP to protect either discrete or continuous-valued secret G . LIP also guarantees that any post-processing on the output cannot further increase privacy leakage. Lemma 1. When G → X → Y → Z forms a Markov chain, if for any y ∈ Y and g ∈ G , M satisfies  -LIP , then M also guarantees  -LIP for any z ∈ Z and g ∈ G . Proof. As P r ( G = g | Z = z ) = P y ∈Y P r ( G = g | Y = y ) P r ( Y = y | Z = z ) , which is bounded between min y ∈Y P r ( G = g | Y = y ) and max y ∈Y P r ( G = g | Y = y ) . Since the ratio of P r ( G = g ) /P r ( G = g | Y = y ) is bounded by [ e −  , e  ] for all g ∈ G , y ∈ Y , the ratio of P r ( G = g ) /P r ( G = g | Z = z ) is also bounded by [ e −  , e  ] Such pr operty enables the data curator to do further data mining, without increasing the privacy leakage. Compared to other context-aware definitions, LIP (including BP-LIP and WC-LIP) models the prior attainability comprehen- sively , including the scenarios where the prior is uncertain, (WC-LIP can be viewed as context-free). 2.2 Relationships with Existing Definitions 2.2.1 LIP v .s. LDP Since LDP does not assume a latent variable, to make a fair comparison between LIP and LDP , we assume the input X is private, i.e., G = X . Then, the following relationship holds between fixed-prior LIP and LDP:  -LIP implies 2  - LDP and  -LDP implies  -LIP (proof is shown in [41]). This implies that  -LIP is a more relaxed privacy notion than  - LDP . However , it is stronger than 2  -LDP . When comparing the relationship between  -WC-LIP and  -LDP , we have  -WC-LIP is equivalent to  -LDP (proof is shown in [44]). Intuitively , these two definitions are equiv- alent because both of them assume worst-case (context- free) priors. Then the relationship between LDP and BP- LIP is straightforward:  -BP-LIP is sandwiched between  - LDP and  -LIP . As a result, LIP , BP-LIP , and WC-LIP can be viewed as context-aware versions of LDP with different assumptions on the data priors. W e further compare the utility privacy tradeoff between these two definitions in terms of optimal mechanism design in Sec. 4.4. 2.2.2 LIP v .s. Local Puff erfish W e next compare LIP (BP-LIP , WC-LIP) with Local Puffer - fish privacy according to different scenarios of P X G . The results in the next lemma follow from the proof of the relationship between LIP and LDP . Lemma 2. The relationship between  -LIP and  -Local Pufferfish can be described as follow: •  -WC-LIP is equivalent to  -Local Pufferfish when P X G includes all possible θ X G ; •  -Local Pufferfish implies  -BP-LIP , and  -BP-LIP implies 2  -Local Pufferfish when P X G includes a subset of all possible prior distributions of θ X G . When P X G includes all possible prior distributions of X and G ,  -Local Pufferfish considers X = G (where the leakage is maximized), which is equivalent to  -LDP . In summary , Local Pufferfish relaxes LDP by defining a bounded set of possible prior distributions. Since the struc- ture of the ratio of two likelihoods in the definition of LDP does not allow for the incorporation of prior knowledge, Pufferfish further extends it by a correlated latent variable. This definition is more general in terms of operational meaning than only protecting the input. However , it also comes with dif ficulties in mechanism design compar ed to LDP , as the values of P r ( Y = y | G = g ) averages over all JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 6 T able 1 List of symbols R The universe of raw data values R Raw data θ Prior distribution G Private latent variable X The universe of input values X Input random variable T Correlation with latent variable ¯ X Set of input data Y Output random variable ¯ Y Set of output data N T otal number of users M Privatizing mechanism q Set of perturbation parameters f ( · ) Aggr egation function ˆ X Estimator at the curator ˆ S Aggregated result  Privacy budget U Utility measurement E Mean square error function T Feasible region of q the likelihood probabilities of P r ( Y = y | X = x ) , which are the perturbation parameters. 2.2.3 LIP v .s. Other Privacy Notions W e next compare the relationship between LIP and MIP , MIL and DI. Since these definitions do not assume latent variable or bounded prior set, we simplify the definition of LIP by: given the prior θ X , a mechanism M satisfies  -LIP for some  ∈ R + if ∀ x ∈ X , y ∈ Y : e −  ≤ P r ( Y = y ) P r ( Y = y | X = x ) ≤ e  . (7) Then,  -LIP provides stronger privacy guarantee than  -MIP , since P r ( X = x,Y = y ) P r ( X = x ) P r ( Y = y ) = P r ( X = x | Y = y ) P r ( X = x ) ≤ e  .  -LIP also im- plies  -MIL, as max x ∈X P r ( Y = y | X = x ) ≤ P r ( Y = y ) e  . Intuitively , among LIP , MIP and MIL, only LIP provides pairwise protection over each possible realization of x and y . T o compare the relationship between LIP and DI, we first define the maximal ratio of two prior probabilities of X as D X ∞ = max x,x 0 ∈X log P r ( X = x ) P r ( X = x 0 ) , then, the relationship between LIP and DI follows the next lemma with proof provided in Appendix A of the supplementary document. Lemma 3. The relationship between LIP and DI is:  -LIP implies (2  + D X ∞ ) -DI and  -DI implies (  + D X ∞ ) -LIP . The characteristics, relationships, and order among dif- ferent privacy notions are summarized in Fig. 2. So far , if a mechanism satisfies  -LIP , it implies  -MIP ,  -MIL, 2  - LDP , 2  -Pufferfish and (2  + D X ∞ ) -DI. The main reasons that we choose to study LIP instead of other notions are listed as follows: (1) LIP is more amenable to incorporate prior knowledge to design mechanisms than other context-aware notions. (2) Compared to context-free notions, LIP based mechanisms achieve much higher utility . In the following sections, we address how to design LIP based mechanisms according to the prior knowledge, and how LIP based mechanisms improve the utility-privacy tradeoff for different types of applications. 3 M O D E L S A N D P R O B L E M F O R M U L AT I O N 3.1 System and Threat Models Consider a data aggregation system with N users and a data curator . Each user possesses discr ete-valued data R i ∈ R , with the prior distribution of θ i R , which can be specified by P i r = P r ( R i = r ) , where i ∈ { 1 , 2 , .., N } is the user index. It is assumed that R i s are independent of each other (and may have differ ent distributions). Note that, each R i may be differ ent from but correlated with some private hidden secret G i ∈ G . Denote T i GR as the conditional pr obability … R 1 R 2 R N … Raw Data User User User M 1 M 2 M N … Y 1 Y 2 Y N … Mechanism Perturbed Data Curator Aggregation  S = f ( ¯ X ) 1 2 3 Latent V ariable …..… G 1 G 2 G N …..… …..… … … … T 1 GR T 2 GR T N GR … X 1 = f 1 ( R 1 ) X 2 = f 2 ( R 2 ) X N = f N ( R N ) Figure 3. System Model of Privacy-Preserving Data Aggregation. of R i given G i , and θ i RG = { θ i R , T i GR } . Denote P i RG as the bounded set including all possible θ i RG . T o answer some query , each user locally generates data X i from R i by a query-dependent function f i , i.e., X i = f i ( R i ) . It is assumed that f i is surjective, i.e., for any x ∈ X , there is at least one r ∈ R , s.t., f i ( r ) = x . Then, the prior distribution of θ i X can be calculated by the prior of θ i R according to the local function f i , and can be specified by P i x = P r ( X i = x ) . Similarly , the correlation between X i and G i , T i GX can be obtained by T i GR and f i . Denote θ i X G = { θ i X , T i GX } , and P i X G as the bounded set including all possible θ i X G . T o avoid potential privacy leakage, before publishing X i , each user locally perturbs it by a privacy-preserving mechanism M i . The output is denoted as Y i ∈ Y . The mechanism maps each possible input to each possible output with a certain probability (perturbation parameter). After receiving each perturbed data, the curator is allowed to further estimate and compute a statistical function of the collected data. The system model is depicted in Fig. 3. The curator is considered redhonest but curious due to both internal and external threats. On one hand, users’ private data is profitable, and companies can be interested in user tracking or selling their data. On the other hand, data breaches happen from time to time due to hacking activities. The curator aims at performing accurate estimations using all the information above, but is also interested in inferring each user ’s hidden secret G i . Denote the true aggregated result by S = f ( ¯ R ) , where ¯ R = { R 1 , R 2 , ..., R N } . Later we discuss the relationship between f operated at the curator and f i conducted by each user . For differ ent applications of data aggr egation, the definition of f ( · ) s varies. In this paper , two applications are considered: • W eighted summation: the curator is interested in finding the summation over users’ data: S = P N i =1 ( c i R i + b i ) . When each c i = 1 and b i = 0 , the application is equivalent to a direct summation, which is useful to find the average value; • Histogram estimation: the curator is interested in estimating how many people possess each of the data category in R , or classifying according to users’ data value. S in histogram is a set of “categorized” data: { S 1 , S 2 , ..., S |X | } , such that, ∀ k ∈ R , S k = P N i =1 1 { R i = k } , wher e 1 { a = b } is an indicator function, which is 1 if a = b ; 0 if a 6 = b . The curator (adversary) observes all the users’ outputs ¯ Y = { Y 1 , Y 2 , ..., Y N } and tries to obtain an estimation of S using estimator ˆ S . JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 7 In terms of prior availability , multiple scenarios could arise in practice. For example, both the user and the curator know θ i R exactly , or one party is uncertain about θ i R , or they possess dif ferent prior knowledge fr om each other , and one or both of them can be inaccurate. W ithin this paper ’s scope, we assume that the curator always knows the exact θ i R ( θ i RG ), and the algorithms/perturbation mechanisms that users deployed to publish their data. In the basic setting, we assume each user also possesses the exact prior (same as the curator). Later we relax it and consider uncertain prior at the user . All the related symbols are listed in T able 1. 3.2 General Privacy and Utility Definitions The privacy of each user ’s latent secret is guaranteed by LIP and is parameterized by the privacy budget (  ) in Definition (6). The smaller  is, the stronger privacy guarantee the mechanism provides. For simplicity , we consider  to be the same for all the users. However , it is straightforward to extend our model and results to the scenarios where differ ent users are provided by differ ent  s. When the exact prior θ i X G is not available for each user , he/she defines P i X G to be the set of plausible priors including θ i X G (users are always allowed to enlarge the size of the P i X G to include θ i X G ). Under LIP , the privacy constraints can be formulated as: ∀ i ∈ { 1 , ..., N } , ∀ g ∈ G , ∀ θ i X G ∈ P i X G and y ∈ Y , there is e −  ≤ P r ( G i = g | Y i = y , θ i X G ) P r ( G i = g | θ i X G ) ≤ e  . (8) Denote q i xy , { P r ( Y i = y | X i = x ) } , and t i g x , P r ( X i = x | G i = g ) , ∀ x ∈ X , g ∈ G , y ∈ Y . By Bayes rule, the privacy constraints in (8) can be expressed as: e −  ≤ P x ∈X q i xy t i g x P x ∈X q i xy P i x ≤ e  . (9) Let q i be the set of perturbation probabilities in M i . Then, when  and each P i X G are given, the set of inequalities in Eq. (8) forms a feasible region T i for q i , ∀ i ∈ 1 , 2 , ..N . The definition of utility depends on application scenar- ios. For example, in statistical aggregation, the estimation accuracy is often measured by absolute error or mean square error [45] [46]; in location tracking, it is typically measured by Euclidean distance [20]; under information theoretical framework, distortion is typically applied [26]. In this paper , we denote U ( S, ˆ S ) as the utility . In general, there is a tradeoff between utility and privacy . W e can formulate the following optimization problem to find the optimal mechanism that yields the optimal tradeoff: max U ( S, ˆ S ) , s.t. q i ∈ T i , ∀ i ∈ 1 , 2 , ..., N . (10) 3.3 Problem Form ulation Focusing on the two applications discussed above, we define utility as the inverse of the Mean Square Error (MSE), which is also adopted in many other works on frequency/histogram estimation [15], [46], [47]: U ( S, ˆ S ) = −E ( S, ˆ S ) , where E ( S , ˆ S ) = E [( S − ˆ S ) 2 ] . Note that, for weighted summation, the utility is data alphabet dependent while for histogram estimation, it is data alphabet inde- pendent, we show how MSE addresses these two different utilities in Sec. 4.5. Note that the adversary can use the prior distribution of each user ’s input data for post-processing. From [48], it is well-known that the optimal estimator that results in the minimized mean square error (MMSE) is ˆ S = g ( ¯ Y ) = E [ S | ¯ Y ] . Since E [ E [ S | ¯ Y ]] = E [ S ] , ˆ S is an unbiased estimator . W e next formulate the pr oblem under two cases, one is for a fixed prior , the other is for an uncertain prior . 3.3.1 Problem f or mulation f or a fixed prior Notice that, given each user ’s prior θ i RG , the MSE E ( S, ˆ S ) depends only on each user ’s perturbation parameters: { q i } N i =1 , as any estimation ˆ S depends on the output ¯ Y whose distribution is a function of { q i } N i =1 . Thus, maximiz- ing the utility is equivalent to finding optimal parameters to minimize the MSE. As a result, (10) becomes: min E ( q 1 , ..., q N ) , s.t. q i ∈ T f i , ∀ i ∈ 1 , 2 , ..N , (11) where T f i denotes the feasible region of q i for a fixed prior . Problem Decomposition: Next, we show the problem defined in Eq. (11) can be decomposed into local opti- mization problems for each user . Since we assume that each user ’s input is independent of each other , all the f ( · ) functions above can be decomposed into local functions f i ( · ) of each R i . Then, each of them results in an MSE in aggregation, which is denoted by E i = E [( f i ( R i ) − E [ f i ( R i ) | Y i ]) 2 ] (for the application of histogram, denote E k i = E [( f k i ( R i ) − E [ f k i ( R i ) | Y i ]) 2 ] as the MSE of aggregat- ing the k -th data with R i ). The utility defined in (11) satisfies decomposition theorem with proof provided in Appendix B of the supplementary document: Theorem 1. The global optimization problem defined in (11) can be decomposed into N local optimization problems: min { ( q i ) ∈T i } N i =1 E ( q 1 , ..., q N ) = N X i =1 min ( q i ) ∈T i E i ( q i ) . (12) By Theor em 1, when each local mechanism is optimized, the global MSE of the system achieves its minimum. In addition, each user can perform its local optimization independent of each other , which well suits the local setting. Now , each local optimization problem incurs an MSE of: E i ( q i ) = E [( f i ( R i ) − E [ f i ( R i ) | Y i ]) 2 ] = E { E [( X i − E [ X i | Y i ]) 2 | Y i ] } = E [ V ar ( X i | Y i )] ( a ) = V ar [ X i ] − V ar [ E ( X i | Y i )] , (13) where ( a ) follows the law of total variance. Utility Gain by Observing ¯ Y : W e next compare with the case where no observation of ¯ Y is available, the goal is to show the utility gain by observing ¯ Y . Since the curator possesses each θ i X , to minimize MSE, his optimal local estimator becomes E [ X i ] . Then, each local MSE becomes: V ar [ X i ] − V ar [ E ( X i )] = V ar [ X i ] − E [ E 2 ( X i )] + E 2 [ E ( X i )] = V ar [ X i ] . (14) JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 8 Compared with (13), the utility gain of observing ¯ Y is due to the term of V ar [ E ( X i | Y i )] . Which means when some observations on Y i are available, the non-negative term of V ar [ E ( X i | Y i )] helps increase data utility . For the data utility , the MSE of the estimation is a function of the variance of each user ’s estimator . Define ˆ X i = E [ X i | Y i ] as the local estimator for the i -th user , and we have ˆ S = P N i =1 ˆ X i , which follows the user indepen- dence assumption. As each V ar [ X i ] is a constant, each local optimization problem can be reformulated as: min E i ( q i ) ≡ max V ar ( ˆ X i ) , s.t. (9) . s.t. q i ∈ T f i , ∀ i ∈ 1 , 2 , ..N . (15) Which means, the optimal solutions are at the maximum of the variance of the estimator , subject to the LIP constraints. 3.3.2 Problem f or mulation f or uncer tain pr ior Next, we consider the case where each user has uncertainty on θ i RG / θ i X G . Note that under the context-aware setting, it is assumed that the curator/adversary possesses the exact prior distribution. Such scenarios exist when users possess less information about the data and secr ets. For example, the curator has recor ded a full history of users’ previously released data in the server such that the curator can infer each user ’s prior . Another example is the curator can esti- mate a global prior for all the users by observing each user ’s released data. The third example might be, the user is highly correlated with someone (such as family members or close friends) whose data has been collected or compromised. The user ’s prior then can be inferred by the curator via the correlations. In the uncertain prior model, the exact prior θ i X is not available for each user , so the prior-dependent utility function defined in (13) can not be calculated either . In such case, for each user , the local MSE function is determined by his/her perturbation parameters as well as the exact prior distribution, i.e., E i ( q i ) in (13) becomes E i ( θ i X , q i ) . A feasible minimax strategy for each user is to find the maximized E i ( ˜ θ i X , q i ) achieved by a prior of ˜ θ i X ∈ P i X G and find q i ∗ which minimizes E i ( q i | ˜ θ i X ) . Thus the problem for the i -th user becomes: min q i max ˜ θ i X G ∈P i X G E i ( ˜ θ i X G , q i ) , s.t. q i ∈ T u i . (16) Note that the feasible region T u i in (16) is differ ent from T f i . It uses BP-LIP’s definition, i.e., LIP must be sat- isfied for a family of priors. The utility function in Eq.(16): E i ( ˜ θ i X , q i ) = V ar ( X i ) − V ar ( ˆ X bp i ) , where ˆ X bp i is the optimal estimator at the curator . As V ar ( X i ) depends only on the exact prior of θ i X , the goal of each user is still to maximize V ar ( ˆ X bp i ) . Thus Eq.(16) can be further expressed as: max q i ∈T u i min ˜ θ i X G ∈P i X G V ar [ ˆ X bp i ( ˜ θ i X G , q i )] . (17) 4 M E C H A N I S M D E S I G N A N D U T I L I T Y - P R I VAC Y T R A D E O FF In this Section, we study the utility-privacy tradeoffs under LIP framework. W e start with the generalized RR mecha- nism for the model with a fixed prior . Then, we extend to the model with uncertain prior . After that, we study mechanisms with local hash and unary encoding, followed by a comparison to LDP based mechanisms. Finally , we show how LIP based mechanisms can be applied in r eal- world applications. 4.1 Optimal Mechanism for Fixed Prior In general, the closed-form optimal solution for the con- strained optimization problem of (15) cannot be directly derived. As the number of linear constraints is quadratically proportional to the dimensions of X i and G i . Also, the valid constraints depend on the concrete prior and correlation. W e numerically present the results and show the properties of the general model in Sec. 5. W e next study some useful properties of the problem in (15). For the privacy constraints, note that: e −  ≤ min x ∈X q i xy P x ∈X q i xy P i x ≤ P x ∈X q i xy t i gx P x ∈X q i xy P i x ≤ max x ∈X q i xy P x ∈X q i xy P i x ≤ e  , (18) which means when Y i is released satisfying  -LIP with respect to X i , the privacy metric in (9) is satisfied automat- ically . As a result, protecting the privacy of a latent variable rather than the input data enlarges the feasible region of the perturbation parameters, and hence, an increased utility can be achieved. W e next show that, under some conditions, the privacy requir ements are met without introducing noise. Proposition 1. For the constrained optimization pr oblem defined in (15) , if for some a ∈ X , max ( max g ∈G t i g a P i a , P i a min g ∈G t i g a ) ≤ e  , (19) the optimal q i ∗ ma = 0 and q i ∗ aa = 1 , ∀ m 6 = a . Proof. Suppose for some a ∈ X , q i aa = 1 and q i ma = 0 , based on Eq.(9), ∀ g ∈ G , there is e −  ≤ t i ga P i a ≤ e  . On the contrary , if this condition is satisfied, to maximize utility , the mechanism decreases q i ma while increases q i mm . In an extreme case, q i ∗ ma = 0 and q i aa = 1 . Which means if X i = a , the mechanism directly releases Y i = X i ; It is straightforward to extend the result in propo- sition 1 to: if ∀ x ∈ X , max  max g ∈G t i gx P i x , P i x min g ∈G t i gx  ≤ e  , then the mechanism directly releases Y i = X i . Notice that, ∀ x ∈ X and ∀ g ∈ G , the bounded ratio in (19) equals to 1 when X i and G i are independent, which means directly releasing X i leaks no information about G i . If the ratio is bounded close to 1 , and the closeness is bounded by [ e −  , e  ], directly releasing X i also does not violate LIP . 4.1.1 Optimal RR Mechanism under Binar y Model Next, we derive closed-form optimal solutions for the model with binary input/output. The input is arbitrarily correlated with a binary latent variable. Denote B as the binary domain of { 0 , 1 } . The binary model is widely used for survey , where each individual’s data is first mapped to one bit, then randomly perturbed before publishing to the curator . In the binary model, G = R = X = B (shown in Fig. 4(a), we omit R i for simplicity as θ i X G can be calculated JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 9 X i Y i 0 1 0 1 1 − P i 1 P i 1 q 1 q 0 1 − q 1 1 − q 0 G i 0 1 T i 01 T i 10 T i 11 T i 00 (a) Mechanism for binary model with latent variable X i Y i a 1 a 2 a d a 1 a 2 a d … … P i 1 P i 2 P i d q 22 q i 21 q 2 d (b) RR Mechanism for M-ary model ! " 1 2 . . . ! " % Hashing Perturbation & " 1 2 1 2 . . . . . . . . . . . . (c) Mechanism of LH-LIP 0, . . . , 0,1 1 2 . . . 0, . . . , 1,0 1, . . . , 0,0 . . . . . . 0 1 & '( ) & (' ) 0, . . . , 0,0 0, . . . , 0,1 1, . . . , 1,1 . . . Encoding Perturbation 0 1 Perturb each bit independently (d) Mechanism of UE-LIP Figure 4. Different perturbation mechanisms considered in this paper . given θ i R , f i and T i GR ). V ar ( X i ) in (13) becomes P i 1 (1 − P i 1 ) . Denote the perturbation parameters as: P r ( Y i = 1 | X i = 0) = q i 0 , P r ( Y i = 0 | X i = 1) = q i 1 . Thus, the local MMSE estimator ˆ X b i ( b denotes the binary model) becomes: ˆ X b i = E [ X i | Y i ] = P i 1  q i 1 λ i 0 (1 − Y i ) + 1 − q i 1 λ i 1 Y i  , (20) where λ i 0 = P r ( Y i = 0) and λ i 1 = P r ( Y i = 1) . Then, the utility-privacy tradeoff can be formulated as: max ( q i 0 ,q i 1 ) ∈T f i V ar ( ˆ X b i ) . (21) Define t iu g 1 = max g ∈G t i g 1 ; t il g 1 = min g ∈G t i g 1 . Then the optimal q i 1 and q i 0 correspond to the following Theorem, with proof provided in Appendix C of the supplementary document. Theorem 2. The optimal q i 0 and q i 1 of the problem defined in (21) are: q i ∗ 0 = max ( 0 , t iu g 1 − P i 1 e  ( e  + 1)( t iu g 1 − P i 1 ) , P i 1 − t il g 1 e  ( e  + 1)( P i 1 − t il g 1 ) ) q i ∗ 1 = max ( 0 , 1 + t iu g 1 e  − e  − P i 1 ( e  + 1)( t iu g 1 − P i 1 ) , 1 + P i 1 e  − e  − t il g 1 ( e  + 1)( P i 1 − t il g 1 ) ) . Key insight from the binary model with latent variables is, when X is highly corr elated with G ( t iu g 1 is lar ge and t il g 1 is small), X should be privatized with more noise in order to protect G ; When X is almost independent of G ( t iu g 1 and t il g 1 are close to P i 1 ), X can be released with slight perturbation. 4.1.2 Optimal RR Mechanism under M-ar y model when each R i = G i Next, we derive the closed-form optimal solutions for the LIP based RR mechanism under M-ary model when each R i = G i , i.e., the raw data R i is private. W e use M-ary to denote that the input X i can take multiple possible value. W e start from the case where f i is a bijective or identity function, i.e., ∀ x ∈ X there is only one r ∈ R , s.t. f i ( r ) = x . W e then extend the optimal solutions to the case where f i is surjective. Note that when f i is bijective, there exists permutation in the mapping from R i to X i , then the prior of P i x equals to the prior of P i r , where r = f − 1 i ( x ) . Under an RR perturbation mechanism, the perturbation channel and corresponding parameters are shown in Fig. 4(b). Denote X = { a 1 , a 2 , ..., a d } , P r ( X i = a m ) = P i m as the prior distri- bution of X i , P r ( Y i = a k ) = λ i k as the mar ginal distribution of Y i . When f i is bijective, the privacy constraints of (9) become, ∀ m, k ∈ { 1 , 2 , ..., d } : e −  ≤ q i mk λ i k ≤ e  . (22) In the utility function of (13), V ar [ X i ] = P d m =1 a 2 m P i m − ( P d m =1 a m P i m ) 2 , and the local estimator becomes ˆ X m i = E [ X i | Y i ] = d X m =1 a m P r ( X i = a m | Y i ) = d X m =1 d X k =1 a m P r ( X i = a m | Y i = a k ) 1 i k , (23) where the superscript m denotes the M-ary model, and 1 i k is the indicator function of 1 i { Y i = a k } . Then, 1 i k can be regar ded as a binary random variable with the distribution of: P r ( 1 i k = 1) = λ i k and P r ( 1 i k = 0) = 1 − λ i k . As a result: V ar [ 1 i k ] = λ i k (1 − λ i k ) and Cov [ 1 i k , 1 i l ] = − λ i k λ i l . T aking values in (23): V ar ( ˆ X m i ) = d X m =1 d X n =1 d X k =1 a m a n q i mk q i nk V ar [ 1 i k ] + d X m =1 d X n =1 d X k =1 d X l =1; l 6 = k a m a n q i mk q i nl Cov [ 1 i k , 1 i l ] = d X m =1 d X n =1 a m a n P i m P i n d X k =1 q i mk q i nk λ i k − 1 ! . (24) So far , Eq.(15) can be further expressed as ∀ m, k ∈ 1 , 2 , ..., d : max V ar ( ˆ X m i ) , s.t. e −  ≤ λ i k q i mk ≤ e  . (25) The global optimal solutions follow the next Theorem, with detailed proof provided in Appendix D of the supple- mentary document. Theorem 3 (Optimal RR-LIP mechanism under M-ary model) . For the constrained optimization problem defined in (25) , the optimal solutions for the i -th user are: q i ∗ mm = 1 − (1 − P i m ) /e  , q i ∗ mk = P i k /e  , ∀ m, k ∈ { 1 , 2 , ..., d } , m 6 = k . The constrained optimization problem defined in (25) can be visualized in Fig. 5 (taking a binary example). The curves stand for the contour of V ar ( ˆ X m i ) . The shaded area stands for the feasible region of T f i for a fixed prior and  . The optimal solutions are found at the boundary of the feasible region, which are intersections of linear equations. From Theor em 3, when  increases, ∀ m ∈ { 1 , 2 , ..., d } , all the q i mm s ar e increasing while all the q i mk s ar e decreasing ( m 6 = k ). The value of q i mk s are proportional to P i k s, JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 10 Contour of the Utility Function 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Feasible Region of LIP Optimal Solution of LIP q i 01 q i 10 ( P i 1 e ϵ , 1 − P i 1 e ϵ ) (1 − P i 1 e ϵ ,1 − 1 − P i 1 e ϵ ) Figure 5. Illustration of the optimal solutions to the binary LIP model. 1 2 3 Y Pr ( Y | X = 1) 1 2 3 Y Pr ( Y | X = 2) 1 2 3 Y Pr ( Y | X = 3) 1 2 3 Y Pr ( Y | X = 1) 1 2 3 Y Pr ( Y | X = 2) 1 2 3 Y Pr ( Y | X = 3) ϵ = 0 ϵ = ln 2 ϵ -LIP 1 2 3 Y Pr ( Y | X = 1) 1 2 3 Y Pr ( Y | X = 2) 1 2 3 Y Pr ( Y | X = 3) 1 2 3 Y Pr ( Y | X = 1) 1 2 3 Y Pr ( Y | X = 2) 1 2 3 Y Pr ( Y | X = 3) RR- ϵ -LDP RR- 0.1 0.2 0.7 0.1 0.2 0.7 0.1 0.2 0.7 0.1 0.35 0.55 0.05 0.35 0.6 0.05 0.1 0.85 1 / 3 1 / 3 1 / 3 1 / 3 1 / 3 1 / 3 1 / 3 1 / 3 1 / 3 1 / 6 1 / 6 2 / 3 2 / 3 2/ 3 1 / 6 1 / 6 1 / 6 1 / 6 Figure 6. Illustration of the per turbation parameters under two different  s and a fixed prior . i.e., the optimal mechanism is more likely to output the values with larger priors. Note that, the optimal solutions in Theorem 3 are similar to but different from a staircase mechanism [10] defined for LDP , wherein the likelihood ratio of P r ( Y = y | X = x ) P r ( Y = y | X = x 0 ) evaluated at any x, x 0 ∈ X , y ∈ Y takes value from the set of { e  , 1 , e −  } . The similarity lies in that the maximized utility can be achieved with parameters that just meet privacy constraints. The difference is the solutions in Theor em 3 make most constraints achieve e  , but some of them take values between [ e −  , 1] . W e further illustrate the structure of the optimal mechanism thr ough the following example. Suppose X = { 1 , 2 , 3 } , for the i -th user: P 1 = 0 . 1 , P 2 = 0 . 2 , P 3 = 0 . 7 . By Theorem 3, q ∗ 11 = 1 − 0 . 9 /e  , q ∗ 22 = 1 − 0 . 8 /e  , q ∗ 33 = 1 − 0 . 3 /e  , q ∗ 21 = q ∗ 31 = 0 . 1 /e  , q ∗ 12 = q ∗ 32 = 0 . 2 /e  , q ∗ 13 = q ∗ 23 = 0 . 7 /e  . When  grows, q ∗ 11 , q ∗ 22 and q ∗ 33 also increase, which means X i is more likely to be directly published ( Y i = X i ). When  is small, as “3” has a larger prior than “1” and “2”, when X i = 1 or X i = 2 , the mechanism is more likely to output Y i = 3 to satisfy the LIP constraints by incr easing the posterior of P r ( X i = 3 | Y i ) . The perturbation parameters are illustrated in Fig. 6. W e next relax the assumption that f i is bijective and ex- tend to the case where f i is surjective. The optimal solution is provided in the following Corollary . Corollary 1. The form of the optimal perturbation parameter of q i xy when f i is surjective is identical to that shown in Theorem 3. The difference lies in the prior of X : in Theorem 3, P i m = P i r where f i ( r ) = a m ; when f i is surjective, P i m = P r : f i ( r )= a m P i r . Proof. When f i is surjective, the privacy metric in (9) can be expressed as: P x ∈X P r ( X i = x | R i = r ) q i xk λ i k = q i mk λ i k , (26) where f i ( r ) = a m . For any surjective function f i , ∀ r ∈ R , there exists only one a m ∈ X s.t., f i ( r ) = a m . There- fore, ∀ r ∈ R , the ratio of P x ∈X P r ( X i = x | R i = r ) q i xk /λ i k is bounded by [ e −  , e  ] is equivalent to ∀ x ∈ X , the ratio of q i mk /λ i k is bounded by [ e −  , e  ] . Since the utility definition and the privacy constraints are identical to the optimization problem defined in (25), their optimal solutions are in the same form. Optimal Output Range: Next, we discuss the opti- mal output domain of the RR mechanism. Denote X = { a 1 , a 2 , ..., a d } , and Y = { a 1 , a 2 , ..., a f } . The following lemma shows when d is fixed, the optimal f ∗ = d . Lemma 4. For the RR mechanism under LIP , to minimize the MSE between each X i and ˆ X i , when the input range of d is fixed, the optimal output range f ∗ is f ∗ = d . Detailed proof is shown in Appendix E of the supple- mentary document. By Lemma 4 we know that enlarge or narrow the output range cannot improve the utility under RR mechanism. 4.2 Utility-Privacy T radeoff for Bounded Priors when each R i = G i The optimal mechanism for each user with bounded priors under M-ary model depends on the concrete P i X G and therefor e, can only be derived numerically . The comparison result is shown in Sec. 5. Similar to the setting of Sec. 4.1.2, we next assume that each R i = G i , which takes value from binary domain B , while the prior comes fr om a bounded set. It is also straight- forward to assume that f i is bijective for binary model. Define the prior uncertainty as P i 1 = P r ( X i = 1) ∈ [ a, b ] , where 0 ≤ a ≤ b ≤ 1 . The optimal solutions to the problem defined in Eq.(17) correspond to the following proposition: Proposition 2. For the constrained optimization pr oblem defined in (17) with binary input/output, the optimal solutions for the i -th user are: q i ∗ 01 = b i b i − a i + e  and q i ∗ 10 = 1 − a i b i − a i + e  . Proof. From the proof of Theorem 3, for any θ i X , the maxi- mized V ar ( ˆ X bp i ) is achieved at the minimum values of q i 01 and q i 10 , which ar e found at the boundary of the privacy con- straints. They are achieved when max P i 1 ∈ [ a,b ] P r ( Y i =1) q i 01 = e  and max P i 1 ∈ [ a,b ] P r ( Y i =0) q i 10 = e  . Observe the expression of q i ∗ 01 and q i ∗ 10 , when a i = b i = P i 1 , which means the prior knowledge is certain and fixed, in this case q i ∗ 01 = P i 1 e  and q i ∗ 10 = 1 − P i 1 e  which are identical to the optimal solutions of Theorem 3; When a i = 0 , b i = 1 , we have the optimal solutions for the WC-LIP: q i ∗ 01 = q i ∗ 10 = 1 1+ e  , which is independent of prior . This result shows that the BP-LIP provides a bridge between the notions of LIP , WC-LIP (LDP) by adjusting prior uncertainty . 4.3 LIP-based Mechanisms with Encoding W e next consider other variations of LIP-based mechanisms to mitigate the impact of large input domain on data utility . JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 11 4.3.1 LIP-based Mechanism with Local Hashing The first method is LIP with Local Hashing (LH-LIP), which can be described as follows: Denote H as a universal hash function family such that each h ∈ H , maps an input data X i to X 0 i ∈ X 0 . Each user randomly selects a hash function h i from H . Then the prior distribution of X 0 i can be calculated by combining the input priors according to the hash function: P r ( X 0 = x 0 ) = X x ∈X P r ( X = x ) 1 { h i ( x )= x 0 } Each user then perturbs X 0 i by the RR mechanism and outputs Y i , then releases < Y i , h i > to the curator . The system model is depicted in Fig. 4(c). The curator , after collecting each user ’s < Y i , h i > , tries to estimate each local X i . The privacy metric of LH-LIP when R i = G i and f i is a bijective function becomes: P r ( X i = x ) P x 0 ∈X 0 P r ( X i = x | X 0 i = x 0 ) P r ( X 0 i = x 0 | Y i = y ) = P r ( X i = x ) P r ( X i = x ) P r ( X 0 i = h i ( x )) P r ( X 0 i = h ( x ) | Y i = y ) = P r ( X 0 i = h i ( x )) P r ( X 0 i = h i ( x ) | Y i = y ) , (27) Notice that, for all x ∈ X , y ∈ Y , the ratio in (27) must be bounded by [ e −  , e  ], which is equivalent to: ∀ x 0 ∈ X 0 , y ∈ Y , the ratio in (27) is bounded by [ e −  , e  ]. The MMSE estimator at the curator becomes (after observing Y i = y ): E [ X i | Y i = y , h i ] = X x ∈X x X x 0 ∈X 0 P r ( X i = x | X 0 i = x 0 , h i ) P r ( X 0 i = x 0 | Y i = y ) = X x ∈X x X x 0 ∈X 0 1 { h i ( x )= x 0 } P r ( X i = x ) P r ( X 0 i = x 0 ) P r ( X 0 i = x 0 ) q i x 0 y λ i y = X x ∈X x X x 0 ∈X 0 1 { h i ( x )= x 0 } P r ( X i = x ) q i x 0 y λ i y . (28) Then, the optimization problem for the i -the user under LH-LIP can be formulated as: max V ar { E [ X i | Y i , h i ] } s.t. e −  ≤ P r ( X 0 i = h i ( x )) P r ( X 0 i = h i ( x ) | Y i = y ) ≤ e  . (29) Proposition 3 (Optimal mechanism for LH-LIP) . For the constrained optimization problem defined in (29) , the optimal solutions are: q i ∗ x 0 x 0 = 1 − (1 − ˆ P i x 0 ) /e  , q i ∗ x 0 y = ˆ P i y /e  , where ˆ P i x 0 denotes the prior distribution of P x : h i ( x )= x 0 P i x . The results presented in Proposition 3 can be directly extended to the case when f i is surjective by similar deriva- tions to (26). Observe that the hashing phase is followed by the RR-LIP , but with a smaller input domain. It is worth noting that, although the hash function leads to collisions, but due to the fact that each hash function is deterministic, it cannot enhance the privacy measured by LIP . For data utility , collision due to hashing will cause infor- mation loss and therefor e impact utility . T ypically , for small  , the information loss due to the mechanism’s perturbation is dominant. Whereas, when |X 0 | is small, the collision in hashing dominates the information loss. In [15], authors propose Optimal Local Hashing for LDP , which finds the op- timal |X 0 | under different  s and |X | . However , in context- aware mechanisms, the perturbation parameters depend on the prior . |X 0 | only has indirect impact on q i x 0 y i in (28). As a result, there is no closed-form optimal solution for |X 0 | . W e simulate to study the optimal |X 0 | in Section. 6.1.3. 4.3.2 LIP-based Mechanism with Unar y Encoding W e next consider another variant of LIP mechanism based on Unary Encoding (UE). From [15], we know that, for histogram estimation, incorporating UE in mechanism de- sign could improve the utility-privacy tradeoff of LDP . The intuition behind this improvement is, UE maps a high- dimensional data into a binary vector , the input domain is reduced. On the other hand, in the output vector , multiple locations can be 1, therefor e, the input sensitivity is also relaxed. Next, we study LIP based UE mechanism, which can be described as follows. The UE maps each user ’s raw data R i = r into a |R| -bit binary vector with the r - th bit equals 1 and others are zeros. Note that such local operation on the raw data can be viewed as a local function. T o discriminate with other functions, denote φ ( · ) as the local function of unary encoding, and { U k i } |R| k =1 = φ i ( R i ) as the encoded vector (input data), with u |R| 1 ∈ B |R| as a vector instance. It is worth noting that |B |R| | = |R| . Specifically , U k i = u k ∈ { 0 , 1 } denotes the k -th bit of U i . The mechanism then perturbs each bit independently through a binary RR perturbation channel and releases { Y k i } |R| k =1 . W e denote q i 01 as the likelihood P r ( Y k i = 1 | U k i = 0) and q i 10 as P r ( Y k i = 0 | U k i = 1) . Note that this may not be optimal, since we assume differ ent bits are perturbed by the same channel. Then the utility function becomes: E   { U k i } |R| k =1 − E h { U k i } |R| k =1 |{ Y k i } |R| k =1 i 2  = |R| X k =1 n V ar [ U k i ] − V ar h E [ U k i | Y k i ] io , (30) where V ar [ U k i ] = P i k (1 − P i k ) is a constant, and E [ U k i | Y k i ] can be expressed as: P r ( R i = k )(1 − q i 10 ) P r ( Y k i = 1) 1 { Y k i =1 } + P r ( R i = k ) q i 10 P r ( Y k i = 0) 1 { Y k i =0 } . (31) The metric of the privacy constraints can be expressed as: P r  { Y k i } |R| k =1 = y |R| 1  P r  { Y k i } |R| k =1 = y |R| 1 |{ U k i } |R| k =1 = u |R| 1  . (32) Then, the optimization problem for the i -th user under LIP with unary encoding can be formulated as: max q i 01 ,q i 10 |R| X k =1 V ar [ E [ U k i | Y k i ]] s.t. e −  ≤ Eq. (32) ≤ e  . (33) The optimal parameters q i ∗ 01 and q i ∗ 10 for the above problem are stated in the following Theorem. The proof is provided in Appendix F of the supplementary document. JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 12 Theorem 4 (Optimal mechanism for UE-LIP) . For the con- strained optimization problem defined in (33) , the optimal solu- tions for the i -th user are: q i ∗ 01 = 1 − P i min e  − 2 P i min +1 , and q i ∗ 10 = 1 2 , where P i min = min r ∈R P i r . Observe that, each local optimal parameter q i ∗ 01 depends on the data prior and is a monotonically decreasing function of P i min ∈ [0 , 1 / |R| ] . This implies that if the user ’s prior is uniformly distributed, i.e., P i min = 1 / |R| , the parameter q i ∗ 01 achieves its minimum, and the utility can be enhanced. W e next extend Theorem 4 to consider prior uncertainty . When there exist uncertainty on θ i X , the local optimization problem for the i -th user becomes: max q i 01 ,q i 10 min θ i R ∈P i R |R| X k =1 V ar [ E [ U k i | Y k i , θ i R ]] s.t. e −  ≤ Eq. (32) ≤ e  , ∀ θ i R ∈ P i R . (34) The optimal q i ∗ 01 , q i ∗ 10 are stated in the following corollary: Corollary 2. For the constrained optimization problem de- fined in (34) , the optimal solutions for the i -th user are: q i ∗ 01 = 1 − min P i min e  − 2 min P i min +1 , and q i ∗ 10 = 1 2 , where min P i min = min θ i R ∈P i R min r ∈R P i r . 4.4 Comparison with LDP based Mechanism Firstly , we would like to compare the number of privacy constraints in LIP and LDP . The results are summarized in the following remark: Remark 1. (Complexity of LDP vs LIP) . LDP involves |Y ||X | ( |X | − 1) linear constraints, while LIP involves 2 |Y ||X | linear constraints. Therefor e, when |X | > 2 , LDP incurs more privacy constraints than LIP . Next, we compare the achievable utilities by the optimal mechanisms based on LDP and LIP . It is readily seen that the optimal mechanisms proposed in [15] also apply for the utility functions defined in this paper . The optimal param- eters are at the boundary of the privacy constraints. In par- ticular , for RR mechanism, the optimal parameters for LDP are: ¯ q i ∗ mm = e  e  + |X |− 1 , ¯ q i ∗ mk = 1 e  + |X |− 1 , ∀ m, k ∈ 1 , 2 , ..., d , m 6 = k . For LDP with Local Hash (LH-LDP), |X | is changed to |X 0 | . For LDP with Optimal Unary Encoding (OUE-LDP), ¯ q i ∗ 10 = 1 / 2 and ¯ q i ∗ 01 = 1 e  +1 . Denote E LI P ∗ i as the local MSE from collecting the i -th user ’s data under LIP constraints and E LDP ∗ i as that under LDP constraints. Comparing E LI P ∗ i with E LDP ∗ i , we have the following proposition: Proposition 4. Given an arbitrary but fixed prior distribution, ∀  ∈ R + , there is E LI P ∗ i ≤ E LDP ∗ i . Proof. Since E LI P ∗ i and E LDP ∗ i are results of the objective function evaluated at differ ent optimal solutions satisfying corresponding privacy constraints. It suffices to show that the optimal perturbation parameters of LDP are within the feasible region of LIP . As  -LDP implies  -LIP , ∀  ≥ 0 , which means all the q i s that satisfying LDP automatically satisfies LIP . Notice that the curator may take advantage of his prior knowledge to make a further estimation. Nevertheless, LDP based mechanisms suffer a decreased utility than those based on LIP because LIP also utilizes the prior knowledge for mechanism design. Also, note that the optimization problems for LIP and LDP only differ in the feasible regions formed by corresponding privacy constraints. While the feasible region of LDP is fixed for all possible priors, the feasible region of LIP reshapes when the prior changes. In particular , we compare the optimal solutions for mechanisms with UE: q i ∗ 01 − ¯ q i ∗ 01 = P i min (1 − e  ) ( e  − 2 P min + 1)( e  + 1) ≤ 0 . (35) The distance diminishes to 0 if P min = 0 (worst-case). Which means UE-LIP will always achieve better utility than OUE- LDP . The relationship also applies to BP-LIP and LDP . 4.5 Real-world Applications of LIP Next, we discuss how to apply the LIP based mechanisms described above to the following applications. (W eighted) Summation: For weighted summation, the aggregated result is S sum = P N i =1 ( c i R i + b i ) with the estimator of ˆ S sum = E [ S sum | ¯ Y ] , Given any c i and b i , the MSE becomes: E [( S sum − ˆ S sum ) 2 ] = E   N X i =1 ( c i R i + b i ) − E " N X i =1 ( c i R i + b i ) | ¯ Y #! 2   = E   N X i =1 ( c i R i + b i ) − N X i =1 E [( c i R i + b i ) | Y i ] ! 2   (36) Denote X i = f i ( R i ) = c i R i + b i , ˆ X s i = E [ X i | Y i ] ( s stands for summation), (36) becomes: = E   N X i =1 X i − N X i =1 E [ X i | Y i ] ! 2   = E   N X i =1 ( X i − ˆ X s i ) 2 + N X i,j =1 ( X i − ˆ X s i )( X j − ˆ X s j )   ( a ) = N X i =1 E [( X i − ˆ X s i ) 2 ] , (37) where ( a ) follows the independent user assumption. Note that, when users have uncertain priors, as long as the curator possesses each accurate θ i X , he is able to design each local unbiased estimator accordingly , which makes the global utility of E [( S sum − ˆ S sum ) 2 ] decomposable. So far , the utility function of weighted summation can be expressed as the form in (15). Remark 2. Each user ’ s local function for (weighted) summa- tion is f i ( R i ) = c i R i + b i . For the curator , after observing Y i = y , for RR-LIP mechanism, each optimal local estimator is ˆ X s i = E [ X i | Y i = y ] = P x ∈X q i xy P i x /λ i y ; for LH-LIP mechanism, ˆ X s i = E [ X i | Y i = y , h i ] (shown in (28) ). Note that UE-LIP as a binary encoding based method is inherently designed for frequency estimation (data value- independent), not for value r elated functions. Therefore, UE- LIP is not appropriate for summation query . JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 13 Histogram Estimation Histogram is useful to estimate or compare the popularity or frequency of some categories. W e can obtain the estimator of the histogram vector , ˆ S hist = { ˆ S 1 , ˆ S 2 , ..., ˆ S |R| } = { E [ S 1 | ¯ Y ] , E [ S 2 | ¯ Y ] , ..., E [ S |R| | ¯ Y ] } , with each entry E [ S k | ¯ Y ] : E ( N X i =1 1 { R i = a k } | ¯ Y ) = N X i =1 P r ( R i = a k | Y i ) . (38) Thus the mean square error of the estimation is |R| X k =1 E   N X i =1 { 1 { R i = a k } − E [ 1 { R i = a k } | Y i ] } ! 2   ( a ) = |R| X k =1 N X i =1 E [( { 1 { R i = a k } − E [ 1 { R i = a k } | Y i ] } ) 2 ] = |R| X k =1 N X i =1 { V ar ( 1 { R i = a k } ) − V ar ( E [ 1 { R i = a k } | Y i ]) } . (39) The (a) of (39) is because each user ’s local error is indepen- dent, and the expectation of the unbiased estimator is identi- cal to that of the estimated value. For histogram estimation, X i = f i ( R i ) = { 1 { R i = a 1 } , 1 { R i = a 2 } , ..., 1 { R i = a |R| } } (the form is identical to that of φ for unary encoding studied in Sec. 4.3.2), and (39) can be expressed as: N X i =1 { V ar ( X i ) − V ar ( E [ X i | Y i ]) } . which is identical to the form in (15). Remark 3. Each user ’ s local function for histogram estimation is: f i ( R i ) = φ ( R i ) = { 1 { R i = a 1 } , 1 { R i = a 2 } , ..., 1 { R i = a |R| } } . With RR-LIP or LH-LIP , given Y i = y , each optimal local esti- mator at the curator is: ˆ X h i = { P r ( R i = a 1 | Y i = y ) , P r ( R i = a 2 | Y i = y ) , ..., P r ( R i = a |R| | Y i = y ) } ; With UE-LIP , given { Y k i } |R| k =1 = y |R| 1 , ˆ X h i = { P r ( U 1 i = 1 | Y 1 i = y 1 ) , P r ( U 2 i = 1 | Y 2 i = y 2 ) , ..., P r ( U |R| i = 1 | Y |R| i = y |R| ) } . 5 E VA L U AT I O N In this Section, we simulate with synthetic and real data to validate our analytical results. In the first part, we validate via Monte-Carlo simulation. W e examine the impact on the utility-privacy tradeoff from the prior distribution, data correlation, and input domain. W e also consider a model where the utility is measured by Hamming distance instead of MSE. In the second part, we evaluate with real-world datasets: Gowalla (location check-ins) and Census Income (People income survey). W e evaluate utility by the square root average MSE in order to normalize the influence of user count, also to make it comparable to the absolute error . Note that doing so does not affect the optimalities in any of our optimization problems. In addition, since LIP provides a relaxed privacy guarantee than LDP , it is not easy to compare their utilities under the same privacy guarantee. Thus, we compare their optimal utilities under any given privacy budge t of  . Since for the experiments with synthetic data, each mechanism takes as input X i not R i , we directly generate X i in the following experiments. 12345 Value of -10 -8 -6 -4 -2 0 2 Log(average MSE) -LDP -LIP with 1 -LIP with 2 Figure 7. The utility-privacy tradeoff comparison among pr ior-aw are and prior-free models considering different prior distribution. 5.1 Simulation Results with Synthetic Data T o generate synthetic data, we consider 5000 users in the system. W e first randomly generate a local prior distribution θ i X for each user and sample each user ’ input data X i from θ i X . Then, for the model with prior uncertainty , each θ i X is generated for multiple times as the bounded set containing all priors, and the true prior is randomly chosen from this set. Each X i takes value from domain X (the default domain for M-ary model is X = { 0 , 1 , 2 , 3 , 4 } ). Each user possesses secret data G i which also takes value from X (it can be directly extended to the case where G i comes from a different domain than X i ). Then randomly generate correlation between X i and G i (for multiple times as the bounded set). 5.1.1 Impact of different prior distributions on utility-privacy tradeoff Firstly , we would like to demonstrate the impact of dif ferent prior distributions on the utility-privacy tradeoff. Let each user share the same prior distribution, and for each user X i = G i . W e consider two sets of priors, one is uniformly distributed θ 1 = { 0 . 2 , 0 . 2 , 0 . 2 , 0 . 2 , 0 . 2 } and the other is more skewed: θ 2 = { 0 . 025 , 0 . 025 , 0 . 025 , 0 . 025 , 0 . 9 } . In ad- dition, we also compare to  -LDP based mechanism with prior-independent estimator ˆ C [15]. This model treats X i as instance rather than random variable: ˆ C = P N i =1 Y i − N p i 1 − 2 p i , (40) where q i mm = p i = e  e  + |X |− 1 , is the optimal perturbation parameter , The context-free estimation results in an MSE of: E [( S − ˆ C ) 2 ] = V ar [ ˆ C ] = N ( |X | − 2 + e  ) ( e  − 1) 2 . (41) The comparison is shown in Fig. 7, where  ranges from 1 to 5 with a step of 0 . 5 . W e can observe that considering the prior in data perturbation and aggregation can largely improve the utility . When each θ i X = θ 1 (prior is uniformly distributed), the utility achieved by  -LIP is decreased than the case when the prior is more skewed, i.e., each θ i X = θ 2 . Intuitively , with a skewed prior , users’ inputs are highly certain, only considering prior in the estimator can already result in accurate aggregation. As the privacy constraints of LIP with both θ 1 and θ 2 are parameterized by the same  , which means a skewed prior would result in higher utility than a uniformly (or close to uniformly) distributed one under the same privacy guarantee. JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 14 02468 1 0 Value of 0 0.5 1 1.5 2 2.5 3 Square Root Average MSE RR- -LIP with gx =1 RR- -LIP with gx =0.7 RR- -LIP with gx =0.3 Figure 8. The impact of correlation between each X i and G i to the utility privacy tradeoff pro vided by LIP . 02468 1 0 Value of 0 0.5 1 1.5 2 Square Root Average MSE /2-BP-LIP -BP-LIP -LDP -Pufferfish No observation Figure 9. Utility-Privacy tradeoff comparison with bounded pr ior among different priv acy notions. 5.1.2 Utility as a function of correlation with latent variable W e next consider the model with each G i 6 = X i . W e first ex- amine the utility as a function of the correlation between X i and G i , and consider a fixed prior of X i . The correlation be- tween X i and G i is measured by the correlation coefficient ρ g x = σ xg σ x σ g , we then find the conditional probability of T i GX by fixing ρ g x to be 1 , 0 . 7 and 0 . 3 respectively (when ρ g x > 0 , a larger ρ g x implies stronger correlation between X i and G i ). Under each correlation, we derive the utility-privacy tradeoff pr ovided by RR-  -LIP based mechanism. The result is shown in Fig. 8. Observer that, str onger correlation results in decreased utility compared to weaker correlation. The reason lies in that when the corr elation is strong, more noise is needed to privatize the input data X i . When ρ g x = 1 , X i = G i , the mechanism cannot achieve zero MSE. When ρ g x = 0 . 3 , given any  ≥ 2 , the MSE is decreased to 0 . Because no noise is added to perturb X i and the correlation between X i and G i makes G i hard enough to be inferred. 5.1.3 Compar ison among different pr iv acy notions with la- tent variab le and uncer tain pr ior Next, we consider the scenario where each user ’s input data X i is correlated to G i with correlation comes from bounded set P i . W e then compare the utility provided by the following privacy notions under the RR mechanism: (a).  -LIP ( / 2 -LIP) with bounded prior; (b).  -Pufferfish privacy; (c).  -LDP; (d) with no observations on ¯ Y . Note that  -LDP provides privacy protection against the worst-case prior , including X i = G i . From the impact of correlation between X i and G i on data utility , we know that, for LDP based mechanisms, protecting G i and X i are equivalent. The utility-privacy tradeoff comparisons are shown in Fig. 9. Observe that differ ent mechanisms share the same start point because the prior distribution of each user ’s input data X i is fixed and known to the curator . Even though BP- 10 20 30 40 50 Domain Size 0 5 10 15 20 25 30 35 Square Root Average MSE RR- /2-LIP RR- -LDP OLH- -LDP OUE- -LDP UE- -LIP LH- -LIP RR- -LIP (a) Utility comparison when |X | increases from 10 to 50 ,  is fixed to be  = 1 (b) Utility comparison for LH-  -LIP with dif ferent hashing sizes given a skewed prior ,  = 1 , |X | = 20 . Figure 10. Impact of domain size on data utility and optimal hashing domain. LIP and Pufferfish privacy have larger feasible regions for perturbation parameters by considering the bounded set of correlations between X i and G i , as long as the input data is not independent of the latent variable, the mechanism needs to make X i and Y i independent in order to achieve zero privacy leakage. The utility provided by LIP incr eases faster with  than Pufferfish and LDP , because the feasible regions of pufferfish and LDP are within that of LIP . Another obser- vation is that the utility of  -LDP is not bounded between / 2 and  -LIP , as we have shown in Section 3. This is because LIP further considers the corr elation between X i and G i , while LDP considers the worst-case correlation, which could be 1 , i.e., X i = G i . Finally , we can observe the utility gain by using outputs from the privacy-preserving mechanism compared to the case when only using prior for estimation. Observe that for different  s, taking no observations results in a constant MES which equals the variance of the data. 5.1.4 Impact of Domain Size on Models Next, we compare how the data domain impacts the utility- privacy tradeoff of LIP and LDP: Consider each X i in the system has a domain size from |X | = 10 to |X | = 50 . W e then fix  = 1 and show the utilities with dif ferent input domain sizes. The goal is to compare the utility provided by RR mechanisms and encoding based mechanisms. T o this end, we also compare with other variations of LDP based mechanisms, which improve RR-LDP’s performance significantly when |X | > 3 e  + 2 ( [15]). One is LDP with Optimal Unary Encoding, and the other is LDP with Optimal Local Hashing. From [15], the optimal hashing size is |X 0 | ∗ = e  + 1 . When  = 1 , we have |X 0 | ∗ = 4 . T o make a fair comparison, we consider each X i = G i , and we compare with LH-  -LIP when |X 0 | = 4 . The utility JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 15 02468 0 0.1 0.2 0.3 0.4 0.5 Hamming Distance between X and Y RR- -LIP with P max =0.5 RR- -LDP with P max =0.5 RR- -LIP with P max =0.2 RR- -LDP with P max =0.2 (a) Comparison between LIP and LDP when util- ity is measured by Hamming distance with binary data (b) Comparison between LIP and LDP when utility is measured by Hamming distance with |X | = 5 Figure 11. Utility-pr ivacy tradeoff comparison from a rate-distor tion per- spective comparison as a factor of the input domain is shown in Fig.10(a). From Fig.10(a), we have the following insights: (1) When the correlation between X i and G i is not considered, the utility provided by RR-  -LDP is always sandwiched be- tween RR-  -LIP and RR- / 2 -LIP under any domain size. Because they share the same utility function, and the utility depends on the size of the parameters’ feasible regions. (2) When |X | is small ( |X | < 15 ), RR-LIP provides better utility than LH-LIP . (3) For large |X | , UE-LIP outperforms RR-LIP , and the gap enlarges as |X | increases. LDP based mech- anisms have similar trends. (4) UE-LIP always provides better utility than OUE-LDP , LH-LIP always outperforms OLH-LDP . The reasons are described in Section 4.4. Further , we compar e the utility pr ovided by LH-LIP with differ ent hashing sizes. W e fix  = 1 and consider |X | = 20 with a prior of [0 , 0 . 005 , 0 . 001 , ..., 0 . 095 , 0 . 1] (the increment is 0 . 005 ). W e then range |X 0 | from 1 to 20 . Given different hashing sizes, there could be multiple hash functions. When there exist more than 100 hash functions, we randomly se- lect 100 functions and calculate their corr esponding utilities. In Fig. 10(b), we show the utility comparison among LH- LIP with different hashing sizes. Observe that, under each |X 0 | , utilities varies for differ ent hash functions, because differ ent hash functions imply different prior combinations. Intuitively , when X 0 is uniformly distributed, more noise is added in perturbation than when the distribution of X 0 is skewed. Also, observe that the optimal hashing size should be ar ound 12 to 16 . However , when |X 0 | ∈ [12 , 16] , there still exist some hash functions that provide poor utilities. Such observation further confirms that the optimal hashing size cannot be determined under an arbitrary prior . 5.1.5 Compar ison between LIP and LDP for Hamming distance-based utility Next, we compare  -LIP to  -LDP when the utility is mea- sured by Hamming distance between each input X i and output Y i , i.e., Utility = − N X i =1 || Y i − X i || h , (42) where || A − B || h = 0 if A = B , || A − B || h = 1 if A 6 = B . Hamming distance is usually adopted in a rate-distortion framework, where rate measures the privacy leakage and distortion captures data utility . In [10], an optimal mecha- nism is derived under LDP constraints. W e next compare LIP and LDP under two cases: (1) Binary model with uncertain prior: when each input data X i is binary and is sampled from θ i X . Notice that θ i X can be further specified by P i 1 . It is assumed that the exact P i 1 is unknown to each user , but each of them knows that P i 1 is upper bounded by P max = max P i 1 . Then each user ’s released data Y i is generated by a RR mechanism satisfying  -BP-LIP described in Section 4.2 or RR-  -LDP . (2) When each X i takes value from X ( |X | = 5 ) with a fixed prior . The prior is assumed to be known by each user . W e con- sider two scenarios on data prior: when data is uniformly distributed or data has a skewed prior . Then each user ’s released data Y i is generated by RR-  -LIP or RR-  -LDP . The utility comparison is shown in Fig. 11. Observe that RR-  - LIP provides better utility than RR-  -LDP under each case, and when the prior is more skewed, the advantage becomes even enhanced. 5.2 Simulation with Real-world Datasets 5.2.1 Histogram Estimation with Location Check-In Dataset In this subsection, we compar e the performance of differ - ent models with the real-world dataset Gowalla, a social networking application where users share their locations by checking-in. Ther e are 6,442,892 users in this dataset. For each user , a trace of the check-in locations is recor ded. De- note the i -th user ’s location trace as ( R 1 i , R 2 i , ..., R k i ) , where the superscript denotes differ ent check-ins, and for differ ent users, k can be differ ent. W ith this dataset, we intend to estimate a histogram of users’ latest check-in location. It is assumed that the past location trace of ( R 1 i , R 2 i , ..., R k − 1 i ) has already been released, and both the users and the curator can use { R 1 i , R 2 i , ..., R k − 1 i } N i =1 to calculate a global prior of the latest check-in location. W e first divide the area into 36 × 36 districts, then map each user ’s latest check-in location, which is denoted as R k i = X i into districts. As we studied in Section 4.5, for each user , the latest check- in location is perturbed accor ding to the LIP (LDP) based mechanisms, and a random vector estimator is used for the curator to estimate the histogram. The results are shown in Fig. 12(a). Observe that the utili- ties provided by different mechanisms incr ease more slowly than the results in Section 5.1.3. This is because each X i has a JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 16 01234 Value of 0.3 0.4 0.5 0.6 Square Root Average MSE RR- -LDP RR- -LIP OUE- -LDP LH- -LDP LH- -LIP UE- -LIP (a) Utility-privacy tradeoffs for location his- togram estimation (users are i.i.d. and domain size is equivalent to |X | = 83 ). (b) Utility-privacy tradeoffs for work class ag- gregation while protecting annual income pri- vacy (Model with hidden variable, |R| = 4 and |X | = 4 ). Figure 12. Utility-privacy tradeoff comparisons using real-world data. larger domain size in this experiment, and the prior of each district is very small. Hence, increasing  has less influence on the utility than when each data value has a larger prior . Also, note that RR-LIP provides decreased utility than LH- LIP and UE-LIP when  is small. But eventually , when  increases, RR-LIP outperforms UE-LIP and LH-LIP , because q i ∗ 10 in UE-LIP is fixed to be 1 / 2 . When  increases, all 0 s in the vector tend to be directly released, but the 1 in the vector still has a one-half probability of being perturbed as 0 . Also, in LH-LIP , when  increases, the information loss at hashing affects the utility more than at perturbation. Finally , UE-LIP provides better utility than OUE-LDP , but the gap diminishes as  increases. 5.2.2 Latent V ar iable Privacy with Dataset of Annual In- come Next, we testify our analysis of the model with latent variables by simulation on a real-world dataset: “Census income” (Adult dataset), a census survey dataset in which 48842 users’ personal information is listed, including 14 attributes, such as age, work class, marriage, race, gen- der , education, and annual income, which are denoted as { R 1 i , R 2 i , ..., R 14 i } respectively . W e assume each user ’s data is published and collected independently . In the field of machine learning, the Adult dataset is usually used for predicting whether each user ’s annual income is over 50k dollars by training on all the personal information (taken as features). In this experiment, we want to aggregate users’ work classes while protecting annual incomes. In this dataset, the raw data R 2 i , work class, has a domain size of 8: { Private, Self-emp-not-inc, Self-emp-inc, Federal-gov , Local- gov , State-gov , W ithout-pay , Never-worked } . Each user ’s annual income, G i = R 14 i , also has a domain size of 8: { below 20K, 20k-30k, 30k-40k, 40k-50k, 50k-60k, 60k-70k, 70k-80k, over 80k } , W e use number 0 to 7 to stand for each of them and statistically calculate the frequency of each value to be the priors. W e then find the correlation between each user ’s work class and the annual income by deep learning (a built-in network of T ensorflow). In this exper- iment, we consider the input data X i has a smaller domain size than |R i | , i.e., f i is surjective but not bijective: let X be { Private, Self-employed, Government, Never-worked } . The prior of X i and correlation with G i can be calculated by the mapping rule. Then each user publishes his/her X i by the LIP/ LDP based mechanism with perturbation parameters numerically solved by the optimization problem defined in (15). The comparison is shown in Fig. 12(b). From Fig. 12(b), we observe that the proposed  -LIP model provides better utility than  -LDP . Compared with Monte- Carlo simulations, with this dataset, each model requir es a larger  to diminish to 0 , because the latent variable G is highly correlated with X . From the experimental results, we have the following insights: a) context-aware privacy notions provide better utility than context-free notions, and when the prior is more skewed, the advantage becomes even enhanced; b) LIP based mechanism achieves better utility than those based on LDP when using the same prior dependent estimator , the utility gain lies in measuring the prior knowledge in the privacy notion. c) When the data domain increases, the util- ity under each notion decreases. Incorporating encoding in the mechanism improves utility when  is small. d) Utilities of the models with latent variables are higher than those without because the collected data becomes less sensitive. When the correlation between X and G is weak, for some  , X can be directly published to achieve zero MSE. 6 C O N C L U S I O N In this paper , the notion of local information privacy is proposed and studied. As a context-aware privacy notion, it provides a relaxed privacy guarantee than LDP by in- troducing prior knowledge in the privacy definition while achieving increased utility . W e implement the proposed LIP notion into the data aggregation framework and derive the utility-privacy tradeoff, which minimizes the MSE between the input data and the estimation while protecting the privacy of the raw data or a private latent variable that is correlated with the input data. W e consider different scenarios on the prior availability (uncertainty) and data correlation. W e also incorporate the encoding methods into the mechanism to mitigate the influence of a large input data domain. Finally , we use synthetic and real-world data to demonstrate the impact of data prior , correlation, and data domain, and compar e the utility provided by proposed mechanisms to those based on LDP . Results show that LIP JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 17 based mechanisms pr ovide better utility than those based on LDP . R E F E R E N C E S [1] T . Maddox, “The dark side of wearables: How they’re secretly jeopardizing your security and privacy ,” 2016. [2] J. Krumm, “Inference attacks on location tracks,” in Pervasive Computing , (Berlin, Heidelberg), pp. 127–143, Springer Berlin Hei- delberg, 2007. [3] K. W eaver , “How smart meters invade individual privacy ,” 2014. [4] P . Samarati and L. Sweeney , “Protecting privacy when disclosing information: k-anonymity and its enforcement through general- ization and suppression,” tech. rep., 1998. [5] C. Dwork, “Differential privacy ,” in 33rd International Colloquium on Automata, Languages and Programming (ICALP), Part II , pp. 1–12, 2006. [6] C. Dwork, F . McSherry , and K. Nissim, “Calibrating noise to sensitivity in private data analysis,” in Third Theory of Cryptography Conference , pp. 265–284, 2006. [7] J. Abowd, “The u.s. census bureau adopts differ ential privacy ,” in 24th International Conference on Knowledge Discovery Data Mining (ACM SIGKDD), London, UK , pp. 2867–2867, 07 2018. [8] S. L. W arner , “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association , vol. 60, no. 309, pp. 63–69, 1965. [9] J. Freudiger , R. Shokri, and J.-P . Hubaux, “Evaluating the privacy risk of location-based services,” in 15th International Conference on Financial Cryptography and Data Security , FC’11, pp. 31–46, Springer-V erlag, 2012. [10] P . Kairouz, S. Oh, and P . Viswanath, “Extremal mechanisms for local differential privacy ,” in Advances in Neural Information Pro- cessing Systems 27 , pp. 2879–2887, Curran Associates, Inc., 2014. [11] S. Xiong, A. D. Sarwate, and N. B. Mandayam, “Randomized requantization with local dif ferential privacy ,” in 2016 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 2189–2193, March 2016. [12] A. D. Sarwate and L. Sankar , “A rate-disortion perspective on local differ ential privacy ,” in 2014 52nd Annual Allerton Conference on Communication, Control, and Computing , pp. 903–908, Sept 2014. [13] ´ Ulfar Erlingsson, V . Pihur , and A. Kor olova, “Rappor: Randomized aggregatable privacy-preserving ordinal response,” in 21st ACM Conference on Computer and Communications Security (CCS) , 2014. [14] J. T ang, A. Korolova, X. Bai, X. W ang, and X. W ang, “Privacy loss in apple’s implementation of differ ential privacy on MacOS 10.12,” CoRR , vol. abs/1709.02753, 2017. [15] T . W ang, J. Blocki, N. Li, and S. Jha, “Locally differentially private protocols for frequency estimation,” in 26th USENIX Security 17 , pp. 729–745, USENIX Association, 2017. [16] T .-H. H. Chan, E. Shi, and D. Song, “Optimal lower bound for differ entially private multi-party aggregation,” in the 20th Annual ECA , ESA ’12, pp. 277–288, 2012. [17] R. Bassily , K. Nissim, U. Stemmer , and A. Thakurta, “Practical locally private heavy hitters,” in Proceedings of the 31st Interna- tional Conference on Neural Information Processing Systems , NIPS’17, p. 2285–2293, Curran Associates Inc., 2017. [18] C. Huang, P . Kairouz, X. Chen, L. Sankar , and R. Rajagopal, “Context-aware generative adversarial privacy ,” in Entropy , 2017. [19] C. Perera, A. Zaslavsky , P . Christen, and D. Georgakopoulos, “Context aware computing for the internet of things: A survey ,” IEEE Communications Surveys T utorials , vol. 16, pp. 414–454, First 2014. [20] M. E. Andr ´ es, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, “Geo-indistinguishability: Differential privacy for location-based systems,” in 2013 ACM SIGSAC Conference on Com- puter Communications Security , CCS ’13, p. 901–914, 2013. [21] F . T ram ` er and Z. Huang, “Differ ential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies,” in 22Nd ACM SIGSAC Conference on Computer Communi- cations Security, CCS’15 , pp. 1286–1297, 2015. [22] D. Kifer and A. Machanavajjhala, “No free lunch in data privacy ,” in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data , SIGMOD ’11, (New Y ork, NY , USA), pp. 193– 204, ACM, 2011. [23] C. Dwork and A. Roth, “The algorithmic foundations of differ en- tial privacy ,” Foundations and T rends in Theor etical Computer Science , vol. 9, no. 3-4, pp. 211–407, 2014. [24] P . Cuff and L. Y u, “Differential privacy as a mutual information constraint,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , CCS ’16, p. 43–54, 2016. [25] S. Asoodeh, F . Alajaji, and T . Linder , “Notes on information- theoretic privacy ,” in 2014 52nd Allerton , pp. 1272–1278, Sept 2014. [26] W . W ang, L. Ying, and J. Zhang, “On the relation between iden- tifiability , differential privacy , and mutual-information privacy ,” IEEE T ransactions on Information Theory , vol. 62, pp. 5018–5029, Sept 2016. [27] W . Zhang, B. Jiang, M. Li, R. T andon, Q. Liu, and H. Li, “Aggregation-based location privacy: An information theoretic approach,” Computers & Security , vol. 97, p. 101953, 07 2020. [28] D. Kifer and A. Machanavajjhala, “A rigorous and customizable framework for privacy ,” in Proceedings of the 31st ACM SIGMOD- SIGACT -SIGAI symposium on Principles of Database Systems , pp. 77– 88, ACM, 2012. [29] B. Y ang, I. Sato, and H. Nakagawa, “Bayesian differential privacy on correlated data,” in Proceedings of the 2015 ACM SIGMOD Inter- national Conference on Management of Data , SIGMOD ’15, pp. 747– 762, 2015. [30] N. Li, W . Qardaji, D. Su, Y . Wu, and W . Y ang, “Membership privacy: A unifying framework for privacy definitions,” in Pro- ceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security , CCS ’13, pp. 889–900, 2013. [31] F . du Pin Calmon and N. Fawaz, “Privacy against statistical infer- ence,” in 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) , pp. 1401–1408, 2012. [32] C. Dwork, “Differ ential privacy: A survey of results,” in 5th International Conference on Theory and Applications of Models of Computation (T AMC) , pp. 1–19, 2008. [33] S. U. Nabar and N. Mishra, “Releasing private contingency ta- bles,” Journal of Privacy and Confidentiality , vol. 2, Sep. 2010. [34] S. P . Kasiviswanathan and A. Smith, “On the ’semantics’ of dif- ferential privacy: A bayesian formulation,” Journal of Privacy and Confidentiality , vol. 6, Jun. 2014. [35] V . Rastogi, D. Suciu, and S. Hong, “The boundary between privacy and utility in data publishing,” in Proceedings of the 33rd Interna- tional Conference on V ery Large Data Bases , VLDB ’07, p. 531–542, VLDB Endowment, 2007. [36] F . d. P . Calmon, A. Makhdoumi, M. M ´ edard, M. V aria, M. Chris- tiansen, and K. R. Duffy, “Principal inertia components and ap- plications,” IEEE T ransactions on Information Theory , vol. 63, no. 8, pp. 5011–5038, 2017. [37] Y . Cao, M. Y oshikawa, Y . Xiao, and L. Xiong, “Quantifying differ ential privacy under temporal correlations,” in 33rd IEEE International Conference on Data Engineering (ICDE) , pp. 821–832, April 2017. [38] P . Kairouz, K. A. Bonawitz, and D. Ramage, “Discrete distribu- tion estimation under local privacy ,” in International Conference on Machine Learning (ICML) , 2016. [39] Y . W ang, S. Song, and K. Chaudhuri, “Privacy-preserving analysis of correlated data,” CoRR , vol. abs/1603.03977, 2016. [40] T . Murakami and Y . Kawamoto, “Utility-optimized local differ en- tial privacy mechanisms for distribution estimation,” in Proceed- ings of the 28th USENIX Conference on Security Symposium , SEC’19, (USA), p. 1877–1894, USENIX Association, 2019. [41] B. Jiang, M. Li, and R. T andon, “Context-A ware data aggregation with localized information privacy ,” in 2018 IEEE Conference on Communications and Network Security (CNS) , May 2018. [42] I. Issa, S. Kamath, and A. B. W agner, “An operational measure of information leakage,” in 2016 Annual Conference on Information Science and Systems (CISS) , pp. 234–239, March 2016. [43] J. Lee and C. Clifton, “Differ ential identifiability ,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 1041–1049, 2012. [44] B. Jiang, M. Li, and R. T andon, “Local information privacy with bounded prior ,” in ICC 2019 - 2019 IEEE International Conference on Communications (ICC) , pp. 1–7, May 2019. [45] F . A. S. Asoodeh and T . Linder , “Privacy-aware mmse estimation,” in 2016 IEEE International Symposium on Information Theory (ISIT) , pp. 1989–1993, July 2016. [46] Z. Qin, Y . Y ang, and T . Y u, “Heavy hitter estimation over set- valued data with local differential privacy ,” in Proceedings of the 2016 ACM SIGSAC , CCS ’16, pp. 192–203, 2016. [47] T . W ang, N. Li, and S. Jha, “Locally differ entially private frequent itemset mining,” in 2018 IEEE Symposium on Security and Privacy (SP) , vol. 00, pp. 578–594. JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 18 [48] A. Papoulis and S. Pillai, Probability, random variables, and stochastic processes . McGraw-Hill, 2002. A P P E N D I X A P R O O F O F L E M M A 3 Proof. When  -LIP is satisfied, the privacy metric of DI can be expressed as: P r ( Y = y | X = x ) P r ( X = x ) P r ( Y = y | X = x 0 ) P r ( X = x 0 ) ≤ P r ( Y = y ) P r ( X = x ) e  P r ( Y = y ) P r ( X = x 0 ) e −  ≤ e 2  + D X ∞ . For the other direction, when  -DI holds, we have: P r ( Y = y | X = x ) P r ( Y = y | X = x 0 ) ≤ e  + D X ∞ . Then we have: P r ( Y = y ) = X x ∈X P r ( Y = y | X = x ) P r ( X = x ) ≤ X x ∈X e  + D X ∞ P r ( Y = y | X = x 0 ) P r ( X = x ) ≤ e  + D ∞ P r ( Y = y | X = x 0 ) . Similarly , P r ( Y = y ) ≥ e −  − D X ∞ P r ( Y = y | X = x 0 ) . Thus (  + D X ∞ ) -LIP is satisfied. A P P E N D I X B P R O O F O F T H E O R E M 1 Proof. The MMSE estimator ˆ S can be expressed as: E [ S | ¯ Y ] = E [ f ( ¯ R ) | ¯ Y ] = E [ f ( R 1 , R 2 , ..., R N ) | ¯ Y ] ( a ) = E [ f 1 ( R 1 ) | ¯ Y ] + E [ f 2 ( R 2 ) | ¯ Y ] , ..., + E [ f N ( R N ) | ¯ Y ] } ( b ) = N X i =1 E [ f i ( R i ) | Y i ] , (43) where (a) in Eq. (43) is due to the independence of R i s, and (b) is because R i is only corr elated with Y i in the output sequence. Thus, E ( S, ˆ S ) can be derived as: E ( S, ˆ S ) = E   N X i =1 { f i ( R i ) − E [ f i ( R i ) | Y i ] } ! 2   . (44) Note that, for the application of histogram, the error forms an error vector of ( S k , ˆ S k ) d k =1 . By the definition of second order norm. The mean square error of this case is: E ( S k , ˆ S k ) d k =1 = d X k =1 E   N X i =1 { f k i ( R i ) − E [ f k i ( R i ) | Y i ] } ! 2   , where f k i ( R i ) = 1 { R i = k } . W e next show that in general, the total MSE can be decomposed into the summation of local MSEs. E ( S, ˆ S ) = E   N X i =1 { f i ( R i ) − E [ f i ( R i ) | Y i ] } ! 2   = N X i =1 E [ f i ( R i ) − E [ f i ( R i ) | Y i ]] 2 − 2 N X j =1 ,l 6 = j E { ( f j ( R j ) − E [ f j ( R j ) | Y j ])( f l ( R l ) − E [ f l ( R l ) | Y l ]) } . The cross terms are 0 because ∀ j, l ∈ { 1 , ..., N } and j 6 = l : E { ( f j ( R j ) − E [ f j ( R j ) | Y j ])( f l ( R l ) − E [ f l ( R l ) | Y l ]) } ] = E [( f j ( R j ) − E [ f j ( R j ) | Y j ])] E [( f l ( R l ) − E [ f l ( R l ) | Y l ])] =[ E ( f j ( R j )) − E { E [ f j ( R j ) | Y j ] } ][ E ( f l ( R l )) − E { E [ f l ( R l ) | Y l ] } ] , where E ( f j ( R j )) − E { E [ f j ( R j ) | Y j ] } and E ( f l ( R l )) − E { E [ f l ( R l ) | Y l ] } ar e 0, because the estimator is unbiased. Thus, E ( S, ˆ S ) = P N i =1 E i ( q i ) . W e next show that the global optimal solutions (pertur- bation parameters) satisfy each local privacy constraint: Assume that for each user , the minimized E i ( q i ) = e i is achieved at q i ∗ ∈ T i , then E ( q 1 ∗ , ..., q N ∗ ) = P N i =1 e i . If for some user “k” who takes parameters q k ∈ T k , by assumption, we know that E k ( q k ) ≥ e k . Thus, k X i =1 E i ( q i ∗ ) + E k ( q k ) + N X i = k +1 E i ( q i ∗ ) ≥ N X i =1 e i . That means the minimal value of E ( q 1 , ..., q N ) , where q i ∈ T i , ∀ i ∈ [1 , N ] can be achieved if for each user , q i = q i ∗ . A P P E N D I X C P R O O F O F T H E O R E M 2 Proof. The first step is to show the minimal MSE is achieved when q 0 and q 1 are at their minimum, which can be proved by taking derivative of the MSE function with respect to q s to show that MSE is increasing with q s. The second step is to find the minimum values of q s, which are found according to the privacy constraints. T o derive the monotocity of the privacy metric with respect to q s. Define F i 1 = P r ( G i = g | Y i =1) P r ( G i = g ) , F i 2 = P r ( G i = g | Y i =0) P r ( G i = g ) which can be further expressed as F i 1 = P r ( Y i = 0 | G i = g ) P r ( Y i = 0) = (1 − q i 0 ) T i g 0 + q i 1 t i g 1 q i 1 P i 1 + (1 − q i 0 )(1 − P i 1 ) ; F i 2 = P r ( Y i = 1 | G i = g ) P r ( Y i = 1) = q i 0 T i g 0 + (1 − q i 1 ) t i g 1 (1 − q i 1 ) P i 1 + q i 0 (1 − P i 1 ) . (45) T aking derivative over q i 0 and q i 1 , we have: ∂ F i 1 ∂ q i 0 = ( t i g 1 − P i 1 ) q i 1 ( q i 1 P i 1 +(1 − q i 0 )(1 − P i 1 )) 2 , ∂ F i 1 ∂ q i 1 = ( t i g 1 − P i 1 )(1 − q i 0 ) ( q i 1 P i 1 +(1 − q i 0 )(1 − P i 1 )) 2 , ∂ F i 2 ∂ q i 0 = ( P i 1 − t i g 1 )(1 − q i 1 ) (1 − q i 1 ) P i 1 + q i 0 (1 − P i 1 ) 2 , ∂ F i 2 ∂ q i 1 = ( P i 1 − t i g 1 ) q i 0 (1 − q i 1 ) P i 1 + q i 0 (1 − P i 1 ) 2 . So we know , when t i g 1 > P i 1 , F i 1 is monotonically increasing with q i , whereas F i 2 is monotonically decreasing with q i , so the minimum q i s are achieved when F i 1 = e −  and F i 2 = e  . Solving the equations, and we get: q i 0 = JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 19 t i g 1 − P i 1 e  ( e  +1)( t i g 1 − P i 1 ) ; q i 1 = 1+ t i g 1 e  − e  − P i 1 ( e  +1)( t i g 1 − P i 1 ) ; When t i g 1 < P 1 , F i 1 is monotonically decreasing with q i , whereas F i 2 is monoton- ically increasing with q i , so the minimum q i s are achieved when F i 1 = e  and F i 2 = e −  . Solving the equation, and we get: q i 0 = P i 1 − t i g 1 e  ( e  +1)( P i 1 − t i g 1 ) ; q i 1 = 1+ P i 1 e  − e  − t i g 1 ( e  +1)( P i 1 − t i g 1 ) . The final step is to test the value of q i 0 and q i 1 as functions of t i g 1 . T aking derivative on q i s, we have that the first set of solutions are monotonically increasing with t i g 1 , and the second set of solutions are monotonically decreasing with t i g 1 . Thus, to find a pair of q 0 and q 1 satisfying T g 1 for all g ∈ G , we take the maximum of all possible values. As q s are non-negative, another candidate in the max function is 0. A P P E N D I X D P R O O F O F T H E O R E M 3 Proof. Notice that V ar [ X i ] is a non-negative constant, thus minimizing MSE is equivalent to maximize V ar [ ˆ X i ] . Step 1. Regardless of the privacy constraints: Minimized solution: Consider a set of parameters: q i min , when q i nk = λ i k , ∀ n, k ∈ 1 , 2 , 3 ...d , V ar [ ˆ X i ] = 0 . Since V ar [ ˆ X i ] ≥ 0 , thus the solution of q i nk = λ i k results in a minimal value of V ar [ ˆ X i ] . Maximized solution: Consider a set of parameters: q i max , assume that for all k = 1 , 2 ...d , q i kk = 1 and q i kl = 0 for all l 6 = k . Under this solution, λ i k = P i k and d X m =1 d X n =1 d X k =1 a m a n P i m P i n q i mk  q i nk λ i k − 1  = d X n =1 a 2 n P i n (1 − P i n ) − d X n =1 d X m 6 = n a n a m P i n P i m = V ar [ X i ] . (46) Notice that E i ≥ 0 , V ar [ X i ] ≥ V ar [ ˆ X i ] . Thus, the solution of q i kk = 1 and q i kl = 0 , ∀ k = 1 , 2 , ..., d, l 6 = k results in the maximum value of V ar ( ˆ X i ) . Next, investigate the monotonicity of the r egion between minimum and maximum: T aking derivative with respect to q i lk , ∂ V ar [ ˆ X i ] ∂ q i lk becomes 1 ( λ i k ) 2   a l λ i k 2 d X m =1 ( a m q i mk − a j λ i k ) ! − P i l d X m =1 a m q i mk ! 2   = a l q i lk  P d m 6 = l a m q i mk  (1 − P i k )( q i lk − λ i k ) λ i k . (47) From Eq. (47), we can observe that the station point of q i lk is λ i k , which we know is the minimal value and V ar [ ˆ X i ] is monotonically increasing when q i lk > λ i k ; V ar [ ˆ X i ] is mono- tonically decreasing when q i lk < λ i k . As a result, without considering the privacy constraints, the optimal solutions of each q i mn is either 0 or 1 . W e next show that the maximum value of V ar [ ˆ X i ] can only be achieved by the solutions discussed above. Now , assume that for the data value l , there is a subset of index S s.t: q i lk 6 = 1 6 = 0 , for any k ∈ S . Denote ˆ X as the estimator using q i max and ˆ X 0 as the estimator using q i max but the parameters for data value l are substituted accor ding to the subset. Regardless of the constraints, compare with the variance of V ar [ ˆ X i ] and V ar [ ˆ X 0 i ] , we have: V ar [ ˆ X i ] − V ar [ ˆ X 0 i ] = n X k =1 a 2 l P i l ( P i l P i l + P i k ) + n X k =1 a 2 k P i k ( P i k P i l + P i k ) + d X m / ∈{ 1 , 2 ,...,n } a l P i l a m P i m − 2 n X k =1 a l a k P i l P i k P i l + P i k = n X k =1 ( a l P i l − a k P i k ) 2 P i l + P i k + d X m / ∈{ 1 , 2 ,...,n } a l P i l a m P i m > 0 . (48) Thus, the form of the optimal solution is unique: for any k ∈ { 1 , 2 , ..., d } , only one of the q i kj = 1 , other q i kj = 0 . Step 2. W ith privacy constraints: As V ar [ ˆ X i ] is monotonically increasing when q i lk > λ i k ; and monotonically decreasing when q i lk < λ i k . The optimal solution (with privacy constraints) lies on the boundaries of the constraints: e −  = λ i k q i j k , or λ i k q i j k = e  (under 0 ≤ q i j k ; P d n =1 q i j n = 1; , ∀ j, k ∈ 1 , 2 , ..., d ). When one of the probabilities of q i m 1 , q i m 2 , ..., q i md , ap- proaches 1 and others approaches 0, there ar e d possible selections, and consider all the m ∈ { 1 , 2 , ..., d } there are d ! feasible solutions. W e now consider the case where q i kk s approach 1 for all k ∈ 1 , 2 , ..., d , and other q i kj s are approaching 0. For the q i kk s which approach 1, the upper bounds is valid, and for q i kj s which approach 0, the lower bounds are valid. Considering the privacy constraints, we know the upper bound of q i kk is λ i k /e −  and the lower bound of q i kj is λ i k /e  . As q i kk + P d j =1 ,j 6 = k q i kj = 1 , for all j s q i kj s are approaching boundaries simultaneously , as a result, they may not reach the boundaries at the same time. Next, discuss whether lower bounds or upper bounds are reached first. When lower bounds are reached, q i j k = λ k e  for all j 6 = k . Thus q i kk = 1 − (1 − P i k ) /e  , λ i k = P i k . W e can check whether q i kk s are in the feasible region: λ i k q i kk − e −  = e  P i k e  + P i k − 1 − e −  ≥ 0 , (49) e  − λ i k q i kk = e  − e  P i k e  + P i k − 1 ≥ 0 . (50) So, when q i kj s reach the lower bound, q i kk is still in the feasible region. It is readily seen that when q i kk reaches the upper bound, q i kj s do not satisfy the privacy constraints. A P P E N D I X E P R O O F O F L E M M A 4 Proof. As the MSE is the difference between the variance of the input data and the variance of the estimator , when d is fixed, the variance of the input data is fixed. It is equivalent to show when f 6 = d , the variance of the estimator decreases. W e know the optimal solution of the parameters of any input X i = a k are in the form of q i kk is approaching 1 while other q i kj s are approaching 0 so that each input value can be JOURNAL OF L A T E X CLASS FILES, V OL. 14, NO . 8, AUGUST 2015 20 inferred by a particular output. For example, given Y i = a k , one can probably infer that X i is also a k and the confidence increases with  . when f < d , when the d is fixed, V ar ( X ) is also fixed. denote V ar ( ˆ X i ) as the variance of the estimator with d = f and V ar ( ˆ X 0 i ) as the variance of the estimator with d > f . Recall that ˆ X i = d X j =1 d X k =1 a j P r ( X i = a j | Y i = a k ) 1 i k , (51) ˆ X 0 i = d X j =1 f X k =1 a j P r ( X i = a j | Y i = a k ) 1 i k , (52) First assume that for each j ∈ { 1 , 2 , ..., d } , k ∈ { 1 , 2 , ..., f } , the parameters of ˆ X i and ˆ X 0 i are identical. W e know that for each j ∈ { 1 , 2 , ..., d } , k ∈ { 1 , 2 , ..., f } , a j P r ( X i = a j | Y i = a k ) ≥ 0 , thus V ar ( ˆ X 0 i ) is monotonically increasing with f . Notice that the parameters of ˆ X i and ˆ X 0 i can not be identical as for at least one j , q i kj will increase for k ∈ { f + 1 , f + 2 , ..., d } , j ∈ { 1 , 2 , ..., f } . However , this will make each P r ( X i = a k | Y i = a j ) smaller , thus P r ( X i = a k | Y i = a j ) > P r ( X 0 i = a k | Y 0 i = a j ) . As a result: V ar ( X i ) > V ar ( X 0 i ) . When d < f , this case can be viewed as a special case of the general model with P i d +1 = P i d +2 = ... = P i f = 0 . Thus the optimal solutions is straightforward: q i kk = 1 − (1 − P i k ) /e  , q i kj = P i j /e  for k, j ∈ { 1 , 2 , ..., d } ; q i kj = 0 , for k ∈ { 1 , 2 , ..., d } , j ∈ { d + 1 , d + 2 , ..., f } . As a result, the optimal solution is equivalent to the case of the general model with d = f . In summary , the optimal range of output is f = d . A P P E N D I X F P R O O F O F T H E O R E M 4 The privacy constraints can be expressed as: P r  { Y k i } |R| k =1 = y |R| 1  P r  { Y k i } |R| k =1 = y |R| 1 |{ U k i } |R| k =1 = u |R| 1  = P ¯ u |R| 1 ∈B |R| P r ( { U k i } |R| k =1 = ¯ u |R| 1 ) Q |R| k =1 P r ( Y k i = y k | U k i = ¯ u k } ) Q |R| k =1 P r ( Y k i = y k | U k i = u k )) = P r  { U k i } |R| k =1 = u |R| 1  + P ¯ u |R| 1 6 = u |R| 1 P r ( { U k i } |R| k =1 = ¯ u |R| 1 ) Q |R| k =1 P r ( Y k i = y k | U k i = ¯ u k ) Q |R| k =1 P r ( Y k i = y k | U k i = u k )) . (53) Note that for any given output vector y k 1 , the product of Q |R| k =1 P r ( Y k i = y k | U k i = ¯ u k } ) and Q |R| k =1 P r ( Y k i = y k | U k i = u k )) differ in at most two bits, because different values of R results in only two bits differ ence when transferred into vector . T o this end, the privacy metric of Eq. (53) is bounded by: " P i r + (1 − P i r ) q i 01 q i 10 (1 − q i 01 )(1 − q i 10 ) , P i r + (1 − P i r )(1 − q i 01 )(1 − q i 10 ) q i 01 q i 10 # , (54) As Eq. (54) must fall in the region of [ e −  , e  ] for all r ∈ R , we have: P i min + (1 − P i min ) q i 01 q i 10 (1 − q i 01 )(1 − q i 10 ) ≥ e −  P i min + (1 − P i min ) (1 − q i 01 )(1 − q i 10 ) q i 01 q i 10 ≥ e  , (55) where P i min = min r ∈R P i r . Then, the upper bound of the ratio of (1 − q i 01 )(1 − q i 10 ) q i 01 q i 10 becomes (when e −  − P i min ≥ 0 ): (1 − q i 01 )(1 − q i 10 ) q i 01 q i 10 ≤ e  − P i min 1 − P i min . (56) The privacy constraints are just met when the inequality in Eq. (56) becomes equality . Note that there are more 0 s than 1 in any input vector { U k i } |R| k =1 , and the utility function of E   { U k i } |R| k =1 − E h { U k i } |R| k =1 |{ Y k i } |R| k =1 i 2  = |R| X k =1 n V ar [ U k i ] − V ar h E [ U k i | Y k i ] io , (57) is a linear combination of MSEs of all errors. Therefore, to minimize MSE, we first set q i 01 to be as small as possible. As a result, q i ∗ 10 = 1 2 q i ∗ 01 = 1 − P i min e  − 2 P i min +1 .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment