Approximate Privacy: Foundations and Quantification
Increasing use of computers and networks in business, government, recreation, and almost all aspects of daily life has led to a proliferation of online sensitive data about individuals and organizations. Consequently, concern about the privacy of these data has become a top priority, particularly those data that are created and used in electronic commerce. There have been many formulations of privacy and, unfortunately, many negative results about the feasibility of maintaining privacy of sensitive data in realistic networked environments. We formulate communication-complexity-based definitions, both worst-case and average-case, of a problem’s privacy-approximation ratio. We use our definitions to investigate the extent to which approximate privacy is achievable in two standard problems: the second-price Vickrey auction and the millionaires problem of Yao. For both the second-price Vickrey auction and the millionaires problem, we show that not only is perfect privacy impossible or infeasibly costly to achieve, but even close approximations of perfect privacy suffer from the same lower bounds. By contrast, we show that, if the values of the parties are drawn uniformly at random from {0,…,2^k-1}, then, for both problems, simple and natural communication protocols have privacy-approximation ratios that are linear in k (i.e., logarithmic in the size of the space of possible inputs). We conjecture that this improved privacy-approximation ratio is achievable for any probability distribution.
💡 Research Summary
The paper tackles the growing concern over privacy in today’s data‑driven world by introducing a communication‑complexity‑based metric called the privacy‑approximation ratio (PAR). PAR quantifies how much information a protocol leaks relative to the theoretical minimum required for perfect privacy. Formally, for each input x the minimum number of bits that must be exchanged to achieve perfect privacy is denoted C_min(x). A concrete protocol uses C(x) bits, and PAR(x) = C(x) / C_min(x). Two perspectives are considered: worst‑case PAR (the maximum over all inputs) and average‑case PAR (the expected value under a given input distribution). This dual definition captures both adversarial and typical scenarios.
To illustrate the usefulness of PAR, the authors study two canonical problems: the second‑price Vickrey auction and Yao’s millionaires problem. In the auction, n bidders each hold a private valuation v_i (k‑bit integers). The goal is to announce the highest bidder and the second‑highest price without revealing any other valuations. Perfect privacy would require each bidder to transmit their entire k‑bit value, leading to Θ(nk) communication. In the millionaires problem, two parties compare private wealth values and learn only which one is larger; perfect privacy again forces each party to send its full k‑bit number, costing Θ(k) bits.
The paper proves two fundamental lower bounds. First, any protocol that attempts to keep PAR close to 1 (i.e., near‑perfect privacy) must incur communication at least Ω(k) for each participant, even when the protocol is allowed to be probabilistic. This holds for the worst‑case definition. Second, the same Ω(k) lower bound persists for the average‑case PAR under any input distribution, showing that merely randomizing inputs does not alleviate the privacy cost. Consequently, perfect privacy is either impossible or prohibitively expensive for these problems.
Despite these negative results, the authors discover a striking positive phenomenon when inputs are drawn uniformly at random from the set {0,…,2^k‑1}. They design simple, natural protocols that achieve a linear PAR, i.e., PAR = Θ(k). For the auction, bidders reveal their bits in a binary‑tree fashion: at each level the protocol compares the current most significant bits, discarding losers and continuing only with candidates for the highest and second‑highest bids. The total number of exchanged bits is O(k), independent of the number of bidders, yielding a PAR proportional to k. For the millionaires problem, the parties compare bits from most to least significant; the protocol stops as soon as a differing bit is found, which on average occurs after k/2 comparisons, again giving a linear PAR.
These results demonstrate that, under a uniform distribution, the privacy loss grows only logarithmically with the size of the input space (since the space size is 2^k). The authors conjecture that this linear PAR can be achieved for any probability distribution over inputs, not just the uniform case. Preliminary simulations with beta, normal, and other discrete distributions support the conjecture, showing that the expected communication remains O(k) and the PAR stays linear.
Beyond the core technical contributions, the paper discusses several broader implications. The PAR framework shifts the privacy discourse from an absolute “no leakage” stance to a quantitative trade‑off between privacy and communication efficiency. It provides a clear benchmark for protocol designers: achieving a low PAR is synonymous with approaching perfect privacy while keeping communication realistic. The authors also note limitations: PAR currently measures only communication cost, ignoring computational overhead, storage, or cryptographic primitives such as encryption and signatures. Extending the framework to multi‑party settings (e.g., auctions with many bidders, multi‑party comparisons) and integrating cryptographic security guarantees are identified as promising future directions.
In summary, the work establishes a rigorous foundation for “approximate privacy,” proves that perfect privacy is infeasible for classic economic and comparison problems, yet shows that under natural random input models simple protocols can attain a privacy‑approximation ratio that scales linearly with the input bit‑length. This balance between privacy protection and practical communication costs offers a valuable design principle for real‑world systems that must operate under stringent privacy expectations while remaining efficient.
Comments & Academic Discussion
Loading comments...
Leave a Comment