$mathcal{E}text{psolute}$: Efficiently Querying Databases While Providing Differential Privacy
As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. To protect the data, various cryptographic techniques are used in outsourced database systems to ensure data privacy, while allowing efficient querying. A rich collection of attacks on such systems has emerged. Even with strong cryptography, just communication volume or access pattern is enough for an adversary to succeed. In this work we present a model for differentially private outsourced database system and a concrete construction, $\mathcal{E}\text{psolute}$, that provably conceals the aforementioned leakages, while remaining efficient and scalable. In our solution, differential privacy is preserved at the record level even against an untrusted server that controls data and queries. $\mathcal{E}\text{psolute}$ combines Oblivious RAM and differentially private sanitizers to create a generic and efficient construction. We go further and present a set of improvements to bring the solution to efficiency and practicality necessary for real-world adoption. We describe the way to parallelize the operations, minimize the amount of noise, and reduce the number of network requests, while preserving the privacy guarantees. We have run an extensive set of experiments, dozens of servers processing up to 10 million records, and compiled a detailed result analysis proving the efficiency and scalability of our solution. While providing strong security and privacy guarantees we are less than an order of magnitude slower than range query execution of a non-secure plain-text optimized RDBMS like MySQL and PostgreSQL.
💡 Research Summary
The paper addresses the long‑standing tension between efficiency and privacy in outsourced database services. While encryption and indexing have been used to protect data confidentiality, prior systems such as CryptDB, Cipherbase, StealthDB, and more recent ORAM‑based solutions (ObliDB, Opaque, Oblix) still leak side‑channel information: the access pattern (which encrypted records are accessed) and the communication volume (the exact size of the result set). These leakages enable reconstruction attacks that can recover statistical distributions or even identify individual records.
To close both gaps, the authors propose a new security model, CDP‑ODB (Computational Differentially Private Outsourced Database), and a concrete construction called 𝔈psolute. The core idea is to combine Oblivious RAM (ORAM) with a differentially private sanitization layer. All data and index structures are stored inside an ORAM, guaranteeing that the server cannot see which specific records are accessed (full AP protection). However, ORAM alone does not hide the number of records returned for a query, which reveals the communication volume (CV). To hide CV, 𝔈psolute perturbs the true result size with Laplace‑distributed noise, then pads the response with that many dummy encrypted records. The padded response is fetched through the ORAM, so the server observes only the total number of transferred blocks, which satisfies (ε,δ)‑differential privacy with respect to the entire view (access logs + traffic).
The construction is carefully engineered for practicality. First, the noise magnitude is minimized by allocating the global privacy budget across queries based on their sensitivity, reducing the number of dummy records and thus network overhead. Second, the authors introduce a parallel ORAM architecture: the storage is split among multiple ORAM instances, allowing a single query to be serviced concurrently. This requires a compositional privacy analysis to ensure that the overall DP guarantee holds when each sub‑ORAM consumes a fraction of the budget. Third, the index can be kept locally on the client, eliminating extra round‑trips; the index can also be outsourced if needed, but the default design keeps it client‑side for speed.
Security is formally defined: the server is honest‑but‑curious, controls both data and queries, and can observe encrypted records, ORAM access transcripts, and total traffic. Under CDP‑ODB, the server learns no information about individual records beyond what is allowed by the (ε,δ)‑DP guarantee, and cannot infer exact access patterns. The paper provides rigorous proofs that 𝔈psolute meets these definitions, emphasizing that DP is applied to the full adversarial view, not just the count query output.
Experimental evaluation uses both synthetic workloads and a real‑world medical dataset, scaling up to 10 million records. Benchmarks compare 𝔈psolute against (i) a naïve linear‑scan baseline (optimal security but worst performance), (ii) Shrinkwrap‑style maximal‑padding solutions, and (iii) standard plaintext RDBMSs (MySQL, PostgreSQL). Results show that 𝔈psolute is only 4–8× slower than the plaintext systems for typical range and point queries, while being 18× faster than the linear‑scan baseline. Parallelism across 8 ORAM instances yields near‑linear speed‑up, and the total network traffic is dramatically reduced thanks to the noise‑minimization strategy.
The authors release a high‑quality C++ implementation as open source, together with scripts for reproducing the experiments. This enables the community to extend 𝔈psolute to more complex query operators (joins, aggregates) and to explore alternative DP mechanisms (e.g., Gaussian noise, advanced composition). In summary, 𝔈psolute is the first system that simultaneously provides full access‑pattern obliviousness and differential‑privacy‑based communication‑volume hiding, while achieving performance that is practical for real‑world outsourced database deployments.
Comments & Academic Discussion
Loading comments...
Leave a Comment