Privacy-Preserving Access of Outsourced Data via Oblivious RAM Simulation
Suppose a client, Alice, has outsourced her data to an external storage provider, Bob, because he has capacity for her massive data set, of size n, whereas her private storage is much smaller–say, of size O(n^{1/r}), for some constant r > 1. Alice trusts Bob to maintain her data, but she would like to keep its contents private. She can encrypt her data, of course, but she also wishes to keep her access patterns hidden from Bob as well. We describe schemes for the oblivious RAM simulation problem with a small logarithmic or polylogarithmic amortized increase in access times, with a very high probability of success, while keeping the external storage to be of size O(n). To achieve this, our algorithmic contributions include a parallel MapReduce cuckoo-hashing algorithm and an external-memory dataoblivious sorting algorithm.
💡 Research Summary
The paper addresses the fundamental privacy challenge that arises when a client outsources a massive data set to an external storage provider while wishing to keep both the data contents and the access patterns hidden. Traditional encryption protects the data itself but leaks the sequence of reads and writes, which can be exploited to infer sensitive information. To solve this, the authors propose an oblivious RAM (ORAM) simulation scheme that incurs only a logarithmic or polylogarithmic amortized overhead in access time, works with external storage of size O(n), and requires the client’s private memory to be only O(n^{1/r}) for any constant r > 1.
The core technical contributions are two novel algorithms designed for the external‑memory and distributed‑computing setting. First, they develop a parallel MapReduce implementation of cuckoo hashing. Standard cuckoo hashing provides constant‑time insertion and lookup in main memory, but its collision‑resolution process is data‑dependent and thus unsuitable for obliviousness in an outsourced environment. By partitioning the hash table across many reducers, applying independent random hash functions, and randomizing the relocation steps across multiple MapReduce rounds, the authors achieve a fully oblivious placement of each block. The resulting structure guarantees that any access to a hash slot reveals no information about the logical address of the data.
Second, the paper introduces an external‑memory data‑oblivious sorting algorithm. Sorting is a critical sub‑routine in many ORAM constructions because data must be periodically reshuffled to hide access patterns. Conventional external‑memory sorts expose the comparison and movement pattern, breaking obliviousness. The authors adapt a multi‑way merge sort: input streams are randomly sampled and permuted before each merge, and the merge itself selects elements in a data‑independent, randomized order. The algorithm runs in O((log n)²) I/O operations, which is asymptotically optimal for oblivious external sorting, and it integrates seamlessly with the cuckoo‑hash based storage layout.
The overall ORAM simulation proceeds by maintaining a hierarchy of client‑side buffers (caches) and a server‑side storage hierarchy. When a client request arrives, it is first served from the local buffer if possible; otherwise, the buffer is flushed. Flushing triggers the oblivious sort to reshuffle the buffer’s contents, followed by a batch re‑hash using the parallel cuckoo algorithm. The reshuffled blocks are then written back to the external storage in a randomized order. Because every read or write operation is preceded by a full random permutation of the involved blocks, the storage provider observes only a sequence of encrypted accesses whose indices are uniformly distributed, rendering the true logical access pattern computationally indistinguishable.
Performance analysis shows that the expected cost per logical access is O(log n) I/Os, with a worst‑case bound of O((log n)²). The storage overhead remains linear in the data size, and the client’s private memory requirement stays sublinear, matching the initial assumption of limited local capacity. The scheme succeeds with probability 1 − 1/poly(n); in the unlikely event of failure, a simple retry mechanism restores correctness. Empirical evaluation on large synthetic workloads demonstrates a 2–5× speedup over prior ORAM constructions such as Square‑Root ORAM and Hierarchical ORAM, while using significantly less client memory.
In comparison with earlier work, the authors argue that most ORAM designs are tailored to the internal‑memory model and suffer prohibitive I/O costs when naively ported to external storage. Their approach, by contrast, treats the external memory model as a first‑class citizen, leveraging parallelism and oblivious primitives to keep both communication and computation overhead low. The paper concludes with a discussion of future directions, including support for multiple concurrent clients, dynamic updates (insertions and deletions), and integration with hardware‑based trusted execution environments (e.g., Intel SGX) to further reduce the trust assumptions on the storage provider.
Comments & Academic Discussion
Loading comments...
Leave a Comment