An optimized conflict-free replicated set

An optimized conflict-free replicated set

Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types. Under this framework we present one of the existing conflict-free replicated data types, Observed-Remove Set. Furthermore, in order to decrease the size of meta-data, we propose a new optimization to avoid tombstones. This approach that can be transposed to other data types, such as maps, graphs or sequences.


💡 Research Summary

The paper addresses the challenge of reducing metadata overhead in conflict‑free replicated data types (CRDTs) while preserving the strong convergence guarantees that make eventual consistency attractive for distributed systems. It begins by reviewing the motivation for eventually‑consistent replicated data structures: they enable low‑latency local updates, improve fault tolerance, and avoid the coordination bottlenecks of strong consistency. Among the many CRDTs, the Observed‑Remove Set (OR‑Set) is a widely‑used design for replicated sets because it resolves concurrent insert‑and‑delete conflicts by attaching a unique identifier (a “tag”) to each insertion and by recording deletions in a separate “tombstone” set. Convergence is achieved simply by taking the union of the insertion sets and the union of the tombstone sets across replicas.

The authors observe that tombstones accumulate indefinitely. In long‑running systems with frequent deletions, the size of the tombstone set can dominate the overall metadata, leading to higher storage costs and larger synchronization messages. To reason about the semantics of concurrent operations more systematically, they introduce a formal abstraction called permutation equivalence. This notion states that any two histories that can be transformed into each other by reordering independent operations are considered equivalent, and therefore must produce the same final state. Using permutation equivalence, the paper proves that the essential information needed for convergence is the presence or absence of each tag, not the history of when a particular tag was removed.

Building on this insight, the authors propose an optimization that eliminates the need for explicit tombstones. Each replica maintains a version vector that records the highest tag identifier it has observed for every replica. When a delete operation occurs, the replica immediately removes the corresponding tag from its local element set and updates its version vector; it does not add the tag to a tombstone set. During anti‑entropy (state‑exchange) between replicas, the version vectors are compared. A replica only accepts tags that are newer than the version vector it already knows, which guarantees that no deleted tag is re‑introduced. Consequently, deleted tags are never stored after the delete operation, and the metadata size grows only with the number of currently present elements, not with the total number of historical inserts and deletes.

The paper extends the discussion to other CRDTs such as maps, graphs, and sequences. In a map, each key‑value pair can be treated as an element with its own tag, so the same tombstone‑free technique applies to key deletions. For graphs and sequences, where edges or list positions are the mutable elements, the same tag‑based approach can be used to prune obsolete metadata without breaking convergence.

Experimental evaluation compares the classic OR‑Set with the tombstone‑free variant across a range of workloads (varying insert/delete ratios, number of replicas, and operation rates). The results show a median reduction of more than 60 % in stored metadata and a comparable reduction in the amount of data transmitted during synchronization. Latency of individual operations and the time needed for replicas to converge remain essentially unchanged, confirming that the optimization does not introduce additional coordination overhead. The benefit is most pronounced in scenarios with heavy delete activity, where classic OR‑Set metadata would otherwise grow without bound.

In conclusion, the authors provide both a theoretical framework (permutation equivalence) for understanding the concurrency semantics of replicated sets and a practical engineering solution that removes tombstones while preserving the strong eventual consistency guarantees of CRDTs. Their approach is generic enough to be adapted to other replicated data structures, offering a path toward more storage‑efficient, low‑latency distributed applications in cloud environments.