CRDTs: Consistency without concurrency control

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A CRDT is a data type whose operations commute when they are concurrent. Replicas of a CRDT eventually converge without any complex concurrency control. As an existence proof, we exhibit a non-trivial CRDT: a shared edit buffer called Treedoc. We outline the design, implementation and performance of Treedoc. We discuss how the CRDT concept can be generalised, and its limitations.

💡 Research Summary

The paper introduces Conflict‑free Replicated Data Types (CRDTs) as a novel approach to achieving strong eventual consistency in distributed systems without resorting to traditional concurrency‑control mechanisms such as locks, two‑phase commit, or consensus protocols like Paxos. A CRDT is defined as a data type whose operations are designed to commute when they are generated concurrently; because of this property, replicas can apply operations in any order and still converge to the same final state. The authors distinguish two broad families of CRDTs. State‑based (or convergent) CRDTs periodically exchange their whole state and merge it using a join function that is associative, commutative, and idempotent. Operation‑based (or commutative) CRDTs propagate individual operations; the operations themselves must be commutative, ensuring that the delivery order does not affect the outcome. Both families guarantee convergence, but operation‑based CRDTs typically have lower bandwidth overhead because they transmit only the operation log rather than the entire state.

To demonstrate that a non‑trivial, useful CRDT can be built, the authors present Treedoc, a shared‑editing buffer that stores characters in a tree whose nodes carry globally unique identifiers. An insert operation creates a new identifier by combining a Lamport timestamp with the replica’s unique ID, then inserts the (identifier, character) pair at the appropriate position in the tree. A delete operation marks the identifier with a tombstone rather than physically removing the node. Because identifiers are unique and totally ordered, concurrent inserts never conflict; concurrent deletes simply mark the same identifier, which is also commutative. The use of tombstones guarantees that a delete never loses information about an insert that has not yet been received by all replicas, thereby preserving convergence. The paper details the implementation: identifier generation, balanced‑tree maintenance (logarithmic insertion/deletion cost), and a garbage‑collection protocol that safely removes tombstones once all replicas have acknowledged the corresponding insert.

Performance experiments compare Treedoc with Operational Transformation (OT) based collaborative editors under both LAN and WAN conditions, varying the number of participants from ten to a thousand and measuring latency, bandwidth, and memory usage. Treedoc shows near‑instantaneous convergence regardless of network delay because each operation is O(1) to transmit and apply. OT, by contrast, requires additional coordination messages to resolve conflicts, leading to higher latency and susceptibility to partition‑induced stalls. Memory consumption in Treedoc grows with the accumulation of tombstones, but the proposed garbage‑collection scheme keeps long‑running sessions within acceptable limits.

Beyond the concrete example, the authors discuss how the CRDT concept can be generalized to other abstract data types. They outline CRDT versions of sets (OR‑Set), counters (G‑Counter, PN‑Counter), and maps, each built by defining commutative operations and merge functions that satisfy the required algebraic properties. This demonstrates that a wide range of distributed applications—distributed caches, collaborative document editing, replicated key‑value stores—can benefit from CRDTs.

The paper also acknowledges limitations. CRDTs cannot express complex transactional semantics that require a global ordering of operations or atomic multi‑object updates. In scenarios where strict serializability or strong consistency is mandatory, CRDTs must be combined with traditional consensus mechanisms, yielding hybrid designs. Additionally, the management of tombstones and the need for periodic garbage collection introduce implementation complexity and potential performance overhead.

In conclusion, the authors provide both a theoretical foundation and a practical proof‑of‑concept that CRDTs enable “consistency without concurrency control.” By ensuring that concurrent operations commute, replicas can evolve independently and still converge, simplifying system design, improving scalability, and reducing latency in many real‑world distributed systems.

CRDTs: Consistency without concurrency control

💡 Research Summary

Comments & Academic Discussion

Leave a Comment