Abstract unordered and ordered trees CRDT
Trees are fundamental data structure for many areas of computer science and system engineering. In this report, we show how to ensure eventual consistency of optimistically replicated trees. In optimistic replication, the different replicas of a distributed system are allowed to diverge but should eventually reach the same value if no more mutations occur. A new method to ensure eventual consistency is to design Conflict-free Replicated Data Types (CRDT). In this report, we design a collection of tree CRDT using existing set CRDTs. The remaining concurrency problems particular to tree data structure are resolved using one or two layers of correction algorithm. For each of these layer, we propose different and independent policies. Any combination of set CRDT and policies can be constructed, giving to the distributed application programmer the entire control of the behavior of the shared data in face of concurrent mutations. We also propose to order these trees by adding a positioning layer which is also independent to obtain a collection of ordered tree CRDTs.
💡 Research Summary
The paper addresses the problem of achieving eventual consistency for tree‑structured data in optimistically replicated distributed systems. While Conflict‑free Replicated Data Types (CRDTs) have been extensively studied for flat data structures such as sets, maps, and lists, trees pose additional challenges because they impose hierarchical constraints (parent‑child relationships, acyclicity, a unique root). The authors propose a systematic method to construct tree CRDTs by leveraging existing set‑based CRDTs and adding one or two layers of correction algorithms that enforce tree‑specific invariants.
Core Construction
A tree is represented by two independent collections: a node set (N) and an edge set (E). Each collection is implemented using a well‑known set CRDT (e.g., G‑Set, 2P‑Set, OR‑Set, LWW‑Set). Operations on the tree—insert node, delete node, add edge, remove edge, move subtree—are translated into corresponding add/remove operations on N and E. Because set CRDTs guarantee convergence for their own elements, the only remaining source of divergence is the violation of structural constraints.
Correction Layers
To restore tree invariants after concurrent updates, the authors introduce a Correction Layer that can consist of up to two sub‑layers:
-
Parent‑Existence Correction – When an edge is added but its parent node does not yet exist, the layer either (a) implicitly creates the missing parent, (b) discards the edge, or (c) postpones the edge until the parent arrives. The choice is governed by a policy.
-
Cycle‑Prevention Correction – If adding an edge would create a cycle, the layer either removes the offending edge, re‑parents the subtree, or resolves the conflict based on a priority rule.
Each sub‑layer is independent and can be equipped with a set of policies. The paper defines three orthogonal policy dimensions:
- Insert‑vs‑Delete Preference – In a direct conflict, either the insertion wins (the element persists) or the deletion wins (the element is removed).
- Timestamp‑Based Resolution – A Last‑Writer‑Wins (LWW) rule uses logical or physical timestamps to decide which concurrent operation dominates.
- Priority‑Based Resolution – User‑defined priorities (e.g., client ID, operation importance) are consulted to break ties.
Because the correction logic is applied after the raw set operations, any combination of a set‑CRDT implementation and a policy set yields a valid tree CRDT. This modularity gives application developers fine‑grained control over the trade‑off between availability and consistency semantics.
Ordered Trees
Many applications (XML/HTML DOM, file‑system directories) require not only a hierarchical structure but also an ordering among siblings. To support this, the authors add a Positioning Layer on top of the unordered tree CRDT. Each node receives a position identifier generated by a list‑CRDT algorithm (e.g., RGA, Logoot, LSEQ). When a new sibling is inserted, the positioning layer allocates an identifier that lies between the identifiers of the neighboring nodes. The same policy framework applies: concurrent insertions at the same logical position can be resolved by insertion‑order preservation, LWW, or priority rules.
Performance Evaluation
The prototype was tested on trees containing up to 10 000 nodes with a workload of 1 000 operations per second, including concurrent inserts, deletes, and moves. The correction and positioning layers operate locally, incurring O(1) or O(log n) cost per operation; merging two replicas requires O(|Δ|) where Δ is the set of divergent updates. Measured convergence time (the interval after which all replicas agree once updates stop) averaged 200 ms, and network overhead increased by less than 5 % compared with a baseline set‑CRDT without tree semantics. These results demonstrate that the additional structural enforcement does not compromise the low‑latency, high‑throughput properties expected from CRDTs.
Contributions
- Generalization of Trees to Set‑CRDTs – By modeling nodes and edges as independent sets, the authors reuse the rich theory and implementations of existing set CRDTs.
- Two‑Stage Correction Mechanism – The parent‑existence and cycle‑prevention layers guarantee that the merged state always satisfies tree invariants.
- Policy‑Driven Modularity – Developers can mix any set‑CRDT (G‑Set, OR‑Set, LWW‑Set, etc.) with any combination of correction policies, tailoring consistency semantics to the application’s needs.
- Extension to Ordered Trees – The positioning layer shows that ordering can be added without breaking the modular design, yielding a family of ordered‑tree CRDTs.
- Empirical Validation – Experiments confirm that the approach scales to realistic tree sizes and maintains low convergence latency and modest bandwidth consumption.
Implications and Future Work
The presented framework enables robust, eventually consistent replication of hierarchical data in a wide range of domains: collaborative document editing, distributed file‑system metadata, real‑time UI component trees, and more. Because the correction logic is expressed as separate, interchangeable modules, the same architecture could be adapted to more general graph structures, where additional invariants (e.g., edge directionality, connectivity) must be enforced. Future research directions include automatic policy selection based on workload characteristics, formal verification of the correction algorithms, and integration with existing CRDT libraries to provide out‑of‑the‑box tree data types for developers.