Fault Tolerant Consensus Agreement Algorithm
Recently a new fault tolerant and simple mechanism was designed for solving commit consensus problem. It is based on replicated validation of messages sent between transaction participants and a special dispatcher validator manager node. This paper presents a correctness, safety proofs and performance analysis of this algorithm.
💡 Research Summary
The paper introduces a novel fault‑tolerant consensus algorithm designed to solve the commit‑consensus problem in distributed transaction processing. The core idea is to insert a special “dispatcher‑validator manager” node that coordinates the validation of messages exchanged between transaction participants and the transaction manager. The system model assumes an asynchronous network where messages may be delayed, reordered, duplicated, or lost but never corrupted, and nodes may stop and later restart.
The protocol proceeds in several phases. First, the transaction manager broadcasts a Begin message to all participants. Upon receipt, each participant locally prepares the transaction and, after successful preparation, sends a Ready message to the dispatcher. The dispatcher validates each Ready locally and forwards it to a set of validator nodes. A Ready is considered Validated once a majority of validators acknowledge it. When all Ready messages of a transaction have been validated, the dispatcher issues a Commit to all participants; otherwise it issues a Rollback. Validators also send back “Validated” acknowledgments to the dispatcher after their local checks.
Leader (dispatcher) election is performed using a two‑step process reminiscent of Raft. Initially, every validator generates a random number and broadcasts a Proposal containing that number. The node with the highest number becomes the coordinator for that election round. The coordinator then runs a roulette‑wheel selection using the random numbers received from other nodes; the winner of this selection becomes the new dispatcher and announces its status to the cluster. The paper proves that at most one coordinator can obtain a majority in a given round, guaranteeing a unique dispatcher.
Fault‑recovery scenarios are explicitly addressed. If a dispatcher fails after validating some Ready messages but before completing the transaction, the algorithm requires that all participants resend any pending Ready messages to the newly elected dispatcher. This ensures that no transaction is left in an indeterminate state. The same mechanism handles the case where the dispatcher crashes before receiving any Ready messages at all.
Correctness is argued through a formal TLA+ specification. Five definitions are presented: (1) monotonic increase of vote rounds, (2) uniqueness of coordinator in a round, (3) existence of exactly one dispatcher after election, (4) ability to elect a dispatcher despite up to ⌊N/2⌋ − 1 validator crashes (where N is the total number of validators), (5) ability to commit a transaction despite the same number of validator crashes, and (6) ability to continue committing even if the current dispatcher fails mid‑processing. These definitions collectively establish safety (no two conflicting decisions) and liveness (progress despite failures).
Performance evaluation was conducted on a five‑node cluster, each node running on a separate virtual machine. The consensus latency for a single transaction averaged 235 ms, with a minimum of 140 ms and a maximum of 313 ms. In 90 % of the runs, consensus was reached within 289 ms. The experiment also included more than 1,000 concurrent transactions, demonstrating that the system can sustain high throughput without significant degradation. A histogram of latency distribution is provided in the paper.
The authors conclude that the algorithm is simple, easy to understand, and provably correct. Its main advantage lies in the explicit handling of dispatcher failures through message replay, allowing the system to continue processing pending transactions without manual intervention. However, the paper acknowledges several open issues: the potential for a single physical node to host both dispatcher and validator roles (creating a single point of failure), the additional network overhead caused by the mandatory resend of pending Ready messages after a dispatcher change, and the lack of extensive empirical testing under more adverse conditions such as network partitions or rapid successive dispatcher turnovers. Future work is suggested to address these concerns, to explore scalability to larger clusters, and to provide a concrete implementation that can be verified beyond the TLA+ model.
Comments & Academic Discussion
Loading comments...
Leave a Comment