On the Practicality of `Practical Byzantine Fault Tolerance
Byzantine Fault Tolerant (BFT) systems are considered by the systems research community to be state of the art with regards to providing reliability in distributed systems. BFT systems provide safety and liveness guarantees with reasonable assumptions, amongst a set of nodes where at most f nodes display arbitrarily incorrect behaviors, known as Byzantine faults. Despite this, BFT systems are still rarely used in practice. In this paper we describe our experience, from an application developer’s perspective, trying to leverage the publicly available and highly-tuned PBFT middleware (by Castro and Liskov), to provide provable reliability guarantees for an electronic voting application with high security and robustness needs. We describe several obstacles we encountered and drawbacks we identified in the PBFT approach. These include some that we tackled, such as lack of support for dynamic client management and leaving state management completely up to the application. Others still remaining include the lack of robust handling of non-determinism, lack of support for web-based applications, lack of support for stronger cryptographic primitives, and others. We find that, while many of the obstacles could be overcome with a revised BFT middleware implementation that is tuned specifically for the needs of the particular application, they require significant engineering effort and time and their performance implications for the end-application are unclear. An application developer is thus unlikely to be willing to invest the time and effort to do so to leverage the BFT approach. We conclude that the research community needs to focus on the usability of BFT algorithms for real world applications, from the end-developer perspective, in addition to continuing to improve the BFT middleware performance, robustness and deployment layouts.
💡 Research Summary
The paper presents a developer‑centric case study of integrating the open‑source Practical Byzantine Fault Tolerance (PBFT) middleware into a real‑world electronic voting service. Although BFT protocols promise strong safety and liveness guarantees under the assumption that at most f out of 3f + 1 replicas are faulty, the authors find that the publicly available PBFT implementation is far from ready for production use. Their investigation uncovers several practical obstacles. First, PBFT assumes a static membership model: both clients and replicas must be known before system start‑up, and there is no protocol for dynamic client registration or deregistration. This is unacceptable for Internet‑scale services where thousands of users may join or leave at any time. The authors add a lightweight client‑registration extension that minimally perturbs the three‑phase consensus and demonstrate that the performance impact is negligible. Second, state management is left entirely to the application. PBFT expects the developer to maintain a raw memory region and to notify the library before mutating it. For a voting system that requires durable, ACID‑compliant storage, this forces either a complete re‑implementation of database semantics inside the replica or a substantial retrofit of the middleware to interface with an existing relational DBMS. Their prototype that couples PBFT replicas to MySQL shows that while functional, the throughput for real operations drops by two orders of magnitude compared with the “null‑operation” figures reported in most BFT papers. Third, handling of nondeterminism is rudimentary. Applications that generate timestamps, random numbers, or perform external I/O must log these values and replay them identically on all replicas, adding considerable engineering complexity and latency. Fourth, the cryptographic primitives used by the reference code are outdated; MAC‑based authentication replaces public‑key signatures for performance, but modern high‑security deployments demand stronger, up‑to‑date algorithms, which are not easily swapped in the existing codebase. Fifth, the middleware offers no native support for web‑centric communication patterns (HTTP/HTTPS, REST, JSON), forcing developers to build a separate gateway layer to bridge browsers or mobile clients to the PBFT protocol. The authors also note numerous low‑level issues—configuration quirks, logging formats, and network tuning—that, while seemingly minor, can trip up third‑party developers. Their empirical evaluation confirms that many of the identified drawbacks can be mitigated with a “better” or “revised” BFT middleware, but doing so requires significant engineering effort, and the performance implications of such changes remain uncertain. The paper concludes that the research community must shift focus from pure algorithmic efficiency to usability: providing dynamic membership APIs, abstracted state management, robust nondeterminism handling, modern cryptography, and seamless web integration. Only by addressing these practical concerns can Byzantine Fault Tolerant systems move from academic prototypes to widely deployed, reliable services.
Comments & Academic Discussion
Loading comments...
Leave a Comment