The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.
Deep Dive into Middleware-based Database Replication: The Gaps between Theory and Practice.
The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We prop
Appears in Proceedings of the ACM SIGMOD Conference, Vancouver, Canada (June 2008)
Middleware-based Database Replication:
The Gaps Between Theory and Practice
Emmanuel Cecchet
EPFL
Lausanne, Switzerland
emmanuel.cecchet@epfl.ch
George Candea
EPFL & Aster Data Systems
Lausanne, Switzerland
george.candea@epfl.ch
Anastasia Ailamaki
EPFL & Carnegie Mellon University
Lausanne, Switzerland
anastasia.ailamaki@epfl.ch
ABSTRACT
The need for high availability and performance in data
management systems has been fueling a long running interest in
database replication from both academia and industry. However,
academic groups often attack replication problems in isolation,
overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses
opportunities for fundamental innovation. This has created over
time a gap between academic research and industrial practice.
This paper aims to characterize the gap along three axes:
performance, availability, and administration. We build on our
own experience developing and deploying replication systems in
commercial and academic settings, as well as on a large body of
prior related work. We sift through representative examples from
the last decade of open-source, academic, and commercial
database replication systems and combine this material with case
studies from real systems deployed at Fortune 500 customers. We
propose two agendas, one for academic research and one for
industrial R&D, which we believe can bridge the gap within 5-10
years. This way, we hope to both motivate and help researchers in
making the theory and practice of middleware-based database
replication more relevant to each other.
Categories and Subject Descriptors
C.2.4 [Distributed Systems]: Distributed databases; H.2.4
[Systems]: Distributed databases
General Terms
Performance, Design, Reliability.
Keywords
Middleware, database replication, practice and experience.
- INTRODUCTION
Despite Gray’s warning on the dangers of replication [18] over a
decade ago, industry and academia have continued building repli-
cation systems for databases. The reason is simply that replication
is the only tried-and-true mechanism for scaling performance and
availability of databases across a wide range of requirements.
There exist replication “solutions” for every major DBMS, from
Oracle RAC™, Streams™ and DataGuard™ to Slony-I for
Postgres, MySQL replication and cluster, and everything in-
between. The naïve observer may conclude that such variety of
replication systems indicates a solved problem; the reality,
however, is the exact opposite. Replication still falls short of
customer expectations, which explains the continued interest in
developing new approaches, resulting in a dazzling variety of
offerings.
Even the “simple” cases are challenging at large scale. We
deployed a replication system for a large travel ticket brokering
system at a Fortune-500 company faced with a workload where
95% of transactions were read-only. Still, the 5% write workload
resulted in thousands of update requests per second, which
implied that a system using 2-phase-commit, or any other form of
synchronous replication, would fail to meet customer performance
requirements (thus confirming Gray’s prediction [18]). This
tradeoff between availability and performance has long been a
hurdle to developing efficient replication techniques.
In practice, the performance/availability tradeoff can be highly
discontinuous. In the same ticket broker system mentioned above,
the difference between a 30-second and a one-minute outage
determines whether travel agents retry their requests or decide to
switch to another broker for the rest of the day (“the competition
is one click away”). Compounded across the hundreds of travel
agencies that connect to the broker system daily for hotel
bookings, airline tickets, car rentals, etc., the impact of one minute
of downtime comes close to that of a day-long outage. The
replication system needs to be mindful of the implied failover
requirements, and obtaining predictable behavior is no mean feat.
Our premise is that, by carefully observing real users’ needs and
transforming them into research goals, the community can bridge
the mismatch between existing replication systems and customers’
expectations within the coming decade. We sift through the last
decade of database replication in academic, industrial, and open-
source projects. Combining this analysis with 45 person-years of
experience building and deploying replicated database systems,
we identify the unanswered challenges of practical replication.
We find that a few “hot topics” (e.g., reliable multicast and lazy
replication [21]) attract the lion’s share of academic interest, while
other
equally
important
aspects
(e.g.,
availability
and
management) are often forgotten—this limits the imp
…(Full text truncated)…
This content is AI-processed based on ArXiv data.