Dockerization Impacts in Database Performance Benchmarking
Docker seems to be an attractive solution for cloud database benchmarking as it simplifies the setup process through pre-built images that are portable and simple to maintain. However, the usage of Docker for benchmarking is only valid if there is no effect on measurement results. Existing work has so far only focused on the performance overheads that Docker directly induces for specific applications. In this paper, we have studied indirect effects of dockerization on the results of database benchmarking. Among others, our results clearly show that containerization has a measurable and non-constant influence on measurement results and should, hence, only be used after careful analysis.
💡 Research Summary
The paper investigates whether Docker containerization, widely adopted for its convenience in setting up cloud‑based database benchmarks, introduces any hidden biases into performance measurements. While prior work has largely quantified the direct overhead of Docker—such as additional CPU scheduling, network namespace translation, and filesystem layering—the authors focus on indirect effects that arise from the interaction of Docker’s resource‑control mechanisms, storage drivers, and networking configurations with the database workload itself.
Methodology
The study uses two popular relational database engines, MySQL 5.7 and PostgreSQL 13, and subjects them to three representative benchmark suites: TPC‑C (transaction‑processing), YCSB (key‑value style), and Sysbench (mixed CPU‑I/O). All experiments run on identical bare‑metal hardware (Intel Xeon 2.4 GHz, 64 GB RAM, 1 TB SSD). For each benchmark the authors compare a native execution (process directly on the host) with a Dockerized execution. The Docker environment is systematically varied across five dimensions:
- CPU allocation – CPU‑share values (0.5, 1.0, 2.0) and explicit CPU‑set pinning.
- Memory limits – 1 GB, 2 GB, and 4 GB caps.
- I/O throttling – blkio cgroup limits.
- Network mode – host networking versus the default bridge.
- Storage driver – overlay2 versus devicemapper.
Each configuration is repeated at least five times to capture variability, and the following metrics are collected: throughput (transactions per second), average latency, CPU utilization, memory page‑fault rate, disk I/O wait, network round‑trip time, and system‑call counts inside the container. Statistical significance is assessed using t‑tests and ANOVA.
Key Findings
- CPU and Memory Constraints: Reducing CPU‑share to 0.5 or limiting memory to ≤2 GB leads to a 8 %–12 % drop in throughput and a 15 %–20 % increase in latency for CPU‑bound Sysbench workloads. The effect is less pronounced for read‑heavy TPC‑C runs but still measurable.
- Storage Driver Impact: Switching from overlay2 to devicemapper causes a dramatic increase in disk wait times for write‑intensive YCSB workloads—up to 30 % longer—resulting in a corresponding throughput loss.
- Network Mode: Using bridge networking adds an average 4 %–6 % performance penalty compared with host networking, but only when the benchmark is network‑bound (e.g., distributed YCSB).
- Non‑Constant Variability: Even with identical Docker settings, repeated runs exhibit a 3 %–7 % spread in measured values. This variability correlates with the number of concurrently running containers and the underlying cgroup scheduler’s dynamic reallocation of CPU time slices.
- Docker Version Effects: An upgrade from Docker 20.10.7 to 23.0.0 (which transitions the default cgroup driver from v1 to v2) reduces overhead for some workloads by 2 %–5 %, indicating that Docker’s own evolution can alter benchmark outcomes.
Practical Recommendations
Based on these observations, the authors propose a checklist for anyone planning to benchmark databases inside Docker:
- Use a dedicated benchmark host and align kernel and cgroup settings between native and Docker runs.
- Explicitly pin CPUs and set memory limits that match the workload’s requirements; prefer host networking and overlay2 unless a specific storage driver is being evaluated.
- Conduct at least five repetitions per configuration, report mean, standard deviation, and perform statistical tests to confirm significance.
- Document Docker Engine version, storage driver, and cgroup driver; repeat the entire experiment whenever any of these components are upgraded.
- Present results side‑by‑side with a native baseline to make the Docker‑induced bias transparent.
Conclusion
Docker provides undeniable convenience for reproducible, portable benchmark environments, but it does not guarantee unbiased performance data. The indirect effects uncovered—varying with workload type, resource limits, storage driver, and Docker version—can lead to measurement errors of up to 15 % or more. Researchers and practitioners must therefore treat Dockerized benchmarks with the same rigor as native tests, applying the outlined validation steps to ensure that conclusions about database performance remain trustworthy.
Comments & Academic Discussion
Loading comments...
Leave a Comment