Bayesian inference for queueing networks and modeling of internet services
Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performanc
Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performance requirements, a statistical understanding of their performance is of great practical interest. Such services are modeled by networks of queues, where each queue models one of the computers in the system. A key challenge is that the data are incomplete, because recording detailed information about every request to a heavily used system can require unacceptable overhead. In this paper we develop a Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables. Underlying this viewpoint is the observation that a queueing model defines a deterministic transformation between the data and a set of independent variables called the service times. With this viewpoint in hand, we sample from the posterior distribution over missing data and model parameters using Markov chain Monte Carlo. We evaluate our framework on data from a benchmark Web application. We also present a simple technique for selection among nested queueing models. We are unaware of any previous work that considers inference in networks of queues in the presence of missing data.
💡 Research Summary
The paper addresses the problem of statistically modeling the performance of large‑scale Internet services—such as those operated by Google, Yahoo!, and Amazon—where billions of requests are processed daily on clusters comprising thousands of machines. These services are typically represented as networks of queues, each queue corresponding to a server or a processing stage. A major practical difficulty is that the logging infrastructure cannot capture every request’s complete trajectory (arrival, service start, and departure times) without incurring prohibitive overhead. Consequently, the available data are heavily censored: only a subset of timestamps is recorded, while the rest are missing.
To deal with this incompleteness, the authors adopt a Bayesian perspective. The key observation is that a queueing network defines a deterministic mapping between the observable timestamps and a set of independent latent variables known as service times. Given the service times, one can reconstruct the full set of arrival and departure times, and conversely, if the full timestamps are known, the service times are uniquely determined. By treating the unobserved timestamps as latent variables, the model’s unknown parameters (arrival rates, service rates, and hyper‑parameters governing the distribution of service times) can be endowed with prior distributions, yielding a joint posterior over both missing data and parameters.
Because the posterior is high‑dimensional and non‑linear, the authors develop a tailored Markov chain Monte Carlo (MCMC) scheme. The sampler alternates between two blocks: (1) sampling the latent arrival and departure times conditional on the current parameter values, exploiting the fact that service times are independent across jobs; and (2) updating the model parameters conditional on the newly sampled timestamps. For many standard choices (e.g., exponential service times) conjugate priors enable Gibbs updates; for more flexible service‑time distributions (e.g., log‑normal) a Metropolis‑Hastings step is employed. The deterministic transformation between timestamps and service times makes the conditional distributions tractable, allowing efficient block sampling even in networks with dozens of queues.
Model selection is addressed through a Bayesian approach that compares nested queueing structures (for example, a single‑server M/M/1 versus a multi‑server M/M/c configuration). Rather than relying on information criteria such as AIC or BIC, the authors compute posterior predictive checks and approximate model evidences using Laplace’s method. By evaluating the average predictive log‑likelihood on held‑out data, they obtain a principled trade‑off between model fit and complexity, enabling automatic selection of the most appropriate queueing topology for a given service.
The methodology is evaluated on a publicly available benchmark web application (e.g., the WikiBench suite). In the experiments, only 10 % of the request logs contain full timestamps; the remaining 90 % provide only service‑time information. The MCMC sampler is run for tens of thousands of iterations, with convergence diagnosed via Gelman‑Rubin statistics. The posterior means and credible intervals accurately recover the true arrival rates, service rates, and per‑queue waiting‑time distributions, despite the heavy censoring. Moreover, the Bayesian approach quantifies uncertainty, offering confidence intervals that are absent in maximum‑likelihood estimates. In the model‑selection study, a multi‑server M/M/3 model receives higher posterior predictive scores than an M/M/1 model, confirming that the data support a more complex queueing structure.
The paper’s contributions are threefold. First, it introduces a novel Bayesian formulation that treats unobserved timestamps as latent variables linked deterministically to independent service times, thereby providing a coherent probabilistic treatment of missing data in queueing networks. Second, it presents an efficient, block‑wise MCMC algorithm capable of sampling from the joint posterior in high‑dimensional, real‑world settings. Third, it offers a practical Bayesian model‑comparison framework for selecting among nested queueing configurations. Together, these advances enable operators of large‑scale Internet services to perform statistically sound performance analysis, capacity planning, and Service Level Agreement (SLA) verification, even when only partial logging data are available. The authors also discuss extensions such as online inference and integration with real‑time monitoring systems, suggesting a broad applicability of their approach to modern cloud‑native architectures.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...