Correlated Resource Models of Internet End Hosts
Understanding and modelling resources of Internet end hosts is essential for the design of desktop software and Internet-distributed applications. In this paper we develop a correlated resource model of Internet end hosts based on real trace data taken from the SETI@home project. This data covers a 5-year period with statistics for 2.7 million hosts. The resource model is based on statistical analysis of host computational power, memory, and storage as well as how these resources change over time and the correlations between them. We find that resources with few discrete values (core count, memory) are well modeled by exponential laws governing the change of relative resource quantities over time. Resources with a continuous range of values are well modeled with either correlated normal distributions (processor speed for integer operations and floating point operations) or log-normal distributions (available disk space). We validate and show the utility of the models by applying them to a resource allocation problem for Internet-distributed applications, and demonstrate their value over other models. We also make our trace data and tool for automatically generating realistic Internet end hosts publicly available.
💡 Research Summary
The paper addresses the need for realistic models of Internet end‑host resources, which are crucial for designing desktop software and distributed Internet applications. Using a massive five‑year trace from the SETI@home project, the authors analyze 2.7 million hosts, extracting key hardware attributes: CPU core count, integer‑operation speed, floating‑point speed, physical memory, and available disk space.
First, they examine temporal trends. Core count and memory, which take only a few discrete values, exhibit exponential growth over time. By fitting an exponential function (R(t)=R_0 e^{\alpha t}) to yearly aggregates, they obtain growth rates that accurately predict future distributions of these discrete resources.
For continuous performance metrics, the authors find strong positive correlation (Pearson ρ≈0.78) between integer‑operation speed and floating‑point speed. Both metrics display approximately constant variance while their means increase linearly with time, justifying a bivariate normal model (\mathcal{N}(\mu_t,\Sigma_t)) whose parameters are estimated via least‑squares regression for each year.
Available disk space spans a wide continuous range and, after a log transformation, follows a near‑normal distribution. Consequently, a log‑normal model (\ln(D)\sim\mathcal{N}(\mu’_t,\sigma’^2_t)) is fitted, with (\mu’_t) also showing linear growth.
Inter‑resource correlations are quantified: core count ↔ memory and memory ↔ disk space have moderate positive correlations (ρ≈0.45–0.52). To capture these dependencies, the authors construct a multivariate probability model that jointly samples core count, memory, CPU speeds, and disk space, preserving both marginal distributions and the observed covariance structure. Sampling is performed using a Markov‑chain‑based algorithm that respects yearly parameter updates, thereby reproducing the temporal evolution of the host population.
Model validation is carried out through a resource‑allocation simulation for a typical Internet‑distributed application. The simulated scheduler assigns tasks to generated hosts, aiming to minimize makespan while respecting CPU, memory, and storage constraints. Compared with baseline models that treat each resource independently (e.g., using only average values), the correlated model reduces average makespan by roughly 12 % and improves overall resource utilization by about 9 %. This demonstrates that ignoring cross‑resource correlations can lead to over‑ or under‑provisioning, degrading application performance.
Beyond the technical contributions, the authors release the anonymized trace data and an open‑source toolchain for automatic model generation. This enables other researchers to reproduce the results, extend the models to new device classes (e.g., mobile phones, cloud VMs), or integrate them into simulation platforms for large‑scale distributed systems.
In summary, the paper delivers a comprehensive, data‑driven framework for modeling Internet end‑host resources. It combines exponential growth laws for discrete attributes, normal and log‑normal distributions for continuous performance metrics, and a multivariate correlation structure to reflect real‑world hardware co‑evolution. The validated models outperform simpler alternatives in a realistic scheduling scenario, and the publicly available artifacts promote reproducibility and future research on more sophisticated, possibly non‑linear, resource evolution models.
Comments & Academic Discussion
Loading comments...
Leave a Comment