Effects of component-subscription network topology on large-scale data centre performance scaling

Reading time: 6 minute
...

📝 Original Info

  • Title: Effects of component-subscription network topology on large-scale data centre performance scaling
  • ArXiv ID: 1004.0728
  • Date: 2016-11-15
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Modern large-scale date centres, such as those used for cloud computing service provision, are becoming ever-larger as the operators of those data centres seek to maximise the benefits from economies of scale. With these increases in size comes a growth in system complexity, which is usually problematic. There is an increased desire for automated "self-star" configuration, management, and failure-recovery of the data-centre infrastructure, but many traditional techniques scale much worse than linearly as the number of nodes to be managed increases. As the number of nodes in a median-sized data-centre looks set to increase by two or three orders of magnitude in coming decades, it seems reasonable to attempt to explore and understand the scaling properties of the data-centre middleware before such data-centres are constructed. In [1] we presented SPECI, a simulator that predicts aspects of large-scale data-centre middleware performance, concentrating on the influence of status changes such as policy updates or routine node failures. [...]. In [1] we used a first-approximation assumption that such subscriptions are distributed wholly at random across the data centre. In this present paper, we explore the effects of introducing more realistic constraints to the structure of the internal network of subscriptions. We contrast the original results [...] exploring the effects of making the data-centre's subscription network have a regular lattice-like structure, and also semi-random network structures resulting from parameterised network generation functions that create "small-world" and "scale-free" networks. We show that for distributed middleware topologies, the structure and distribution of tasks carried out in the data centre can significantly influence the performance overhead imposed by the middleware.

💡 Deep Analysis

Deep Dive into Effects of component-subscription network topology on large-scale data centre performance scaling.

Modern large-scale date centres, such as those used for cloud computing service provision, are becoming ever-larger as the operators of those data centres seek to maximise the benefits from economies of scale. With these increases in size comes a growth in system complexity, which is usually problematic. There is an increased desire for automated “self-star” configuration, management, and failure-recovery of the data-centre infrastructure, but many traditional techniques scale much worse than linearly as the number of nodes to be managed increases. As the number of nodes in a median-sized data-centre looks set to increase by two or three orders of magnitude in coming decades, it seems reasonable to attempt to explore and understand the scaling properties of the data-centre middleware before such data-centres are constructed. In [1] we presented SPECI, a simulator that predicts aspects of large-scale data-centre middleware performance, concentrating on the influence of status changes su

📄 Full Content

Effects of component-subscription network topology on large-scale data centre performance scaling

Ilango Sriram & Dave Cliff Department of Computer Science University of Bristol Bristol, UK {ilango, dc} @cs.bris.ac.uk

Abstract— Modern large-scale date centres, such as those used for cloud computing service provision, are becoming ever- larger as the operators of those data centres seek to maximise the benefits from economies of scale. With these increases in size comes a growth in system complexity, which is usually problematic. There is an increased desire for automated “self- star” configuration, management, and failure-recovery of the data-centre infrastructure, but many traditional techniques scale much worse than linearly as the number of nodes to be managed increases. As the number of nodes in a median-sized data-centre looks set to increase by two or three orders of magnitude in coming decades, it seems reasonable to attempt to explore and understand the scaling properties of the data- centre middleware before such data-centres are constructed. In [1] we presented SPECI, a simulator that predicts aspects of large-scale data-centre middleware performance, concentrating on the influence of status changes such as policy updates or routine node failures. The initial version of SPECI was based on the assumption (taken from our industrial sponsor, a major data-centre provider) that within the data- centre there will be components that work together and need to know the status of other components via “subscriptions” to status-updates from those components. In [1] we used a first- approximation assumption that such subscriptions are distributed wholly at random across the data centre. In this present paper, we explore the effects of introducing more realistic constraints to the structure of the internal network of subscriptions. We contrast the original results from SPECI with new results from simulations exploring the effects of making the data-centre’s subscription network have a regular lattice-like structure, and also semi-random network structures resulting from parameterised network generation functions that create “small-world” and “scale-free” networks. We show that for distributed middleware topologies, the structure and distribution of tasks carried out in the data centre can significantly influence the performance overhead imposed by the middleware. Keywords: cloud-scale data centre; normal failure; simulation; small-world networks; scale free networks I. INTRODUCTION Modern large-scale data centres, such as those used for providing cloud computing services, are becoming ever- larger as the operators of those data-centres seek to maximise the benefits from economies of scale. With these increases in size comes a growth in system complexity, which is usually problematic. The growth in complexity manifests itself in two ways. The first is that many conventional management techniques (such as those required for resource-allocation and load-balancing) that work well when controlling a relatively small number of data-centre nodes (a few hundred, say) scale much worse than linearly and hence become impracticable and unworkable when the number of nodes under control are increased by two or three orders of magnitude. The second is that the very large number of individual independent hardware components in modern data centres means that, even with very reliable components, at any one time it is reasonable to expect there always to be one or more significant component failures (so-called “normal failure”): guaranteed levels of performance and dependability must be maintained despite this normal failure; and the constancy of normal failure in any one data-centre soon leads to situations where the data-centre has a heterogeneous composition (because exact replacements for failed components cannot always be found) and where that heterogeneous composition is itself constantly changing. For these reasons, the setup and ongoing management of current and future data-centres clearly presents a number of problems that can properly be considered as issues in the engineering of complex computer systems. For an extended discussion of the issues that arise in the design of warehouse- scale data-centres, see [2]. In almost all of current engineering practice, predictive computer simulations are used to evaluate possible designs before they go into production. Simulation studies allow for the rapid exploration and evaluation of design alternatives, and can help to avoid costly mistakes. Computational Fluid Dynamics (CFD) simulations are routinely used to understand and refine the aerodynamics of designs for airplanes, ground vehicles, and structures such as buildings and bridges; and to understand and refine the hydrodynamics of water-vehicles. In microelectronics, the well-known SPICE circuit-simulation system [3] has lon

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut