📝 Original Info
- Title: Effects of component-subscription network topology on large-scale data centre performance scaling
- ArXiv ID: 1004.0728
- Date: 2016-11-15
- Authors: Researchers from original ArXiv paper
📝 Abstract
Modern large-scale date centres, such as those used for cloud computing service provision, are becoming ever-larger as the operators of those data centres seek to maximise the benefits from economies of scale. With these increases in size comes a growth in system complexity, which is usually problematic. There is an increased desire for automated "self-star" configuration, management, and failure-recovery of the data-centre infrastructure, but many traditional techniques scale much worse than linearly as the number of nodes to be managed increases. As the number of nodes in a median-sized data-centre looks set to increase by two or three orders of magnitude in coming decades, it seems reasonable to attempt to explore and understand the scaling properties of the data-centre middleware before such data-centres are constructed. In [1] we presented SPECI, a simulator that predicts aspects of large-scale data-centre middleware performance, concentrating on the influence of status changes such as policy updates or routine node failures. [...]. In [1] we used a first-approximation assumption that such subscriptions are distributed wholly at random across the data centre. In this present paper, we explore the effects of introducing more realistic constraints to the structure of the internal network of subscriptions. We contrast the original results [...] exploring the effects of making the data-centre's subscription network have a regular lattice-like structure, and also semi-random network structures resulting from parameterised network generation functions that create "small-world" and "scale-free" networks. We show that for distributed middleware topologies, the structure and distribution of tasks carried out in the data centre can significantly influence the performance overhead imposed by the middleware.
💡 Deep Analysis
Deep Dive into Effects of component-subscription network topology on large-scale data centre performance scaling.
Modern large-scale date centres, such as those used for cloud computing service provision, are becoming ever-larger as the operators of those data centres seek to maximise the benefits from economies of scale. With these increases in size comes a growth in system complexity, which is usually problematic. There is an increased desire for automated “self-star” configuration, management, and failure-recovery of the data-centre infrastructure, but many traditional techniques scale much worse than linearly as the number of nodes to be managed increases. As the number of nodes in a median-sized data-centre looks set to increase by two or three orders of magnitude in coming decades, it seems reasonable to attempt to explore and understand the scaling properties of the data-centre middleware before such data-centres are constructed. In [1] we presented SPECI, a simulator that predicts aspects of large-scale data-centre middleware performance, concentrating on the influence of status changes su
📄 Full Content
Effects of component-subscription network topology
on large-scale data centre performance scaling
Ilango Sriram & Dave Cliff
Department of Computer Science
University of Bristol
Bristol, UK
{ilango, dc} @cs.bris.ac.uk
Abstract— Modern large-scale date centres, such as those used
for cloud computing service provision, are becoming ever-
larger as the operators of those data centres seek to maximise
the benefits from economies of scale. With these increases in
size comes a growth in system complexity, which is usually
problematic. There is an increased desire for automated “self-
star” configuration, management, and failure-recovery of the
data-centre infrastructure, but many traditional techniques
scale much worse than linearly as the number of nodes to be
managed increases. As the number of nodes in a median-sized
data-centre looks set to increase by two or three orders of
magnitude in coming decades, it seems reasonable to attempt to
explore and understand the scaling properties of the data-
centre middleware before such data-centres are constructed. In
[1] we presented SPECI, a simulator that predicts aspects of
large-scale
data-centre
middleware
performance,
concentrating on the influence of status changes such as policy
updates or routine node failures. The initial version of SPECI
was based on the assumption (taken from our industrial
sponsor, a major data-centre provider) that within the data-
centre there will be components that work together and need to
know the status of other components via “subscriptions” to
status-updates from those components. In [1] we used a first-
approximation assumption that such subscriptions are
distributed wholly at random across the data centre. In this
present paper, we explore the effects of introducing more
realistic constraints to the structure of the internal network of
subscriptions. We contrast the original results from SPECI
with new results from simulations exploring the effects of
making the data-centre’s subscription network have a regular
lattice-like structure, and also semi-random network structures
resulting from parameterised network generation functions
that create “small-world” and “scale-free” networks. We show
that for distributed middleware topologies, the structure and
distribution of tasks carried out in the data centre can
significantly influence the performance overhead imposed by
the middleware.
Keywords:
cloud-scale
data
centre;
normal
failure;
simulation; small-world networks; scale free networks
I.
INTRODUCTION
Modern large-scale data centres, such as those used for
providing cloud computing services, are becoming ever-
larger as the operators of those data-centres seek to maximise
the benefits from economies of scale. With these increases in
size comes a growth in system complexity, which is usually
problematic. The growth in complexity manifests itself in
two ways. The first is that many conventional management
techniques (such as those required for resource-allocation
and load-balancing) that work well when controlling a
relatively small number of data-centre nodes (a few hundred,
say) scale much worse than linearly and hence become
impracticable and unworkable when the number of nodes
under control are increased by two or three orders of
magnitude. The second is that the very large number of
individual independent hardware components in modern data
centres means that, even with very reliable components, at
any one time it is reasonable to expect there always to be one
or more significant component failures (so-called “normal
failure”):
guaranteed
levels
of
performance
and
dependability must be maintained despite this normal failure;
and the constancy of normal failure in any one data-centre
soon leads to situations where the data-centre has a
heterogeneous composition (because exact replacements for
failed components cannot always be found) and where that
heterogeneous composition is itself constantly changing.
For these reasons, the setup and ongoing management of
current and future data-centres clearly presents a number of
problems that can properly be considered as issues in the
engineering of complex computer systems. For an extended
discussion of the issues that arise in the design of warehouse-
scale data-centres, see [2].
In almost all of current engineering practice, predictive
computer simulations are used to evaluate possible designs
before they go into production. Simulation studies allow for
the rapid exploration and evaluation of design alternatives,
and can help to avoid costly mistakes. Computational Fluid
Dynamics (CFD) simulations are routinely used to
understand and refine the aerodynamics of designs for
airplanes, ground vehicles, and structures such as buildings
and bridges; and to understand and refine the hydrodynamics
of water-vehicles. In microelectronics, the well-known
SPICE circuit-simulation system [3] has lon
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.