Scaling Distributed All-Pairs Algorithms: Manage Computation and Limit Data Replication with Quorums
📝 Abstract
In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/sqrt(P)) in size, up to 50% smaller than the dual N/sqrt(P) array implementations, and significantly smaller than solutions requiring all data. Implementation evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes with 1/3rd the memory usage per process. The all-pairs problem requires all data elements to be paired with all other data elements. These all-pair problems occur in many science fields, which has led to their continued interest. Additionally, as datasets grow in size, new methods like these that can reduce memory footprints and distribute work equally across compute nodes will be demanded.
💡 Analysis
In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/sqrt(P)) in size, up to 50% smaller than the dual N/sqrt(P) array implementations, and significantly smaller than solutions requiring all data. Implementation evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes with 1/3rd the memory usage per process. The all-pairs problem requires all data elements to be paired with all other data elements. These all-pair problems occur in many science fields, which has led to their continued interest. Additionally, as datasets grow in size, new methods like these that can reduce memory footprints and distribute work equally across compute nodes will be demanded.
📄 Content
The final publication is available at Springer via http://dx.doi.org/10.1007/978-981-10-0557-2_25
Scaling Distributed All-Pairs Algorithms:
Manage Computation and Limit Data Replication with Quorums
Cory J. Kleinheksel and Arun K. Somani
Department of Electrical and Computer Engineering
Iowa State University, Ames, Iowa 50011
cklein@iastate.edu; arun@iastate.edu
Abstract. In this paper we propose and prove that cyclic quorum sets can
efficiently manage all-pairs computations and data replication. The quorums are
O(N/√P) in size, up to 50% smaller than the dual N/√P array implementations,
and significantly smaller than solutions requiring all data. Implementation
evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes
with 1/3rd the memory usage per process.
The all-pairs problem requires all data elements to be paired with all other
data elements. These all-pair problems occur in many science fields, which has
led to their continued interest. Additionally, as datasets grow in size, new
methods like these that can reduce memory footprints and distribute work equally
across compute nodes will be demanded.
1
Introduction
In elementary schools and introductory computer science courses a popular
“handshake” problem [1] is often taught and it goes something like this: 𝑃 people
attend a party and a popular greeting is to shake hands, how many handshakes take
place? After discussion and manipulation the answer of (𝑃
2) =
𝑃(𝑃−1)
2
is derived.
This “handshake” problem is not reserved for the teaching introductory topics. In
databases this manifests as a self-join without a join condition, forcing all tuples to
interact with all other tuples. In physics, the n-body problem predicts the position and
motion of 𝑛 bodies by calculating the total forces every body has on every other body.
In biometrics applications, a similarity matrix can be formed using a set of images
compared with itself using facial recognition [2]. In metagenomics, finding a protein’s
likeness to every other protein is a crucial part of forming the complex graphs used in
protein clustering, which has led to new discoveries of protein functions [3].
1.1
Acceleration of Applications
Accelerating the execution of many of these important applications has been done using
multicore CPUs, FPGAs, GPUs, Intel’s many-core MIC, and distributed clusters. In
[4] the authors provide a generalized framework to solve these all-pair classification of
The final publication is available at Springer via http://dx.doi.org/10.1007/978-981-10-0557-2_25
algorithms and show performance improvements for biometrics and data mining
applications in a distributed system, e.g., cloud. A different approach was taken for a
bioinformatics application seeking to reconstruct gene co-expression networks. The
PCIT algorithm [5] was chosen to identify significant gene correlations. This method
was optimized for Intel’s multicore Xeon and many-core MIC [6].
Every element interacting with every other element leads to a natural result of
having all elements present in memory. The generalized framework [4] showed that
efficiently distributing all of the input data to all of the nodes prior to beginning
execution resulted in faster turnaround times than reading from the disk on demand.
The optimization of the PCIT algorithm [6] experienced needing all of the data in
memory and created a second optimization strategy with longer runtimes, but had a
minimal memory usage footprint.
1.2
Relaxing the All Elements Present Requirement
N-body problems have a natural all-pairs decomposition called atom-decomposition [7]
that is based on equal distribution of 𝑁 element responsibilities to 𝑃 parallel processes.
To address load imbalances and the need to communicate all data to all processes, the
authors proposed a method to perform force-decomposition which still requires input
data replication, but reduced it to 2 arrays of size
𝑁
√𝑃 elements per process. The authors
in [8] showed that data replication in the system can be variable (𝑐); and when 𝑐= √𝑃,
a lower bound on communication is achieved. When 𝑐= 1, their solution behaved
similar to atom-decomposition, although requiring only 2 arrays of N/P elements per
process. When 𝑐= √𝑃, their solution behaved similar to force-decomposition and still
required 2 arrays of size
𝑁
√𝑃 elements per process.
Minimizing the amount of data replication in a distributed system, while
maintaining efficient all-pairs algorithm operation, is a recurring theme in this
classification of algorithms. Quorum systems are commonly used for coordination and
mutual exclusion in distributed systems [9], [10]. Their decentralized approach and
slow quorum growth rate compared to the system size are two of the reasons that make
them a good tool in managing replicated data [11]. In 1985, quorums of size 𝑂(√𝑃)
were proven using finite projective planes [12]. Rel
This content is AI-processed based on ArXiv data.