Scaling Distributed All-Pairs Algorithms: Manage Computation and Limit Data Replication with Quorums

Reading time: 5 minute
...

📝 Abstract

In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/sqrt(P)) in size, up to 50% smaller than the dual N/sqrt(P) array implementations, and significantly smaller than solutions requiring all data. Implementation evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes with 1/3rd the memory usage per process. The all-pairs problem requires all data elements to be paired with all other data elements. These all-pair problems occur in many science fields, which has led to their continued interest. Additionally, as datasets grow in size, new methods like these that can reduce memory footprints and distribute work equally across compute nodes will be demanded.

💡 Analysis

In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/sqrt(P)) in size, up to 50% smaller than the dual N/sqrt(P) array implementations, and significantly smaller than solutions requiring all data. Implementation evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes with 1/3rd the memory usage per process. The all-pairs problem requires all data elements to be paired with all other data elements. These all-pair problems occur in many science fields, which has led to their continued interest. Additionally, as datasets grow in size, new methods like these that can reduce memory footprints and distribute work equally across compute nodes will be demanded.

📄 Content

The final publication is available at Springer via http://dx.doi.org/10.1007/978-981-10-0557-2_25
Scaling Distributed All-Pairs Algorithms: Manage Computation and Limit Data Replication with Quorums Cory J. Kleinheksel and Arun K. Somani Department of Electrical and Computer Engineering Iowa State University, Ames, Iowa 50011 cklein@iastate.edu; arun@iastate.edu Abstract. In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/√P) in size, up to 50% smaller than the dual N/√P array implementations, and significantly smaller than solutions requiring all data. Implementation evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes with 1/3rd the memory usage per process. The all-pairs problem requires all data elements to be paired with all other data elements. These all-pair problems occur in many science fields, which has led to their continued interest. Additionally, as datasets grow in size, new methods like these that can reduce memory footprints and distribute work equally across compute nodes will be demanded. 1 Introduction In elementary schools and introductory computer science courses a popular “handshake” problem [1] is often taught and it goes something like this: 𝑃 people attend a party and a popular greeting is to shake hands, how many handshakes take place? After discussion and manipulation the answer of (𝑃 2) = 𝑃(𝑃−1) 2 is derived.
This “handshake” problem is not reserved for the teaching introductory topics. In databases this manifests as a self-join without a join condition, forcing all tuples to interact with all other tuples. In physics, the n-body problem predicts the position and motion of 𝑛 bodies by calculating the total forces every body has on every other body.
In biometrics applications, a similarity matrix can be formed using a set of images compared with itself using facial recognition [2]. In metagenomics, finding a protein’s likeness to every other protein is a crucial part of forming the complex graphs used in protein clustering, which has led to new discoveries of protein functions [3].
1.1 Acceleration of Applications Accelerating the execution of many of these important applications has been done using multicore CPUs, FPGAs, GPUs, Intel’s many-core MIC, and distributed clusters. In [4] the authors provide a generalized framework to solve these all-pair classification of

The final publication is available at Springer via http://dx.doi.org/10.1007/978-981-10-0557-2_25
algorithms and show performance improvements for biometrics and data mining applications in a distributed system, e.g., cloud. A different approach was taken for a bioinformatics application seeking to reconstruct gene co-expression networks. The PCIT algorithm [5] was chosen to identify significant gene correlations. This method was optimized for Intel’s multicore Xeon and many-core MIC [6].
Every element interacting with every other element leads to a natural result of having all elements present in memory. The generalized framework [4] showed that efficiently distributing all of the input data to all of the nodes prior to beginning execution resulted in faster turnaround times than reading from the disk on demand.
The optimization of the PCIT algorithm [6] experienced needing all of the data in memory and created a second optimization strategy with longer runtimes, but had a minimal memory usage footprint. 1.2 Relaxing the All Elements Present Requirement N-body problems have a natural all-pairs decomposition called atom-decomposition [7] that is based on equal distribution of 𝑁 element responsibilities to 𝑃 parallel processes.
To address load imbalances and the need to communicate all data to all processes, the authors proposed a method to perform force-decomposition which still requires input data replication, but reduced it to 2 arrays of size 𝑁 √𝑃 elements per process. The authors in [8] showed that data replication in the system can be variable (𝑐); and when 𝑐= √𝑃, a lower bound on communication is achieved. When 𝑐= 1, their solution behaved similar to atom-decomposition, although requiring only 2 arrays of N/P elements per process. When 𝑐= √𝑃, their solution behaved similar to force-decomposition and still required 2 arrays of size 𝑁 √𝑃 elements per process. Minimizing the amount of data replication in a distributed system, while maintaining efficient all-pairs algorithm operation, is a recurring theme in this classification of algorithms. Quorum systems are commonly used for coordination and mutual exclusion in distributed systems [9], [10]. Their decentralized approach and slow quorum growth rate compared to the system size are two of the reasons that make them a good tool in managing replicated data [11]. In 1985, quorums of size 𝑂(√𝑃) were proven using finite projective planes [12]. Rel

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut