Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach
Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications. Many of these applications are not latency-sensitive and thus can run as batch jobs in data centers. By running multiple applications on a computing host, task co-location can significantly improve the server utilization and system throughput. However, effective task co-location is a non-trivial task, as it requires an understanding of the computing resource requirement of the co-running applications, in order to determine what tasks, and how many of them, can be co-located. In this paper, we present a mixture-of-experts approach to model the memory behavior of Spark applications. We achieve this by learning, off-line, a range of specialized memory models on a range of typical applications; we then determine at runtime which of the memory models, or experts, best describes the memory behavior of the target application. We show that by accurately estimating the resource level that is needed, a co-location scheme can effectively determine how many applications can be co-located on the same host to improve the system throughput, by taking into consideration the memory and CPU requirements of co-running application tasks. Our technique is applied to a set of representative data analytic applications built upon the Apache Spark framework. We evaluated our approach for system throughput and average normalized turnaround time on a multi-core cluster. Our approach achieves over 83.9% of the performance delivered using an ideal memory predictor. We obtain, on average, 8.69x improvement on system throughput and a 49% reduction on turnaround time over executing application tasks in isolation, which translates to a 1.28x and 1.68x improvement over a state-of-the-art co-location scheme for system throughput and turnaround time respectively.
💡 Research Summary
This paper addresses the problem of improving the overall throughput of Apache Spark clusters that run batch‑oriented data‑analytic jobs. Because Spark executors allocate a fixed amount of heap memory for each application, a single job can monopolize a host’s memory even when its actual usage is far lower, preventing other jobs from being co‑located on the same machine. The authors propose a novel “memory‑aware task co‑location” scheme that first predicts the memory footprint of any Spark application for a given input size, and then uses this prediction together with measured CPU utilization to decide how many tasks can safely share a host without causing out‑of‑memory (OOM) errors or CPU contention.
The core of the approach is a Mixture‑of‑Experts (MoE) model for memory prediction. During an offline training phase the authors profile 44 representative Spark workloads (including sorting, PageRank, machine‑learning algorithms, etc.) with varying input sizes. For each workload they fit several parametric functions—linear, exponential, and logarithmic regressions—that capture the relationship between input size and memory consumption. Each fitted function becomes an “expert” specialized for a particular memory‑growth pattern.
At runtime, when a new application arrives, the system runs the job on a small sample of the data (≈100 MB) and collects a handful of hardware performance counters (e.g., L1 data and instruction cache misses). These features are fed to a K‑Nearest‑Neighbour classifier that selects the most appropriate expert from the pre‑trained pool. The chosen expert’s parameters are then calibrated using two additional small‑scale runs on different input sizes, yielding a lightweight but accurate memory model for the specific job and dataset.
With the calibrated memory model and the average CPU usage observed during profiling, the scheduler evaluates each host’s remaining resources. It admits a new executor only if the sum of predicted memory footprints stays within the physical RAM and the aggregate CPU load does not exceed 100 %. If a task would cause an OOM condition, it is re‑executed in isolation (though this never occurred in the experiments). This decision process is applied repeatedly, allowing multiple Spark executors to run concurrently on the same node.
The authors evaluate their scheme on a 40‑node, multi‑core cluster (8 cores per node, 64 GB RAM). The memory predictor achieves an average error of only 5 % and attains 83.9 % of the performance of an oracle predictor that knows the exact memory requirement. Compared with running each application in isolation, the proposed co‑location strategy yields an average 8.69× increase in system throughput and a 49 % reduction in average normalized turnaround time. When benchmarked against a state‑of‑the‑art resource scheduler (reference
Comments & Academic Discussion
Loading comments...
Leave a Comment