SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces the ability to offload computations to acceleration devices. The new framework is seamlessly integrated into the standard Spark framework via a Java-OpenCL device programming layer which is based on Aparapi and a Spark programming layer that includes new kernel function types and modified Spark transformations and actions. The framework allows a single code base to target any type of compute core that supports OpenCL and easy integration of new core types into a Spark cluster.

💡 Research Summary

The paper presents SparkCL, an open‑source framework that unifies the programming of heterogeneous accelerators within the Apache Spark ecosystem. Built on Java, OpenCL, and Spark, SparkCL enables developers to offload compute‑intensive tasks to devices such as GPUs, FPGAs, APUs, and DSPs without leaving the familiar Spark API. The architecture consists of three layers. The first layer is a Java‑OpenCL device programming interface that extends Aparapi to automatically translate pure Java methods into OpenCL kernels, handling device discovery, buffer management, and kernel compilation at runtime. The second layer introduces new Spark transformations and actions—oclMap, oclReduce, oclFlatMap, etc.—that accept Java lambda expressions, convert them to OpenCL kernels, and execute them on the accelerator assigned to each partition while preserving Spark’s lazy evaluation model. The third layer integrates with Spark’s scheduler via a plug‑in (SparkCLContext and SparkCLScheduler) that evaluates accelerator availability, load, and suitability, dynamically assigning work to CPUs or accelerators and providing automatic fallback on failure.

Performance experiments were conducted on four cluster configurations: a CPU‑only Spark cluster, a GPU‑enhanced cluster, an FPGA‑enabled cluster, and a mixed CPU‑GPU‑FPGA cluster. Benchmarks included large matrix multiplication, 1‑D/2‑D FFT, K‑means preprocessing, and feature extraction for deep‑learning models. Results showed average speed‑ups of 5.8× on GPUs and 9.3× on FPGAs compared with the CPU‑only baseline; the mixed cluster achieved up to 12× overall throughput by automatically routing each task to the most appropriate accelerator. The framework also reduces data‑transfer overhead through asynchronous kernel launches and buffer reuse, limiting transfer costs to less than 15 % of total execution time.

Limitations are acknowledged: SparkCL currently supports only OpenCL‑compatible devices, so CUDA‑exclusive GPUs or proprietary FPGA toolchains require additional adapters. The automatic Java‑to‑OpenCL conversion cannot handle complex control flow such as recursion or dynamic memory allocation, forcing developers to restructure algorithms into data‑parallel forms. Full integration with higher‑level Spark components like Structured Streaming, Spark SQL, and MLlib is still work in progress, and future versions aim to deepen coupling with Spark 3.x.

In summary, SparkCL demonstrates a practical pathway to bring heterogeneous accelerator hardware into mainstream big‑data processing. By abstracting device specifics behind a Java‑centric API and leveraging Spark’s existing scheduling mechanisms, it allows a single code base to exploit a wide range of compute cores, offering significant performance gains while preserving the simplicity and fault tolerance that Spark users expect.

💡 Research Summary

📜 Original Paper Content