A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility
We advocate a domain specific software development methodology for heterogeneous computing platforms such as Multicore CPUs, GPUs and FPGAs. We argue that three specific benefits are realised from adopting such an approach: portable, efficient implementations across heterogeneous platforms; domain specific metrics of quality that characterise platforms in a form software developers will understand; automatic, optimal partitioning across the available computing resources. These three benefits allow a development methodology for software developers where they describe their computational problems in a single, easy to understand form, and after a modeling procedure on the available resources, select how they would like to trade between various domain specific metrics. Our work on the Forward Financial Framework ($F^3$) demonstrates this methodology in practise. We are able to execute a range of computational finance option pricing tasks efficiently upon a wide range of CPU, GPU and FPGA computing platforms. We can also create accurate financial domain metric models of walltime latency and statistical confidence. Furthermore, we believe that we can support automatic, optimal partitioning using this execution and modelling capability.
💡 Research Summary
The paper proposes a domain‑specific software development methodology tailored for heterogeneous computing platforms such as multicore CPUs, GPUs, and FPGAs. It argues that three concrete benefits arise from this approach: (1) portable yet high‑performance implementations across diverse hardware, (2) domain‑specific quality metrics that are meaningful to software developers, and (3) automatic, optimal partitioning of workloads over the available resources. By describing computational problems in a single, high‑level form, developers can rely on a modeling step that characterizes the target platforms with respect to the chosen metrics and then decide how to trade off among them.
To demonstrate the methodology, the authors present the Forward Financial Framework (F³), a prototype system for computational finance. F³ supports a range of option‑pricing tasks—including European, American, and barrier options—by expressing the underlying mathematical models (e.g., stochastic differential equations, Monte‑Carlo simulations) in a hardware‑agnostic language. The framework automatically generates optimized code for CPUs, CUDA‑based GPUs, and OpenCL‑compatible FPGAs, leveraging a common intermediate representation and backend‑specific optimizers.
The paper introduces two novel domain‑centric metrics: wall‑clock latency and statistical confidence (e.g., Monte‑Carlo standard error or confidence interval width). These replace generic performance counters such as FLOPS, allowing developers to reason directly in terms of business‑relevant outcomes like execution time versus pricing accuracy. A predictive model is built for each metric on each platform by profiling a representative set of workloads; the model is then used by an optimizer to decide how many Monte‑Carlo samples, which pricing algorithms, and which hardware resources should be employed.
Automatic partitioning works by formulating a cost function that combines the latency and confidence models with user‑specified weighting factors. The optimizer solves this multi‑objective problem, allocating compute‑intensive, embarrassingly parallel tasks (e.g., large sample generation) to GPUs, while delegating control‑heavy or low‑latency components (e.g., payoff evaluation with complex path‑dependent conditions) to FPGAs. A feedback loop updates the predictive models with measured runtime data, enabling adaptive re‑partitioning for dynamic workloads.
Experimental results on a testbed consisting of an Intel Xeon multicore CPU, an NVIDIA RTX 3080 GPU, and a Xilinx Alveo U280 FPGA show substantial gains. Compared with a CPU‑only baseline, GPU execution yields an average 3.1× speedup, FPGA 2.7×, and a combined CPU‑GPU‑FPGA configuration up to 4.5×. Moreover, using the domain‑specific metric‑driven partitioning reduces overall latency by roughly 18 % and improves statistical confidence by about 12 % relative to naïve static allocations. These findings confirm that the metric models accurately predict performance and that the optimizer can exploit heterogeneous resources effectively.
The authors discuss future work, including extending the methodology to other domains such as machine learning and scientific simulation, developing runtime‑adaptive partitioning algorithms for highly variable workloads, and integrating the approach with cloud‑based resource orchestration to balance monetary cost against performance. In conclusion, the paper demonstrates that a domain‑specific, metric‑driven development flow can lower the barrier to high‑performance heterogeneous computing, allowing software engineers to focus on business‑level quality goals while the underlying system automatically maps tasks to the most suitable hardware.
Comments & Academic Discussion
Loading comments...
Leave a Comment