Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

Reading time: 5 minute
...

📝 Abstract

Application-Specific Instruction-Set Processors (ASIPs) built on the RISC-V architecture offer specialization opportunities for various applications. However, existing frameworks from the open-source RISC-V ecosystem suffer from limited performance due to restricted hardware synthesis and rigid compiler support. To address these challenges, we introduce Aquas, a holistic hardware-software codesign framework built upon MLIR. Aquas enhances ASIP synthesis with fast memory access capability via a burst DMA engine and advanced high-level synthesis (HLS) optimizations. On the compiler side, we propose an e-graph based retargetable approach with a novel matching engine for efficient instruction matching. Evaluation demonstrates up to 9.27× speedup on real-world workloads, including point cloud processing and LLM inference.

💡 Analysis

Application-Specific Instruction-Set Processors (ASIPs) built on the RISC-V architecture offer specialization opportunities for various applications. However, existing frameworks from the open-source RISC-V ecosystem suffer from limited performance due to restricted hardware synthesis and rigid compiler support. To address these challenges, we introduce Aquas, a holistic hardware-software codesign framework built upon MLIR. Aquas enhances ASIP synthesis with fast memory access capability via a burst DMA engine and advanced high-level synthesis (HLS) optimizations. On the compiler side, we propose an e-graph based retargetable approach with a novel matching engine for efficient instruction matching. Evaluation demonstrates up to 9.27× speedup on real-world workloads, including point cloud processing and LLM inference.

📄 Content

Edge computing is increasingly driving the need for Application-Specific Instruction-Set Processors (ASIPs), which accelerate applications through hardware specialization while maintaining programmability and generality with instruction set architecture extensions (ISAXs). RISC-V [23]-based ASIPs have been widely adopted in domains across signal processing [17], machine learning inference [2,27], cryptography [5,12,15], and computer graphics [19,20]. The design of high-performance ASIPs presents a significant hardware-software co-design challenge, demanding a framework that can both generate specialized hardware and adapt software to leverage it effectively.

Despite recent advances [14,25], creating a truly holistic ASIP design framework requires overcoming three fundamental challenges. First, a powerful and agile synthesis framework is needed to generate efficient hardware for diverse workloads. Commercial tools like Synopsys ASIP Designer [18] offer fine-grained control but rely on low-level languages such as nML [21], requiring significant expertise and manual effort that hinders agility. Research frameworks have their own limitations, too. Longnail [14] provides a high-level synthesis (HLS) based hardware synthesis flow but lacks expressiveness for complex control patterns or custom memories, confining it to simple ISAXs. APS [25] exhibits performance constraints due to microarchitectural limits, such as memory bandwidth, and a lack of crucial synthesis optimizations-such as array partitioning and loop transforms common in HLS frameworks [1,10,26,28]-to generate highly parallel designs.

Second, a robustly retargetable compiler is essential to automatically leverage these complex ISAXs. The RISC-V ecosystem exhibits tremendous diversity in instruction backends, with countless instruction extensions featuring varying semantics and computational patterns. However, current compiler technologies remain primitive and lack true retargetability-unable to systematically adapt to this wide variety of custom backends. Longnail [14] offers no compiler support, and APS [25] provides only basic pattern matching, lacking the sophisticated program transformation capabilities required. Even commercial tools [18] require users to write C-code in the exact way that exposes the ISAX pattern, or use intrinsic explicitly. The fused semantics of modern ISAXs, which often contain internal loops and complex memory access patterns, render trivial pattern-matching techniques insufficient. Furthermore, application code must be structurally transformed to align with the ISAX, yet existing frameworks fall short.

Third, the hardware and software design flows should be integrated within a unified, holistic infrastructure. Separate toolchains prevent iterative co-design and optimization. While Longnail and HLS tools like HECTOR [26] and Dynamatic [10] use the MLIR [11] infrastructure, they do not integrate a complete compiler retargeting flow, thereby missing the opportunity to leverage a unified infrastructure for comprehensive co-optimization.

In this work, we present Aquas, a holistic, MLIR-based framework for automated hardware-software co-design that generates optimized ASIPs and their corresponding compiler support. To overcome hardware synthesis limitations, Aquas provides a complete flow to specify and generate ISAXs with a burst-capable Direct Memory Access (DMA) interface, eliminating memory bottlenecks. By leveraging MLIR’s optimization infrastructure, our framework applies advanced HLS techniques to generate highly parallel hardware that fully exploits data-level parallelism. For the compiler, Aquas proposes a novel and robust retargetable approach through the seamless interaction with the e-graph [24] data structure. E-graph supports joint application of algebraic and controlflow rewrites while compactly maintaining the space of possible program variants to expose ISAX offloading opportunities. We propose a hybrid skeleton-components matching engine to effectively map application patterns to complex ISAXs. Finally, Aquas unifies the entire ASIP design flow-from hardware specification through synthesis to compiler retargeting-within the MLIR infrastructure. This unified approach enables seamless co-optimization across the hardware-software boundary and facilitates agile design iteration.

Our contributions are as follows:

• We introduce Aquas, a holistic open-source framework for unified ASIP hardware-software co-design built upon MLIR. • We propose an enhanced synthesis flow with burst-capable DMA, memory subsystem support, and advanced HLS optimizations that unleash the performance potential of ASIP ISAXs. • We design a novel e-graph-based retargetable compiler for automatic adoption of ISAXs, featuring a skeleton-component matching engine to map application code to complex instructions.

We demonstrate using two real-world case studies, point-cloud analysis and CPU large language model (LLM) inference. Compared to t

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut