A Soft Processor Overlay with Tightly-coupled FPGA Accelerator

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers’ productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full application acceleration, it is often necessary to also include a highly efficient processor that integrates and collaborates with the accelerators while maintaining the benefits of being implemented within the same overlay framework. This paper presents an open-source soft processor that is designed to tightly-couple with FPGA accelerators as part of an overlay framework. RISC-V is chosen as the instruction set for its openness and portability, and the soft processor is designed as a 4-stage pipeline to balance resource consumption and performance when implemented on FPGAs. The processor is generically implemented so as to promote design portability and compatibility across different FPGA platforms. Experimental results show that integrated software-hardware applications using the proposed tightly-coupled architecture achieve comparable performance as hardware-only accelerators while the proposed architecture provides additional run-time flexibility. The processor has been synthesized to both low-end and high-performance FPGA families from different vendors, achieving the highest frequency of 268.67MHz and resource consumption comparable to existing RISC-V designs.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator Ho-Cheung Ng, Cheng Liu, Hayden Kwok-Hay So Department of Electrical & Electronic Engineering, The University of Hong Kong {hcng, liucheng, hso}@eee.hku.hk Abstract—FPGA overlays are commonly implemented as coarse-grained reconﬁgurable architectures with a goal to im- prove designers’ productivity through balancing ﬂexibility and ease of conﬁguration of the underlying fabric. To truly facilitate full application acceleration, it is often necessary to also include a highly efﬁcient processor that integrates and collaborates with the accelerators while maintaining the beneﬁts of being implemented within the same overlay framework. This paper presents an open-source soft processor that is designed to tightly-couple with FPGA accelerators as part of an overlay framework. RISC-V is chosen as the instruction set for its openness and portability, and the soft processor is designed as a 4-stage pipeline to balance resource consumption and performance when implemented on FPGAs. The processor is generically implemented so as to promote design portability and compatibility across different FPGA platforms. Experimental results show that integrated software-hardware applications using the proposed tightly-coupled architecture achieve comparable performance as hardware-only accelerators while the proposed architecture provides additional run-time ﬂexibility. The processor has been synthesized to both low-end and high-performance FPGA families from different vendors, achieving the highest frequency of 268.67 MHz and resource consumption comparable to existing RISC-V designs. I. INTRODUCTION By raising the abstraction level of the underlying conﬁg- urable fabric, many early works have already demonstrated the promise of using FPGA overlays to improve designer’s pro- ductivity in developing hardware accelerators [1], [2]. While such hardware accelerators can often deliver signiﬁcant perfor- mance improvement over their software counterparts, they are often ﬁxed in functionality and lack the ﬂexibility to process irregular input or data that depends on run-time dynamics. To truly take advantage of the performance beneﬁt of hardware accelerators, it is therefore desirable to have an efﬁcient CPU in the overlay tightly-coupled with the accelerator to control its operations and to maintain compatibility with the rest of the software system. To illustrate these intricate hardware-software codesign challenges, Algorithm 1 shows a simple design that accelerates the Sobel edge detection algorithm in such heterogeneous system. In this implementation, an accelerator that computes 16 × 16 output pixels at a time is implemented in FPGA. During run time, depending on the user input image size, the software reuses this hardware accelerator for as many complete 16 × 16 output pixels as possible. The remaining odd pixels, as well as pixels on the boundary of the image where the standard ﬁlter kernel cannot readily operate on, are handled in software. Data: Pixels of size N × N 1 # deﬁne BUF 16 // HW computes 16x16 output pixels 2 for r := 0 to N −1 do 3 for c := 0 to N −1 do 4 if pixel[r, c] is edge then 5 SW SOBEL( pixel, r, c ); 6 else if ((r −1) % BUF) == 0 && 7 (c −1) % BUF) == 0 then 8 HW SOBEL( pixel, r, c ); 9 else 10 continue; 11 end 12 end 13 end Algorithm 1: Pseudocode for Sobel edge detector. As the hardware accelerator operates on a ﬁxed 16 × 16 array of output pixel at a time, software passes control to the accelerator only for cases when all 17×17 pixels are available. Otherwise, the computation is carried out in software. Assume N −2 is a multiple of BUF. While the design of Algorithm 1 may be speciﬁc to the particular implementation of Sobel edge detection, it high- lights several challenges commonly faced by many real-world hardware-software designers. First of all, because of the lim- ited ﬂexibility of most hardware accelerators, the controlling software must ensure the necessary input data are available before the accelerator is launched. Furthermore, unless the hardware accelerator is arbitrarily ﬂexible, software running in the CPU must also be able to process any run time data that cannot readily be processed by the accelerator. In view of the above, this paper proposes the use of a small, open source soft processor to provide ﬁne-grained control for the hardware accelerator in the context of an overlay framework. The core is designed to be tightly-coupled with the hardware accelerator in order to minimize the overhead in- volved with switching control between hardware and software. RISC-V RV32I [3] is chosen as the ISA for its openness and simplicity. Finally, the core is generically designed in order to promote design portability and compatibility. As such, we consider the main contribution of this work rests on the demonstration of the beneﬁts of tightly-coupling a lightweight CPU with hardware accelerator to serve within a combined overlay architectur

View Original ArXiv

This content is AI-processed based on ArXiv data.

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found