This paper presents a solution to efficiently explore the design space of communication adapters. In most digital signal processing (DSP) applications, the overall architecture of the system is significantly affected by communication architecture, so the designers need specifically optimized adapters. By explicitly modeling these communications within an effective graph-theoretic model and analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints Graph (RCG). The RCG properties enable an efficient architecture space exploration in order to synthesize a STAR component. The proposed approach has been tested to design an industrial data mixing block example: an Ultra-Wideband interleaver.
The ever growing complexity of applications and the shrinking time-to-market lead the designers to look for advanced design methodologies. Indeed, to design such complex architecture within a short design time, it is necessary to raise the abstraction level of design description to system level, to explore the design space and finally to automatically generate the hardware register transfer level (RTL) architecture. Nowadays, a widespread solution to handle design complexity is to reuse predesign heterogeneous IP cores. Unfortunately, the main problem arises from their integration.
In the multi-processor SoC (MPSoC) context (IP cores can be processor, memory, bus…) the problems come from the interfaces and protocols of the components. To tackle interfacing and functional problems when designing MPSoC architectures, system integrators can use standard interfaces such as Virtual Component Interface proposed by VSIA [16] and Open Core Protocol proposed by the OCP International Partnership [17]. However, in addition to the protocol aspects, SoC designers also have to synchronize components and to buffer data in order to ensure system behavior and to meet timing constraints. In [7] authors propose to automatically generate simulation wrappers for MPSoC architectures.
However, in the field of Digital Signal Processing (DSP) applications (e.g. [13]), a multi-processor SoC (MPSoC) architecture may not be a well-suited solution because of design complexity. Optimized hardware accelerators (e.g. filters)composed of a set of computing blocks communicating through point-to-point links-are still needed. From this point of view, the designers have to tackle problems such like throughput adaptation, data re-ordering (e.g. row-column), Input/Output parallelism adaptation. Based on communication templates, [9] presents a generic interface unit architecture for communication synthesis in a platform-based design approach. In [1] a multiplexer/demultiplexer and FIFO-based interface architecture is used. In [6], the authors propose a systematic way of interfacing data-flow hardware accelerators (IP core) for their integration in a system on chip. Their interface architecture is based on FIFO (queue) storage elements and a Direct Memory Access module (DMA). They assume that the IP are data synchronized (i.e. at each clock cycle a data is presented and read). However, these previous approaches assumed that the sequence of produced data is the same as the sequence of consumed data (no re-ordering). Moreover, FIFO sizes are computed by a “set and simulate” approach.
Obviously, interfacing DSP’s blocks greatly impacts the quality of the system (throughput, area, power consumption…), that’s why efficient communication adapter design is still one of the most important points in complex system design. In fact, using Input/Output (I/O) wrappers can introduce unnecessary memorizing elements. Such wrappers may be needed in order to solve data reordering problems that can arise from the IP core integration. In [12] the authors aim at determining at compile time whether a FIFO is sufficient for every producer/consumer pair of a Kahn Process Network. When the sequence of produced data is different from the sequence of consumed data, extra storage and control on the consumer side is proposed [15]. This extra module includes a CAM (Content Addressable Memory) where data are addressed using a hash table. This solution enables the implementation of non-deterministic communications, but there is no optimization of the adapter overhead since overlapping of input and output data is not possible. In [2], a formal technique for hardware interface design is proposed. A generic interface model targeted by the communication synthesis is used. The low-level timing constraints can include strict timing specifications or data transfer schedule. The interface synthesis is carried out by an allocation procedure of data storage components (FIFO, LIFO and register). However, the size of storage elements is not computed or even taken into account during the design process. The proposed methodology is based on NP-complete maximum clique algorithm. In [14] the authors develop a system-level IP reuse methodology where designs are described in three layers. Data transfer and data storage optimizations are done by reorganizing loop indexing and loop nesting. Unfortunately, the authors do not present the technique they use to produce the RTL component architecture from the algorithm specification. In [4], the authors develop a set of techniques dedicated to the design of DSP algorithm. High-level synthesis of the processing unit is carried out under I/O timing and architectural constraints. The approach leads to an optimized data-path synthesis but still requires the communication unit design.
In [11] authors proposed approaches that use Matlab/Simulink for the system specification and that produce a VHDL RTL architecture of the system. Based on hardware macro ge
This content is AI-processed based on open access ArXiv data.