The complexity of multimedia applications in terms of intensity of computation and heterogeneity of treated data led the designers to embark them on multiprocessor systems on chip. The complexity of these systems on one hand and the expectations of the consumers on the other hand complicate the designers job to conceive and supply strong and successful systems in the shortest deadlines. They have to explore the different solutions of the design space and estimate their performances in order to deduce the solution that respects their design constraints. In this context, we propose the modeling of one of the design space possible solutions: the software to hardware task migration. This modeling exploits the synchronous dataflow graphs to take into account the different migration impacts and estimate their performances in terms of throughput.
Deep Dive into Performance Analysis of Software to Hardware Task Migration in Codesign.
The complexity of multimedia applications in terms of intensity of computation and heterogeneity of treated data led the designers to embark them on multiprocessor systems on chip. The complexity of these systems on one hand and the expectations of the consumers on the other hand complicate the designers job to conceive and supply strong and successful systems in the shortest deadlines. They have to explore the different solutions of the design space and estimate their performances in order to deduce the solution that respects their design constraints. In this context, we propose the modeling of one of the design space possible solutions: the software to hardware task migration. This modeling exploits the synchronous dataflow graphs to take into account the different migration impacts and estimate their performances in terms of throughput.
The enhancement of multimedia applications reaches its culminating point because of the growing consumers needs in all domestic and professional audio video domains. To answer these needs more and more rigid, the embedded systems rapidly evolve towards multiprocessor systems on chip (MPSoCs) particularly those based on networks on chip (NoCs) as communication architecture. The number of processors per chip, the diversity of their types as well as their communications complicate the MPSoCs design; without forgetting the multimedia applications complexity in terms of computation intensity and data abundance and heterogeneity. So, the principal challenge of designers is to face this NoC-based MPSoC design complexity and provide robust systems in the shortest delays. To deal with these conflicting design challenges, designers have to estimate principal characteristics of the final system early in the design process of MPSoCs; which results in a final implementation where productivity and quality are simultaneously guaranteed.
Designers must control the ever growing MPSoC Design Space Exploration (DSE) where different choices are investigated in order to determine the appropriate choice that leads to a fair compromise between the different conflicting design objectives. Typically, the performance estimation is an important part of the DSE. Different choices of the application parallelization, the target platform and the mapping of the application onto the platform need to be estimated in terms of different quality criteria. If the constraints (energy consumption, throughput, etc.) drawn by the designers are not achieved, modifications should be brought to the application decomposition and/or the platform and/or the proposed mapping in order to find an MPSoC configuration that meets the designers constraints.
In recognition of the growing need to the MPSoC performance estimation, different approaches aim at estimating the overall system performance. In [1], three approaches are defined. First, the simulation-based approach is based on an evaluation of the system behavior by means of simulation (native execution, Instruction Set Simulator, etc.) in different abstraction levels. Kai Huang and al [2] exploits the Simulink platform to simulate the multimedia applications on different hardware platforms. The H.264 decoder is used as a case study to validate this work. Its simulation on different platforms (change of the processors type and number) estimates the number of consumed cycles per processor for execution and communication. In the same way, Teresa Medina Leon [3] proposes the MJPEG decoder simulation on the MiniNoC platform to estimate the time required for a frame decoding. The MiniNoC platform, implemented in C++, simulates in the register transfer level a platform composed of four mini MIPS processors displayed in four nodes that communicate with each other via a mini NoC composed of four routers. Second, the trace based approach consists in collecting the application execution traces. Designers operate a single simulation at the beginning of the design phase from which they extract traces of the application execution on the target platform. These traces can be, for instance, the size of the transferred data on the communication platform, the number of transactions between every pair of tasks, the execution times of different tasks, etc. The collected traces are organized in the form of a Communication Analysis Graph (CAG). The CAG analysis allows the designers to produce several statistics about the system performances. The trace based approach is generally used when the initial simulation is difficult to reproduce. This approach was exploited in [4] to lead to the optimal communication mapping on a predefined target architecture assuming that the application is already partitioned and mapped. Traces collected in this work serve at calculating, for every edge of the CAG, a weight that reflects the frequency, the volume and the criticism of transactions between every communicating unity. Finally, the static approach that tries to avoid the computationally prohibitive and exhaustive simulation, makes use of “static” models such as graphs, mathematical equations, UML components [5] and XML tags to estimate the MPSoCs performances.
In this paper, we opt for static approach using exactly the Synchronous DataFlow Graphs (SDFGs) to model applications as well as their mapping on target platforms. SDFGs are extremely used for MPSoCs performance estimation since they fit well with the characteristics of streaming multimedia applications. Moreover, they can model many mapping decisions of an application on NoC-based MPSoC adding new actors and edges to the initial SDFG of the application. Among the several design flows [6] [7] [8] that make use of SDFGs as a model of computation, we focus in this paper on the predictable design flow established by Sander Stuijk [9]. These design flows do not model the migration o
…(Full text truncated)…
This content is AI-processed based on ArXiv data.