The 5Gperf project was conducted by Huawei research teams in 2016-17. It was concerned with the acceleration of signal-processing algorithms for a 5G base-station prototype. It improved on already optimized SIMD-parallel CPU algorithms and designed a new software tool for higher programmer productivity when converting MATLAB code to optimized C
Deep Dive into 5Gperf: signal processing performance for 5G.
The 5Gperf project was conducted by Huawei research teams in 2016-17. It was concerned with the acceleration of signal-processing algorithms for a 5G base-station prototype. It improved on already optimized SIMD-parallel CPU algorithms and designed a new software tool for higher programmer productivity when converting MATLAB code to optimized C
As a leading vendor of wireless telecommunication systems, Huawei/CRI (Central Research Institute)'s Wireless Technology Lab is developing a 5G base-station prototype and has demonstrated its very high performance based on MIMO technology [1,2,3]. Base-station power consumption and throughput critically depends on the efficiency of the signalprocessing system. Its algorithms are designed by wireless signal experts, usually in MATLAB and then have to be converted to high-performance sequential C, a labor-intensive process of up to one man*month per new algorithm or pipeline module version. The 5Gperf project has been a collaboration with Huawei's CSI (Central Software Institute)'s Paris team for improving key algorithms and designing a software tool to improve human productivity in high-performance C codes. This paper summarizes the project and its results.
A new 5G base-station prototyped, built and tested. Its signal processing system is a pair of two algorithm pipelines for processing signal packets as shown in figure 1. Each stage in the pipeline implements a specialized algorithm and the system throughput is limited by the speed of each one. It currently runs on a hardware configuration of five Huawei E9000 blade servers connected by Infiniband. One pipeline instance per CPU core is running in continuous mode.
Each algorithm has been carefully designed in MATLAB to maximize signal quality. It has then been converted to optimized sequential C code to become the compute kernel that implements the corresponding pipeline stage. This process is too labor-intensive and the results not always optimal because of the complex interplay between hardware architecture and relatively-small compute kernels. The 5Gperf project has scrutinized some performance-critical algorithms and provided a new software tool for improving the MATLAB-to-C conversion’s productivity and code performance. Turbo decoding has also been re-implemented, tested and given acceleration factors of 1.7 to 1.9 for its pipeline stage. The corresponding number of CPUs required for a parallel multi-stream execution of multiple instances of turbo decode has been reduced from 20 to 12, with similar energy savings.
The 5Gperf project has also designed and implemented a new software tool called the optimizer so that programmers can convert performance-naive MATLAB code to optimized C in much less time and in a reliable fashion. It provides high programmer productivity, highest-possible performance for the predefined operations it applies and portability to ARM architectures through Numscale’s bSIMD library [4,5]. The optimizer’s principle and design is summarized by figure 6. The application developer in charge of the signal-processing pipeline’s algorithms can transform a high-level algorithm description to optimized and portable C code by annotating critical portions of his code. Each such code segment corresponds to an algorithm building block available in the optimizer’s database. The optimizer tool then replaces the annotations by the most efficient version of the building block on a given matrix size and target architecture. This line tells the optimizer to insert optimized code produced by the kernel named ke.
The following example is valid input for optimizer.py:
- #include 2. #include 3. /// PRAGMA INCLUDES 4. /// PRAGMA FUNCTIONS 5. int main() { 6. std::cout « “begin” « std::endl; 7. /// PRAGMA BEGIN algo, b, c, 8. std::cout « “BAD” « std::endl; 9. /// g, h 10. std::cout « “BAD” « std::endl; 11. /// PRAGMA END 12. std::cout « “end” « std::endl; 13. return 0; 14. } On the above example the optimizer will replace lines 6-10 by the best algorithm given by the executable bbs/algo with parameters b, c, g and h. The choice of best algorithm depends on vector sizes for SIMD libraries, loops and target architecture. The parameters are passed to the bbs/algo executable as command line arguments.
The output will look like this:
This content is AI-processed based on ArXiv data.