Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

Reading time: 5 minute
...

📝 Original Info

  • Title: Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor
  • ArXiv ID: 1604.04205
  • Date: 2016-04-15
  • Authors: James A. Ross, David A. Richie

📝 Abstract

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following an investigation into the use of two-sided communication using threaded MPI, one-sided communication using SHMEM is being explored. Here we present work in progress on the development of an OpenSHMEM 1.2 implementation for the Epiphany architecture.

💡 Deep Analysis

Figure 1

📄 Full Content

The Adapteva Epiphany MIMD architecture [1], currently realized in the inexpensive Parallella platform, remains a challenge to program. Contributing factors include the limited core memory (32 KB shared instruction and data), low off-chip bandwidth, an unfamiliar proprietary software stack, and current inability to access host system calls within the asymmetric hybrid platform architecture. A general overview of the Epiphany architecture appears in Figure 1. The vendor-developed multi-core elib interface within Epiphany SDK (eSDK) requires rewriting parallel applications in order to take advantage of the underlying hardware features such as the on-chip dual-channel DMA engines and 2D NoC topology. However, such applications cannot be reused on other platforms. The e-lib interface also lacks most of the multi-core primitives found in the OpenSHMEM interface so few direct comparisons are available.

Computer architectures like Epiphany achieve excellent computational energy efficiency. The use of SRAM instead of off-chip DRAM decreases power consumption and reduces memory latency, while addressing the memory wall problem found in symmetric multiprocessor (SMP) architectures. The bandwidth scales with the number of cores in the same manner that DRAM bandwidth scales with the number of sockets on a distributed CPU cluster. The architecture may be tiled so that a larger coprocessor can be created by placing additional processors on a circuit board and connected without additional glue logic. Conceptually, the greatest challenges for effectively using the Epiphany cores are from the limited SRAM as well as the efficient execution of inter-processor communication primitives. In previous work, we demonstrated the use of a threaded MPI implementation to achieve high performance using a standard parallel programming API for the Epiphany architecture [2], [3]. The OpenSHMEM 1.2 standard provides excellent one-sided communication routines well-suited for Epiphany when executed in a SPMD manner. For the Epiphany architecture, the OpenSHMEM API provides improved data referencing semantics and reduced interface complexity compared to MPI, thus reducing code size and increasing application performance. We present here the status of the implementation of the ARL OpenSHMEM implementation for Epiphany. Most of the OpenSHMEM routines have been implemented, including all of the accessibility functions, atomics (add and fetchand-operate), wait operations, reductions, locks, block data copy, and elemental data copy routines.

The eSDK uses a 2D identifier for the row and column of the core within the chip. This API choice restricts applications to use rectangular sections of the chip rather than arbitrary or odd work group sizes. There is no abstraction between a virtual process identifier and the physical domain and this is problematic for future architectures where there may be broken or disabled cores within a larger 2D array. With the OpenSHMEM API, the one-dimensional virtual computational topology abstracts away the physical location and memory address. Calculating the physical address of cores within a workgroup is comprised of trivial logical and integer operations. Direct comparisons between APIs is challenging because device code within the eSDK library contains only a subset of the routines used in OpenSHMEM: remote address calculation, a global barrier, block data memory copying, and multi-core locks.

Compared to the previous work on MPI for the architecture, the simplicity of the explicit typespecialization of OpenSHMEM routines enables more compact implementation, saving limited core memory resources. Additionally, one-sided communication and weaker synchronization requirements reduce the effective code size of an application compared with the use of explicit two-sided threaded MPI routines. There are also fewer specialized routines in OpenSHMEM to build out a full implementation. The MPI specification makes no assumptions for symmetric memory allocation, so additional inter-processor coordination may be required for correct remote address calculation.

Symmetric memory management is one of the most challenging aspects of the standard for the Epiphany architecture because there is no translation between logical and physical address of local memory and there is no tracking. As a consequence, calculating remote addresses is trivial and does not require inter-core coordination. The assumptions of symmetric allocation within SHMEM lead to less code and inter-processor coordination compared to the MPI implementation. SHMEM memory management routines are presently implemented using UNIX brk/sbrk for linear ordered allocation and imposes rules on the ordering of reallocation and freeing. We will address this limitation in the future to allow allocation accounting consistent with a conventional malloc.

For the fully collective barrier, the Epiphany hardware wait-on-AND (WAND) barrier and interrupt service routine are used

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut