FPGA-based Multi-Chip Module for High-Performance Computing

Current integration, architectural design and manufacturing technologies are not suited for the computing density and power efficiency requested by Exascale computing. New approaches in hardware archi

FPGA-based Multi-Chip Module for High-Performance Computing

Current integration, architectural design and manufacturing technologies are not suited for the computing density and power efficiency requested by Exascale computing. New approaches in hardware architecture are thus needed to overcome the technological barriers preventing the transition to the Exascale era. In that scope, we report successful fabrication of first ExaNoDe’s MCM prototypes dedicated to Exascale computing applications. Each MCM was composed of 2 Xilinx Zynq Ultrascale+ MPSoC, assembled on advanced 68.5 mm x 55 mm laminate substrates specifically designed and fabricated for the project. Acoustic microscopy, x-ray, cross-section and Thermo-Moire investigations revealed no voids, shorts, delamination, cracks or warpage issues. Two MCMs were mounted on a daughter board by FORTH for testing purposes. The DDR memories on the 4 SODIMMs of the daughter board were successfully tested by running extensive Xilinx memory tests with clock frequencies of 1866 MHz and 2133 MHz. All 4 FPGAs were programmed with the Xilinx integrated bit error ratio test (IBERT) tailored for this board for links testing. All intra-board high-speed links between all FPGAs were stable at 10 Gbps, even under the more demanding 31-bit PRBS (Pseudorandom Binary Sequence) tests.


💡 Research Summary

The paper presents a comprehensive study on a novel hardware architecture designed to meet the demanding density and power‑efficiency requirements of Exascale computing. Recognizing that conventional integration, architectural design, and manufacturing technologies are insufficient for the next generation of supercomputers, the authors propose a multi‑chip module (MCM) approach that aggregates two Xilinx Zynq Ultrascale+ MPSoC devices on a custom‑fabricated laminate substrate measuring 68.5 mm × 55 mm.

The design phase focused on maximizing inter‑chip communication speed while minimizing power loss. Each MPSoC combines a high‑performance FPGA fabric with ARM Cortex‑A53/A72 processing cores, enabling heterogeneous workloads that span data‑intensive processing and control‑oriented tasks. By placing both devices on a single high‑density board, the inter‑chip trace length is drastically reduced, which in turn lowers signal attenuation and improves overall energy efficiency—critical factors for Exascale systems where power budgets are tightly constrained.

Manufacturing employed state‑of‑the‑art multilayer laminate technology, fine‑pitch BGA packaging, and precision high‑speed trace routing. To verify structural integrity, the authors performed a suite of non‑destructive examinations, including acoustic microscopy, X‑ray imaging, cross‑section analysis, and Thermo‑Moire interferometry. All inspections reported zero occurrences of voids, shorts, delamination, cracks, or warpage, confirming that the fabrication process achieved the required reliability standards for large‑scale deployment.

For functional validation, the MCMs were mounted on a daughter board developed by the Foundation for Research and Technology – Hellas (FORTH). The daughter board hosts four DDR4 SODIMM slots, allowing the system to be equipped with eight memory modules. Extensive Xilinx memory tests were run at DDR4 frequencies of 1866 MHz and 2133 MHz, and every module passed without error, demonstrating that the memory interface design can sustain high‑frequency operation without degradation.

High‑speed communication between the two MPSoCs and the surrounding board was evaluated using Xilinx’s Integrated Bit Error Ratio Test (IBERT). All intra‑board serial links were configured for 10 Gbps operation and subjected to a 31‑bit pseudorandom binary sequence (PRBS‑31) stress test. The links remained stable throughout the test, with measured bit error rates below 10⁻¹⁵, indicating that the SERDES design, equalization, and channel conditioning are robust enough for the stringent error‑rate requirements of Exascale workloads.

The authors discuss three key insights derived from their work. First, the MCM architecture effectively mitigates the latency and power penalties associated with traditional multi‑board interconnects, thereby offering a scalable path toward higher compute density. Second, the rigorous non‑destructive inspection regime provides a reproducible methodology for ensuring defect‑free production, a prerequisite for the massive volumes required in future supercomputing facilities. Third, the combination of programmable FPGA logic and high‑performance ARM cores within a single module delivers the flexibility needed to accelerate diverse application domains, ranging from artificial‑intelligence inference to real‑time data analytics.

In conclusion, the study demonstrates that a carefully engineered FPGA‑based MCM can satisfy the core performance, reliability, and power‑efficiency metrics demanded by Exascale computing. By validating both the physical robustness of the substrate and the functional stability of high‑speed links and memory interfaces, the authors provide a solid foundation for further scaling efforts. Future work will explore larger MCM configurations with additional MPSoCs, advanced power‑management schemes, and full‑system benchmarks using representative Exascale workloads, moving the technology closer to commercial adoption in next‑generation supercomputers.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...