GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search

February 18, 2026

Reading time: 5 minute

...

📝 Original Info

Title: GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search
ArXiv ID: 2512.11856
Date: 2025-12-05
Authors: Ao Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Zhi Yang, Weisheng Zhao, Chunming Hu

📝 Abstract

Graph Neural Networks (GNNs) have emerged as the state-of-the-art graph learning method. However, achieving efficient GNN inference on edge devices poses significant challenges, limiting their application in real-world edge scenarios. This is due to the high computational cost of GNNs and limited hardware resources on edge devices, which prevent GNN inference from meeting real-time and energy requirements. As an emerging paradigm, device-edge co-inference shows potential for improving inference efficiency and reducing energy consumption on edge devices. Despite its potential, research on GNN device-edge co-inference remains scarce, and our findings show that traditional model partitioning methods are ineffective for GNNs. To address this, we propose GCoDE, the first automatic framework for GNN architecture-mapping Co-design and deployment on Device-Edge hierarchies. By abstracting the device communication process into an explicit operation, GCoDE fuses the architecture and mapping scheme in a unified design space for joint optimization. Additionally, GCoDE's system performance awareness enables effective evaluation of architecture efficiency across diverse heterogeneous systems. By analyzing the energy consumption of various GNN operations, GCoDE introduces an energy prediction method that improves energy assessment accuracy and identifies energy-efficient solutions. Using a constraint-based random search strategy, GCoDE identifies the optimal solution in 1.5 hours, balancing accuracy and efficiency. Moreover, the integrated co-inference engine in GCoDE enables efficient deployment and execution of GNN co-inference. Experimental results show that GCoDE can achieve up to 44.9x speedup and 98.2% energy reduction compared to existing approaches across diverse applications and system configurations.

💡 Deep Analysis

📄 Full Content

1 GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search Ao Zhou, Jianlei Yang, Senior Member, IEEE, Tong Qiao, Yingjie Qi, Zhi Yang, Weisheng Zhao, Fellow, IEEE, Chunming Hu Abstract—Graph Neural Networks (GNNs) have emerged as the state-of-the-art graph learning method. However, achieving efficient GNN inference on edge devices poses significant chal- lenges, limiting their application in real-world edge scenarios. This is due to the high computational cost of GNNs and limited hardware resources on edge devices, which prevent GNN inference from meeting real-time and energy requirements. As an emerging paradigm, device-edge co-inference shows potential for improving inference efficiency and reducing energy consumption on edge devices. Despite its potential, research on GNN device- edge co-inference remains scarce, and our findings show that traditional model partitioning methods are ineffective for GNNs. To address this, we propose GCoDE, the first automatic frame- work for GNN architecture-mapping Co-design and deployment on Device-Edge hierarchies. By abstracting the device commu- nication process into an explicit operation, GCoDE fuses the architecture and mapping scheme in a unified design space for joint optimization. Additionally, GCoDE’s system performance awareness enables effective evaluation of architecture efficiency across diverse heterogeneous systems. By analyzing the energy consumption of various GNN operations, GCoDE introduces an energy prediction method that improves energy assessment accu- racy and identifies energy-efficient solutions. Using a constraint- based random search strategy, GCoDE identifies the optimal so- lution in 1.5 hours, balancing accuracy and efficiency. Moreover, the integrated co-inference engine in GCoDE enables efficient deployment and execution of GNN co-inference. Experimental results show that GCoDE can achieve up to 44.9× speedup and 98.2% energy reduction compared to existing approaches across diverse applications and system configurations. Index Terms—Graph Neural Networks, Device-Edge Co- Inference, Neural Architecture Search, Edge Devices, System Awareness This work is supported in part by the National Natural Science Foun- dation of China (Grant No. 62072019), the Fundamental Research Funds for the Central Universities, the Beijing Natural Science Foundation (Grant No. L243031), and the National Key R&D Program of China (Grant No. 2023YFB4503704 and 2024YFB4505601). Corresponding authors is Jianlei Yang. A. Zhou and C. Hu are with School of Software, Beihang University, Beijing 100191, China. J. Yang, T. Qiao, and Y. Qi are with School of Computer Science and Engineering, Beihang University, Beijing 100191, China, and Qing- dao Research Institute, Beihang University, Qingdao 266104, China. Email: jianlei@buaa.edu.cn. Z. Yang is with School of Computer Science and Engineering, Peking University, Beijing 100871, China. W. Zhao is with School of Integrated Circuits and Engineering, Beihang University, Beijing 100191, China. Manuscript received on September 2024, revised on August 2025, and accepted on October 2025. I. INTRODUCTION A S edge devices become more intelligent and deep learn- ing breakthroughs continue, the demand for deploying models to process various collected data in real-time on the device side is growing [1, 2]. However, limited hardware resources make it challenging to meet latency and energy requirements when deploying complex deep learning mod- els [3]. In particular, Graph Neural Networks (GNNs) have recently excelled in processing irregular data structures, mak- ing them a popular choice for graph-related applications in edge scenarios, such as point cloud processing [4] and natural language processing [5]. Additionally, the rising popularity of various sensors in mobile devices also encourages the deployment of GNNs to the wireless network edge for real- time sensing and interaction. For instance, autonomous drones require immediate obstacle detection from point clouds [6], where the latency of cloud communication is unacceptable. Likewise, executing speech-based interaction locally for smart assistants [7] is crucial for safeguarding user privacy. However, the significant computational cost of GNNs and the limited hardware resources on edge devices pose major challenges to meeting these strict real-time and energy requirements, severely limiting their application in real-world edge scenarios. This is demonstrated by deploying the popular point cloud processing model DGCNN [8] on a Raspberry Pi 3B, which achieves less than 0.3 fps, far below practical requirements (typically above 30 fps [9]). Research efforts have been made to address the inefficiency of GNNs on edge devices. [4, 10] reduced GNN computation by manually simplifying the model structure. Meanwhile, HGNAS [11] and [12] adopted a more efficient hardware- aware neural architecture search (NAS) approach to design hardware-friendly GNNs for edge

📄 Read Full PDF on ArXiv