GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search
Reading time: 5 minute
...
📝 Original Info
Title: GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search
ArXiv ID: 2512.11856
Date: 2025-12-05
Authors: Ao Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Zhi Yang, Weisheng Zhao, Chunming Hu
📝 Abstract
Graph Neural Networks (GNNs) have emerged as the state-of-the-art graph learning method. However, achieving efficient GNN inference on edge devices poses significant challenges, limiting their application in real-world edge scenarios. This is due to the high computational cost of GNNs and limited hardware resources on edge devices, which prevent GNN inference from meeting real-time and energy requirements. As an emerging paradigm, device-edge co-inference shows potential for improving inference efficiency and reducing energy consumption on edge devices. Despite its potential, research on GNN device-edge co-inference remains scarce, and our findings show that traditional model partitioning methods are ineffective for GNNs. To address this, we propose GCoDE, the first automatic framework for GNN architecture-mapping Co-design and deployment on Device-Edge hierarchies. By abstracting the device communication process into an explicit operation, GCoDE fuses the architecture and mapping scheme in a unified design space for joint optimization. Additionally, GCoDE's system performance awareness enables effective evaluation of architecture efficiency across diverse heterogeneous systems. By analyzing the energy consumption of various GNN operations, GCoDE introduces an energy prediction method that improves energy assessment accuracy and identifies energy-efficient solutions. Using a constraint-based random search strategy, GCoDE identifies the optimal solution in 1.5 hours, balancing accuracy and efficiency. Moreover, the integrated co-inference engine in GCoDE enables efficient deployment and execution of GNN co-inference. Experimental results show that GCoDE can achieve up to 44.9x speedup and 98.2% energy reduction compared to existing approaches across diverse applications and system configurations.
💡 Deep Analysis
📄 Full Content
1
GCoDE: Efficient Device-Edge Co-Inference for
GNNs via Architecture-Mapping Co-Search
Ao Zhou, Jianlei Yang, Senior Member, IEEE, Tong Qiao, Yingjie Qi, Zhi Yang,
Weisheng Zhao, Fellow, IEEE, Chunming Hu
Abstract—Graph Neural Networks (GNNs) have emerged as
the state-of-the-art graph learning method. However, achieving
efficient GNN inference on edge devices poses significant chal-
lenges, limiting their application in real-world edge scenarios.
This is due to the high computational cost of GNNs and
limited hardware resources on edge devices, which prevent GNN
inference from meeting real-time and energy requirements. As an
emerging paradigm, device-edge co-inference shows potential for
improving inference efficiency and reducing energy consumption
on edge devices. Despite its potential, research on GNN device-
edge co-inference remains scarce, and our findings show that
traditional model partitioning methods are ineffective for GNNs.
To address this, we propose GCoDE, the first automatic frame-
work for GNN architecture-mapping Co-design and deployment
on Device-Edge hierarchies. By abstracting the device commu-
nication process into an explicit operation, GCoDE fuses the
architecture and mapping scheme in a unified design space for
joint optimization. Additionally, GCoDE’s system performance
awareness enables effective evaluation of architecture efficiency
across diverse heterogeneous systems. By analyzing the energy
consumption of various GNN operations, GCoDE introduces an
energy prediction method that improves energy assessment accu-
racy and identifies energy-efficient solutions. Using a constraint-
based random search strategy, GCoDE identifies the optimal so-
lution in 1.5 hours, balancing accuracy and efficiency. Moreover,
the integrated co-inference engine in GCoDE enables efficient
deployment and execution of GNN co-inference. Experimental
results show that GCoDE can achieve up to 44.9× speedup and
98.2% energy reduction compared to existing approaches across
diverse applications and system configurations.
Index Terms—Graph Neural Networks, Device-Edge Co-
Inference, Neural Architecture Search, Edge Devices, System
Awareness
This work is supported in part by the National Natural Science Foun-
dation of China (Grant No. 62072019), the Fundamental Research Funds
for the Central Universities, the Beijing Natural Science Foundation (Grant
No. L243031), and the National Key R&D Program of China (Grant No.
2023YFB4503704 and 2024YFB4505601). Corresponding authors is Jianlei
Yang.
A. Zhou and C. Hu are with School of Software, Beihang University,
Beijing 100191, China.
J. Yang, T. Qiao, and Y. Qi are with School of Computer Science
and Engineering, Beihang University, Beijing 100191, China, and Qing-
dao Research Institute, Beihang University, Qingdao 266104, China. Email:
jianlei@buaa.edu.cn.
Z. Yang is with School of Computer Science and Engineering, Peking
University, Beijing 100871, China.
W. Zhao is with School of Integrated Circuits and Engineering, Beihang
University, Beijing 100191, China.
Manuscript received on September 2024, revised on August 2025, and
accepted on October 2025.
I. INTRODUCTION
A
S edge devices become more intelligent and deep learn-
ing breakthroughs continue, the demand for deploying
models to process various collected data in real-time on the
device side is growing [1, 2]. However, limited hardware
resources make it challenging to meet latency and energy
requirements when deploying complex deep learning mod-
els [3]. In particular, Graph Neural Networks (GNNs) have
recently excelled in processing irregular data structures, mak-
ing them a popular choice for graph-related applications in
edge scenarios, such as point cloud processing [4] and natural
language processing [5]. Additionally, the rising popularity
of various sensors in mobile devices also encourages the
deployment of GNNs to the wireless network edge for real-
time sensing and interaction. For instance, autonomous drones
require immediate obstacle detection from point clouds [6],
where the latency of cloud communication is unacceptable.
Likewise, executing speech-based interaction locally for smart
assistants [7] is crucial for safeguarding user privacy. However,
the significant computational cost of GNNs and the limited
hardware resources on edge devices pose major challenges
to meeting these strict real-time and energy requirements,
severely limiting their application in real-world edge scenarios.
This is demonstrated by deploying the popular point cloud
processing model DGCNN [8] on a Raspberry Pi 3B, which
achieves less than 0.3 fps, far below practical requirements
(typically above 30 fps [9]).
Research efforts have been made to address the inefficiency
of GNNs on edge devices. [4, 10] reduced GNN computation
by manually simplifying the model structure. Meanwhile,
HGNAS [11] and [12] adopted a more efficient hardware-
aware neural architecture search (NAS) approach to design
hardware-friendly GNNs for edge