Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are first propagated over the graph then mapped to responses via a nonlinear function. For least-squares estimation over GNNs with linear graph convolutions and a deep ReLU readout, we prove a sharp non-asymptotic risk bound that separates approximation, stochastic, and optimization errors. The bound makes explicit how performance scales with the fraction of labeled nodes and graph-induced dependence. Approximation guarantees are further derived for graph-smoothing followed by smooth nonlinear readouts, yielding convergence rates that recover classical nonparametric behavior under full supervision while characterizing performance when labels are scarce. Numerical experiments validate our theory, providing a systematic framework for understanding GNN performance and limitations.
Graph Neural Networks (GNNs) have become the default tool for semi-supervised prediction on graphs: given a graph G = (V, E) on n = |V| nodes with features X i , we observe a response variable Y i on a subset of nodes, and aim to predict the rest [35,49,52,66]. A central assumption in this setting is that the graph specifies how information propagates across nodes, which, if efficiently leveraged, can substantially boost prediction. GNNs have achieved strong performance for node-level prediction on interaction graphs such as social or hyperlink networks where they predict outcomes like website traffic, future engagement, or satisfaction from partial labels [4,12,34]. They are also increasingly used in spatially resolved omics, where nodes are spots or cells connected by spatial neighborhoods and the goal is to predict expensive assays (e.g., gene or protein measurements) from observed modalities by propagating local context through the graph [21,24]; we present real-data case studies in Section 4.
Semi-supervised learning on graphs has a rich and long history in the statistics literature. Classical graph semi-supervised learning is dominated by (i) Laplacian-based regularization [3] and (ii) label propagation [70,65]. While both of these approaches have been extensively studied and benefit from solid theoretical guarantees, they focus on spatial regularization, often ignoring node features.
By contrast, modern GNNs inject node features into propagation and deliver strong empirical performance across domains. One of the key ingredients of their success lies in the use of message-passing layers that perform a localized averaging of features, effectively acting as a learnable low-pass filter on the graph signal [11]. The propagation rule enables the model to learn representations that are smooth across the graph and discriminative in their features. [28] empirically showed that graph convolutional networks (GCNs), one of the earliest types of GNNs, significantly outperform manifold regularization and transductive SVMs, establishing GNNs as the dominant paradigm for graph semi-supervised learning. While a variety of GNN architectures have been proposed (e.g. [60,69]), they largely share the same spirit: all involve an initial aggregation of node information through a message-passing algorithm before its synthesis into an output via a readout step (see Appendix A for an extended discussion of related works). a graph-induced propagation step and a Hölder-smooth synthesizing (readout) function (Lemma 4).
(iii) Rates that expose label-scarcity and graph effects. Combining (i) and (ii), we derive explicit convergence rates for the least-squares GNN estimator (Theorem 2). With a properly chosen architecture (depth and width scaling with graph size), the estimator achieves a convergence rate governed by the smoothness of the underlying regression function and its intrinsic input dimension.
Our analysis builds on the oracle-inequality and approximation-theoretic framework developed for sparse deep ReLU networks in classical nonparametric regression, most notably [46]. The semi-supervised graph setting, however, introduces two core challenges that require novel analytical tools. First, responses are observed only on a random subset of nodes; we incorporate this missingness mechanism directly into the risk decomposition to quantify the effect of limited supervision. Second, graph propagation induces nontrivial statistical dependencies because predictions at each node rely on overlapping neighborhoods; we introduce a bounded receptive field assumption to model this graph-structured dependency and, via graph coloring, develop concentration arguments tailored to localized interactions. Combined with new metric entropy bounds for the graph-convolutional component and approximation guarantees for compositions of graph filters and Hölder-smooth functions, this framework yields explicit non-asymptotic risk bounds and convergence rates for least-squares estimation over GNN classes. Notably, this general rate recovers the optimal minimax rate for standard (non-graph) regression in the special case of full supervision and purely local node responses.
This article is structured as follows. In Section 2, we introduce the semi-supervised regression setting with graph-structured data. We analyze GNN-based estimation in Section 3, where, under a locality condition, we establish an oracle inequality for any given estimator and explicitly characterize the convergence rate of the least squares estimator in terms of the proportion of labeled nodes and the graph’s receptive field. Numerical experiments on both synthetic and real-world datasets are presented in Section 4. We conclude in Section 5 and all the related proofs are provided in the Appendix.
Notation: We set N 0 = {0, 1, 2, . . .}, N = {1, 2, . . .}, R + = (0, ∞), and [m] = {1, . . . , m}. In this paper, vectors and matrices are denoted by bold lowercase and uppercase letter
This content is AI-processed based on open access ArXiv data.