GraphNNK -- Graph Classification and Interpretability
Graph Neural Networks (GNNs) have become a standard approach for learning from graph-structured data. However, their reliance on parametric classifiers (most often linear softmax layers) limits interpretability and sometimes hinders generalization. Recent work on interpolation-based methods, particularly Non-Negative Kernel regression (NNK), has demonstrated that predictions can be expressed as convex combinations of similar training examples in the embedding space, yielding both theoretical results and interpretable explanations.
💡 Research Summary
The paper “GraphNNK – Graph Classification and Interpretability” investigates whether a non‑parametric, example‑driven classifier can replace the conventional soft‑max layer in graph neural networks (GNNs) without sacrificing performance, while simultaneously providing transparent, example‑based explanations. The authors focus on the Graph Isomorphism Network (GIN) as the backbone for learning graph‑level embeddings and then substitute the final linear classifier with Non‑Negative Kernel regression (NNK), an interpolation method that expresses a query point as a convex combination of a few informative training samples in a reproducing kernel Hilbert space (RKHS).
The background section outlines the prevalence of graph‑structured data across domains such as chemistry, citation networks, and social analysis, and highlights the expressive power of GIN, which uses sum aggregation and a learnable ε parameter to achieve near‑maximum discriminative capability among GNN architectures. However, the authors argue that the typical soft‑max classifier, despite its simplicity, offers limited interpretability because it aggregates all learned features into a single weight matrix, obscuring which training examples influence a particular decision. Existing explainability methods for GNNs (e.g., subgraph identification, attribution techniques, hierarchical pooling) either add computational overhead or compromise predictive performance.
NNK, originally proposed for deep learning models, addresses these issues by constructing a locally consistent interpolation in the kernel space. For a query embedding x, a candidate neighborhood S of k nearest training embeddings is selected (k‑nearest neighbor search is performed with FAISS). Each neighbor xi is mapped via a kernel function ϕ(·) into the RKHS, and the query is approximated as ϕ(x) ≈ ΦSθ, where θ ≥ 0 and Σθ = 1. The optimal θ is obtained by solving a constrained quadratic program that can be expressed purely in terms of kernel evaluations, avoiding explicit feature maps. The solution is typically sparse: only a subset of neighbors receives non‑zero weights, which directly indicates the most influential training graphs. For classification, the one‑hot label vectors of the active neighbors are combined using the normalized weights wi = θi / Σθj, yielding class probabilities that are convex combinations of real examples. The non‑negativity constraint guarantees that the interpolation stays within the local convex hull of the data manifold, which contributes to both stability and interpretability. The authors also discuss the geometric condition known as the Kernel Ratio Interval (KRI), which characterizes when two candidates can simultaneously be active NNK neighbors, providing a theoretical lens on the shape of the local polytope.
The proposed architecture consists of two stages. First, a GIN model is trained on the training set to produce fixed‑size graph embeddings (five GIN layers, hidden dimension 128, dropout 0.5, learning rate 1e‑3, batch size 128). After training, the GIN parameters are frozen and the embeddings are used as inputs to the NNK classifier. For each test graph, FAISS retrieves the 50 nearest training embeddings, a Cholesky‑based solver solves the quadratic program (θ ≥ 0), and the resulting weights are used to compute class probabilities. This pipeline requires no additional learnable parameters beyond the GIN and is computationally efficient because the quadratic problem is low‑dimensional (k = 50).
Experiments are conducted on the NCI1 benchmark from the TU collection, a standard dataset for graph classification where node features are absent and thus each node is assigned its degree as a feature. Two setups are compared under identical training conditions: (1) a supervised baseline with a linear soft‑max layer trained via cross‑entropy, and (2) the proposed NNK classifier applied post‑hoc to the same GIN embeddings. Results are reported for two evaluation points: the best validation checkpoint and the final training snapshot. At the best checkpoint, NNK consistently outperforms the supervised baseline across epochs; for example, at epoch 90 accuracy rises from 0.7786 (soft‑max) to 0.8273 (NNK), and macro‑F1 shows a similar gain. This demonstrates that when the embedding space is well‑structured—i.e., neighbors are geometrically coherent—NNK can exploit local similarity more effectively than a globally trained linear classifier. Conversely, on the final snapshot where the embedding may have over‑fitted or become less stable, NNK’s performance degrades relative to the soft‑max, highlighting its sensitivity to embedding quality.
The authors conclude that integrating NNK with GIN yields a graph classification system that maintains or improves predictive accuracy while providing explicit, example‑based explanations without additional training overhead. The empirical findings suggest that the success of NNK hinges on the geometric consistency of the learned representations; high‑quality embeddings lead to sparse, meaningful neighbor selections and robust predictions. Future work is proposed to explore adaptive non‑parametric classifiers that can dynamically adjust neighbor counts or kernel parameters, and to test the approach on larger, more diverse graph datasets.
In summary, the paper makes three key contributions: (1) it demonstrates that a non‑parametric NNK layer can replace the soft‑max classifier in a GIN‑based graph classification pipeline, (2) it provides a clear, mathematically grounded mechanism for interpreting each prediction through a small set of training graphs, and (3) it empirically validates that, given well‑structured embeddings, NNK can achieve competitive or superior accuracy on a standard benchmark. This work bridges the gap between high‑performing GNN representations and the growing demand for transparent, trustworthy AI in graph‑structured domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment