Less is More: Towards Simple Graph Contrastive Learning
Graph Contrastive Learning (GCL) has shown strong promise for unsupervised graph representation learning, yet its effectiveness on heterophilic graphs, where connected nodes often belong to different classes, remains limited. Most existing methods rely on complex augmentation schemes, intricate encoders, or negative sampling, which raises the question of whether such complexity is truly necessary in this challenging setting. In this work, we revisit the foundations of supervised and unsupervised learning on graphs and uncover a simple yet effective principle for GCL: mitigating node feature noise by aggregating it with structural features derived from the graph topology. This observation suggests that the original node features and the graph structure naturally provide two complementary views for contrastive learning. Building on this insight, we propose an embarrassingly simple GCL model that uses a GCN encoder to capture structural features and an MLP encoder to isolate node feature noise. Our design requires neither data augmentation nor negative sampling, yet achieves state-of-the-art results on heterophilic benchmarks with minimal computational and memory overhead, while also offering advantages in homophilic graphs in terms of complexity, scalability, and robustness. We provide theoretical justification for our approach and validate its effectiveness through extensive experiments, including robustness evaluations against both black-box and white-box adversarial attacks.
💡 Research Summary
The paper tackles the persistent challenge of graph contrastive learning (GCL) on heterophilic graphs, where neighboring nodes often belong to different classes. While many recent GCL methods have turned to increasingly elaborate data‑augmentation pipelines, sophisticated multi‑view encoders, or negative‑sampling strategies, the authors ask whether such complexity is truly required. By revisiting the fundamentals of supervised and unsupervised node classification, they identify a simple yet powerful principle: the performance of contrastive learning hinges on the relationship between two sources of noise—feature noise (the deviation of a node’s raw attributes from its class centroid) and structural noise (the distortion introduced when those attributes are propagated through the graph topology).
Theoretical analysis shows that if two views share highly correlated class centroids but have weakly correlated noise components, the aggregated representation will amplify the signal (larger centroid norm) while cancelling the noise. Proposition 1 formalizes this “centroid‑strength / noise‑cancellation” trade‑off, and Proposition 2 together with Observation 1 demonstrate that applying a k‑hop adjacency operator (i.e., a GCN) transforms raw feature noise into a distinct structural noise that becomes increasingly decorrelated as the number of hops grows. Crucially, when the graph construction is only loosely tied to the original features—as is typical for heterophilic datasets—this decorrelation is pronounced even with just two GCN layers (k = 2).
Guided by these insights, the authors propose an embarrassingly simple GCL framework: a two‑branch architecture consisting of (1) a multilayer perceptron (MLP) that processes the raw node features alone, thereby preserving pure feature noise, and (2) a shallow (2‑layer) Graph Convolutional Network (GCN) that aggregates information across edges, thus generating structural noise. No data augmentation, no edge‑perturbation, and no negative‑sampling are employed. The two embeddings are aligned using a standard InfoNCE contrastive loss, and the final node representation is obtained by a simple weighted average (β≈0.5) of the two views.
Empirical evaluation spans a wide range of heterophilic benchmarks (Wisconsin, Roman, etc.) and classic homophilic citation graphs (Cora, PubMed, Citeseer). On heterophilic datasets, the proposed GCN‑MLP model outperforms state‑of‑the‑art methods such as GraphACL, PolyGCL, EP‑AGCL, and SDMG, achieving higher classification accuracy while reducing per‑epoch training time by up to 70 % and cutting memory consumption dramatically. Notably, on the large‑scale Roman dataset, competing methods encounter out‑of‑memory failures, whereas the simple model runs comfortably on a single RTX A5000 GPU.
Robustness tests against both black‑box and white‑box adversarial attacks reveal that the GCN‑MLP approach retains higher accuracy under perturbations, suggesting that structural noise contributes to inherent attack resistance. Additional ablation studies confirm that the decorrelation between feature and structural noise is the key driver of performance gains, rather than the depth of the GCN or the specific choice of contrastive loss.
The paper’s contributions are threefold: (1) it introduces a minimalistic, augmentation‑free GCL paradigm grounded in a clear theoretical principle; (2) it provides rigorous justification for why contrasting raw features with graph‑propagated features yields low‑correlation noise and thus superior representations, especially on heterophilic graphs; (3) it demonstrates that this simplicity translates into state‑of‑the‑art accuracy, scalability, and robustness across diverse graph settings.
Future work may explore deeper or alternative propagation operators to further modulate structural noise, extend the two‑view scheme to semi‑supervised or multi‑task scenarios, and devise loss functions that explicitly penalize noise correlation. Overall, the study convincingly argues that “less is more” for graph contrastive learning, offering a practical blueprint for efficient and effective representation learning on complex graph data.
Comments & Academic Discussion
Loading comments...
Leave a Comment