Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?

Can T abPFN Compete with GNNs f or Node Classiﬁcation via Graph T ab ularization? Jeongwhan Choi KAIST jeongwhan.choi@kaist.ac.kr W oosung Kang KAIST wskang@kaist.ac.kr Minseo Kim KAIST evlingbling@kaist.ac.kr Jongwoo Kim KAIST gsds4885@kaist.ac.kr Noseong Park KAIST noseong@kaist.ac.kr Abstract Foundation models pretrained on large data ha ve demonstrated remarkable zero- shot generalization capabilities across domains. Building on the success of T abPFN for tabular data and its recent extension to time series, we inv estigate whether graph node classiﬁcation can be effecti vely reformulated as a tabular learning problem. W e introduce T abPFN-GN, which transforms graph data into tabular features by extracting node attrib utes, structural properties, positional en- codings, and optionally smoothed neighborhood features. This enables T abPFN to perform direct node classiﬁcation without an y graph-speciﬁc training or lan- guage model dependencies. Our experiments on 12 benchmark datasets re veal that T abPFN-GN achiev es competitiv e performance with GNNs on homophilous graphs and consistently outperforms them on heterophilous graphs. These results demonstrate that principled feature engineering can bridge the gap between tab- ular and graph domains, pro viding a practical alternati ve to task-speciﬁc GNN training and LLM-dependent graph foundation models. 1 Introduction Large-scale pretrained models trained on large datasets, such as foundation and large language models (LLMs) [ 1 ], hav e gained popularity across di verse domains, including text [ 2 – 4 ], images [ 5 , 6 ], and time series [ 7 – 9 ], due to their ability to make accurate predictions with minimal ﬁne-tuning on speciﬁc datasets. There hav e been recent efforts to b uild graph foundation models that take a seemingly natural approach to le veraging LLMs [ 10 – 13 ]. Meanwhile, a similar paradigm but a different approach has been proposed in the tab ular domain. T abPFNs [ 14 , 15 ], trained only on synthetic data generating numerical and categorical features, achieve remarkable performance on tabular tasks without ﬁne-tuning or retraining on the target dataset. This success, particularly its recent extension to time series [ 16 , 17 ], suggests unexplored potential for graph learning. Graph neural networks (GNNs) require training and architectures for each new dataset, and compared to other ﬁelds, the potential of T abPFN to generalize to graph node classiﬁcation remains an untapped area in graph learning. Motivation 1: Limitations of LLM-dependent graph models. Recent graph foundation models fundamentally rely on LLMs to process node features [ 12 , 18 ]. This dependency restricts them to text-attrib uted graphs where each node must hav e meaningful textual descriptions [ 12 , 19 ]. Or they need textual instruction descriptions for prompt engineering [ 11 , 13 , 20 , 21 ]. Due to relying on LLMs, they require ef fort to create such textual descriptions, and some graph networks contain nodes with numerical features. Moreover , LLM-based approaches can introduce potential biases from pretrained language models. The ﬁeld needs graph learning methods that handle arbitrary feature types without relying on language models. Preprint. Preliminary work. Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? T able 1: Analogous feature tabularization strategies. Feature T ype T abPFN-TS [ 16 , 17 ] T abPFN-GN (Ours) Local Patterns Calendar features Degree, clustering, triangles Global Patterns Seasonal features Centrality (betweenness, PageRank) Position T emporal index, sine/cosine encoding LapPE, R WSE Smoothing Moving a verage Linear graph con volution Node features Structural features Positional encoding Class ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ? Train nodes T est nodes T abPFN-GN Node class predictions Figure 1: T abPFN-GN o vervie w . Graph nodes are transformed into tabular features with node attributes, structural properties, and positional encodings, enabling direct inference via T abPFN. Motivation 2: Success of T ab ularization in Time-Series T abPFN of fers an alternati ve paradigm. By training on millions of synthetic tabular datasets, it learns general classiﬁcation patterns that transfer to real data without ﬁne-tuning. T abPFN-TS [ 16 , 17 ] recently demonstrated that this capability extends to time series by encoding temporal patterns into tabular format (see T able 1 ), achie ving competitiv e forecasting performance. This success demonstrates that structured domains can be “tabularized” via appropriate feature engineering for T abPFN. As sho wn in T able 1 , we propose an analogous transformation for graphs: extracting local structural patterns, global network properties, and positional encodings. These moti vations lead us to question: “Can we tabularize graph information into tabular featur es such that T abPFN can achie ve competitive performance with GNNs without graph-speciﬁc training or LLM dependency?” W e propose T abPFN for graph node classiﬁcation ( T abPFN-GN ), which systematically transforms graph data into tabular representations for direct node classiﬁcation as sho wn in Fig. 1 . By encoding node attributes, structural properties, positional encodings, and optionally smoothed neighborhood features as tabular features, we enable T abPFN for node classiﬁcation. Our experiments demonstrate that T abPFN-GN achie ves competiti ve performance with GNNs on homophilous graphs and consis- tently outperforms them on heterophilous datasets, where the ﬂexibility to exclude neighborhood aggregation pro ves adv antageous. This success validates that principled tabularization can effecti vely capture graph structure. 2 Preliminaries & Related W ork Prior -data Fitted Network f or T abular Data. T abPFNv1 [ 14 ] presents a new paradigm via the prior-data ﬁtted network (PFN). It trains a transformer on millions of synthetic tabular data for in-context learning. This pre-training enables direct inference on small, real-world tabular data by lev eraging the learned prior knowledge. T abPFNv2 [ 15 ] extends this approach to handle larger datasets. For con venience, we will refer to both T abPFNv2 and T abPFNv1 as T abPFN. T o our knowledge, the only attempt to apply T abPFN to other domains is for time series forecasting. T abPFN-TS [ 16 , 17 ] analyzes time series via feature engineering and encodes temporal patterns as tabular features. This success motiv ates our exploration of graph-to-tab ular transformation. Graph Neural Networks f or Node Classiﬁcation. While GNNs [ 22 – 25 ] remain competiti ve on various graph tasks, they require dataset-speciﬁc training and architecture selection. Additionally , neighborhood aggre gation of GNNs sho ws stable performance on homophily graph benchmark datasets b ut struggles with heterophilous graphs [ 26 ]. As GNNs may not dominate all graph networks, lev eraging pretrained models such as T abPFN can bypass the need for architecture search. At the same time, we aim to verify their potential for node classiﬁcation. 2 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? Graph Foundation Models Recent graph foundation models lev erage LLMs. GraphGPT [ 13 ], GraphLLM [ 20 ], and LLA GA [ 11 ] con vert graphs to te xt descriptions, while frameworks that use text-attrib uted graph datasets [ 12 ], such as OF A [ 18 ], use LLMs to encode node features. These approaches le verage LLMs’ strengths and limitations, including their dependency on textual attributes. In contrast, our approach requires no LLMs and works with arbitrary node features. 3 Proposed Method 3.1 Graph T abularization f or T abPFN W e transform graph structure and attrib utes into tab ular features: original node features, structural features capturing connectivity patterns, positional encodings providing topological context, and optionally smoothed features from neighborhood aggregation (see T able 1 ). Node Attrib utes. W e preserve original node features when dimensionally feasible. For high- dimensional features that do not satisfy the constraints of T abPFNs, we apply a truncated singular value decomposition (SVD) to preserv e discriminati ve information. Structural F eatures. W e capture graph topology at local and global scales. Local structural features include de gree, clustering coef ﬁcient, and triangle (i.e., 3-clique) count to quantify neigh- borhood patterns. Global structural features consist of centrality measures (e.g., betweenness [ 27 ], PageRank [ 28 ]) that encode network-wide importance. Positional Encodings. In this study , we use either the LapPE [ 29 ] or the R WSE [ 29 ] as features. LapPE uses the ﬁrst k eigen vectors of the graph Laplacian to pro vide spectral coordinates. R WSE computes landing probabilities to encode multi-scale proximity relationships. These encodings distin- guish structurally different nodes with similar attrib utes. More details are provided in Appendix B . Final Set of F eatures. Our ﬁnal feature representation for each node v combines complementary views e xtracted from the graph structure G = ( V , E ) with normalized adjacency matrix ¯ A : x v = [ ϕ attr ( v ) ⊕ ϕ struct ( v , ¯ A ) ⊕ ϕ pos ( v , ¯ A ) ⊕ ϕ smooth ( v , ¯ A )] , (1) where ϕ attr ( v ) represents the raw node features, ϕ struct ( v , ¯ A ) captures both local patterns (degree, clustering coefﬁcient, triangle count) and global importance (betweenness, closeness, PageRank) computed from the adjacency matrix, ϕ pos ( v , ¯ A ) combines Laplacian PE and Random W alk SE deriv ed from the graph Laplacian, and ϕ smooth ( v , ¯ A ) optionally aggregates features from neighboring nodes through L -step linear graph con volutions [ 30 ] without any weight matrices. This tabularization preserves essential graph information while enabling direct inference through T abPFN. 3.2 Node Classiﬁcation with T abPFN W e directly input the features described in Sec. 3.1 into T abPFN for classiﬁcation. Gi ven training nodes with their tabularized features X train = { x i } i ∈V train and labels y train = { y i } i ∈V train , T abPFN performs in-context inference by learned patterns during pretraining. For each test node v ∈ V test with features x v , T abPFN outputs an approximate posterior predictiv e distribution p ( y | X train , y train , x v ) , providing node-speciﬁc calibrated class probabilities without training. W e follow the standard procedure of T abPFNv2 to apply z-normalization to all features. All other conﬁgurations are left at their default v alues. 4 Experiments Datasets. W e use both homophily and heterophily graph benchmark datasets for node classiﬁcation. For homophily graph datasets [ 22 , 31 , 32 ], we use Cora, Citeseer , Pubmed, WikiCS, Amazon- Computer , and Amazon-Photo. For heterophily graph datasets [ 26 , 33 ], we compare GNNs on Chameleon, Squirrel, Cornell, T exas, Actor , and Wisconsin. Evauluation Protocol. For Cora, Citeseer, and Pubmed, we follow the semi-supervised setting of Kipf and W elling [ 22 ] for data splits. W e adhere to the widely accepted practice of train- ing/validation/test splits of 60%/20%/20% and the accuracy metric [ 32 , 34 ]. Furthermore, we 3 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? T able 2: T est accuracy on homophilous graph benchmarks. Best and second-best are highlighted. Dataset Cora Citeseer Pubmed W ikiCS Computer Photo GCN 81.60 ± 0.40 71.60 ± 0.40 78.80 ± 0.60 77.47 ± 0.85 89.65 ± 0.52 92.70 ± 0.20 GraphSA GE 82.68 ± 0.47 71.93 ± 0.85 79.41 ± 0.53 74.77 ± 0.95 91.20 ± 0.29 94.59 ± 0.14 GA T 83.00 ± 0.70 72.10 ± 1.10 79.00 ± 0.40 76.91 ± 0.82 90.78 ± 0.13 93.87 ± 0.11 GraphGPS 82.84 ± 1.03 72.73 ± 1.23 79.94 ± 0.26 78.66 ± 0.49 91.19 ± 0.54 95.06 ± 0.13 T abPFN 57.30 ± 0.00 51.50 ± 0.00 65.30 ± 0.00 72.08 ± 0.59 76.70 ± 0.00 93.27 ± 0.00 GraphAny 79.38 ± 0.16 68.10 ± 0.04 76.30 ± 0.09 74.95 ± 0.61 83.04 ± 1.24 90.60 ± 0.82 T abPFN-GN 81.98 ± 0.45 72.14 ± 0.58 82.74 ± 0.10 79.40 ± 0.77 92.71 ± 0.03 93.55 ± 0.05 T able 3: T est accuracy on heterophilous graph benchmarks. Dataset Chameleon Squirrel Cornell T exas Actor W isconsin GCN 41.31 ± 3.05 38.67 ± 1.84 43.78 ± 3.15 59.73 ± 9.70 25.87 ± 1.21 47.65 ± 6.20 GraphSA GE 37.77 ± 4.14 36.09 ± 1.99 70.73 ± 6.59 60.20 ± 7.21 31.24 ± 1.71 41.15 ± 5.65 GA T 39.21 ± 3.08 35.62 ± 2.06 54.60 ± 7.90 60.54 ± 6.22 27.82 ± 0.28 44.31 ± 8.16 H2GCN 26.75 ± 3.64 35.10 ± 1.15 71.62 ± 5.57 79.73 ± 3.25 36.18 ± 0.45 77.57 ± 4.11 GPRGNN 39.93 ± 3.30 38.95 ± 1.99 80.27 ± 8.11 78.38 ± 4.36 35.30 ± 0.80 82.66 ± 5.62 T abPFN 45.16 ± 4.32 37.51 ± 1.27 72.70 ± 6.33 79.19 ± 3.83 36.37 ± 1.31 82.55 ± 4.15 GraphAny 39.98 ± 3.12 38.74 ± 2.01 65.94 ± 1.48 72.97 ± 2.71 28.60 ± 0.21 71.77 ± 5.98 T abPFN-GN 49.11 ± 4.34 46.66 ± 1.43 74.05 ± 6.96 80.81 ± 4.75 37.22 ± 1.08 85.10 ± 4.66 utilize the W ikiCS dataset and the splits provided in Rozemberczki et al. [ 31 ] . For Chameleon and Squirrel, we use the splits from Platonov et al. [ 33 ] , and for the other heterophilous datasets, we use the splits from Pei et al. [ 26 ]. More detailed settings are provided in Appendix C . Baseline. W e compare against standard GNNs (GCN [ 22 ], GraphSA GE [ 23 ], GA T [ 24 ]), GraphGPS [ 35 ], which combines Graph T ransformers with local GNNs, and specialized heterophilous models (H2GCN [ 36 ], GPRGNN [ 37 ]). For GraphAn y [ 38 ], we use their arxi v-pretrained checkpoint for heterophilous datasets. For fair comparison, we re-ev aluate GraphAny on Cora, Citeseer, and PubMed using our experimental setup. Empirical Comparison. As shown in T able 2 , T abPFN-GN achieves competiti ve performance on homogeneous benchmarks, ranking ﬁrst on Pubmed, W ikiCS, and Computer . As sho wn in T able 3 , for heterogeneous graphs, T abPFN-GN achiev es the best performance in all cases except Cornell. Furthermore, it consistently outperforms specialized models designed for such graphs, such as H2GCN and GPRGNN, except for Cornell. In particular , T abPFN-GN outperforms vanilla T abPFN on all datasets and consistently outperforms GraphAny . 5 Discussion and Conclusion Limitations. T abPFN’ s constraint on class numbers pre v ents application to datasets like ogbn-arxi v (40 classes) [ 39 ]. While T abPFN-GN excels on heterophilous graphs, the synthetic prior lacks explicit graph connectivity patterns, potentially limiting performance on strongly homophilous networks. Future work should explore pre-training with graph-aware synthetic datasets and comprehensive comparisons with LLM-based graph foundation models. Conclusion. W e introduced T abPFN-GN, demonstrating that graph node classiﬁcation can be effecti vely reformulated as tabular learning via principled feature engineering — combining posi- tional encodings, structural features, node attrib utes, and optional neighborhood aggre gation. This tabularization achie ves competitive performance without graph-speciﬁc training, particularly on heterophilous graphs. 4 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? References [1] T om Brown, Benjamin Mann, Nick Ryder , Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry , Amanda Askell, et al. Language models are few-shot learners. Advances in neur al information pr ocessing systems , 33:1877–1901, 2020. 1 [2] Alec Radford, Jeffre y W u, Rew on Child, David Luan, Dario Amodei, Ilya Sutske ver , et al. Language models are unsupervised multitask learners. OpenAI blog , 1(8):9, 2019. 1 [3] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Ke vin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv pr eprint arXiv:1909.11942 , 2019. [4] Mike Le wis, Y inhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy , V eselin Stoyanov , and Luke Zettlemo yer . BAR T: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension. In Pr oceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 7871–7880, 2020. 1 [5] Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, V asil Khalidov , Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby , et al. Dinov2: Learning robust visual features without supervision. arXiv pr eprint arXiv:2304.07193 , 2023. 1 [6] Y utong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar , Alan L Y uille, T rev or Darrell, Jitendra Malik, and Alexei A Efros. Sequential modeling enables scalable learning for large vision models. In Pr oceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , pages 22861–22872, 2024. 1 [7] Abhimanyu Das, W eihao K ong, Rajat Sen, and Y ichen Zhou. A decoder -only foundation model for time-series forecasting. In F orty-ﬁrst International Confer ence on Machine Learning , 2024. 1 [8] Xu Liu, Juncheng Liu, Gerald W oo, T aha Aksu, Y uxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Sav arese, Caiming Xiong, and Doyen Sahoo. Moirai-moe: Empowering time series foundation models with sparse mixture of experts. arXiv pr eprint arXiv:2410.10469 , 2024. [9] Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, and Sepp Hochreiter . T irex: Zero-shot forecasting across long and short horizons with enhanced in-conte xt learning. arXiv pr eprint arXiv:2505.23719 , 2025. 1 [10] Zhikai Chen, Haitao Mao, Jingzhe Liu, Y u Song, Bingheng Li, W ei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, et al. T ext-space graph foundation models: Comprehensiv e benchmarks and ne w insights. Advances in Neural Information Pr ocessing Systems , 37:7464– 7492, 2024. 1 [11] Runjin Chen, T ong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang W ang. Llaga: Large language and graph assistant. arXiv pr eprint arXiv:2402.08170 , 2024. 1 , 3 , 9 , 10 [12] Y uhan Li, Peisong W ang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, V ictor W Chan, and Jia Li. Glbench: A comprehensive benchmark for graph with large language models. Advances in Neural Information Pr ocessing Systems , 37:42349–42368, 2024. 1 , 3 , 9 , 10 [13] Jiabin T ang, Y uhao Y ang, W ei W ei, Lei Shi, Lixin Su, Suqi Cheng, Da wei Y in, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In Proceedings of the 47th International A CM SIGIR Conference on Resear ch and Development in Information Retrieval , pages 491–500, 2024. 1 , 3 [14] Noah Hollmann, Samuel Müller , Katharina Eggensperger , and Frank Hutter . T abPFN: A transformer that solv es small tabular classiﬁcation problems in a second. In The Ele venth International Conference on Learning Repr esentations , 2023. URL https://openreview. net/forum?id=cp5PvcI6w8_ . 1 , 2 [15] Noah Hollmann, Samuel Müller , Lennart Purucker , Arjun Krishnakumar, Max Körfer , Shi Bin Hoo, Robin T ibor Schirrmeister , and Frank Hutter . Accurate predictions on small data with a tabular foundation model. Natur e , 637(8045):319–326, 2025. 1 , 2 [16] Shi Bin Hoo, Samuel Müller , Da vid Salinas, and Frank Hutter . The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features. In NeurIPS W orkshop on T ime Series in the Age of Lar ge Models , 2024. 1 , 2 5 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? [17] Shi Bin Hoo, Samuel Müller , David Salinas, and Frank Hutter . From tables to time: How tabpfn- v2 outperforms specialized time series forecasting models. arXiv pr eprint arXiv:2501.02945 , 2025. 1 , 2 [18] Hao Liu, Jiarui Feng, Lecheng K ong, Ningyue Liang, Dacheng T ao, Y ixin Chen, and Muhan Zhang. One for all: T owards training one graph model for all classiﬁcation tasks. In The T welfth International Conference on Learning Repr esentations , 2024. URL https://openreview. net/forum?id=4IT2pgc9v6 . 1 , 3 , 9 , 10 [19] Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Y ann LeCun, and Bryan Hooi. Harnessing explanations: LLM-to-LM interpreter for enhanced text-attributed graph represen- tation learning. In The T welfth International Confer ence on Learning Repr esentations , 2024. URL https://openreview.net/forum?id=RXFVcynVe1 . 1 [20] Ziwei Chai, T ianjie Zhang, Liang W u, Kaiqiao Han, Xiaohai Hu, Xuanwen Huang, and Y ang Y ang. Graphllm: Boosting graph reasoning ability of large language model. arXiv preprint arXiv:2310.05845 , 2023. 1 , 3 [21] Ruosong Y e, Caiqi Zhang, Runhui W ang, Shuyuan Xu, and Y ongfeng Zhang. Language is all a graph needs. In EA CL (F indings) , 2024. 1 [22] Thomas N. Kipf and Max W elling. Semi-supervised classiﬁcation with graph con volutional networks. In Pr oceedings of the International Confer ence on Learning Representations (ICLR) , 2017. 2 , 3 , 4 , 10 [23] W ill Hamilton, Zhitao Y ing, and Jure Leskovec. Inducti ve representation learning on large graphs. In Advances in Neural Information Pr ocessing Systems (NeurIPS) , 2017. 4 [24] Petar V eli ˇ ckovi ´ c, Guillem Cucurull, Arantxa Casanov a, Adriana Romero, Pietro Liò, and Y oshua Bengio. Graph Attention Networks. In Pr oceedings of the International Confer ence on Learning Repr esentations (ICLR) , 2018. 4 , 10 [25] Jeongwhan Choi, Seoyoung Hong, Noseong Park, and Sung-Bae Cho. Gread: Graph neural reaction-diffusion netw orks. In International Confer ence on Machine Learning (ICML) , pages 5722–5747. PMLR, 2023. 2 [26] Hongbin Pei, Bingzhe W ei, K evin Chen-Chuan Chang, Y u Lei, and Bo Y ang. Geom-gcn: Geometric graph con volutional netw orks. arXiv pr eprint arXiv:2002.05287 , 2020. 2 , 3 , 4 [27] Ulrik Brandes. A faster algorithm for betweenness centrality . Journal of mathematical sociolo gy , 25(2):163–177, 2001. 3 [28] Lawrence Page, Sergey Brin, Rajeev Motwani, and T erry W inograd. The pagerank citation ranking: Bringing order to the web . T echnical report, Stanford InfoLab, 1999. 3 [29] Ladislav Rampášek, Michael Galkin, V ijay Prakash Dwi vedi, Anh T uan Luu, Guy W olf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer . Advances in Neural Information Pr ocessing Systems , 35:14501–14515, 2022. 3 [30] Felix W u, Tian yi Zhang, Amauri Holanda de Souza, Christopher Fifty , T ao Y u, and Kilian Q. W einberger . Simplifying graph con volutional networks. In International Confer ence on Mac hine Learning (ICML) , 2019. 3 [31] Benedek Rozemberczki, Carl Allen, and Rik Sarkar . Multi-scale attributed node embedding. Journal of Comple x Networks , 9(2):cnab014, 2021. 3 , 4 [32] Hamed Shirzad, Ameya V elingker , Balaji V enkatachalam, Danica J Sutherland, and Ali Kemal Sinop. Exphormer: Sparse transformers for graphs. In International Confer ence on Mac hine Learning , pages 31613–31632. PMLR, 2023. 3 [33] Oleg Platono v , Denis Kuznedele v , Michael Diskin, Artem Babenko, and Liudmila Prokhorenkov a. A critical look at the e v aluation of gnns under heterophily: Are we really making progress? arXiv pr eprint arXiv:2302.11640 , 2023. 3 , 4 [34] Chenhui Deng, Zichao Y ue, and Zhiru Zhang. Polynormer: Polynomial-expressi v e graph transformer in linear time. In The T welfth International Conference on Learning Repr esentations , 2024. URL https://openreview.net/forum?id=hmv1LpNfXa . 3 [35] Ladislav Rampášek, Michael Galkin, V ijay Prakash Dwi vedi, Anh T uan Luu, Guy W olf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer . In Advances in Neural Information Pr ocessing Systems (NeurIPS) , volume 35, pages 14501–14515, 2022. 4 , 8 6 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? [36] Jiong Zhu, Y ujun Y an, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Kout ra. Be- yond homophily in graph neural networks: Current limitations and effecti ve designs. Advances in neural information pr ocessing systems , 33:7793–7804, 2020. 4 [37] Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenko vic. Adaptiv e univ ersal generalized PageRank graph neural network. In Pr oceedings of the International Conference on Learning Repr esentations (ICLR) , 2021. 4 [38] Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin, Hesham Mostafa, Michael M. Bronstein, and Jian T ang. Fully-inducti ve node classiﬁcation on arbitrary graphs. In The Thirteenth International Confer ence on Learning Repr esentations , 2025. URL https://openreview.net/forum? id=1Qpt43cqhg . 4 [39] W eihua Hu, Matthias Fey , Hongyu Ren, Maho Nakata, Y uxiao Dong, and Jure Leskov ec. OGB- LSC: A lar ge-scale challenge for machine learning on graphs. arXiv pr eprint arXiv:2103.09430 , 2021. 4 [40] Lianghao Xia and Chao Huang. Anygraph: Graph foundation model in the wild. arXiv preprint arXiv:2408.10700 , 2024. 9 [41] Haihong Zhao, Aochuan Chen, Xiangguo Sun, Hong Cheng, and Jia Li. All in one and one for all: A simple yet effecti ve method to wards cross-domain graph pretraining. In Pr oceedings of the 30th A CM SIGKDD Confer ence on Knowledge Discovery and Data Mining , pages 4443–4454, 2024. 9 [42] Ben Finkelshtein, ˙ Ismail ˙ Ilkan Ceylan, Michael Bronstein, and Ron Le vie. Equiv ariance e verywhere all at once: A recipe for graph foundation models. arXiv pr eprint arXiv:2506.14291 , 2025. 9 [43] Ruosong Y e, Caiqi Zhang, Runhui W ang, Shuyuan Xu, and Y ongfeng Zhang. Language is all a graph needs. arXiv pr eprint arXiv:2308.07134 , 2023. 9 , 10 [44] Jianan Zhao, Le Zhuo, Y ikang Shen, Meng Qu, Kai Liu, Michael Bronstein, Zhaocheng Zhu, and Jian T ang. Graphtext: Graph reasoning in text space. arXiv pr eprint arXiv:2310.01089 , 2023. 9 , 10 [45] Dmitry Eremee v , Gleb Bazheno v , Oleg Platono v , Artem Babenko, and Liudmila Prokhorenkov a. T urning tabular foundation models into graph foundation models. arXiv pr eprint arXiv:2508.20906 , 2025. 9 [46] Adrian Hayler , Xingyue Huang, ˙ Ismail ˙ Ilkan Ceylan, Michael Bronstein, and Ben Finkelshtein. Of graphs and tables: Zero-shot node classiﬁcation with tabular foundation models. arXiv pr eprint arXiv:2509.07143 , 2025. 9 [47] Y uankai Luo, Lei Shi, and Xiao-Ming W u. Classic gnns are strong baselines: Reassessing gnns for node classiﬁcation. Advances in Neural Information Pr ocessing Systems , 37:97650–97669, 2024. 10 [48] Christopher Morris, Nils M. Kriege, Franka Bause, Kristian K ersting, Petra Mutzel, and Marion Neumann. T udataset: A collection of benchmark datasets for learning with graphs. In ICML 2020 W orkshop on Graph Repr esentation Learning and Beyond (GRL+ 2020) , 2020. URL www.graphlearning.io . 10 [49] Mitchell Keren T araday , Almog David, and Chaim Baskin. Sequential signal mixing aggregation for message passing graph neural networks. Advances in Neural Information Pr ocessing Systems , 37:93985–94021, 2024. 10 [50] Ke yulu Xu, W eihua Hu, Jure Leskov ec, and Stefanie Jegelka. How po werful are graph neural networks? arXiv preprint , 2018. 10 [51] Gabriele Corso, Luca Cav alleri, Dominique Beaini, Pietro Liò, and Petar V eli ˇ ckovi ´ c. Principal neighbourhood aggregation for graph nets. Advances in neural information pr ocessing systems , 33:13260–13271, 2020. 10 7 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? A Dataset Statistics W e list the dataset statistics we used in T ables 4 and 5 . T able 4: Homophily dataset statistics for node classiﬁcation benchmarks. Cora Citeseer Pubmed Compuiter Photo W ikiCS #Nodes 2,708 3,327 19,717 13,752 7,650 11,701 #Edges 5,278 4,676 44,327 245,861 119,081 216,123 #Features 1,433 3,703 500 767 745 300 #Classes 6 7 3 10 8 10 T able 5: Heterophily dataset statistics for node classiﬁcation benchmarks. T exas W isconsin Cornell Actor Squirrel Chameleon #Nodes 183 251 183 7,600 2,223 890 #Edges 295 466 280 26,752 46,998 8,854 #Features 1,703 1,703 1,703 931 2,089 2,325 #Classes 5 5 5 5 5 5 B Positional Encodings Laplacian Positional Encoding (LapPE). Giv en a graph G = ( V , E ) with adjacency matrix A and degree matrix D , the normalized graph Laplacian is deﬁned as: L = I − D − 1 / 2 AD − 1 / 2 = U Λ U T , (2) where U = [ u 1 , u 2 , ..., u n ] contains orthonormal eigenv ectors and Λ is the diagonal matrix of eigen v alues 0 = λ 1 ≤ λ 2 ≤ ... ≤ λ n ≤ 2 . LapPE [ 35 ] uses the ﬁrst k non-trivial eigen v ectors as positional features for node v : LapPE ( v ) = [ u 2 ( v ) , u 3 ( v ) , ..., u k +1 ( v )] ∈ R k . (3) These eigen vectors provide a spectral coordinate system where geometrically close nodes have similar encodings. Random W alk Structural Encoding (R WSE). R WSE [ 35 ] encodes the probabilities of random walks starting from each node. Let P = D − 1 A be the transition matrix. The probability of a random walk from node v returning to itself in exactly i steps is: p i ( v ) = [ P i ] v ,v = diag ( P i )[ v ] . (4) R WSE computes these probabilities for walks of different lengths: R WSE ( v ) = [ p 1 ( v ) , p 2 ( v ) , ..., p k ( v )] ∈ R k . (5) This encoding captures multi-scale structural information: p 1 ( v ) reﬂects immediate neighborhood density (related to degree), while larger i values capture broader topological patterns and community structures. C Detailed Experimental Settings Hardwar e and Software Speciﬁcations. Our implementation is based on P Y G and T A B P F N . W e run the experiments on a single N V I D I A R TX A6000 GPU with C U DA 12.4, N V I D I A Driver 550.54.14, and an i9 CPU. 8 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? Hyperparameters Conﬁgurations. W e conducted experiments with the following hyperparameter search space: • Truncated SVD dimensions: { None, 16, 32, 64, 128, 256 } • PE type: { Laplacian PE, Random W alk SE } • Dimensions of PE: { 4 , 8 , 16 , 32 , 64 } • Local structural features: Degree, clustering coefﬁcient, triangle count • Global structural features: Betweenness centrality , PageRank • L -steps linear graph con v olutions: { 0 , 1 , 2 } Featur e Selection Strategy . Our framew ork allows ﬂexible feature combination — we can use all feature types comprehensiv ely or selectiv ely choose subsets based on dataset characteristics. For datasets with suf ﬁcient original node features (e.g., Citeseer with 3,703 features), we do not apply truncated SVD for dimensionality reduction, preserving the full feature information. For heterophilous datasets where neighborhood aggre gation assumptions are violated, we e xclude smoothed features from linear graph con volutions (i.e., set L = 0 ). T abPFN-GN Inference Protocol. For T abPFN-GN’ s inference interface, we strictly maintain the integrity of the train/v alidation/test split. Only the training nodes with their labels ( X train , y train ) are provided as conte xt to T abPFN-GN. The validation and test nodes are treated as query nodes, with T abPFN-GN predicting their labels based only on the training context. W e ensure no label leakage by ne ver e xposing v alidation or test labels during inference, maintaining f air comparison with supervised GNN baselines that follow the same data split protocol. D Additional Related W ork Recent adv ances in graph foundation models hav e explored v arious directions tow ard generalizable and zero-shot graph learning. AnyGraph [ 40 ] addresses the challenge of distribution shifts in graph data by employing a Mixture-of-Experts (MoE) architecture with dynamic routing, resulting in strong zero-shot performance and fast adaptation to ne w datasets. GCOPE [ 41 ] enables uniﬁed pretraining across multiple graph domains by linking datasets with learnable coordinator nodes and aligning features via SVD, which mitigates the ne gati ve transfer of isolated pretraining and yields strong fe w-shot node-classiﬁcation transfer . The TS-GNNs framework [ 42 ] introduces a recipe for building graph foundation models based on a ‘triple-symmetry’ principle: equi variance to node and label permutations, and in v ariance to feature permutations, thereby achie ving strong zero-shot generalization across div erse datasets. Sev eral studies hav e explored language–graph integration [ 12 ]. Y e et al. [ 43 ] proposed InstructGLM, a framew ork that represents graph structures through ﬂexible, scalable natural language descriptions. Instruction-ﬁnetuning an LLM with these descriptiv e prompts demonstrates superior performance ov er traditional GNN baselines on node classiﬁcation and link prediction. GraphT ext [ 44 ] translates graphs into natural language by constructing a graph-syntax tree ov er node attributes and relationships and then trav ersing it to produce a textual prompt that supports training-free reasoning via in-context learning, while also being adaptable to instruction-tuning. LLaGA [ 11 ] integrates LLMs with graph data by reor ganizing nodes into structure-a ware sequences and mapping them into the token embedding space with a versatile projector . This approach allows a single general-purpose model to achiev e strong performance across v arious graph tasks and datasets, ev en outperforming specialized GNNs in both supervised and zero-shot scenarios. One for All (OF A) framework [ 18 ] trains a single GNN by unifying cross-domain graphs as te xt-attributed graphs and standardizing node, link, and graph tasks via a nodes of interest subgraphs and their prompt nodes. It also introduces a novel graph prompting paradigm that enables in-context learning, allowing the model to achiev e few-shot and zero-shot capabilities without requiring ﬁne-tuning. In recently , there are two concurrent worksEremeev et al. [ 45 ] , Hayler et al. [ 46 ] that share conceptual closeness to our T abPFN-GN by reformulating graph learning as tabular inference. 9 Can T abPFN Compete with GNNs for Node Classiﬁcation via Graph T abularization? E Additional Studies E.1 Comparison with LLM-based Graph Methods on GLBench Follo wing the experimental setting of recent LLM-based graph methods, we conduct the supervised node classiﬁcation experiments on all the datasets in GLBench [ 12 ] 1 . T abPFN-GN achie ves competitive or superior performance compared to LLM-based graph foundation models without requiring text descriptions or language model dependencies. While LLM-based methods le verage pre-trained language kno wledge, T abPFN-GN le verages pre-trained patterns from massiv e synthetic prior data. T able 6: Accurac y under the supervised setting of GLBench [ 12 ]. Best and second-best are high- lighted. Dataset Cora Citeseer Pubmed W ikiCS InstructGLM [ 43 ] 69.10 51.87 71.26 45.73 GraphT ext [ 44 ] 76.21 59.43 75.11 67.35 LLaGA [ 11 ] 74.42 55.73 68.82 73.88 OF A [ 18 ] 75.24 73.04 75.61 77.34 T abPFN-GN (Ours) 76.45 63.33 66.74 77.72 E.2 Comparison with tuned GNNs The GNNs in the main experiments do not use residual connections or other speciﬁc design options, as in Luo et al. [ 47 ] . W e compare the results of Chameleon and Squirrel using tuned GNNs (e.g., GCN ∗ , GraphSA GE ∗ , GA T ∗ ) to compare the results of T abPFN-GN with those of Luo et al. [ 47 ]. T able 7: Compare with Luo et al. [ 47 ]’ s setting Dataset Chameleon Squirrel GCN ∗ 46.29 ± 3.40 45.01 ± 1.63 GA T ∗ 44.13 ± 4.17 41.73 ± 2.07 GraphSA GE ∗ 44.81 ± 7.04 40.78 ± 1.47 T abPFN-GN 49.11 ± 4.34 46.66 ± 1.43 E.3 A pplicability to Graph Classiﬁcation T abPFN-GN extends naturally to graph-le vel tasks by applying pooling operations (e.g., sum) ov er node features to obtain graph-lev el representations. W e ev aluate on the I M D B - B I N A RY (1,000 graphs), M U TAG (188 graphs), and E N Z Y M E S (600 graphs), P R OT E I N S (1,113 graphs) tasks from TUDatasets [ 48 ]. W e report the TU datasets’ accuracy mean and STD of a 10-fold cross-v alidation runs in T able 8 . W e follow the same settings as K eren T araday et al. [ 49 ] and use their results for the reported results. In T able 8 , T abPFN-GN achiev es the best performance on all 4 benchmarks, with particularly strong improv ements on E N Z Y M E S . T able 8: Results on TUDataset [ 48 ] Dataset P R OT E I N S E N Z Y M E S M U TAG I M D B - B I N A RY GCN [ 22 ] 75.39 ± 4.53 51.00 ± 10.63 84.23 ± 9.86 68.8 ± 3.49 GA T [ 24 ] 73.32 ± 3.08 50.67 ± 4.92 75.51 ± 11.72 51.0 ± 6.07 GIN [ 50 ] 73.30 ± 5.11 49.50 ± 4.58 86.45 ± 8.17 71.3 ± 3.97 PN A [ 51 ] 74.86 ± 4.57 52.50 ± 4.60 84.19 ± 9.44 71.9 ± 4.46 T abPFN-GN 76.80 ± 3.78 61.43 ± 5.76 88.36 ± 6.07 73.3 ± 3.83 1 https://github.com/NineAbyss/GLBench 10

Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment