Reading time: 36 minute
...

📝 Original Info

  • Title:
  • ArXiv ID: 2512.21106
  • Date:
  • Authors: Unknown

📝 Abstract

Graph-structured data exhibit substantial heterogeneity in where their predictive signals originate: in some domains, node-level semantics dominate, while in others, structural patterns play a central role. This structuresemantics heterogeneity implies that no graph learning model with a fixed inductive bias can generalize optimally across diverse graph domains. However, most existing methods address this challenge from the model side by incrementally injecting new inductive biases, which remains fundamentally limited given the open-ended diversity of real-world graphs. In this work, we take a data-centric perspective and treat node semantics as a task-adaptive variable. We propose a Data-Adaptive Semantic Refinement framework (DAS) for graph representation learning, which couples a fixed graph neural network (GNN) and a large language model (LLM) in a closed feedback loop. The GNN provides implicit supervisory signals to guide the semantic refinement of LLM, and the refined semantics are fed back to update the same graph learner. We evaluate our approach on both text-rich and text-free graphs. Results show consistent improvements on structuredominated graphs while remaining competitive on semantics-rich graphs, demonstrating the effectiveness of data-centric semantic adaptation under structure-semantics heterogeneity.

📄 Full Content

Graph-structured data (Wu et al., 2020;Zhang et al., 2020) are ubiquitous in the real world, arising in diverse domains such as citation networks, social platforms, molecular interaction systems, and transportation infrastructures. Despite sharing the same graph abstraction, these domains differ fundamentally in where their predictive signals originate. In citation networks, for example, each node represents a scientific document whose topical content and research focus are explicitly encoded in natural language. Here, node-level semantics-captured by titles, abstracts, or full texts-often provide the primary discriminative signal for downstream tasks, while citation links mainly serve as a contextual scaffold that propagates and regularizes semantic information (Greenberg, 2009;Zhao and Strotmann, 2015;Zhang et al., 2019b). By contrast, in domains such as molecular graphs or transportation networks, semantic attributes are weak or even absent (Wu et al., 2018). Instead, node identity and functionality are determined predominantly by structural roles and global topological patterns, such as motifs, connectivity configurations, and relative positional relationships (Chen et al., 2020;Zhang et al., 2024a;Wang et al., 2025c). These examples demonstrate that predictive signals in real-world graphs may be dominated by semantics, dominated by structure, or arise from their intricate interplay.

This observation leads to a fundamental and unavoidable consequence: the balance between semantics and structure is inherently domaindependent, rather than governed by a universal principle. As a result, no graph learning model with a fixed inductive bias can perform optimally across graph domains with drastically different structuresemantics regimes (Platonov et al., 2023).

However, translating this observation into a practical learning system remains challenging. For a new graph, the dominant source of predictive signal-whether driven by semantics, structure, or their interaction-is unknown a priori, yet both the model and the data representation must commit to specific inductive biases in advance. Modern GNNs encode fixed architectural preferences once chosen, favoring, for example, locality (Veličković et al., 2018), long-range dependencies (Xu et al., 2019;Rampášek et al., 2022), or substructure information (Wang et al., 2025b,d). Meanwhile, node representations-whether feature vectors (Kipf and Welling, 2017), textual embeddings (Wang et 2024b), or structural descriptors (Perozzi et al., 2014;Grover and Leskovec, 2016)-are typically constructed in a predefined manner and kept fixed throughout training. As a result, the learning system becomes implicitly specialized to a particular structure-semantics regime. When this specialization is mismatched with the true signal distribution of the target graph, performance degrades systematically, and adaptation in practice is often reduced to empirical model and feature selection rather than a principled mechanism.

To balance semantics and structure, most existing methods approach this problem primarily from the model side. One line of work adapts GNN architectures by redesigning message passing (Morris et al., 2019;Zhang et al., 2019a;Fan et al., 2022), incorporating adaptive aggregation (Ying et al., 2018), or injecting positional encodings (Murphy et al., 2019), thereby embedding different inductive biases into the model. Beyond architectural modifications, another line of work introduces external reasoning models (Chen et al., 2024b;Wang et al., 2023a), most notably large language models (LLMs) (Zhao et al., 2023b;Ye et al., 2025), which process graph structures and node attributes in textual form. In parallel, other methods rely on auxiliary models (Chen et al., 2024b) to generate additional semantic signals-such as synthetic attributes (He et al., 2023)-that are subsequently consumed by a downstream GNN. Despite their empirical success, these approaches fundamentally rely on incrementally injecting model-level inductive biases, which cannot guarantee universal adaptability across open-ended and structurally diverse graph domains.

In this work, we take a fundamentally different stance on structure-semantics heterogeneity by shifting the adaptation from the model to the data. Instead of continually expanding model-level inductive biases, we treat node semantics as a taskadaptive variable. This shift is motivated by the observation that the balance between structure and semantics is ultimately realized through the input representations consumed by the model, rather than through the architecture alone. As a result, misalignment on new graph domains often arises from fixed node semantics that fail to reflect the graphspecific source of predictive signal.

Building on this perspective, we propose Data-Adaptive Semantic Refinement (DAS), a datacentric, feedback-driven framework for iterative semantic adaptation. Starting from initial node descriptions or structure-derived verbalizations (Wang et al., 2025a), we train a fixed GNN for the downstream task and use its predictions as implicit supervision. A large language model then refines node semantics by conditioning on both structural context and model behavior, and the refined descriptions are fed back to update the same graph learner. By iterating this closed loop for a small number of rounds, DAS progressively aligns node semantics with the structure-semantics regime of the target graph without modifying the underlying model. We evaluate DAS on both text-attributed and text-free graphs, where it consistently improves performance on structure-dominated graphs while remaining competitive on semantics-rich graphs.

We consider a graph G = (V, E) with node set V and edge set E. Each node v ∈ V is associated with an initial description r v , which is either a natural language text (in text-attributed graphs) or a structure-derived verbalization (in text-free graphs). A subset of nodes V train ⊆ V is labeled with y v ∈ Y. Given a GNN g θ for node classification, we treat node semantics as adaptive variables. Our objective is to iteratively refine node descriptions {d v } v∈V from the initial inputs {r v }, such that the refined semantics better align with both the graph structure and the downstream prediction task. After T refinement steps, the final classifier g θ trained on {d (T ) v } is used for evaluation.

DAS is a data-centric, feedback-driven framework that adapts node semantics under structural context and task supervision while keeping the graph model fixed. Instead of modifying model architectures to handle heterogeneous structure-semantics regimes, DAS treats node descriptions as adaptive states that are iteratively refined through a closed loop between a GNN and an LLM.

As illustrated in Figure 2, DAS operates over T refinement iterations. At iteration t, each node v ∈ V is associated with a description d

, which is encoded into node features and fed into a fixed GNN g θ to produce predictions

The current descriptions and predictions are stored in a history buffer B (t) , from which an in-graph support set S (t)

v is retrieved for each node. Conditioned on d (t) v and S (t) v , the LLM refines the node semantics as

The refined descriptions are then fed back to the same GNN, completing one refinement iteration. Through this feedback-driven loop, DAS progressively aligns node semantics with the structuresemantics regime of the target graph. Unlike prior LLM-enhanced methods that rely on fixed prompts or exemplars (He et al., 2023;Chen et al., 2024b;Wang et al., 2025a), DAS enables task-conditioned semantic evolution guided by model behavior.

We construct initial node descriptions by expressing structural information in natural language, so that both semantic and structural cues can be processed in a unified textual space. Following Wang et al. (2025a), for each node v ∈ V we compute a small set of structural statistics, including degree, betweenness, closeness, clustering coefficient, and square clustering coefficient (Zhang and Luo, 2017;Saramäki et al., 2007;Zhang et al., 2008). We present a detailed discussion in Appendix C.

To eliminate scale variation across graphs, each statistic is converted into a percentile rank within the graph. These normalized values are then mapped into a concise structural summary t struct v via a fixed template, as shown in Appendix I. For text-attributed graphs, we set d

while for text-free graphs we use d

This design expresses both semantic and structural information in a single textual modality, enabling consistent encoding and subsequent refinement.

We maintain a model-conditioned memory to explicitly represent how node semantics, graph structure, and task predictions interact during each refinement stage. At iteration t, the memory is denoted as

v } v∈V , which stores node-level states induced by the current descriptions {d (t) v } under the fixed GNN g θ . For each node v ∈ V, the memory stores a joint state triple

where

v is the current textual description, s v denotes a structure-oriented embedding encoding the node’s topological role via struc2vec (Ribeiro et al., 2017), and p (t) v is the predictive distribution produced by the GNN. This triple defines a semanticstructural-predictive state for each node under the current representation.

Memory Construction and Update. At t = 0, the initial memory B (0) is constructed from the initial descriptions {d (0) v }, the fixed structural embeddings {s v }, and the GNN predictions obtained using these features. At each subsequent iteration t > 0, the descriptions are updated to {d v } are then used to overwrite the previous memory entries, yielding an updated memory B (t) . This rolling update scheme ensures that the memory always reflects the latest alignment between node semantics, graph structure, and task-specific behavior.

Memory Retrieval. Given the memory B (t) , the goal of memory retrieval is to identify, for each target node v, a small set of in-graph exemplars S (t) v that are simultaneously semantically relevant, structurally aligned, and reliable under the current classifier. These exemplars serve as task-aware references for subsequent semantic refinement.

To this end, the memory induces a joint semanticstructural similarity space. Let t (t) v denote the embedding of the current description d (t) v produced by the text encoder (Wang et al., 2020), and let s v denote the structural embedding encoding the topological role of node v. For any pair of nodes (v, u), we define the semantic similarity

and the structural similarity

These two components are combined into a joint similarity score

where α ∈ [0, 1] controls the trade-off between semantic and structural proximity. This design allows the retriever to adapt to different graph regimes, emphasizing textual semantics in text-rich graphs and structural roles in topology-dominated graphs.

For each target node v, all candidate nodes u ∈ V \ {v} are ranked according to S(v, u). From the top-ranked candidates, we further incorporate model confidence stored in the predictive state p (t) u to filter unreliable references. Specifically, nodes with low predictive entropy or consistently correct predictions are preferred. The resulting exemplar set S (t) v thus consists of in-graph references that are not only close to v in the joint semantic-structural space, but also stable with respect to the current task model.

Formally, for each target node v, we first rank all candidate nodes u ∈ V \ {v} by the joint similarity score S(v, u). Let C (t) v denote the top-K candidates under this ranking. We then define a confidence score for each candidate node u based on the predictive distribution p (t) u , for example using the normalized entropy

The final exemplar set is selected as

where τ is a confidence threshold. This ensures that selected exemplars are both similar to v in the joint semantic-structural space and reliable under the current classifier.

Given the history memory B (t) at iteration t, DAS updates node semantics through a in-context refinement operator. This operator defines how the current description of each node is locally reshaped under task-aligned, in-graph references.

Semantic Refinement Operator. For each node v ∈ V, an exemplar set t) is first retrieved based on joint semantic-structural similarity and model stability. The large language model M is then applied as a conditional refinement operator d

denotes the refined semantic description at iteration t + 1.

The LLM is instructed to perform semantic reweighting and compression rather than knowledge expansion. Specifically, it reconstructs d v is drawn from the same graph and filtered by the current classifier, the refinement is implicitly shaped by both structural context and task supervision.

Parallel Update. The refinement in Eq. ( 10) is applied to all nodes in parallel, yielding

These updated descriptions are re-encoded as node features for the next training stage of the fixed GNN, which in turn updates the predictive state used to construct the next memory B (t+1) .

Iterative Semantic Evolution. Starting from the initialization d

Early iterations primarily reflect coarse lexical and structural cues, while later iterations progressively concentrate on task-discriminative semantics shaped by model feedback. After T iterations, the final descriptions {d

v } are used for evaluation.

To further explain why DAS benefits from iterative semantic refinement, we provide a theoretical analysis in Appendix B. Specifically, we formalize DAS Baselines. For text-attributed graphs, we compare Raw Feat. (bag-of-words/TF-IDF), Raw Text (use original text), TAPE (He et al., 2024), KEA (Chen et al., 2024b), and TANS (Wang et al., 2025a). For text-free graphs, we compare handcrafted topology features-Node Degree, Eigenvector (Dwivedi et al., 2023), Random Walk (Dwivedi et al., 2022), and TANS (Wang et al., 2025a). In all text-attributed baselines, generated texts are appended to the original node text and encoded by the same sentence encoder for fairness, while for textfree graphs, the generated texts are used directly as node descriptions.

Evaluation Protocol. Unless otherwise specified, we focus on node classification with a GCN backbone (Kipf and Welling, 2017). We also report results with GAT (Veličković et al., 2018) and MLP in the text-attributed setting, following Wang et al. (2025a). For single-graph learning, we adopt

We evaluate DAS on three text-free airport graphs using GCN as the backbone. Accuracy (%)

Dataset USA Europe Cora further increases accuracy in the majority of sourcetarget pairs. For example, when transferring from USA to Europe or Brazil, and from Europe to Brazil, we achieve the best performance among all methods. The performance gains of our method are especially notable on more challenging transfers involving Brazil, where structural roles differ markedly across graphs. These improvements suggest that iteratively refined, structure-aware node texts provide a more transferable representation than synthesized descriptions.

This section analyzes the key mechanisms underlying DAS beyond aggregate performance. We focus on how iterative refinement and history-guided exemplar retrieval shape semantic evolution, and characterize when these mechanisms succeed or fail. We provide additional analysis in Appendix F

The Role of Iterative Refinement. We analyze whether iterative refinement is necessary beyond a single-round semantic update. We vary the number of refinement iterations T ∈ {1, 2, 3} on a text-attributed graph (Cora) and two text-free graphs (USA, Europe). Figure 3 rather than one-shot rewriting. This trend suggests that successive rounds allow the model-conditioned memory to accumulate more reliable exemplars, which in turn guide the LLM toward increasingly task-aligned descriptions. The marginal gain from T =2 to T =3 is small, indicating that a limited number of iterations suffices to realize most of the benefit, while keeping LLM cost manageable.

The Role of Model-Conditioned Memory. We examine the role of the model-conditioned memory in exemplar retrieval for semantic refinement. To isolate its effect, we compare DAS with three ablated variants: Random exemplar selection, Structure-only retrieval based solely on structural similarity, and Text-only retrieval based solely on semantic similarity. All variants use the same refinement procedure and prompt format.

Results in Table 5 show that joint semanticstructural retrieval in DAS consistently outperforms all ablated variants on both Cora and USA. On Cora, text-only retrieval performs competitively, reflecting the strong semantic signal in raw node texts, while structure-only retrieval is weaker. In contrast, on the text-free USA graph, both text-only and structure-only retrieval degrade performance, indicating that neither modality alone is sufficient for stable refinement. Random exemplar selection performs worst in both cases. These results demonstrate that effective refinement requires taskrelevant exemplars that are aligned in both semantic content and structural role. The joint retrieval mechanism is particularly critical when node semantics must be induced from topology, where relying on a single modality can introduce noisy or misleading in-context signals.

Why DAS Succeeds: Semantic Sharpening and Role Abstraction. DAS is most effective when iterative refinement sharpens task-discriminative evidence already latent in the input representations. On text-attributed graphs, successful refinement makes class-consistent technical cues more explicit, leading to reduced predictive entropy and corrections toward the true label. On text-free graphs, DAS succeeds when raw topological statistics are reorganized into a coherent semantic role interpretation (e.g., distinguishing regionally embedded nodes from global connectors based on clustering and betweenness). In both cases, refinement aligns node semantics more closely with the structuresemantics regime of the graph, enabling the classifier to make more confident and accurate predictions. Representative examples are provided in Table 7 in the appendix.

When DAS Fails: Drift and Over-Confidence. Failure cases reveal inherent limitations of LLMbased semantic refinement. One common failure mode is label drift, where rewrites improve fluency without introducing additional discriminative evidence, causing predictions to shift toward a semantically adjacent but incorrect class. Another failure mode is over-confidence, in which refinement reduces predictive entropy while the prediction itself remains incorrect. On text-free graphs, we additionally observe occasional attribute drift, where numeric structural attributes are subtly altered during rewriting, raising faithfulness concerns even when predictive accuracy improves. These failures highlight that effective refinement depends on maintaining a tight coupling between generated semantics and the underlying structural evidence. See Table 8 in the appendix for examples.

We provide a more detailed discussion of related work in Appendix A.

GNNs and Fixed Inductive Bias. Classical GNNs learn representations through local message passing, thereby encoding fixed inductive biases such as locality and homophily (Kipf and Welling, 2017;Hamilton et al., 2017;Veličković et al., 2018;Xu et al., 2019). Structure-only methods based on random walks or structural roles further impose predefined topological assumptions (Perozzi et al., 2014;Grover and Leskovec, 2016;Ribeiro et al., 2017). While effective in specific regimes, these methods rely on fixed priors that do not adapt to heterogeneous structure-semantics distributions across graph domains.

Model-Centric Refinement. Most existing attempts to address structure-semantics heterogeneity remain model-centric. This includes strengthening GNN architectures with richer structural priors (Maron et al., 2018;Murphy et al., 2019;Jin et al., 2022), treating LLMs as direct graph reasoners via graph-to-text serialization (Zhao et al., 2023a;Kong et al., 2024;Chen et al., 2024a), and introducing auxiliary models to enrich node representations with external semantic signals (Yao et al., 2019;He et al., 2023;Yang et al., 2021). Despite their diversity, these approaches all inject additional inductive biases from the model side, and the construction of node representations remains largely static once the model is specified.

Existing data-centric methods primarily focus on graph augmentation, structure learning, or pseudolabeling for robustness and generalization (You et al., 2020;Jin et al., 2020;Chen et al., 2023;He et al., 2023;Chen et al., 2024b), rather than on adapting node semantics to task-specific structuresemantics regimes. In contrast, our work treats node semantics as task-adaptive variables and proposes a feedback-driven framework for iterative, structure-aware semantic refinement, providing a fundamentally different data-centric perspective for handling structure-semantics heterogeneity.

In this work, we proposed DAS, a data-centric framework for iterative, structure-aware node semantic refinement on graphs. By coupling a fixed GNN with a large language model through a model-conditioned history memory, DAS enables node semantics to be progressively adapted under joint structural context and task feedback. Experiments on both text-rich and text-free graphs show that DAS consistently improves over strong LLMas-enhancer baselines in structure-dominated settings while remaining competitive in semanticsrich regimes. More broadly, this work highlights a new direction for graph learning, where input representations are treated as dynamic, task-adaptive states rather than static features.

Although DAS provides a flexible data-centric mechanism for adapting node semantics to varying graph regimes, several limitations remain. First, the approach incurs non-trivial computational overhead because each refinement round requires a separate LLM inference pass over all nodes. While our experiments suggest that a small number of refinement iterations suffice on medium-sized graphs, scaling DAS to very large graphs or to settings requiring repeated retraining would be challenging without further optimization or model compression strategies. To this end, we can modify its refinement process refining only uncertain or representative nodes, offering a potential path toward scalability on larger graphs. Second, the quality and faithfulness of refined descriptions depend on the underlying language model. Weaker LLMs may fail to preserve structural cues, and even strong LLMs can occasionally introduce factual inconsistencies. Although our prompts explicitly discourage hallucination and require strict adherence to the original information, semantic drift may still occur in some cases. Finally, our study focuses primarily on node classification. Whether the same refinement dynamics hold for tasks such as link prediction, clustering, or graph-level classification remains an open question and a promising area for future work.

This work relies on large language models to generate or refine textual node descriptions. As with any LLM-based system, the outputs may inherit or amplify biases present in the pretraining data. When DAS is applied to graphs whose nodes represent people or sensitive entities, refined descriptions could potentially introduce or reinforce demographic stereotypes, even when the graph structure itself does not encode such information. Care should therefore be taken when deploying the method in sensitive domains. DAS also requires repeated LLM calls during iterative refinement. Depending on the model and API used, this may entail substantial computational and environmental cost. Although our experiments rely on a small number of refinement rounds, practitioners should be mindful of the carbon footprint and financial cost associated with large-scale deployment. Finally, DAS modifies node-level textual descriptions during training. While these refined descriptions are not intended for human consumption and are not used to produce new factual knowledge, they may nonetheless be misinterpreted as authoritative explanations of the underlying graph if presented without context. We therefore recommend that systems built on DAS avoid exposing refined descriptions directly to end users unless appropriate disclaimers are provided.

A.1 GNNs with Fixed Inductive Bias Classical graph representation learning is largely built upon fixed inductive biases that encode how structural and semantic information is propagated and aggregated over the graph. Early messagepassing GNNs (Kipf and Welling, 2017;Hamilton et al., 2017;Veličković et al., 2018) propagate input node features through local neighborhood aggregation to learn task-specific representations. By design, these architectures favor locality and homophily (Xu et al., 2019;Luan et al., 2022), thereby imposing a strong but fixed prior on how predictive signals are assumed to distribute over the graph. Complementary to message-passing models, structure-only methods characterize nodes by their positional or role similarity using random walks and structural homophily. Representative approaches include DeepWalk (Perozzi et al., 2014), node2vec (Grover andLeskovec, 2016), andstruc2vec (Ribeiro et al., 2017), which learn embeddings purely from graph topology without relying on node semantics. While highly effective in structure-dominated settings, these methods likewise rely on pre-specified structural assumptions that remain fixed across graph domains. Motivated by this limitation, a large body of subsequent work has sought to address structure-semantics mismatch primarily from the model side, by designing new architectures and learning mechanisms with enhanced inductive biases.

Advanced GNN Model Architecture. A prominent line of model-centric approaches seeks to address structure-semantics heterogeneity by directly strengthening the inductive bias of GNN architectures (Xue et al., 2024;Han et al., 2024). These methods go beyond standard message passing by encoding richer structural priors into the model design. For instance, Maron et al. ( 2018 2019) introduce positional and relational encodings to enhance representational power. Another line of work leverages random walk kernels to guide the message passing process, further enriching the inductive bias of existing GNNs (Jin et al., 2022;Tönshoff et al., 2021;Wang et al., 2025b,d). Despite improved expressivity, these models remain fundamentally model-centric: the inductive biases are still explicitly predefined by architecture design. Moreover, many remain theoretically bounded by the k-WL hierarchy (Zhang et al., 2024a), suggesting that architectural enhancement alone cannot offer a principled solution to the open-ended diversity of realworld graphs.

LLMs as Reasoners. Another emerging modelcentric paradigm treats LLMs as direct graph reasoners. These methods linearize graph structures and node attributes into natural language prompts and rely on the general reasoning capabilities of LLMs for training-free or lightly supervised graph classification and question answering (Zhao et al., 2023a;Guo et al., 2023;Wang et al., 2023a;Kong et al., 2024;Chen et al., 2024a). For example, Wang et al. (2023a) describes graphs in natural language and applies LLMs to solve basic graph reasoning tasks, while GOFA (Kong et al., 2024) and LLaGA (Chen et al., 2024a) operate over serialized graph representations or graph embeddings for downstream inference. By bypassing explicit message passing, these approaches effectively replace graph-specific inductive biases with the intrinsic reasoning priors of LLMs. However, this paradigm remains model-centric: structural information is processed solely according to the LLM and the serialization scheme, and context length limits together with the loss of explicit topology constrain scalability and long-range structural modeling.

Beyond architectural modification and languagebased reasoning, another class of model-centric approaches introduces auxiliary models to enrich the input representations of graph learners. Early work such as TextGCN (Yao et al., 2019) demonstrates the benefit of incorporating external textual semantics into graph learning. More recently, with the advent of LLMs, a growing body of methods leverage LLMs as semantic enhancers to generate or refine node descriptions for downstream GNNs (Chen et al., 2024c;Zhang et al., 2024b;Yan et al., 2023;He et al., 2023;Chen et al., 2024b;Wang et al., 2025a;Yang et al., 2024;Fang et al., 2024). In parallel, some works align auxiliary models with GNNs via joint training or embedding alignment (Yang et al., 2021;Zhao et al., 2022;Wen and Fang, 2023). Despite their effectiveness, these approaches remain model-centric: auxiliary models inject additional semantic or structural inductive biases, while the resulting node representations are typically treated as static inputs by the downstream GNN rather than being refined under task-driven feedback.

Limitations of Model-Centric Methods. Despite their empirical success, the above paradigms share a fundamental commonality: they all address structure-semantics heterogeneity by injecting additional inductive biases from the model side. Whether through architectural design, languagebased reasoning, or auxiliary semantic enhancement, the manner in which semantic and structural information is combined is still determined by pre-specified model mechanisms. However, realworld graph distributions are open-ended and structurally diverse, making it fundamentally impossible for any finite collection of model-level biases to guarantee universal adaptability. Moreover, most model-centric approaches construct node representations in a largely static manner with respect to downstream learning dynamics, limiting their ability to adapt data representations to graph-specific structure-semantics regimes.

Beyond modifying graph learning models, another line of research adopts a data-centric perspective by directly manipulating the graph data or input representations. Most existing data-centric approaches are developed primarily for representation robustness, regularization, or generalization, rather than for explicitly addressing structure-semantics heterogeneity.

Graph Data Augmentation. A large body of work focuses on graph data augmentation, where node features or graph structures are perturbed to construct multiple views of the same graph for invariant representation learning (Zhao et al., 2021b;Zhu et al., 2021;Suresh et al., 2021;You et al., 2020You et al., , 2021;;Wang et al., 2023bWang et al., , 2024a)). For example, GraphCL (You et al., 2020) applies a set of predefined structural and feature augmentations to generate contrastive graph views, enabling a model to capture augmentation-invariant information. These methods are effective for improving robustness and transferability, but the underlying node semantics are not explicitly refined toward task-specific semantic-structural alignment.

Graph Structure Learning. Another line of data-centric work focuses on graph structure learn-ing, which aims to optimize or reconstruct graph connectivity to better support GNN training (Jin et al., 2020;Liu et al., 2022;Zhao et al., 2021a;Zhang et al., 2025;Perozzi et al., 2024). These approaches adapt the graph topology by removing spurious edges or adding task-relevant connections, thereby modifying the structural substrate on which message passing operates. However, they primarily operate at the level of graph structure and do not directly model how node semantics should be adapted under different structuresemantics regimes.

Pseudo-Labeling. In addition, several studies explore pseudo-labeling and self-training schemes to guide representation learning in low-label settings (Chen et al., 2023). While effective for label efficiency, such methods treat node features as fixed inputs and do not address the problem of task-driven semantic adaptation under structural context.

The Position of Our Work. In contrast to the above data-centric paradigms, our work targets a fundamentally different objective. Rather than augmenting data for invariance, modifying graph structure, or propagating pseudo-label supervision, we focus on task-driven refinement of node semantics themselves. Our method treats node semantics as adaptive variables that are progressively reshaped under structural context and predictive feedback from a downstream GNN, enabling direct handling of structure-semantics heterogeneity at the level where the balance between structure and semantics is instantiated.

We provide a theoretical interpretation of DAS as a generalized Majorization-Minimization (MM) procedure (Sun et al., 2016). We show that the iterative refinement loop monotonically decreases a task-adaptive surrogate objective, where the modelconditioned memory induces a tractable majorizer for semantic refinement.

Let D = {d v } v∈V be the set of node descriptions and let t(d v ) denote the sentence embedding of d v .

We define the global objective as

where ℓ is the supervised loss and R is a memoryconsistency regularizer defined below.

Implicit Memory-Consistency Regularizer. At iteration t, DAS retrieves an exemplar set S (t)

v from the model-conditioned memory B (t) . Given a candidate description set D, consider the following admissible anchor set for each node:

i.e., anchors induced by averaging embeddings of up to K in-graph exemplars. In practice, S (t) v is selected by joint similarity and confidence filtering; the definition above abstracts this selection into a feasible set of anchors. We define

This regularizer encourages each node description to be close (in the embedding space) to a prototype summarizing reliable in-graph exemplars.

At iteration t, DAS fixes the retrieved exemplar sets t) and defines the corresponding anchors

This yields the following surrogate regularizer:

and the iteration-t surrogate objective

Lemma B.1 (Majorization and Tightness). Fix θ and B (t) constructed from D (t) . Then for any D,

and the bound is tight at D = D (t) , i.e., J (θ,

Proof. By definition (13), for each v and any feasible anchor m ∈ M v (D), min

In particular, using the feasible anchor m

(19) Summing over v gives R(D) ≤ Ω(D|B (t) ). Plugging into (11) proves (17). Tightness holds at

v is constructed from the current exemplar sets and current embeddings, hence it is one of the anchors considered by R at the current iterate.

DAS alternates between (i) updating θ with D (t) fixed, and (ii) refining D with θ and B (t) fixed.

Theorem B.2 (Monotonic Descent). Assume that at iteration t: (1) GNN step: the parameter update satisfies J (θ (t+1) , D (t) ) ≤ J (θ (t) , D (t) ), e.g., by performing (approximate) descent on the supervised loss. (2) LLM refinement step: the refinement operator produces D (t+1) such that Ω(D (t+1) | B (t) ) ≤ Ω(D (t) | B (t) ). Then the global objective is monotonically non-increasing: J (θ (t+1) , D (t+1) ) ≤ J (θ (t) , D (t) ).

(20)

Moreover, if J is bounded below, the sequence {J (θ (t) , D (t) )} converges to a finite limit.

Proof. By Lemma B.1, t) ).

(21) The refinement assumption implies t+1) , D (t) ). Combining the inequalities with the GNN step yields (20). The lower-boundedness follows from ℓ ≥ 0 and Ω ≥ 0.

Theorem B.2 formalizes DAS as an MM procedure: B (t) induces a majorizer by freezing the exemplar-induced anchors, and the LLM acts as a (possibly stochastic) descent oracle on the surrogate regularizer in the embedding space.

To characterize node-level structural roles in a compact yet informative manner, we employ a set of five widely used graph-theoretic measures. These features are chosen to balance descriptive power and computational efficiency, and are used solely to support structure-aware semantic construction.

Degree. The degree of a node v is defined as the number of its immediate neighbors, reflecting its local connectivity within the graph:

where N (v) denotes the neighborhood of v. Nodes with higher degree typically correspond to locally influential or highly connected entities.

Betweenness Centrality. Betweenness centrality quantifies the extent to which a node lies on shortest paths between other node pairs, thereby capturing its bridging or mediating role in the network:

where σ st is the total number of shortest paths between s and t, and σ st (v) counts those paths that pass through v.

Closeness Centrality. Closeness centrality measures how close a node is, on average, to all other nodes in the graph. It is defined as

where d(u, v) denotes the shortest-path distance between nodes u and v. This measure reflects the global accessibility of a node.

Clustering Coefficient. The clustering coefficient evaluates the degree of local transitivity by measuring whether the neighbors of a node are also connected with each other:

where T (v) denotes the number of triangles that include node v. This metric captures the strength of tightly connected local neighborhoods. (Zhang et al., 2008).

Algorithm 1 summarizes the overall DAS pipeline. DAS first constructs a structure-aware textual summary for each node and (optionally) concatenates it with the raw node text to initialize descriptions. It then runs a closed-loop refinement for T iterations: (i) encode current descriptions and train the pre-defined GNN, (ii) store semantic-structuralpredictive states in a memory buffer, (iii) retrieve an in-graph support set for each node from the buffer, and (iv) refine all node descriptions using an LLM conditioned on the retrieved exemplars. After T iterations, the refined descriptions are used to train the final GNN classifier for evaluation.

Let n = |V| and m = |E|. We summarize the cost of DAS at a high level to clarify that the seemingly complex memory loop adds only modest overhead beyond the GNN and LLM calls. We assume bounded description/prompt lengths, fixed embedding dimensions, and fixed support size K.

One-time preprocessing. We compute structural statistics (for verbalized topology) and structureoriented embeddings (e.g., struc2vec). This graphdependent cost is incurred once:

Per-iteration cost (one refinement round).

Each round consists of three components:

C enc is the description encoding cost, C sent is the sentence embedding cost used in retrieval, and C llm is the cost of one LLM call (dominated by prompt+generation tokens). Importantly, the memory construction/update is only O(n) per round; it does not introduce additional message passing or graph traversal beyond the fixed GNN backbone.

Total runtime. After T refinement rounds, we perform one final encoding and GNN training/evaluation on {d

With bounded text length and fixed K, the main costs are (i) the GNN training/inference term C gnn (n, m), (ii) the retrieval term O(n 2 ) under brute-force similarity computation, and (iii) the LLM term nC llm . All other components introduced by DAS (buffer writes, prompt assembly) are linear in n.

F.1 Can DAS Provide Better Initialization for Downstream Learning?

We include an additional pretrain-finetune experiment as a diagnostic study to evaluate a key mechanism of DAS. Specifically, we examine whether node semantics refined by DAS on a source graph provide a better initialization for downstream learning on a related target graph. In this setting, a model is first trained on the source graph and then fine-tuned on the target graph under the high-label split, using a shared textual feature space for all methods.

We analyze the role of the trade-off parameter α in the joint similarity score S(v, u) = α sim t (v, u) + (1 -α) sim s (v, u), which controls how semantic and structural cues are balanced during exemplar retrieval. This analysis probes whether retrieval based on a single modality is sufficient, or whether effective refinement requires joint alignment. Figure 4 reports results for Structure-only retrieval (α = 0), Text-only retrieval (α = 1), and intermediate values. On the text-attributed Cora graph, text-only retrieval performs competitively, reflecting the strong semantic signal in raw node texts, while structure-only retrieval is weaker. In contrast, on the text-free USA graph, both extremes degrade performance, indicating that neither semantic nor structural similarity alone provides reliable in-context guidance. Sweeping α ∈ {0.1, 0.3, 0.5, 0.7, 0.9} further reveals a smooth performance landscape. Higher α values are preferred on Cora, whereas lower values yield better results on USA, suggesting that the op- timal balance shifts with the underlying structuresemantics regime. Overall, these results show that joint semantic-structural retrieval is critical for stable refinement, while DAS remains robust to moderate variations in α.

DAS relies on a large language model (LLM) to iteratively regenerate node descriptions, and its computational cost is therefore dominated by LLM inference. Specifically, the cost scales with (i) the number of nodes whose descriptions are refined and (ii) the number of refinement iterations. In each refinement round, DAS issues one LLM call per node, resulting in a total of

where N denotes the number of refined nodes and T the number of refinement iterations. We report LLM usage in terms of both API calls and token consumption. As an illustrative example, refining N =16 nodes for a single iteration (T =1) results in 16 calls to the :contentReference[oaicite:0]index=0 API, consuming 41,698 input tokens and 3,437 output tokens in total. For larger graphs or additional refinement rounds, the number of calls and token usage increase linearly with N and T . In practice, our experiments indicate that a small number of refinement iterations (e.g., T ≤ 3) is sufficient to capture most of the performance gains, offering a favorable trade-off between accuracy and computational cost.

We present the success cases in Table 7 and failure cases in Table 8.

For each dataset, backbone, and method we perform a random search over architecture and optimization hyperparameters. The candidate values are: hidden dimension {8, 16, 32, 64, 128, 256}, number of layers {1, 2, 3}, normalization layer in {none, batchnorm}, learning rate {5e-2, 1e-2, 5e-3, 1e-3}, weight decay {0.0, 5e-5, 1e-4, 5e-4}, and dropout rate {0.0, 0.1, 0.5, 0.8}. For each configuration we choose the setting that achieves the best validation accuracy. DAS introduces additional hyperparameters: the number of refinement iterations T , entropy threshold τ , and the support-set size K used in exemplar retrieval. We fix K = 10, and τ = 0.5 for all datasets. We use T = 3 refinement iterations on all datasets except Pubmed, where we set T = 1.

Node initialization. For each node, we build an initial description by concatenating (i) an original description and (ii) a topological summary. For citation graphs, the original description is the node text (e.g., paper title and abstract). For datasets without node text (e.g., airport graphs), this component is omitted. The topological summary encodes structural cues using a fixed natural-language schema: we first state the global graph context (graph type, node type, number of nodes, edge type, and number of edges), and then append nodelevel property statements where each property is reported with its scalar value and its rank among all nodes. The exact verbalization template is shown in Figure 5 and follows the consistent schema adapted from Wang et al. (2025a).

Iterative refinement. Starting from the initialized description above, we refine the target node text using a single unified prompt wrapper. Only the placeholders (highlighted tokens such as GRAPH_TYPE ) are swapped per node/dataset. Two blocks are optional: (i) a target history block that summarizes previous rewrite attempts and the GNN feedback signals (top probability and predictive entropy), and (ii) an example block that provides a small set of training nodes as reference for how descriptions behave under the GNN. Figure 6 shows the complete prompt layout.

3.1 Experimental SetupDatasets. We evaluate on five graphs followingWang et al. (2025a): two text-attributed citation networks, Cora and Pubmed, and three text-free airport networks, USA, Europe, and Brazil (Statistics are given in Table1). For Cora/Pubmed, nodes are papers (title+abstract), edges are citations, and classes are research topics. For airports, nodes are airports, edges are flight connections, and classes correspond to activity levels(Ribeiro et al., 2017).

3.1 Experimental SetupDatasets. We evaluate on five graphs followingWang et al. (2025a): two text-attributed citation networks, Cora and Pubmed, and three text-free airport networks, USA, Europe, and Brazil (Statistics are given in Table1). For Cora/Pubmed, nodes are papers (title+abstract), edges are citations, and classes are research topics. For airports, nodes are airports, edges are flight connections, and classes correspond to activity levels(Ribeiro et al., 2017)

3.1 Experimental SetupDatasets. We evaluate on five graphs followingWang et al. (2025a): two text-attributed citation networks, Cora and Pubmed, and three text-free airport networks, USA, Europe, and Brazil (Statistics are given in Table1). For Cora/Pubmed, nodes are papers (title+abstract), edges are citations, and classes are research topics. For airports, nodes are airports, edges are flight connections, and classes correspond to activity levels

3.1 Experimental SetupDatasets. We evaluate on five graphs followingWang et al. (2025a): two text-attributed citation networks, Cora and Pubmed, and three text-free airport networks, USA, Europe, and Brazil (Statistics are given in Table1

3.1 Experimental SetupDatasets. We evaluate on five graphs followingWang et al. (2025a): two text-attributed citation networks, Cora and Pubmed, and three text-free airport networks, USA, Europe, and Brazil (Statistics are given in Table

3.1 Experimental SetupDatasets. We evaluate on five graphs followingWang et al. (2025a)

3.1 Experimental SetupDatasets. We evaluate on five graphs following

3.1 Experimental Setup

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM(Wang et al., 2020) unless otherwise noted.3.2 Results on Text-Attributed GraphsTable2reports node classification accuracy on textattributed Cora and Pubmed under both low-and high-label settings, using GCN, GAT, and MLP backbones. On both of these datasets,

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM(Wang et al., 2020) unless otherwise noted.3.2 Results on Text-Attributed GraphsTable2

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM(Wang et al., 2020) unless otherwise noted.3.2 Results on Text-Attributed GraphsTable

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM(Wang et al., 2020) unless otherwise noted.3.2 Results on Text-Attributed Graphs

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM(Wang et al., 2020) unless otherwise noted.

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM(Wang et al., 2020)

the low-label / high-label splits: in the low-label regime, we use 20/30 nodes per class for train/valid on Cora/Pubmed (10/20 for Brazil); in the highlabel regime, we use a 60/20/20 train/valid/test split. All reported numbers are averages over 30 random seeds with mean ± standard deviation, selecting models by the best validation accuracy. For the text encoder, we adopt MiniLM

reports the accuracy improvement over using raw node descriptions. Across all datasets, performance improves monotonically as T increases, indicating that semantic refinement benefits from repeated feedback

v } into features and train final GNN g * θ 22: return {d

Cora → Pubmed Pubmed → Cora

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut