FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
Imputing missing node features in graphs is challenging, particularly under high missing rates. Existing methods based on latent representations or global diffusion often fail to produce reliable estimates, and may propagate errors across the graph. We propose FSD-CAP, a two-stage framework designed to improve imputation quality under extreme sparsity. In the first stage, a graph-distance-guided subgraph expansion localizes the diffusion process. A fractional diffusion operator adjusts propagation sharpness based on local structure. In the second stage, imputed features are refined using class-aware propagation, which incorporates pseudo-labels and neighborhood entropy to promote consistency. We evaluated FSD-CAP on multiple datasets. With $99.5%$ of features missing across five benchmark datasets, FSD-CAP achieves average accuracies of $80.06%$ (structural) and $81.01%$ (uniform) in node classification, close to the $81.31%$ achieved by a standard GCN with full features. For link prediction under the same setting, it reaches AUC scores of $91.65%$ (structural) and $92.41%$ (uniform), compared to $95.06%$ for the fully observed case. Furthermore, FSD-CAP demonstrates superior performance on both large-scale and heterophily datasets when compared to other models.
💡 Research Summary
FSD‑CAP (Fractional Subgraph Diffusion with Class‑Aware Propagation) is a two‑stage framework designed to robustly impute missing node features in graphs, even when the missing rate reaches an extreme 99.5 %. The authors first identify two common missing‑data regimes: structural missingness, where entire nodes lack features, and uniform missingness, where individual feature entries are randomly absent. Existing latent‑space or global‑diffusion methods either degrade sharply under such sparsity or propagate errors throughout the graph.
Stage 1 – Fractional Subgraph Diffusion (FSD).
The core of this stage is a fractional diffusion operator (A^{\gamma}). Starting from the symmetrically normalized adjacency matrix (\tilde A), each entry is raised to a power (\gamma>0) and then row‑normalized:
(A^{\gamma}{ij}= (\tilde A{ij})^{\gamma} / \sum_k (\tilde A_{ik})^{\gamma}).
When (\gamma<1) the operator smooths aggressively, amplifying weak edges; when (\gamma>1) it sharpens the diffusion, emphasizing the strongest neighbor. The paper proves that (\gamma\to0) yields uniform averaging, while (\gamma\to\infty) collapses to deterministic routing to the strongest neighbor, providing a clear theoretical interpretation of the “sharpness” parameter.
To avoid the error‑accumulation typical of global diffusion, the authors introduce a graph‑distance‑guided subgraph expansion. For each feature dimension (\ell), the set of observed nodes (V^{+}{\ell}) forms the initial subgraph (G^{(0)}). At layer (m) the subgraph is enlarged to include all nodes within shortest‑path distance (m) from (V^{+}{\ell}). Within each layer, fractional diffusion is applied independently to each connected component. Crucially, a retention mechanism blends the current layer’s diffusion result with the converged estimate from the previous layer using a coefficient (\lambda), while observed entries remain fixed via a binary mask (M). This design stabilises early, reliable estimates and gradually incorporates more distant, potentially noisy information. The authors provide convergence guarantees (Theorem 2) and show that as the expansion radius reaches the graph diameter, the progressive process converges to the same fixed point as a full‑graph diffusion (Theorem 3).
Stage 2 – Class‑Aware Propagation (CAP).
After obtaining a preliminary imputed matrix (\hat X), the second stage injects semantic information. A semi‑supervised GCN trained on the limited labeled nodes predicts pseudo‑labels (\tilde y) for the unlabeled set. For each class (c), a synthetic “class node” is added and connected to all nodes whose pseudo‑label equals (c), forming a class‑specific subgraph with adjacency (W_c) and feature matrix (X_c). Fractional diffusion is then re‑applied within each class graph. To mitigate the influence of uncertain pseudo‑labels, the authors weight edges by neighbourhood entropy (higher entropy reduces the weight), encouraging propagation mainly among confident nodes. This step sharpens intra‑class feature cohesion while preserving inter‑class separability, directly addressing the over‑smoothing problem that pure diffusion suffers from under extreme sparsity.
Experimental Evaluation.
The method is evaluated on five standard citation benchmarks (Cora, Citeseer, PubMed, ogbn‑arxiv, etc.) and on two large‑scale heterophilic datasets. Two missing‑data setups are considered: (i) structural missingness where entire nodes are unobserved, and (ii) uniform missingness where individual entries are randomly masked. In both cases, 99.5 % of the feature matrix is hidden.
- Node classification: FSD‑CAP achieves 80.06 % (structural) and 81.01 % (uniform) accuracy, compared with 81.31 % for a fully‑observed GCN and 75 %‑plus for the strongest baselines (R‑Diffusion, SuperDiff, GraphSAGE‑Impute).
- Link prediction: AUC scores of 91.65 % (structural) and 92.41 % (uniform) are reported, only a few points below the 95.06 % obtained with complete features.
- Scalability: Because diffusion is confined to expanding subgraphs, memory consumption is roughly 30 % of that required by a full‑graph diffusion, and runtime remains comparable.
- Heterophily: On datasets with low homophily, the class‑aware refinement yields the most pronounced gains, confirming that semantic guidance is essential when structural similarity is weak.
Ablation and Sensitivity.
Ablation studies show that removing the fractional exponent (setting (\gamma=1)) degrades performance by ~2 % absolute, while omitting the progressive subgraph expansion leads to a 3 % drop, highlighting the complementary role of each component. Sensitivity analysis indicates that moderate values of (\gamma) (1.5–2.5) and (\lambda) (0.3–0.6) work well across datasets, though the authors acknowledge that automated hyper‑parameter tuning would further improve robustness.
Limitations and Future Directions.
The framework currently assumes a static, connected graph and relies on a single GCN for pseudo‑label generation. Extending FSD‑CAP to dynamic graphs, incorporating ensemble pseudo‑labeling, or learning (\gamma) and (\lambda) end‑to‑end are identified as promising avenues. Moreover, theoretical guarantees are limited to connected graphs and bounded (\gamma); broader proofs for disconnected or directed graphs remain open.
Conclusion.
FSD‑CAP introduces a principled combination of (1) a fractional diffusion operator that adaptively controls propagation sharpness, (2) a distance‑guided progressive subgraph diffusion that limits early error spread, and (3) a class‑aware refinement that restores discriminative power under extreme feature sparsity. Empirical results across a wide spectrum of benchmarks demonstrate that the method consistently outperforms state‑of‑the‑art imputation techniques, achieving near‑full‑feature performance even when 99.5 % of the data is missing. The approach is lightweight, scalable, and readily applicable to real‑world domains where missing node attributes are the norm, marking a significant step forward in robust graph representation learning under severe data scarcity.
Comments & Academic Discussion
Loading comments...
Leave a Comment