Identifying Asymptomatic Nodes in Network Epidemics using Graph Neural Networks

Identifying Asymptomatic Nodes in Network Epidemics using Graph Neural Networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Infected individuals in some epidemics can remain asymptomatic while still carrying and transmitting the infection. These individuals contribute to the spread of the epidemic and pose a significant challenge to public health policies. Identifying asymptomatic individuals is critical for measuring and controlling an epidemic, but periodic and widespread testing of healthy individuals is often too costly. This work tackles the problem of identifying asymptomatic individuals considering a classic SI (Susceptible-Infected) network epidemic model where a fraction of the infected nodes are not observed as infected (i.e., their observed state is identical to susceptible nodes). In order to classify healthy nodes as asymptomatic or susceptible, a Graph Neural Network (GNN) model with supervised learning is adopted where a set of node features are built from the network with observed infected nodes. The approach is evaluated across different network models, network sizes, and fraction of observed infections. Results indicate that the proposed methodology is robust across different scenarios, accurately identifying asymptomatic nodes while also generalizing to different network sizes and fraction of observed infections.


💡 Research Summary

This paper tackles the challenging problem of identifying asymptomatic individuals in network‑based epidemic outbreaks, focusing on the classic Susceptible‑Infected (SI) model where a fraction of infected nodes are not observed as infected. The authors propose a supervised Graph Neural Network (GNN) framework that leverages only a single snapshot of the epidemic—i.e., the underlying graph structure and the set of observed infected nodes—to predict which apparently healthy nodes are actually asymptomatic carriers.

To enrich the raw graph input, eight node‑level features are engineered: (1) a binary indicator of whether the node is observed as infected, (2) normalized degree, (3) the fraction of observed infected nodes at exact graph distance k for k = 1, 2, 3, (4) the fraction of observed infected nodes within distance ≤ 2, (5) traditional betweenness centrality, and (6) “observed betweenness,” a variation that counts only shortest paths between pairs of observed infected nodes that pass through the target node. All continuous features are normalized to zero mean and unit variance per graph, while the binary observation flag is left unchanged.

These feature vectors are fed into a two‑layer Graph Convolutional Network (GCN). The GCN aggregates information from a node’s neighbourhood, updates its hidden representation, and finally outputs a sigmoid‑scaled score between 0 and 1 that reflects the probability of being asymptomatic. Supervised training is performed on large synthetic datasets generated from two well‑known random graph families: Barabási‑Albert (BA) scale‑free networks and Watts‑Strogatz (WS) small‑world networks. For each graph, an SI epidemic is simulated with infection probability β drawn uniformly from {0.1, 0.3, 0.5}. The process stops when 20 % of the nodes become infected. Then, each infected node is observed with probability θ (the “symptom manifestation” probability), where θ ∈ {0.1, 0.25, 0.5, 0.75, 0.9}. This yields ten distinct training datasets (BA/WS × five θ values), each containing 1,000 epidemic snapshots on graphs of 3,000 nodes (≈600 infected per snapshot).

Testing follows the same generation pipeline but varies the graph size (1 k, 3 k, 6 k, 12 k nodes) to assess scalability. In total, 40 test datasets are created, ensuring no overlap with training seeds. Performance is measured primarily by the Area Under the ROC Curve (AUC) and classification accuracy, and is compared against the previously proposed “observed betweenness” baseline, which ranks nodes based on a single centrality metric.

Results show that the GCN‑based approach consistently outperforms the observed‑betweenness baseline across all configurations. On average, AUC improvements range from 8 % to 12 % relative to the baseline. The gain is especially pronounced on WS graphs, where high clustering and short average path lengths make the engineered features more informative. On BA graphs, performance is comparable to the baseline in a few settings but remains robust overall. When the observation probability θ is low (i.e., many asymptomatic carriers), the model’s AUC declines modestly but stays above 0.75 even at θ = 0.1, demonstrating resilience to sparse observation. Importantly, the model trained on 3 k‑node graphs transfers well to larger (6 k, 12 k) and smaller (1 k) graphs without retraining, indicating that the combination of normalized features and the locality‑preserving nature of GCNs yields size‑invariant representations.

The paper’s contributions are threefold: (1) a systematic feature engineering pipeline that extracts epidemic‑relevant information from a single static snapshot, (2) the application of a lightweight two‑layer GCN to the binary classification of asymptomatic versus susceptible nodes, and (3) an extensive empirical evaluation that demonstrates robustness across network topologies, sizes, and observation rates.

Limitations are acknowledged. The SI model assumes permanent infection and a uniform transmission probability, which oversimplifies many real diseases that feature recovery, varying infectiousness, or latency periods. Moreover, the synthetic data lack measurement noise, false positives/negatives, and temporal dynamics that are inevitable in real‑world testing regimes. Consequently, direct deployment on actual epidemiological data would require extensions.

Future work suggested by the authors includes (i) adapting the framework to more realistic compartmental models such as SIR or SEIR, (ii) incorporating temporal sequences via recurrent or temporal graph neural networks to exploit multi‑time‑step observations, and (iii) validating the approach on real contact‑tracing datasets where partial testing results and symptom reports are available. By addressing these avenues, the proposed methodology could become a valuable tool for public health agencies aiming to prioritize testing and containment resources in the face of asymptomatic transmission.


Comments & Academic Discussion

Loading comments...

Leave a Comment