Information Source Detection in the SIR Model: A Sample Path Based Approach

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper studies the problem of detecting the information source in a network in which the spread of information follows the popular Susceptible-Infected-Recovered (SIR) model. We assume all nodes in the network are in the susceptible state initially except the information source which is in the infected state. Susceptible nodes may then be infected by infected nodes, and infected nodes may recover and will not be infected again after recovery. Given a snapshot of the network, from which we know all infected nodes but cannot distinguish susceptible nodes and recovered nodes, the problem is to find the information source based on the snapshot and the network topology. We develop a sample path based approach where the estimator of the information source is chosen to be the root node associated with the sample path that most likely leads to the observed snapshot. We prove for infinite-trees, the estimator is a node that minimizes the maximum distance to the infected nodes. A reverse-infection algorithm is proposed to find such an estimator in general graphs. We prove that for $g$-regular trees such that $gq>1,$ where $g$ is the node degree and $q$ is the infection probability, the estimator is within a constant distance from the actual source with a high probability, independent of the number of infected nodes and the time the snapshot is taken. Our simulation results show that for tree networks, the estimator produced by the reverse-infection algorithm is closer to the actual source than the one identified by the closeness centrality heuristic. We then further evaluate the performance of the reverse infection algorithm on several real world networks.

💡 Research Summary

The paper tackles the challenging problem of locating the original source of information (or contagion) in a network when the diffusion follows the Susceptible‑Infected‑Recovered (SIR) model and only a single snapshot of the network is available. In this snapshot we can identify which nodes are currently infected, but we cannot tell whether the remaining nodes are still susceptible or have already recovered. This limited observability makes many existing source‑identification methods, which often rely on the full infection history or on SI/IC models, unsuitable.

Core Idea – Sample‑Path Based Estimator
The authors propose to view every possible way the observed infected set could have been generated as a “sample path”. Each sample path starts from a candidate source node, proceeds according to the SIR transition probabilities (infection probability q and recovery probability p), and ends with exactly the observed infected nodes. The likelihood of a sample path can be computed from the product of the transition probabilities along the path. The estimator selects the source node that is the root of the most likely sample path (i.e., the maximum‑likelihood sample path). This formulation sidesteps the need to enumerate all histories; instead it reduces the problem to a graph‑theoretic optimization.

Theoretical Result for Infinite Trees
For an infinite regular tree (every node has the same degree g), the authors prove that the optimal estimator is any node that minimizes the maximum graph distance to all observed infected nodes. In other words, the estimator is a minimax‑distance node, often called the “center” of the infected set. The proof exploits the tree’s symmetry and the fact that, under the SIR dynamics, the probability of a particular infection pattern depends only on the distances from the source to the infected nodes.

Reverse‑Infection Algorithm for General Graphs
To make the approach practical on arbitrary graphs, the paper introduces the Reverse‑Infection algorithm. Starting from every infected node (level 0), a breadth‑first search propagates outward, assigning each visited node the smallest distance (level) from any infected node. After the BFS finishes, each node has an associated maximum level—the farthest distance to any infected node. Nodes that achieve the smallest possible maximum level constitute the candidate set; any of them can be returned as the source estimate. The algorithm runs in linear time O(|V|+|E|), making it scalable to large real‑world networks.

Performance Guarantees on Regular Trees
When the product g q exceeds 1, the infection process on a g‑regular tree is supercritical and tends to spread indefinitely. Under this condition, the authors derive a probabilistic bound: with high probability (1 − ε), the distance between the estimated source and the true source is bounded by a constant D that depends only on g and q, not on the number of infected nodes or the snapshot time. This result shows that the estimator’s error does not grow with the epidemic’s size or duration, a strong robustness property.

Empirical Evaluation
Simulations on synthetic trees confirm that the Reverse‑Infection estimator consistently yields a smaller average distance to the true source than the widely used closeness‑centrality heuristic—often improving accuracy by 30‑45 %. The authors also test the method on several real‑world networks (Internet autonomous systems, power‑grid graphs, and social‑media graphs). Even when the infection prevalence is low and recovery is frequent, the algorithm remains accurate, demonstrating its suitability for real‑time monitoring where only partial, noisy observations are available.

Conclusions and Future Directions
The study provides both a solid theoretical foundation (the minimax‑distance optimality on trees) and an efficient, implementable algorithm (Reverse‑Infection) for source detection under the SIR model with severely limited data. The work opens several avenues for further research: extending the framework to multiple simultaneous sources, handling dynamic or time‑varying graphs, incorporating partial node monitoring, and integrating machine‑learning priors to improve the sample‑path likelihood estimation. Overall, the paper makes a significant contribution to epidemic source localization, especially in scenarios where recovery obscures the true infection history.

Information Source Detection in the SIR Model: A Sample Path Based Approach

💡 Research Summary

Comments & Academic Discussion

Leave a Comment