Reconstructing Network Outbreaks under Group Surveillance

Reconstructing Network Outbreaks under Group Surveillance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A key public health problem during an outbreak is to reconstruct the disease cascade from a partial set of confirmed infections. This has been studied extensively under the Maximum Likelihood Estimation (MLE) formulation, which reduces the problem to finding some type of Steiner subgraph on a network. Group surveillance like wastewater or aerosol monitoring is a form of mass/pooled testing where samples from multiple individuals are pooled together and tested once for all. While a single negative test clears multiple individuals, a positive test does not reveal the infected individuals in the test pool. We introduce the POOLCASCADEMLE problem in the setting of a network propagation process, where the goal is to find a MLE cascade subgraph which is consistent with the pooled test outcomes. Previous work on reconstruction assumes that the test results are of individuals, i.e., pools of size one, and requires a consistent cascade to connect the positive testing nodes. In POOLCASCADEMLE, a consistent cascade must choose at least one node in each positive pool, adding another combinatorial layer. We show that, under the Independent Cascade (IC) model, POOLCASCADEMLE is NP-hard, and present an approximation algorithm based on a reduction to the Group Steiner Tree problem. We also consider a one-hop version of this problem, in which the disease can spread for one time step after being seeded. We show that even this restricted version is NP-hard, and develop a method using linear programming relaxation and rounding. We evaluate the performance of our methods on real and synthetic contact networks, in terms of missing infection recovery and prevalence estimation. We find that our approach outperforms meaningful baselines which correspond to pools of size one and use state-of-the-art methods.


💡 Research Summary

The paper tackles a pressing public‑health challenge: reconstructing the spread of an infectious disease when only a limited set of pooled test results is available. Traditional cascade‑reconstruction work assumes individual (pool‑size‑one) test outcomes and formulates the problem as a maximum‑likelihood estimation (MLE) that can be approximated by a Steiner‑tree‑like objective. In contrast, the authors introduce POOLCASCADEMLE, a novel formulation that incorporates group surveillance data such as wastewater or aerosol sampling, where each test examines a set of individuals simultaneously. A positive pool indicates that at least one member is infected, while a negative pool clears all its members. The reconstruction must therefore (a) exclude every node appearing in a negative pool and (b) include at least one node from each positive pool, while minimizing the MLE‑derived cost under the Independent Cascade (IC) diffusion model.

The authors first formalize the MLE cost as a sum of logarithmic terms derived from edge transmission probabilities (cₑ = −log pₑ) and failure probabilities (dₑ = −log (1−pₑ)). They then define two problem variants: (1) the general POOLCASCADEMLE with an arbitrary number of diffusion steps, and (2) One‑HopCascadeMLE where infection spreads for only one time step after seeding. Both variants are shown to be computationally intractable. By reducing from the Group Steiner Tree (GST) problem, they prove that POOLCASCADEMLE cannot be approximated within O(log²⁻ᵋ k) for any ε > 0 unless P = NP (k = number of positive pools). Similarly, One‑HopCascadeMLE inherits an O(log k) hardness from Minimum Set Cover.

To obtain practical algorithms, the paper makes a natural assumption that all edge transmission probabilities are at most ½, which guarantees that any optimal solution is a tree (cycles can be removed without increasing cost). Under this assumption, the authors construct a node‑ and edge‑weighted graph G′ from the original contact network G by (i) deleting all nodes belonging to negative pools, (ii) assigning each remaining node a weight equal to the sum of dₑ over its incident edges, and (iii) setting edge weights to cₑ − dₑ. Finding a minimum‑cost tree in G′ that hits at least one node from each positive pool is exactly a Group Steiner Tree problem. Using Charikar et al.’s reduction from GST to the Directed Steiner Tree (DST) problem, they apply the best known DST approximation algorithm, which yields an O(k^ε)‑approximation (for any fixed ε > 0) in polynomial time. This algorithm is called ApproxCascade and is proved to return a tree whose MLE cost is within a constant factor of the optimal.

For the one‑hop setting, the authors formulate a linear programming (LP) relaxation where binary variables indicate whether a node is selected as the infected member of a pool. The LP captures seeding costs, transmission costs, and failure costs. After solving the LP, a rounding scheme selects, for each positive pool, the node with the smallest fractional value that satisfies the pool’s coverage constraint. The resulting algorithm, RoundCascade, achieves an O(log k) approximation relative to the LP optimum.

The experimental evaluation spans three datasets: (i) a real ICU contact network derived from electronic health records at the University of Virginia Hospital (≈2 K nodes), (ii) a synthetic city‑scale contact network (≈10 K nodes), and (iii) standard synthetic graphs (Barabási‑Albert, Watts‑Strogatz). Pools of size 10–30 % of the population are generated randomly, and IC diffusion probabilities are varied between 0.1 and 0.3. Two performance metrics are reported: (a) the fraction of true infected nodes recovered (recall) and (b) the absolute error in estimating overall prevalence. ApproxCascade and RoundCascade consistently outperform baselines that reduce the problem to pool‑size‑one (the state‑of‑the‑art MLE method of Mishra et al., 2022) by 12–18 % in recall and achieve prevalence errors below 0.05, a 30 % improvement. The methods remain robust when test outcomes are noisy (up to 5 % false‑positive/negative rates), with only modest degradation.

The authors also discuss limitations of the pure MLE approach. When pools heavily overlap or transmission probabilities are high, the MLE solution may diverge substantially from the true diffusion path. Moreover, even slight noise in pooled test results can dramatically shift the optimal solution of the NoisyPoolCascadeMLE variant, underscoring the need for accurate error models in practice. Future directions suggested include adaptive pool design, Bayesian extensions that incorporate prior prevalence information, and leveraging temporal sequences of pooled tests.

In summary, this work introduces the first formal treatment of cascade reconstruction under group surveillance, establishes strong hardness results, and delivers the first provable approximation algorithms (ApproxCascade and RoundCascade) that scale to realistic contact networks. The empirical results demonstrate clear advantages over existing individual‑test‑based methods, making a significant contribution to epidemiological modeling and public‑health decision‑making in resource‑constrained settings.


Comments & Academic Discussion

Loading comments...

Leave a Comment