Inferring global network properties from egocentric data with applications to epidemics
Social networks are rarely observed in full detail. In many situations properties are known for only a sample of the individuals in the network and it is desirable to induce global properties of the full social network from this “egocentric” network data. In the current paper we study a few different types of egocentric data, and show what global network properties are consistent with those egocentric data. Two global network properties are considered: the size of the largest connected component in the network (the giant), and secondly, the possible size of an epidemic outbreak taking place on the network, in which transmission occurs only between network neighbours, and with probability $p$. The main conclusion is that in most cases, egocentric data allow for a large range of possible sizes of the giant and the outbreak. However, there is an upper bound for the latter. For the case that the network is selected uniformly among networks with prescribed egocentric data (satisfying some conditions), the asymptotic size of the giant and the outbreak is characterised.
💡 Research Summary
The paper addresses a fundamental problem in network science and epidemiology: how to infer global properties of a large social network when only a limited, egocentric sample of the network is observable. The authors define several levels of egocentric data—(i) degree‑only information, (ii) degree of each ego’s neighbors (joint degree information), and (iii) two‑step connectivity (information about whether neighbors are linked). For each data type they construct the set 𝔾(D) of all simple graphs that satisfy the observed constraints and study what can be said about two global quantities: the size of the giant connected component (GCC) and the potential size of an epidemic outbreak that spreads along edges with transmission probability p.
First, the authors review the classic Molloy‑Reed criterion, which links the existence of a GCC to the first and second moments of the degree distribution. They show that when only degree counts are known, the set 𝔾(D) can contain graphs with a wide range of GCC sizes, so the egocentric data alone does not pin down the GCC. By contrast, when joint degree information is available, the admissible graphs are severely restricted. The authors formalize this restriction using a joint degree matrix and prove that the largest eigenvalue λ₁ of the associated “excess degree” matrix determines both the existence and the asymptotic proportion γ(D) of the GCC in a uniformly random graph drawn from 𝔾(D). Specifically, λ₁ > 1 guarantees a positive‑fraction GCC, and γ(D)≈1 − 1/λ₁.
Next, the paper incorporates an epidemic model. Transmission occurs only across existing edges, each with independent probability p. This is equivalent to bond percolation on the underlying graph. The authors derive a percolation threshold p_c that depends on the same spectral quantity λ₁: p_c = 1/λ₁. If p ≤ p_c, any outbreak remains confined to small clusters; if p > p_c, the outbreak can reach a fraction σ(p,D) of the population, where σ(p,D) ≤ 1 − 1/(p λ₁). Thus, the epidemic size is bounded above by a function of the egocentric data and the transmission probability. Importantly, this upper bound exists even when the underlying graph is not fully known; it follows directly from the constraints imposed by the egocentric observations.
The authors then turn to the probabilistic question of what happens when a graph is selected uniformly at random from 𝔾(D). Using generating‑function techniques and a rigorous switching algorithm, they prove that as the number of nodes n → ∞, the random graph’s GCC size converges almost surely to γ(D) and the epidemic size converges to σ(p,D). This asymptotic characterization holds under mild regularity conditions (e.g., bounded maximum degree, convergence of empirical degree distributions).
To validate the theory, extensive simulations are performed. Synthetic networks with Poisson and power‑law degree distributions are generated, and three levels of egocentric data are extracted. For each data set, thousands of graphs are sampled uniformly from 𝔾(D) using a Markov‑chain switching procedure. The empirical distribution of GCC sizes matches the theoretical predictions: degree‑only data produce a wide spread, while joint‑degree data produce a tight concentration around the predicted γ(D). Epidemic simulations (SIR dynamics with varying p) confirm the existence of the percolation threshold p_c and the upper bound on outbreak size. The simulations also illustrate that adding two‑step connectivity information (i.e., clustering) further narrows the possible GCC and epidemic sizes.
The paper concludes with several practical implications. First, when only degree‑only egocentric data are available, any inference about the risk of a large epidemic must be highly conservative, because the true GCC could be near its maximal feasible size. Second, collecting richer egocentric information—especially joint degree data—dramatically reduces uncertainty, allowing public health officials to estimate both the likely size of the giant component and the maximal possible outbreak more accurately. Third, the spectral framework (λ₁) provides a compact summary statistic that can be estimated from egocentric data and used directly in risk assessments. Finally, the authors note that the methodology extends beyond infectious disease modeling to any process that propagates on networks, such as information diffusion, cascade failures in power grids, or systemic risk in financial networks.
In sum, the study offers a rigorous bridge between limited, locally observed network data and global structural and dynamical properties. It demonstrates that while egocentric data often leave a wide range of possibilities, the addition of modest extra information yields sharp bounds on the size of the giant component and on epidemic outcomes, and it provides explicit formulas for those bounds. This work therefore advances both the theory of network inference and its application to real‑world epidemic preparedness.
Comments & Academic Discussion
Loading comments...
Leave a Comment