Can co-location be used as a proxy for face-to-face contacts?
Technological advances have led to a strong increase in the number of data collection efforts aimed at measuring co-presence of individuals at different spatial resolutions. It is however unclear how much co-presence data can inform us on actual face-to-face contacts, of particular interest to study the structure of a population in social groups or for use in data-driven models of information or epidemic spreading processes. Here, we address this issue by leveraging data sets containing high resolution face-to-face contacts as well as a coarser spatial localisation of individuals, both temporally resolved, in various contexts. The co-presence and the face-to-face contact temporal networks share a number of structural and statistical features, but the former is (by definition) much denser than the latter. We thus consider several down-sampling methods that generate surrogate contact networks from the co-presence signal and compare them with the real face-to-face data. We show that these surrogate networks reproduce some features of the real data but are only partially able to identify the most central nodes of the face-to-face network. We then address the issue of using such down-sampled co-presence data in data-driven simulations of epidemic processes, and in identifying efficient containment strategies. We show that the performance of the various sampling methods strongly varies depending on context. We discuss the consequences of our results with respect to data collection strategies and methodologies.
💡 Research Summary
The paper investigates whether coarse-grained co‑location (co‑presence) data can serve as a proxy for high‑resolution face‑to‑face (FTF) contact data, a question of growing relevance for epidemiological modeling and social network analysis. Using five publicly available SocioPatterns datasets—two office settings (InVS13, InVS15), a hospital ward (LH10), a primary school (LyonSchool), a high school (Thiers13), and a scientific conference (SFHH)—the authors simultaneously record (i) FTF contacts detected by wearable RFID badges within 1.5 m and (ii) co‑presence events inferred from the same badges’ signals received by fixed RFID readers that define spatial zones of roughly 30 m radius. Both data streams are temporally resolved at 20‑second intervals, allowing the construction of two parallel temporal networks: a dense co‑presence network and a sparse FTF network.
The first part of the analysis compares structural and statistical properties of the two networks. Distributions of event duration, inter‑event time, number of events per dyad, and cumulative contact time all exhibit heavy‑tailed, approximately power‑law behavior in both networks, but co‑presence events are generally longer and more numerous, reflecting the looser spatial definition. Consequently, the co‑presence network has a much higher average degree, density, clustering coefficient, and larger cliques. Despite this, group‑level contact matrices (e.g., department‑to‑department or class‑to‑class link densities) are highly similar between the two networks, with cosine similarities ranging from 0.71 to 0.97 across datasets. This suggests that while co‑presence captures the macroscopic community structure, it fails to resolve the microscopic pattern of who actually interacts.
To bridge this gap, the authors propose three simple down‑sampling schemes that aim to transform the dense co‑presence stream into a surrogate contact network with statistical properties closer to the true FTF data. All schemes are calibrated to reproduce the empirical total contact time (T_c) observed in the FTF dataset.
- Sampling 1 (Uniform co‑presence list sampling): Each co‑presence list (the set of individuals present in a zone at a given time) is duplicated proportionally to its size, forming a global pool. Randomly draw (T_c) lists without replacement, then for each selected list pick a random pair of individuals.
- Sampling 2 (Duration‑weighted sampling): Co‑presence events are weighted by their duration; longer events have higher probability of being selected. After selecting an event, a random pair is drawn as in method 1.
- Sampling 3 (Uniform event thinning): Randomly retain a fixed fraction of all co‑presence events, discarding the rest, then randomly pair individuals within each retained event.
For each method, 100 realizations are generated and compared to the real FTF network in terms of average degree, density, clustering, clique number, and centrality rankings (betweenness, PageRank). Results show that no single method universally outperforms the others; performance is context‑dependent. For the hospital dataset, Sampling 2 best reproduces average degree and clustering, while for the high‑school data Sampling 1 yields the highest correlation in node centralities. Overall, the surrogate networks capture the global group structure but only partially recover the most central individuals.
The authors then assess the epidemiological relevance of these surrogate networks using a standard Susceptible‑Infected‑Recovered (SIR) model. Simulations are run on the empirical FTF network, on each down‑sampled surrogate, and on the raw co‑presence network, varying the transmission probability (\beta) and recovery rate (\mu). When (\beta) is low (sub‑critical regime), all networks produce similar outbreak sizes. In the super‑critical regime, however, the raw co‑presence network systematically overestimates the final epidemic size because of its inflated density, while the down‑sampled surrogates give mixed results: some under‑estimate, others over‑estimate depending on the sampling method and dataset. This demonstrates that coarse co‑presence data, even after simple down‑sampling, may not be reliable for precise quantitative epidemic forecasting.
Finally, the paper examines containment strategies based on node centrality. By immunizing (or isolating) the top 5 % of nodes ranked by betweenness or PageRank, the authors compare the reduction in epidemic size across networks. The true FTF network consistently yields the greatest reduction, confirming that accurate identification of superspreaders requires high‑resolution contact data. Among the surrogates, Sampling 3 occasionally approaches the performance of the true network, but the variability across contexts is large.
In conclusion, the study provides a nuanced answer to the initial question. Co‑presence data do encode valuable information about overall community structure and can be transformed into sparser surrogate networks that approximate some global statistics of face‑to‑face contacts. However, the intrinsic loss of spatial resolution leads to systematic biases in node‑level centralities and epidemic outcomes, especially in high‑transmission scenarios. The effectiveness of any down‑sampling approach is highly context‑specific, implying that researchers must tailor their data collection and preprocessing pipelines to the intended application—whether it is coarse‑grained sociological analysis or fine‑grained epidemic modeling. The work highlights both the promise and the limitations of using low‑resolution co‑location as a proxy for detailed contact networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment