Less is More: Optimizing Probe Selection Using Shared Latency Anomalies
Latency anomalies, defined as persistent or transient increases in round-trip time (RTT), are common in residential Internet performance. When multiple users observe anomalies to the same destination, this may reflect shared infrastructure, routing behavior, or congestion. Inferring such shared behavior is challenging because anomaly magnitudes vary widely across devices, even within the same ISP and geographic area, and detailed network topology information is often unavailable. We study whether devices experiencing a shared latency anomaly observe similar changes in RTT magnitude using a topology-agnostic approach. Using four months of high-frequency RTT measurements from 99 residential probes in Chicago, we detect shared anomalies and analyze their consistency in amplitude and duration without relying on traceroutes or explicit path information. Building on prior change-point detection techniques, we find that many shared anomalies exhibit similar amplitude across users, particularly within the same ISP. Motivated by this observation, we design a sampling algorithm that reduces redundancy by selecting representative devices under user-defined constraints. Our approach captures 95 percent of aggregate anomaly impact using fewer than half of the deployed probes. Compared to two baselines, it identifies significantly more unique anomalies at comparable coverage levels. We further show that geographic diversity remains important when selecting probes within a single ISP, even at city scale. Overall, our results demonstrate that anomaly amplitude and duration provide effective topology-independent signals for scalable monitoring, troubleshooting, and cost-efficient sampling in residential Internet measurement.
💡 Research Summary
The paper tackles the problem of redundant measurements in residential broadband monitoring by investigating whether latency anomalies—periods of elevated round‑trip time (RTT)—experienced by multiple users to the same destination share similar magnitudes. Using a four‑month dataset of high‑frequency RTT measurements from 99 fixed probes deployed in Chicago homes, the authors first detect latency anomalies with an enhanced version of the Jitterbug change‑point detection pipeline. Their modifications include more sensitive Bayesian change‑point detection, sliding‑window processing to avoid edge effects, and additional heuristics to filter out minor fluctuations, thereby capturing a larger set of positive RTT jumps.
Detected anomalies are then grouped into “shared anomalies” when they occur on the same destination and overlap in time (minimum five‑minute overlap). For each shared anomaly the authors compute two key attributes: amplitude (the increase in RTT) and duration (how long the elevated RTT persists). The product of these two attributes defines the anomaly’s impact weight. Statistical analysis shows that probes belonging to the same ISP—and especially those located in the same ZIP code—exhibit remarkably low variance in amplitude and duration when they share an anomaly. The larger the temporal overlap, the smaller the amplitude difference, suggesting that a common piece of infrastructure (e.g., a local exchange, last‑mile link) is responsible for the observed degradation. Conversely, probes from different ISPs or distant ZIP codes display higher variance, indicating that shared anomalies are less consistent across broader geographic or provider boundaries.
Motivated by this consistency, the authors formulate probe selection as a maximum weighted set‑cover problem: each shared anomaly forms a set, each probe “covers” the anomalies it observes, and the weight of a set is its impact. The objective is to choose the smallest subset of probes that captures a target fraction of total impact (e.g., 95 %). Because the problem is NP‑hard, they propose a greedy heuristic that iteratively selects the probe that adds the largest uncovered impact weight, removes the covered anomalies, and repeats until the desired coverage is reached.
Evaluation on the full Chicago deployment reveals that selecting 44 probes (44 % of the total) captures 95 % of the cumulative impact. Compared with two baselines—uniform random selection of 33 probes and a geographically uniform selection of 33 probes—the greedy algorithm discovers 2.2 × and 1.8 × more unique anomalies, respectively, at comparable coverage levels. Moreover, the authors demonstrate that a short historical window (1–2 weeks) is sufficient to train the selection algorithm; probes chosen from this window continue to achieve stable coverage (loss < 3 %) over a subsequent four‑week test period.
The study also highlights the importance of geographic diversity even within a single ISP. When probes are constrained to a single ZIP code, coverage drops noticeably, whereas spreading probes across multiple ZIP codes within the same ISP improves both the number of unique anomalies captured and the overall impact coverage.
In summary, the paper makes three primary contributions: (1) empirical evidence that latency anomalies observed by different residential probes often have similar amplitudes and durations, especially when the probes share the same ISP and locality; (2) a topology‑agnostic probe selection algorithm grounded in maximum weighted set‑cover that efficiently reduces measurement redundancy while preserving the ability to detect impactful anomalies; and (3) a demonstration that modest historical data suffices to maintain high coverage over time, and that geographic diversity remains a key factor even at city‑scale. By relying solely on end‑to‑end RTT statistics, the work offers a practical, cost‑effective approach for large‑scale residential measurement platforms, ISP monitoring systems, and policymakers seeking to understand and mitigate broadband performance degradations without the overhead of extensive traceroute or topology data collection.
Comments & Academic Discussion
Loading comments...
Leave a Comment