Country-in-the-Middle: Measuring Paths between People and their Governments

Country-in-the-Middle: Measuring Paths between People and their Governments
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Understanding where Internet services are hosted, and how users reach them, has captured the interest of government regulators and others concerned with the privacy of data flows. In this paper we focus on government websites – services which arguably merit a higher expectation of protection against foreign surveillance or interference – and seek to identify countries in the middle (CitMs): countries that are neither the source nor destination in a path for a resident visiting their online government services. Finding these CitMs raises daunting methodological challenges. We propose a framework to identify CitMs and use a pilot study of 149 countries to refine our methodology before conducting an in-depth measurement study of 11 countries. For our focused study, we compile an extensive set of websites hosting government services and analyze over 9,000 IP-level paths from vantage points in those countries to these services. We conduct extensive manual validation to corroborate or discard paths based on the aforementioned challenges, and discuss paths that experience unexpected CitMs.


💡 Research Summary

The paper introduces the concept of “Country‑in‑the‑Middle” (CitM) to identify nations that appear as intermediate hops on the network paths between a country’s residents and its governmental web services. Recognizing that data sovereignty and privacy concerns are increasingly prominent, the authors argue that governments need to understand not only where their services are hosted but also how traffic to those services traverses the global Internet.

To address this, the authors develop a comprehensive framework consisting of four stages: (1) collection of government website domains, (2) selection of in‑country measurement points, (3) IP‑level path measurement, and (4) geolocation and validation of intermediate hops. Each stage confronts substantial methodological challenges.

Domain collection: No single authoritative list exists for most nations. The authors start from the limited United Nations list, augment it with country‑specific sources, and then expand the set by mining TLS certificates in the Censys database. This approach uncovers sub‑domains that would be missed by simple pattern matching (e.g., “gc.ca” for Canada). After extracting eTLD+1 domains, they verify liveness by checking DNS resolution and confirming that port 443 is open.

Measurement points: RIPE Atlas probes are used as the primary platform. Up to ten probes per country are selected to maximize AS diversity. The authors note that many probes are located in data centers rather than residential networks, which may bias the observed paths. They also discuss known issues with Atlas probe self‑reported locations and apply additional sanity checks.

Path measurement: ICMP‑based Paris traceroute is run from each probe to each target IP address (resolved locally on the probe to emulate a real user). The authors discuss the trade‑offs between ICMP and TCP traceroute and justify their choice. They also handle typical traceroute ambiguities: non‑responsive routers, MPLS tunnels that hide hops, and routers that reply with an IP address different from the incoming interface.

Geolocation and CitM validation: Hop IPs are mapped to countries using the IPinfo database, which the authors claim offers higher accuracy than many alternatives. To reduce noise, they filter out private‑address hops and any hop that appears only once in the entire traceroute. Because geolocation services can be inaccurate for core routers, the authors cross‑validate results manually, especially for hops suspected to be CitMs. Anycast addresses present a special difficulty; the authors use a dedicated anycast geolocation dataset, though it is not contemporaneous with their measurements, so anycast findings are treated as supplementary.

A pilot study conducted in May 2023 attempted a global‑scale measurement across 149 countries, yielding 71 164 traceroutes. After extensive filtering (removing private hops, single‑appearance hops, and incomplete traces) the usable set shrank to 68 764 traceroutes. The pilot revealed the severity of the challenges and informed the design of a focused study.

Focused study: The authors selected 11 countries for an in‑depth analysis. For each country, up to ten RIPE Atlas probes were used to launch traceroutes toward roughly 100 government domains, resulting in 9 278 successful traceroutes after validation. The authors classify destinations as “convergent” (hosted within the same country) or “divergent” (hosted abroad). CitMs can appear in both cases; paths that leave the source country despite a convergent destination are sometimes called “boomerang” or “tromboning” in prior work.

Key findings: Approximately 30 % of the validated paths contain at least one CitM. Divergent destinations typically involve two to three intermediate countries. Unexpected CitMs were observed—for example, traffic from Germany to a Dutch‑hosted German government site passing through Russia, likely due to ISP‑level traffic engineering rather than geographic proximity. Boomerang effects were common, confirming that even when a service is locally hosted, the underlying backbone topology can force traffic to cross borders. Anycast services complicated the classification of convergent vs. divergent destinations, as the same IP address may resolve to multiple geographic locations. MPLS tunnels were present but, according to prior studies, rarely concealed entire segments of the path; most tunnels revealed the routers they traversed.

Implications and recommendations: The study demonstrates that governments may be unaware of the full set of nations that can observe or potentially interfere with traffic to their own services. To improve future measurements, the authors recommend: (1) expanding the number and diversity of measurement probes, especially residential‑type probes; (2) maintaining up‑to‑date anycast geolocation datasets; (3) incorporating BGP‑based tunnel detection to better account for hidden hops; and (4) automating validation steps with machine‑learning‑driven anomaly detection to reduce manual effort.

In conclusion, the paper provides a rigorous methodology for identifying CitMs, validates it through a large‑scale empirical study, and uncovers both expected and surprising patterns of cross‑border traffic to government web services. The work contributes valuable data and methodological guidance for policymakers, regulators, and researchers concerned with Internet sovereignty and the security of public‑sector digital infrastructure.


Comments & Academic Discussion

Loading comments...

Leave a Comment