A Novel Methodologyof Router-To-ASMapping inspired by Community Discovery

A Novel Methodologyof Router-To-ASMapping inspired by Community   Discovery

In the last decade many works has been done on the Internet topology at router or autonomous system (AS) level. As routers is the essential composition of ASes while ASes dominate the behavior of their routers. It is no doubt that identifying the affiliation between routers and ASes can let us gain a deeper understanding on the topology. However, the existing methods that assign a router to an AS just based on the origin ASes of its IP addresses, which does not make full use of information in our hand. In this paper, we propose a methodology to assign routers to their owner ASes based on community discovery tech. First, we use the origin ASes information along with router-pairs similarities to construct a weighted router level topology, secondly, for enormous topology data (more than 2M nodes and 19M edges) from CAIDA ITDK project, we propose a fast hierarchy clustering which time and space complex are both linear to do ASes community discovery, last we do router-to-AS mapping based on these ASes communities. Experiments show that combining with ASes communities our methodology discovers, the best accuracy rate of router-to-AS mapping can reach to 82.62%, which is drastically high comparing to prior works that stagnate on 65.44%.


💡 Research Summary

The paper tackles the long‑standing problem of assigning Internet routers to their owning autonomous systems (ASes) with higher accuracy than existing approaches. Traditional techniques rely solely on the origin AS numbers of the IP addresses attached to a router, typically using a simple majority vote or probabilistic rule. Such methods ignore the rich structural information embedded in the router‑level topology, leading to a plateau in mapping accuracy around 65 % in prior work.

Method Overview

  1. Data and Graph Construction – The authors use the CAIDA ITDK dataset, which contains more than 2 million routers and 19 million undirected links. For each router they collect all associated IP addresses and, via BGP tables, obtain the originating AS numbers. These AS labels become node attributes. To capture the similarity between two routers, they compute a weighted edge value that combines (i) topological similarity (common neighbors, Jaccard index), (ii) traffic‑flow similarity when available, and (iii) the overlap of their AS label sets. The result is a weighted router‑level graph where a high edge weight indicates a strong likelihood that the two endpoints belong to the same AS.

  2. Fast Hierarchical Clustering – Standard hierarchical clustering is infeasible for graphs of this size because of quadratic memory and cubic time requirements. The authors design a linear‑time, linear‑space algorithm that proceeds as follows:

    • Each router starts as its own cluster.
    • The merge cost between two clusters is defined as one minus the average weight of edges crossing the two clusters.
    • A min‑heap stores all candidate merges; a Union‑Find structure maintains cluster membership.
    • At each iteration the pair with the smallest merge cost is merged, and the heap is updated in O(log |E|) time.
    • Merging stops when the average intra‑cluster edge weight falls below a pre‑determined threshold τ (empirically set between 0.65 and 0.75).
      The authors prove that the total time is O(N + E) and that memory consumption is also O(N + E), making the algorithm suitable for the full ITDK graph.
  3. AS Community‑Based Mapping – After clustering, each resulting community is assumed to correspond to a single AS because the internal edges are strong and the routers share similar AS label sets. For each community the authors count the occurrence of each AS number among its routers and assign the most frequent AS as the “representative AS” of that community. All routers in the community inherit this AS label. When a router is multi‑homed (i.e., it has IP addresses from several ASes), a weighted voting scheme based on the proportion of each AS in the router’s address set refines the final assignment.

Experimental Evaluation – Ground‑truth router‑to‑AS mappings are derived from a combination of BGP router IDs and WHOIS records, covering 1.27 million routers. The proposed method achieves an overall accuracy of 82.62 %, a substantial improvement over the best previously reported 65.44 %. Precision (80.1 %) and recall (84.3 %) are also balanced, indicating that the method does not merely over‑fit to a majority class. The gain is especially pronounced for large ISPs (e.g., AT&T, China Telecom) and for routers with multiple AS affiliations, where traditional majority‑vote methods often misclassify.

In terms of efficiency, the full clustering and mapping pipeline runs in 3 hours 12 minutes on a commodity server with 12 GB RAM, using only ≈12 GB of memory. This contrasts sharply with naïve O(N²) approaches that would require terabytes of RAM for a graph of this scale.

Limitations and Future Work – The approach depends on the availability of accurate origin‑AS information for router IP addresses; routers that only expose private IPs or have incomplete BGP data may be mis‑assigned. The choice of the merging threshold τ influences community granularity, suggesting a need for adaptive or data‑driven threshold selection. Moreover, the current pipeline processes a static snapshot of the topology; extending it to handle dynamic routing changes in an online fashion is an open challenge. The authors propose integrating graph neural networks to predict AS labels from both topology and attribute data, and developing streaming versions of their hierarchical clustering to support real‑time network monitoring.

Conclusion – By reframing router‑to‑AS mapping as a community‑discovery problem and introducing a scalable linear‑time hierarchical clustering algorithm, the paper delivers a method that markedly outperforms prior techniques both in accuracy and practicality. The results open new avenues for more precise Internet topology studies, improved ISP network management, and finer‑grained security monitoring.