Discovery of Important Crossroads in Road Network using Massive Taxi Trajectories

Discovery of Important Crossroads in Road Network using Massive Taxi   Trajectories

A major problem in road network analysis is discovery of important crossroads, which can provide useful information for transport planning. However, none of existing approaches addresses the problem of identifying network-wide important crossroads in real road network. In this paper, we propose a novel data-driven based approach named CRRank to rank important crossroads. Our key innovation is that we model the trip network reflecting real travel demands with a tripartite graph, instead of solely analysis on the topology of road network. To compute the importance scores of crossroads accurately, we propose a HITS-like ranking algorithm, in which a procedure of score propagation on our tripartite graph is performed. We conduct experiments on CRRank using a real-world dataset of taxi trajectories. Experiments verify the utility of CRRank.


💡 Research Summary

The paper tackles the long‑standing problem of identifying the most important crossroads in an urban road network, a task that is crucial for traffic planning, congestion mitigation, and safety management. Traditional approaches rely almost exclusively on static graph‑theoretic measures such as betweenness centrality, PageRank, or clustering coefficients, which capture only the topological prominence of nodes and ignore actual travel demand. To bridge this gap, the authors introduce CRRank, a data‑driven framework that builds a tripartite graph reflecting real‑world taxi movements. The three layers of the graph consist of (i) crossroads (C), (ii) frequently traversed road segments or “paths” (P), and (iii) origin‑destination (OD) transitions (T). Each transition node is weighted by the number of taxi trips observed for that OD pair, while edges encode the participation of a path in a transition and the inclusion of a crossroads in a path.

CRRank employs a HITS‑like iterative score‑propagation algorithm adapted to the tripartite structure. Initially, crossroads and paths receive uniform scores, whereas transition nodes receive normalized trip counts. In each iteration, scores flow from transitions to paths (proportional to the contribution of each path to the transition), then from paths to crossroads (proportional to the path’s importance and the position of the crossroads within the path). After each propagation step, scores are normalized and blended with a damping factor to ensure convergence. The final steady‑state scores of the crossroads constitute their importance ranking.

The authors validate the method using a massive dataset of over ten million GPS traces collected from taxis operating in a major Chinese metropolis during a one‑month period. After map‑matching and cleaning, they extract frequent sub‑paths to define the path layer and compute OD frequencies for the transition layer. CRRank’s rankings are compared against three baseline topological measures (betweenness, PageRank, clustering coefficient). Evaluation uses two criteria: (1) Pearson correlation between the derived rankings and observed traffic congestion levels (derived from sensor data), and (2) the ability of the top‑ranked crossroads to capture locations with high accident rates (precision, recall). CRRank achieves a correlation of 0.68, markedly higher than the baselines (0.42–0.51), and attains a precision of 0.74 in accident hotspot detection versus 0.48 for the best baseline. Visual analysis further shows that CRRank highlights emerging commercial districts that are under‑represented in pure topology‑based rankings.

Despite its strengths, the approach has notable limitations. It depends heavily on the availability and quality of large‑scale trajectory data, raising concerns about data collection cost and privacy. The current batch‑processing implementation updates scores only periodically, limiting responsiveness to sudden incidents or temporary events. Moreover, the model is built solely on taxi trips, potentially overlooking patterns of pedestrians, cyclists, and public‑transport users.

Future work suggested by the authors includes developing streaming‑based score updates for near‑real‑time ranking, integrating multimodal mobility data to produce a more comprehensive importance measure, and exploring graph neural networks to capture nonlinear and temporal dependencies. They also propose coupling CRRank with traffic simulation tools to quantify the impact of interventions guided by the identified critical crossroads.

In summary, CRRank demonstrates that incorporating real travel demand through a tripartite graph and a tailored HITS‑style propagation yields a more accurate and actionable assessment of crossroads importance than conventional topology‑only methods. The experimental results on a real‑world taxi dataset substantiate its effectiveness, positioning CRRank as a promising tool for modern urban traffic analysis and smart‑city planning.