On Network Embedding for Machine Learning on Road Networks: A Case Study on the Danish Road Network
Road networks are a type of spatial network, where edges may be associated with qualitative information such as road type and speed limit. Unfortunately, such information is often incomplete; for instance, OpenStreetMap only has speed limits for 13% of all Danish road segments. This is problematic for analysis tasks that rely on such information for machine learning. To enable machine learning in such circumstances, one may consider the application of network embedding methods to extract structural information from the network. However, these methods have so far mostly been used in the context of social networks, which differ significantly from road networks in terms of, e.g., node degree and level of homophily (which are key to the performance of many network embedding methods). We analyze the use of network embedding methods, specifically node2vec, for learning road segment embeddings in road networks. Due to the often limited availability of information on other relevant road characteristics, the analysis focuses on leveraging the spatial network structure. Our results suggest that network embedding methods can indeed be used for deriving relevant network features (that may, e.g, be used for predicting speed limits), but that the qualities of the embeddings differ from embeddings for social networks.
💡 Research Summary
The paper investigates whether general‑purpose network embedding techniques, specifically node2vec, can be used to generate useful feature vectors for road‑segment‑level machine‑learning tasks when attribute information is scarce. Using the Danish road network extracted from OpenStreetMap—where only 13 % of segments have speed‑limit labels—the authors treat each directed road segment as a node (or edge) in a directed graph and apply node2vec’s biased second‑order random walks to learn d‑dimensional embeddings that preserve node neighborhoods.
Two classification tasks are examined: (1) road‑category prediction (e.g., highway, residential) and (2) speed‑limit prediction. The embeddings are fed to several classifiers (logistic regression, random forest, XGBoost) and evaluated with macro‑averaged F1 scores. The baseline is a naïve “most‑frequent‑class” predictor. With appropriate hyper‑parameter settings, node2vec achieves macro‑F1 = 0.57 for road‑category and 0.79 for speed‑limit, representing improvements of up to 8.3× and 11.5× over the baseline, respectively.
Key findings include:
- Homophily vs. structural equivalence – Road networks exhibit low homophily (adjacent segments often have different labels) and contain disconnected sub‑graphs (e.g., islands). Consequently, embeddings are not linearly separable, and linear classifiers underperform. By tuning node2vec’s return parameter p (low) and in‑out parameter q (high), the walk favours exploration of distant but structurally similar nodes, effectively emphasizing structural equivalence. This leads to higher classification performance, suggesting that structural similarity is more informative than pure homophily for road‑network tasks.
- Class‑specific homophily effects – Classes with higher intra‑class homophily (e.g., certain highway categories) achieve better F1 scores than low‑homophily classes, confirming that the embedding’s quality depends on the underlying label distribution in the graph.
- Hyper‑parameter sensitivity – Embedding dimension (d ≈ 128–256), number of walks per node (r ≈ 10), and walk length (l ≈ 80) provide a good trade‑off between representation richness and over‑fitting. Larger dimensions improve performance up to a point, after which gains plateau.
- Non‑linear classifiers outperform linear ones – Because the embedding space does not exhibit clear linear separability, tree‑based models (random forest, XGBoost) consistently outperform logistic regression, highlighting the need for non‑linear decision boundaries.
The study also situates node2vec among road‑specific embedding approaches such as Road2Vec (which relies on GPS trajectories) and attribute‑augmented random‑walk methods, noting that node2vec’s reliance solely on topology makes it attractive when auxiliary data are unavailable. However, the authors acknowledge that incorporating traffic or trajectory data could further boost performance.
In summary, the paper demonstrates that (i) network embedding is feasible for road‑network analysis despite structural differences from social networks, (ii) careful tuning of node2vec’s parameters to favour structural equivalence yields substantial gains, and (iii) downstream models should be non‑linear to compensate for the lack of linear separability in the learned embeddings. The authors suggest future work on multi‑modal embeddings, scalability to larger national networks, and transferability of the findings to other embedding algorithms.
Comments & Academic Discussion
Loading comments...
Leave a Comment