Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks

Layered Label Propagation: A MultiResolution Coordinate-Free Ordering   for Compressing Social Networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We continue the line of research on graph compression started with WebGraph, but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses overdecomposition to perform aggressively on multi-core architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours. Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs.


💡 Research Summary

The paper tackles a fundamental obstacle in extending WebGraph‑style compression to general social networks: the lack of a natural node ordering. In Web graphs the lexicographic ordering of URLs yields highly localized adjacency lists, which can be efficiently encoded with Gap, Reference, and Interval coding. Social networks, however, have no such intrinsic order, and naïve orderings (BFS, DFS, degree‑based) produce much poorer compression.

To solve this, the authors introduce Layered Label Propagation (L‑LP), a multi‑resolution clustering algorithm built on the classic Label Propagation (LP) method. Standard LP repeatedly exchanges labels among neighboring vertices until a stable community structure emerges, but it provides only a single‑scale partition. L‑LP runs LP iteratively on a hierarchy of layers: the first layer discovers coarse‑grained communities, each of which is then split into finer sub‑communities in the next layer, and so on, until a logarithmic number of levels is reached. At each level the vertices inside a community are locally ordered according to a composite score that combines degree, current label, and the ordering inherited from the previous layer. The final global order is obtained by concatenating the ordered communities, respecting a priority that favours larger, denser clusters.

Scalability is achieved through over‑decomposition. The input graph is partitioned into thousands of small sub‑graphs that can be processed independently on separate cores. Each sub‑graph undergoes the full L‑LP pipeline (multi‑level LP and intra‑cluster ordering) without needing global information. After all sub‑graphs finish, a lightweight merging step resolves label inconsistencies at the boundaries. This approach limits memory consumption to the size of a sub‑graph and yields near‑linear speed‑up on multi‑core machines. The authors report that a graph with more than 600 million vertices can be reordered in a few hours on a modest server, while a 100 million‑vertex graph finishes in under an hour.

Once the ordering is produced, the standard WebGraph compression framework is applied unchanged. Because L‑LP places vertices that belong to the same community next to each other, adjacency lists become highly contiguous: Gap values shrink dramatically, and the probability that a neighbour can be referenced by a previously listed vertex rises sharply. Empirical evaluation on a diverse set of datasets—including LiveJournal, Orkut, Twitter, and several large web crawls—shows compression gains of 30 % to 45 % over the best previously known orderings. In many cases the bits‑per‑edge metric drops below 2.5 bpe, enabling the entire graph to reside in main memory for downstream analytics.

Performance metrics beyond compression are also presented. The time complexity of each LP pass is O(m) (m = number of edges); with O(log n) layers the overall algorithm remains essentially linear. Parallel scaling experiments demonstrate a 1.8× speed‑up when the number of cores is doubled, confirming high parallel efficiency. Memory overhead stays bounded because only a sub‑graph and its local data structures are resident at any moment.

The authors acknowledge limitations: extremely sparse or irregular graphs may produce ambiguous community boundaries, reducing ordering quality; and the boundary‑merging step, while inexpensive, adds some overhead. Nevertheless, these issues do not dominate the overall runtime.

In conclusion, Layered Label Propagation provides a practical, scalable, and highly effective method for generating node orderings that dramatically improve WebGraph‑style compression on both web and social network graphs. By enabling far higher compression ratios, it makes it feasible to keep massive graphs in RAM, thereby lowering I/O costs and expanding the scope of graph‑based machine‑learning, community detection, and real‑time network analysis tasks.


Comments & Academic Discussion

Loading comments...

Leave a Comment