Fast and Scalable Analysis of Massive Social Graphs
Graph analysis is a critical component of applications such as online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS). In this paper, we study ways to enable scalable graph processing on today’s massive graphs. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near-) shortest paths between node pairs. After a one- time preprocessing cost, Rigel answers node-distance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortest-path systems with similar levels of accuracy.
💡 Research Summary
The paper addresses the fundamental scalability problem of computing shortest‑path distances on massive social graphs that contain tens to hundreds of millions of nodes. Traditional algorithms such as breadth‑first‑search (BFS), Dijkstra, or Floyd‑Warshall become prohibitively slow, often taking minutes for a single pairwise query on graphs the size of Facebook or Twitter. To overcome this bottleneck, the authors build upon the concept of Graph Coordinate Systems (GCS), which embed each node of a graph into a point in a geometric space so that the geometric distance approximates the true shortest‑path distance. Their earlier system, Orion, used Euclidean coordinates, suffered from 15 %–20 % average relative error and a centralized embedding phase that did not scale beyond a few million nodes.
The new system, named Rigel, makes two decisive advances. First, it adopts a hyperbolic embedding, specifically the hyperboloid model, because hyperbolic geometry naturally captures the hierarchical, core‑dense structure of social networks and yields substantially lower distortion. The curvature parameter c (≤ 0) can be tuned; when c = 0 the space collapses to Euclidean, while more negative values increase hyperbolic curvature and improve fit. Empirical tests on seven real‑world social graphs (including Facebook regional networks, Flickr, Orkut, LiveJournal, and a 43‑million‑node Renren snapshot) show that hyperbolic embeddings achieve average relative error (ARE) below 0.10, average absolute error (AAE) below 0.50, and superior contraction/expansion ratios compared with Euclidean and spherical alternatives.
Second, Rigel’s embedding pipeline is explicitly designed for parallel execution on a compute cluster. The process begins by selecting a set of anchor nodes and measuring exact shortest‑path distances among them. These anchor distances are used to compute initial coordinates for all nodes. The graph is then partitioned across multiple servers; each server runs a scalable gradient‑descent optimization on its subgraph, periodically exchanging coordinate updates to maintain global consistency. This distributed approach reduces the overall embedding time from hours (in a centralized setting) to tens of minutes even for the 43‑million‑node Renren graph, while preserving linear‑time (O(n)) computational complexity.
After the one‑time embedding, Rigel can answer distance queries in constant time—typically 10–30 µs—because it merely computes a hyperbolic distance formula between two stored coordinate vectors. Moreover, the authors extend the coordinate framework to approximate shortest‑path reconstruction. By drawing the hyperbolic geodesic between two node coordinates and mapping points along this geodesic back to actual graph nodes (using a limited BFS for verification), Rigel produces a candidate path whose length deviates by less than 5 % from the true shortest path. This path‑reconstruction method is up to 18× faster than prior approximate‑path systems while delivering comparable accuracy.
The evaluation includes a thorough comparison of five embedding spaces (Euclidean, spherical, and three hyperbolic models) using six distortion metrics (ARE, AAE, AER, ACR, ASPD, SD). The hyperboloid model consistently outperforms the others, justifying its selection. Rigel’s scalability is demonstrated on a cluster of commodity machines (16 GB RAM, 8‑core CPUs per node); network traffic during embedding is modest because only anchor distances and coordinate deltas are exchanged.
Finally, the paper discusses practical applications such as graph separation metrics, centrality computation, and distance‑ranked social search, showing that Rigel can serve as a backend for a wide range of analytics that previously required costly BFS traversals. Limitations include the need for re‑embedding when the graph changes dramatically and the upfront preprocessing cost, which the authors identify as future work along with dynamic coordinate updates and automatic curvature tuning. In summary, Rigel delivers a highly accurate, massively scalable, and fast solution for distance and approximate path queries on today’s largest social graphs, opening the door to real‑time graph‑driven services and research.
Comments & Academic Discussion
Loading comments...
Leave a Comment