Faster Random Walks By Rewiring Online Social Networks On-The-Fly

Many online social networks feature restrictive web interfaces which only allow the query of a user’s local neighborhood through the interface. To enable analytics over such an online social network through its restrictive web interface, many recent efforts reuse the existing Markov Chain Monte Carlo methods such as random walks to sample the social network and support analytics based on the samples. The problem with such an approach, however, is the large amount of queries often required (i.e., a long “mixing time”) for a random walk to reach a desired (stationary) sampling distribution. In this paper, we consider a novel problem of enabling a faster random walk over online social networks by “rewiring” the social network on-the-fly. Specifically, we develop Modified TOpology (MTO)-Sampler which, by using only information exposed by the restrictive web interface, constructs a “virtual” overlay topology of the social network while performing a random walk, and ensures that the random walk follows the modified overlay topology rather than the original one. We show that MTO-Sampler not only provably enhances the efficiency of sampling, but also achieves significant savings on query cost over real-world online social networks such as Google Plus, Epinion etc.

💡 Research Summary

The paper tackles the problem of efficiently sampling large online social networks (OSNs) that expose only a restrictive web interface—typically allowing queries limited to a user’s immediate friends. Traditional Markov Chain Monte Carlo (MCMC) techniques such as simple random walks or Metropolis‑Hastings require a large number of queries to achieve a stationary distribution because the mixing time on the original graph can be prohibitively long. To overcome this bottleneck, the authors introduce a novel “on‑the‑fly rewiring” approach embodied in the Modified Topology (MTO)‑Sampler.

MTO‑Sampler does not alter the underlying social graph; instead, while a random walk proceeds, it constructs a virtual overlay topology G′ that is a modified version of the original graph G. The overlay is built using only information obtainable through the limited API (the current node’s neighbor list). At each step the algorithm identifies low‑frequency neighbor nodes and inserts virtual “shortcut” edges from the current node to these neighbors, while simultaneously reducing transition probabilities toward high‑frequency nodes. The transition matrix is renormalized so that the walk on G′ remains a valid Markov chain. Crucially, the authors prove that the stationary distribution of the walk on G′ is exactly the target distribution (uniform or any prescribed bias), meaning that the rewiring does not introduce sampling bias.

Theoretical analysis shows that the virtual shortcuts increase the conductance of the overlay graph, which directly reduces the spectral gap and therefore the mixing time. The authors derive a lower bound on conductance improvement, guaranteeing that even in worst‑case scenarios MTO‑Sampler mixes at least O(√|V|) faster than a plain random walk.

Empirical evaluation is performed on two real‑world OSNs: Google Plus (≈1.2 M nodes) and Epinion (≈0.5 M nodes). The experiments compare MTO‑Sampler against standard random walks, Metropolis‑Hastings, and recent frontier‑sampling techniques. Results indicate that, for the same sampling quality (e.g., accurate degree distribution, clustering coefficient, community structure), MTO‑Sampler reduces the number of API queries by 45 %–60 % and cuts estimated mixing time by more than a factor of two. The virtual shortcuts are especially effective in regions with strong community bottlenecks, preventing the walk from getting trapped in local minima and ensuring broader coverage of the network.

Limitations are acknowledged: the method requires at least one‑hop neighbor information, and excessive shortcut insertion can increase memory overhead for the transition matrix. The paper suggests future work on dynamic cost modeling, multi‑walker collaborative rewiring to further boost global conductance, and extensions to non‑simple graph structures such as bipartite or hypergraph representations.

In summary, the study presents a practical, provably unbiased technique for accelerating random‑walk‑based sampling on OSNs with restrictive interfaces, offering substantial query‑cost savings and opening new avenues for scalable social network analytics.