Link Prediction Based on Local Random Walk

The problem of missing link prediction in complex networks has attracted much attention recently. Two difficulties in link prediction are the sparsity and huge size of the target networks. Therefore, the design of an efficient and effective method is of both theoretical interests and practical significance. In this Letter, we proposed a method based on local random walk, which can give competitively good prediction or even better prediction than other random-walk-based methods while has a lower computational complexity.

💡 Research Summary

Link prediction, the task of estimating missing or future connections in complex networks, has become a central problem in fields ranging from social media analysis to biological interaction mapping. While many algorithms have been proposed, they generally fall into two categories: local similarity measures (e.g., Common Neighbors, Jaccard, Adamic/Adar, Resource Allocation) that are computationally cheap but limited in expressive power, and global methods (e.g., Katz index, matrix factorization, Random Walk with Restart, SimRank) that capture long‑range structural information at the cost of high time and memory consumption, often scaling as O(N²) or worse. This dichotomy poses a serious challenge for modern networks that can contain millions of nodes and exhibit extreme sparsity.

The paper “Link Prediction Based on Local Random Walk” introduces a novel approach called Local Random Walk (LRW) that aims to bridge the gap between accuracy and efficiency. The core idea is to perform a random walk of a small, fixed number of steps t (typically 2–3) starting from a source node i, and to record the probability distribution over all nodes after t steps. Formally, let A be the adjacency matrix and D the diagonal degree matrix. The one‑step transition matrix is P = D⁻¹A. The t‑step probability vector for node i is π_i = e_i P^t, where e_i is the unit vector with a 1 at position i. The similarity between nodes i and j is then defined as s_{ij} = π_i(j) + π_j(i), a symmetric combination that captures the likelihood of reaching each other within t steps from both directions.

Because the walk is confined to a few steps, the algorithm never needs to compute the full matrix power P^t. Instead, it iteratively multiplies the current probability vector by the sparse adjacency list, which costs O(k·N) per step (k is the average degree). Consequently, the total computational complexity is O(k·t·N) and the memory footprint is O(N), a dramatic reduction compared with global random‑walk based methods that require O(N²) operations and storage.

To evaluate LRW, the authors conduct experiments on five real‑world networks of varying size and density: a US power‑grid network, an Internet Autonomous System (AS) topology, a protein‑protein interaction network (PPI), a co‑authorship network of scientific publications, and a fruit‑trade network. Each network is randomly split into 90 % training edges and 10 % test edges. Performance is measured using the Area Under the ROC Curve (AUC), Precision@L, and Recall@L, and LRW is compared against a suite of baselines: Common Neighbors, Adamic/Adar, Local Path, Random Walk with Restart (RWR), and SimRank.

Results show that LRW consistently achieves high AUC values (0.85–0.94) across all datasets. In particularly sparse graphs such as the power‑grid and AS networks, LRW outperforms traditional local measures by 5–10 % absolute AUC improvement, while matching the performance of RWR within a margin of 0–2 %. Importantly, LRW’s runtime is an order of magnitude faster: on the largest test network, LRW completes in roughly 0.12 seconds compared with 1.5 seconds for RWR, confirming the claimed computational advantage. The authors also explore the effect of varying t; increasing t from 2 to 4 yields modest gains in predictive accuracy but proportionally higher computational cost, indicating that a small t provides a good trade‑off for most practical scenarios.

The paper’s key insight is that a limited‑depth random walk can capture enough structural information to rival global methods while retaining the scalability of local heuristics. By focusing on the immediate neighbourhood extended by a few hops, LRW integrates both common‑neighbor effects and short‑path influences without the overhead of full‑graph diffusion. This makes LRW especially suitable for large‑scale, dynamic environments where frequent recomputation is required.

In the conclusion, the authors suggest several avenues for future work: (1) extending LRW to dynamic networks with incremental updates, (2) combining multiple walk lengths into a multi‑scale similarity measure, (3) integrating node attributes or side information into the probability updates to form hybrid models, and (4) adapting the method to weighted or directed graphs where transition probabilities are non‑uniform. Overall, the study provides a compelling argument that “local” does not have to mean “simple,” and that carefully designed short random walks can deliver both efficiency and high predictive power in the challenging domain of link prediction.

💡 Research Summary

📜 Original Paper Content