Fast matrix computations for pair-wise and column-wise commute times and Katz scores
We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and Katz scores from a node to all other nodes in the graph. Here, our approach for the commute times is based on a variation of the conjugate gradient algorithm, and it provides an estimate of all the diagonals of the inverse of a matrix. Our technique for the Katz scores is based on exploiting an empirical localization property of the Katz matrix. We adopt algorithms used for personalized PageRank computing to these Katz scores and theoretically show that this approach is convergent. We evaluate these methods on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our results show that our pair-wise commute time method and column-wise Katz algorithm both have attractive theoretical properties and empirical performance.
💡 Research Summary
The paper addresses the computational bottleneck of evaluating two fundamental graph similarity measures—commute time and Katz score—on large networks. Direct computation of these measures requires inverting dense matrices (I − αA for Katz and L + (1/n)eeᵀ for commute time), which incurs O(n³) time and O(n²) memory, prohibitive for graphs with millions of nodes. The authors propose two families of algorithms that target (1) a single pair of nodes (pair‑wise problem) and (2) a single source node to all others (column‑wise problem), achieving near‑linear runtime and modest memory footprints while also providing rigorous error bounds.
Pair‑wise algorithms
Both commute time and Katz score for a node pair (i, j) can be expressed as a bilinear form uᵀZ⁻¹v where Z is a symmetric positive‑definite matrix (Z = I − αA for Katz, Z = \tilde L for commute time) and u, v are simple basis vectors. The authors exploit the Lanczos process to construct a small tridiagonal matrix Tₖ that captures the action of Z on the Krylov subspace generated by u (or v). Using the Gauss‑type quadrature theory of Golub and Meurant, the scalar uᵀZ⁻¹v is approximated by a quadrature rule whose nodes and weights are derived from the eigenvalues and eigenvectors of Tₖ. Crucially, this yields provable upper and lower bounds after only a modest number of Lanczos steps (typically 10–30), requiring only matrix‑vector products with Z, i.e., sparse‑matrix multiplications. The method thus avoids any explicit matrix inversion and delivers tight confidence intervals for the estimated score.
Column‑wise algorithms
For the column‑wise problem the goal is to compute either the i‑th column of the commute‑time matrix C or the i‑th column of the Katz matrix K.
Commute‑time column: The authors note that C eᵢ = vol(G)·(eᵢ − eⱼ)ᵀ\tilde L⁻¹(eᵢ − eⱼ) for all j, which requires both the solution of \tilde L⁻¹eᵢ and the diagonal entries of \tilde L⁻¹. By leveraging the relationship between the Lanczos process and the conjugate‑gradient (CG) algorithm, they run a CG iteration for \tilde L⁻¹eᵢ while simultaneously extracting estimates of all diagonal elements from the tridiagonal matrices generated during the iteration (following Paige‑Saunders and recent extensions). This yields an estimate of the entire column in O(k·nnz(\tilde L)) time and O(n) additional memory.
Katz column: Empirical observations reveal that the solution vector x = (I − αA)⁻¹eᵢ is highly localized—only a small subset of entries have appreciable magnitude. The authors adapt push‑style algorithms originally designed for personalized PageRank. Starting from the residual vector, they “push” mass only to neighboring nodes whose residual exceeds a threshold, iteratively refining the approximation. The process can be interpreted as a coordinate‑descent method on the quadratic form (1/2)xᵀ(I − αA)x − eᵢᵀx, for which convergence is proven. Because the algorithm touches only a limited frontier of vertices, it runs in time proportional to the number of visited nodes, often orders of magnitude smaller than the full graph size.
Experimental evaluation
The authors test their methods on 17 real‑world graphs ranging from 1 K to 1 M nodes and up to tens of millions of edges. For pair‑wise commute time, the Lanczos‑quadrature approach delivers upper and lower bounds within 10⁻⁴ absolute error in sub‑millisecond time. For pair‑wise Katz, similar accuracy is achieved with comparable speed. In the column‑wise setting, the CG‑based commute‑time column estimator converges in a few CG iterations, producing the full column in seconds for million‑node graphs, while using less than 5 % of the memory required by a full matrix inversion. The push‑based Katz column algorithm identifies the top‑k most related nodes in 2–5 seconds, with negligible error for the returned entries, and its memory usage scales with the size of the explored frontier rather than the whole graph.
Theoretical contributions
- Integration of Lanczos‑based Gauss quadrature into graph similarity estimation, providing deterministic error bounds rarely seen in data‑mining literature.
- Demonstration of the localization property of Katz solutions and its exploitation via a provably convergent push algorithm.
- Extension of the Lanczos‑CG relationship to simultaneous diagonal‑inverse estimation, enabling efficient column‑wise commute‑time computation without explicit diagonal extraction.
Implications and applications
The proposed techniques are directly applicable to link prediction, anomaly detection, recommendation, and clustering tasks where only a subset of similarity scores is needed on demand. The ability to obtain tight bounds for pair‑wise scores enables confidence‑aware decision making, while the column‑wise methods support fast “nearest‑neighbor” queries in massive networks. By reducing both time and space requirements dramatically, the work opens the door to real‑time analytics on graphs that were previously infeasible to process with exact spectral methods.
In summary, the paper delivers a cohesive suite of algorithms that blend sophisticated numerical linear algebra (Lanczos, CG, quadrature) with graph‑specific insights (localization, push dynamics) to compute commute times and Katz scores efficiently at scale, backed by rigorous analysis and extensive empirical validation.
Comments & Academic Discussion
Loading comments...
Leave a Comment