Detecting and Characterizing Small Dense Bipartite-like Subgraphs by the Bipartiteness Ratio Measure

We study the problem of finding and characterizing subgraphs with small \textit{bipartiteness ratio}. We give a bicriteria approximation algorithm \verb|SwpDB| such that if there exists a subset $S$ of volume at most $k$ and bipartiteness ratio $\theta$, then for any $0<\epsilon<1/2$, it finds a set $S’$ of volume at most $2k^{1+\epsilon}$ and bipartiteness ratio at most $4\sqrt{\theta/\epsilon}$. By combining a truncation operation, we give a local algorithm \verb|LocDB|, which has asymptotically the same approximation guarantee as the algorithm \verb|SwpDB| on both the volume and bipartiteness ratio of the output set, and runs in time $O(\epsilon^2\theta^{-2}k^{1+\epsilon}\ln^3k)$, independent of the size of the graph. Finally, we give a spectral characterization of the small dense bipartite-like subgraphs by using the $k$th \textit{largest} eigenvalue of the Laplacian of the graph.

💡 Research Summary

The paper addresses the problem of locating small, dense subgraphs that are close to bipartite, measured by a quantity called the bipartiteness ratio (BR). For a subset of vertices (S) that can be partitioned into two parts (L) and (R), the BR is defined as the ratio of the total number of edges incident to the partition (including edges inside (L), inside (R), and crossing between (L) and (R)) to the volume (sum of degrees) of the vertices in (L\cup R). A low BR indicates that the subgraph is almost bipartite, i.e., most edges go across the cut rather than staying inside each side.

The authors propose two algorithms with provable guarantees. The first, SwpDB, is a global bicriteria approximation algorithm that works by computing an eigenvector of the normalized Laplacian, sorting vertices according to the absolute value of the eigenvector entries, and then performing a “sweep” over the sorted order. At each step the algorithm evaluates the current prefix set’s volume and BR; when a prefix satisfies the volume bound it records the best BR seen. The analysis shows that if there exists a set (S) of volume at most (k) with BR (\theta), then for any parameter (\epsilon\in(0,1/2)) SwpDB returns a set (S’) whose volume is at most (2k^{1+\epsilon}) and whose BR is at most (4\sqrt{\theta/\epsilon}). The factor (k^{\epsilon}) is the price paid for the approximation; by choosing (\epsilon) small one can keep the volume blow‑up modest while the BR guarantee degrades only as the square root of (\theta/\epsilon).

The second algorithm, LocDB, is a local version that avoids scanning the entire graph. It starts from a seed vertex, runs a truncated random walk (or heat diffusion) that repeatedly “renormalizes” the probability vector and discards entries below a threshold. This truncation limits the support of the walk to a region whose size depends only on the target volume (k) and the desired BR (\theta). The authors prove that LocDB inherits the same bicriteria guarantees as SwpDB: it outputs a set of volume at most (2k^{1+\epsilon}) and BR at most (4\sqrt{\theta/\epsilon}). Moreover, its running time is (O(\epsilon^{2}\theta^{-2}k^{1+\epsilon}\log^{3}k)), which is independent of the total number of vertices or edges in the graph. This makes LocDB suitable for massive networks where only a small portion of the graph can be examined.

Beyond algorithmic contributions, the paper provides a spectral characterization of small dense bipartite‑like subgraphs. Classical Cheeger‑type results relate the second smallest eigenvalue of the Laplacian to conductance. Here the authors focus on the k‑th largest eigenvalue (\lambda_{n-k+1}) of the normalized Laplacian. They prove a “bipartiteness Cheeger inequality”: a large value of (\lambda_{n-k+1}) implies the existence of a set of volume at most (k) whose BR is small, and conversely a set with small BR forces (\lambda_{n-k+1}) to be large. This duality gives a clean spectral certificate for the presence of almost‑bipartite dense clusters, extending the traditional spectral clustering theory to the regime of large eigenvalues.

The technical analysis combines several tools. The sweep argument relies on the monotonicity of the sorted eigenvector entries and on bounding the change in BR when a vertex is added. The truncation analysis for LocDB uses concentration bounds for truncated random walks and shows that the probability mass lost by truncation is bounded by a function of (\epsilon) and (\theta). Both algorithms are proved to succeed with high probability, assuming the existence of a target set satisfying the volume and BR constraints.

In terms of impact, the work opens a new avenue for community detection and graph mining where the target communities are not merely dense but also exhibit a strong bipartite structure—common in recommendation systems (user‑item bipartite graphs), biological interaction networks, and fraud detection (buyer‑seller graphs). The local algorithm’s independence from the global graph size makes it practical for streaming or distributed settings. Moreover, the spectral characterization provides a diagnostic tool: by inspecting the tail of the Laplacian spectrum one can quickly assess whether a graph contains any small, dense, almost‑bipartite substructures without running a full combinatorial search.

Overall, the paper delivers a coherent theoretical framework: a well‑defined quality measure (bipartiteness ratio), two algorithms (global and local) with matching bicriteria approximation guarantees, and a spectral theorem that ties the existence of such subgraphs to the large eigenvalues of the Laplacian. The results are rigorous, the algorithms are efficient, and the insights have clear relevance to a broad range of applications in network analysis.