Hypergraphs as Weighted Directed Self-Looped Graphs: Spectral Properties, Clustering, Cheeger Inequality
Hypergraphs naturally arise when studying group relations and have been widely used in the field of machine learning. To the best of our knowledge, the recently proposed edge-dependent vertex weights (EDVW) modeling is one of the most generalized modeling methods of hypergraphs, i.e., most existing hypergraph conceptual modeling methods can be generalized as EDVW hypergraphs without information loss. However, the relevant algorithmic developments on EDVW hypergraphs remain nascent: compared to the spectral theories for graphs, its formulations are incomplete, the spectral clustering algorithms are not well-developed, and the hypergraph Cheeger Inequality is not well-defined. To this end, deriving a unified random walk-based formulation, we propose our definitions of hypergraph Rayleigh Quotient, NCut, boundary/cut, volume, and conductance, which are consistent with the corresponding definitions on graphs. Then, we prove that the normalized hypergraph Laplacian is associated with the NCut value, which inspires our proposed HyperClus-G algorithm for spectral clustering on EDVW hypergraphs. Finally, we prove that HyperClus-G can always find an approximately linearly optimal partitioning in terms of both NCut and conductance. Additionally, we provide extensive experiments to validate our theoretical findings from an empirical perspective. Code of HyperClus-G is available at https://github.com/iDEA-iSAIL-Lab-UIUC/HyperClus-G.
💡 Research Summary
This paper develops a comprehensive spectral theory for hypergraphs equipped with edge‑dependent vertex weights (EDVW), a modeling framework that subsumes most existing hypergraph representations without loss of information. The authors begin by formalizing a random‑walk process on an EDVW hypergraph H = (V, E, ω, γ), where ω(e) > 0 is the weight of hyperedge e and γₑ(v) ≥ 0 is the weight assigned to vertex v within that hyperedge. A walk from a vertex u first selects an incident hyperedge e with probability ω(e)/d(u) (where d(u) is the sum of incident hyperedge weights) and then selects a vertex v inside e with probability γₑ(v)/δ(e) (δ(e) is the sum of γₑ over e). This yields a transition matrix
P = D_V⁻¹ W D_E⁻¹ R,
where W and R are the hyperedge‑to‑vertex and vertex‑to‑hyperedge weight matrices, respectively, and D_V, D_E are diagonal degree matrices. Under the connectivity assumption, the walk admits a unique stationary distribution ϕ, leading to the diagonal matrix Π = diag(ϕ).
Using Π and P, the authors define a random‑walk‑based Laplacian
L = Π − ΠP + PᵀΠ,
which they prove shares the same eigenstructure as the classical graph Laplacian. This enables the introduction of a Rayleigh quotient
R(x) = (xᵀLx)/(xᵀΠx).
For any bipartition (S, S̄) of V, they construct a vector x that takes values √(vol(S̄)/vol(S)) on S and −√(vol(S)/vol(S̄)) on S̄, where vol(S) = ∑_{u∈S}d(u). They show that
R(x) = 2·NCut(S, S̄),
where NCut(S, S̄) = |∂S|/vol(S) + |∂S|/vol(S̄) and |∂S| is the probability mass crossing the cut under the random walk. This mirrors the well‑known relationship in ordinary graphs and provides a solid foundation for spectral clustering on hypergraphs.
A major theoretical contribution is the proof of a Cheeger inequality for EDVW hypergraphs. Let λ₂ be the second smallest eigenvalue of the normalized Laplacian Π⁻¹ᐟ² L Π⁻¹ᐟ² and define the conductance Φ(H) = min_{S⊂V} |∂S|/min(vol(S), vol(S̄)). The authors establish
Φ(H)² / 2 ≤ λ₂ ≤ 2 Φ(H).
This result corrects and extends the earlier, incomplete Cheeger bound presented by Chitra & Raphael (2019). It guarantees that the spectral gap directly controls the quality of a cut, just as in the graph case.
Building on these foundations, the paper introduces HyperClus‑G, a spectral clustering algorithm for EDVW hypergraphs. The algorithm proceeds as follows: (1) construct P and Π, (2) compute L, (3) obtain the second eigenvector f of the normalized Laplacian (using Lanczos or power iteration), (4) partition vertices according to the sign of f (or by applying k‑means to f for multi‑way clustering). The authors analyze the computational complexity, noting that building the transition matrix costs O(m) (m = total hyperedge‑vertex incidences) and eigenvector computation can be performed in O(|V|³) in the worst case, but practical implementations using sparse solvers are far more efficient.
Theoretical guarantees are provided: Theorem 2 proves that the partition N returned by HyperClus‑G satisfies
NCut(N) ≤ O(NCut(N*)), Φ(N) ≤ O(Φ(N*)),
where N* and Φ(N*) denote the optimal NCut and conductance, respectively. Thus HyperClus‑G achieves a linear‑approximation factor for both objectives, mirroring the classic Cheeger‑based guarantees for graph spectral clustering.
Empirical validation is extensive. Synthetic experiments vary hyperedge sizes, edge weights, and vertex weight distributions to test the tightness of the Cheeger bound and the robustness of the algorithm. Real‑world datasets include co‑authorship networks (where each paper is a hyperedge and authors have different contribution weights), protein‑chemical interaction graphs, and recommendation‑system data (users, items, and group interactions). HyperClus‑G is compared against: (a) graph‑conversion spectral clustering, (b) prior EIVW‑based methods, and (c) recent hypergraph neural network clustering approaches. Evaluation metrics comprise NCut, conductance, Normalized Mutual Information (NMI), accuracy, and F1‑score. Across the board, HyperClus‑G attains lower NCut and conductance values and higher clustering quality, especially when vertex‑specific weights vary significantly—situations where traditional methods lose information.
The paper concludes by emphasizing that EDVW hypergraphs provide a truly expressive representation, unifying earlier models while enabling rigorous spectral analysis. Future directions suggested include extending the theory to multi‑way partitions with provable guarantees, developing online/streaming versions for dynamic hypergraphs, and integrating EDVW‑aware layers into hypergraph neural networks.
In sum, this work delivers a mathematically solid, algorithmically practical, and empirically validated framework for spectral clustering on the most general class of hypergraphs currently studied, filling a notable gap in the literature and opening avenues for advanced hypergraph‑based machine learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment