Learning Graph Representations by Dendrograms
Hierarchical graph clustering is a common technique to reveal the multi-scale structure of complex networks. We propose a novel metric for assessing the quality of a hierarchical clustering. This metric reflects the ability to reconstruct the graph from the dendrogram, which encodes the hierarchy. The optimal representation of the graph defines a class of reducible linkages leading to regular dendrograms by greedy agglomerative clustering.
💡 Research Summary
The paper introduces a novel framework for representing a weighted, undirected graph as a hierarchical dendrogram and evaluates the quality of such a representation by its ability to reconstruct the original graph. The authors start by defining a probability distribution over node pairs, p(u,v) = w(u,v)/w, where w(u,v) is the edge weight and w is the total weight of all edges. They also introduce a prior distribution π over nodes, which can be uniform (no prior knowledge) or equal to the node‑sampling distribution p (perfect knowledge of node weights).
A dendrogram is modeled as a rooted binary tree whose leaves correspond to the graph’s vertices. Each internal node i is assigned a height d(i); the ultrametric distance between any two vertices u and v is defined as the height of their lowest common ancestor, d(u,v). Interpreting distance as inverse similarity, the authors construct a reconstructed graph ˆG with edge weights ˆw(u,v) = π(u)π(v)d(u,v) (for u≠v). The corresponding sampling distribution ˆp(u,v) = ˆw(u,v)/ˆw is then compared to the original p(u,v) using the Kullback‑Leibler (KL) divergence D(p‖ˆp).
Minimizing D(p‖ˆp) leads to the cost function
J(d) = Σ_{u≠v} p(u,v) log d(u,v) + log Σ_{u≠v} π(u)π(v) d(u,v),
which is invariant under scaling of the ultrametric. By expressing d(u,v) in terms of internal nodes (A,B) of the dendrogram, the cost can be rewritten as
J(d) = Σ_{(A,B)∈I} p(A,B) log d(A,B) + log Σ_{(A,B)∈I} π(A)π(B) d(A,B),
where I is the set of internal nodes and p(A,B), π(A), π(B) are the aggregated probabilities for the two sub‑clusters attached to node (A,B).
Assuming the tree topology T is fixed, the optimal distance for each internal node is obtained by differentiating J(d) and yields
d(A,B) = λ π(A)π(B) p(A,B),
with λ =
Comments & Academic Discussion
Loading comments...
Leave a Comment