Structural patterns in complex systems using multidendrograms

Structural patterns in complex systems using multidendrograms
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Complex systems are usually represented as an intricate set of relations between their components forming a complex graph or network. The understanding of their functioning and emergent properties are strongly related to their structural properties. The finding of structural patterns is of utmost importance to reduce the problem of understanding the structure-function relationships. Here we propose the analysis of similarity measures between nodes using hierarchical clustering methods. The discrete nature of the networks usually leads to a small set of different similarity values, making standard hierarchical clustering algorithms ambiguous. We propose the use of “multidendrograms”, an algorithm that computes agglomerative hierarchical clusterings implementing a variable-group technique that solves the non-uniqueness problem found in the standard pair-group algorithm. This problem arises when there are more than two clusters separated by the same maximum similarity (or minimum distance) during the agglomerative process. Forcing binary trees in this case means breaking ties in some way, thus giving rise to different output clusterings depending on the criterion used. Multidendrograms solves this problem grouping more than two clusters at the same time when ties occur.


💡 Research Summary

The paper addresses a fundamental limitation of hierarchical clustering when applied to complex networks: the “ties in proximity” problem. In many network analyses the similarity (or distance) between nodes takes on only a limited set of discrete values, which often leads to several pairs of clusters sharing exactly the same minimum distance (or maximum similarity) at a given agglomeration step. Traditional agglomerative algorithms based on the pair‑group (binary) merging rule must arbitrarily break these ties, producing different binary dendrograms depending on the order of the data or on the tie‑breaking criterion. This non‑uniqueness hampers reproducibility and makes the interpretation of the resulting hierarchy ambiguous.

To solve this, the authors propose a variable‑group agglomerative scheme implemented in the “multidendrogram” algorithm. Instead of selecting a single pair of clusters when a tie occurs, the algorithm identifies the whole set of clusters that share the current extremal distance and merges them simultaneously into a single super‑cluster. The distance (or similarity) between this new super‑cluster and the remaining clusters is then recomputed using any standard linkage method (e.g., Unweighted Average, Weighted Average, Complete Linkage). When no ties are present, the method reduces to the ordinary pair‑group result; when ties exist, the resulting dendrogram is non‑binary, explicitly showing multi‑branch nodes that correspond to the simultaneous merging of more than two clusters. This guarantees a unique hierarchical solution that is independent of input ordering or arbitrary tie‑breaking rules.

The algorithm is packaged in a publicly available tool called MultiDendrograms. The software offers a graphical user interface for data import, choice of distance/similarity matrices, selection of linkage criteria, and extensive visual customization. It also provides a command‑line mode for batch processing, outputs in text, Newick, and image formats, and computes quality measures such as the cophenetic correlation coefficient, normalized mean squared error, and normalized mean absolute error.

Three case studies illustrate the practical impact of the approach:

  1. Vertex similarity in networks – Using Jaccard and Leicht similarity measures, the authors cluster (i) a synthetic 25‑node hierarchical network (Barabási‑Ravasz model) and (ii) Zachary’s karate club network. Both datasets contain many identical similarity values, leading to numerous ties. Multidendrograms correctly group tied clusters and recover the known community split in the karate club, as well as reveal symmetric nodes (e.g., nodes 15, 16, 19, 21, 23) that would be hidden in binary trees.

  2. Modular node similarity – The authors define a similarity based on the fraction of resolution parameters (r) for which two nodes belong to the same community in a modularity‑optimisation framework. Applying this to a synthetic H13‑4 network (256 nodes, two hierarchical levels) yields a similarity matrix that, when processed with multidendrograms, reproduces the four underlying groups at a single agglomeration level. Conventional binary dendrograms would generate several incompatible trees, failing to capture the true hierarchical organization.

  3. Distance similarity in complete weighted networks – A real‑world example uses a genetic distance matrix of 78 Spanish grape varieties. Because the distances are rounded to three, four, or five decimal places, many ties appear. Binary hierarchical clustering with Unweighted or Weighted Average linkage can produce up to 17,900 distinct binary trees for the same data, depending on precision. Multidendrograms, however, produce a single, interpretable hierarchy and visually highlight the tie‑rich regions, demonstrating how precision influences the number of possible trees.

Overall, the multidendrogram approach eliminates the ambiguity caused by ties, provides a unique and reproducible hierarchical representation, and offers a clear visual cue of where ties occur. This makes it especially valuable for fields where similarity‑based clustering of network nodes is common, such as systems biology, social network analysis, and computational chemistry. The authors suggest that future work could explore new network descriptors derived from tie‑aware hierarchies and scale the algorithm to very large datasets.


Comments & Academic Discussion

Loading comments...

Leave a Comment