Normalized Mutual Information to evaluate overlapping community finding algorithms

Normalized Mutual Information to evaluate overlapping community finding   algorithms
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2].


💡 Research Summary

The paper addresses a critical shortcoming in the evaluation of overlapping community detection algorithms, a problem that has become increasingly relevant as social‑network analysis and other fields routinely encounter nodes belonging to multiple groups. The authors focus on the Normalized Mutual Information (NMI) metric, originally devised for disjoint partitions, and demonstrate that its standard normalization—division by the maximum of the two entropies—produces unintuitive and misleading scores when applied to overlapping clusterings.

Through a series of synthetic experiments based on a modified LFR benchmark and tests on real‑world networks (Facebook, DBLP, Amazon), the authors show that even completely unrelated clusterings can receive a positive NMI value simply because the number of communities in one partition grows. This occurs because the denominator, max{H(A), H(B)}, does not adequately capture the joint uncertainty introduced by overlapping memberships; as the number of clusters increases, the entropy terms inflate, artificially boosting the normalized score.

To remedy this, the paper proposes a more conventional information‑theoretic normalization:

\


Comments & Academic Discussion

Loading comments...

Leave a Comment