Information Distance in Multiples

Information Distance in Multiples
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering, and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity, and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. {\em Index Terms}– Information distance, multiples, pattern recognition, data mining, similarity, Kolmogorov complexity


💡 Research Summary

The paper “Information Distance in Multiples” extends the well‑known concept of information distance—from pairs of objects to finite lists (multiples). Classical information distance, rooted in Kolmogorov complexity, measures the length of the shortest program that transforms one object into another, i.e., D(x, y) = max{K(x|y), K(y|x)}. This pairwise distance underlies practical similarity measures such as the Normalized Compression Distance (NCD), which have been successfully applied to pattern recognition, data mining, phylogeny, clustering, and classification.

The authors first define a “multiple information distance” for a list L = {x₁,…,xₙ}. They consider the Kolmogorov complexity of the whole list, K(L), and the conditional complexities K(xᵢ | L{xᵢ})—the amount of information needed to reconstruct each element given the rest of the list. Two natural formulations arise: (1) a max‑based distance Dₘₐₓ(L) = maxᵢ K(xᵢ | L{xᵢ}), and (2) a sum‑based distance Dₛᵤₘ(L) = Σᵢ K(xᵢ | L{xᵢ}). Both capture the intuition that the distance reflects the “extra” information each element contributes beyond the common core of the list.

A central contribution is the analysis of maximal overlap, the shared information among all items in the list. Formally, maximal overlap is the difference between K(L) and Dₛᵤₘ(L). When this gap is small, the items are highly redundant, which is useful for detecting tight clusters or common motifs. Conversely, minimal overlap measures the unique information of each item, expressed by the gap between the unconditional complexity K(xᵢ) and its conditional counterpart K(xᵢ | L{xᵢ}). Large minimal overlap indicates that items carry distinct features, a property valuable for feature selection and discriminative classification.

The paper rigorously proves that the multiple distance satisfies the axioms of a metric: non‑negativity, identity of indiscernibles, symmetry (trivially, because the list is unordered), and the triangle inequality extended to lists (D(A∪B) + D(B∪C) ≥ D(A∪C)). This metricity guarantees that distance‑based algorithms (e.g., nearest‑neighbor, hierarchical clustering) remain well‑behaved when applied to sets of objects rather than pairs.

Universality is another key property: the proposed distance dominates any computable distance that can be defined via effective transformations. In other words, for any computable distance d, D(L) ≤ d(L) + O(1). This establishes the multiple distance as the most “information‑theoretically efficient” similarity measure, analogous to the universality of the pairwise information distance.

The authors also discuss additivity: when two disjoint lists L₁ and L₂ are concatenated, the distance of the union never exceeds the sum of the individual distances, i.e., D(L₁∪L₂) ≤ D(L₁) + D(L₂). This property underpins scalable computation—large datasets can be partitioned, processed independently, and then combined without loss of theoretical guarantees.

To make the theory applicable, the authors introduce a normalized multiple information distance (NID). Extending the pairwise NID (D(x,y)/max{K(x),K(y)}), the multiple version is defined as NID(L) = D(L)/K(L) (or equivalently using the sum formulation). This yields a dimensionless similarity score in the interval


Comments & Academic Discussion

Loading comments...

Leave a Comment