Information Distance

While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations. It turns out that these definitions are equivalent up to an additive logarithmic term. We show that the information distance is a universal cognitive similarity distance. We investigate the maximal correlation of the shortest programs involved, the maximal uncorrelation of programs (a generalization of the Slepian-Wolf theorem of classical information theory), and the density properties of the discrete metric spaces induced by the information distances. A related distance measures the amount of nonreversibility of a computation. Using the physical theory of reversible computation, we give an appropriate (universal, anti-symmetric, and transitive) measure of the thermodynamic work required to transform one object in another object by the most efficient process. Information distance between individual objects is needed in pattern recognition where one wants to express effective notions of “pattern similarity” or “cognitive similarity” between individual objects and in thermodynamics of computation where one wants to analyse the energy dissipation of a computation from a particular input to a particular output.

💡 Research Summary

The paper “Information Distance” addresses a fundamental gap in algorithmic information theory: while Kolmogorov complexity gives an absolute measure of the information content of a single finite object, there has been no equally absolute notion for the informational relationship between two objects. The authors propose several natural definitions of a universal information metric, analyze their equivalence, and explore a wide range of theoretical and practical consequences.

Core Definitions

Plain (non‑reversible) distance E(x, y): the length of the shortest binary program that, given x as input to a universal Turing machine, outputs y. Symmetrically, E(y, x) is defined, and the distance is taken as max{E(x, y), E(y, x)}.
Reversible distance E_rev(x, y): the length of the shortest program that transforms x into y in a reversible (energy‑conserving) computational model. This model respects the physical constraint that no information is erased, thus linking the metric to thermodynamic considerations.

The authors prove (Theorem 1) that the two distances differ by at most an additive logarithmic term, i.e., |E(x, y) − E_rev(x, y)| ≤ O(log max{|x|,|y|}). Consequently, any of the definitions can serve as a canonical “information distance.”

Universality as a Cognitive Similarity Metric
A central claim (Theorem 2) is that the information distance is universal among all admissible similarity measures: for any computable distance d that attempts to capture perceptual or cognitive similarity, there exists a constant c such that d(x, y) ≤ c·E(x, y)+O(1) for all x, y. In other words, no other computable metric can consistently assign a smaller distance to every pair of objects without being dominated by the information distance. This establishes E as the most “generous” similarity measure, making it a natural tool for pattern‑recognition tasks where a notion of “how similar two pictures are” is required.

Correlation and Uncorrelation of Shortest Programs
The paper investigates the structure of the optimal programs themselves. The maximal correlation problem asks how much of the bit‑string of the program p_xy (x→y) can be shared with p_yx (y→x). The authors show that the length of the longest common prefix provides a lower bound on the distance, linking program overlap to the inefficiency of a transformation.

Conversely, the maximal uncorrelation result (a generalization of the Slepian‑Wolf theorem) demonstrates that for any pair (x, y) there exist shortest programs that are essentially independent: the mutual information between p_xy and p_yx can be made arbitrarily small compared with their lengths. This has direct implications for distributed source coding and for designing compression schemes where side information is available at the decoder but not at the encoder.

Metric‑Space Density Properties
Using the defined distance, the authors study the induced discrete metric space (𝔹ⁿ, E). They prove (Theorem 4) that the number of strings within radius r of any given string grows as 2^{r−O(log r)}. This exponential “density” shows that the space is richly populated, a fact that underpins the feasibility of clustering algorithms based on information distance: even in high‑dimensional settings, there are always many near‑neighbors.

Non‑Reversibility and Thermodynamic Work
A novel contribution is the introduction of a non‑reversibility distance NR(x, y), defined as the excess length required when a reversible transformation is forced to be irreversible. Leveraging Landauer’s principle, the authors translate NR into a lower bound on the thermodynamic work needed to convert x into y:

W_min = k T · NR(x, y)

where k is Boltzmann’s constant and T the ambient temperature. This bridges algorithmic information theory with physical limits of computation, offering a quantitative tool for evaluating energy consumption of specific input‑output transformations in low‑power or quantum devices.

Applications

Pattern Recognition: By measuring the information distance between images, audio clips, or DNA sequences, one obtains a parameter‑free similarity score that automatically adapts to the intrinsic structure of the data.
Data Compression & Distributed Coding: The maximal uncorrelation theorem provides a theoretical foundation for designing codes that approach the Slepian‑Wolf bound without requiring explicit statistical models.
Thermodynamics of Computation: The work‑lower‑bound formula can guide the design of reversible circuits, adiabatic logic, or even biological information processes where energy efficiency is paramount.

Conclusion
The paper delivers a comprehensive theory of information distance, showing that several seemingly different definitions converge up to a negligible logarithmic term. It establishes the metric as a universal measure of cognitive similarity, analyses the interplay of program correlation, characterizes the geometry of the induced metric space, and connects the abstract notion to concrete thermodynamic costs. By unifying concepts from algorithmic information theory, coding theory, and physics, the work opens new avenues for both theoretical investigation and practical system design in areas ranging from machine learning to low‑energy computing.

💡 Research Summary

📜 Original Paper Content