An additivity theorem for plain Kolmogorov complexity
We prove the formula C(a,b) = K(a|C(a,b)) + C(b|a,C(a,b)) + O(1) that expresses the plain complexity of a pair in terms of prefix and plain conditional complexities of its components.
💡 Research Summary
The paper investigates the relationship between plain (non‑prefix) Kolmogorov complexity C and prefix‑free Kolmogorov complexity K when describing a pair of binary strings (a, b). Classical results state that the joint plain complexity satisfies C(a,b)=C(a)+C(b|a)+O(log n), where the additive O(log n) term cannot be eliminated. Levin later showed an O(1)‑precise formula for the prefix‑free version: K(a,b)=K(a)+K(b|a,K(a))+O(1). The authors bridge the gap by proving an exact additive theorem for the plain complexity that involves both K and C:
Theorem 1. C(a,b)=K(a | C(a,b)) + C(b | a, C(a,b)) + O(1).
The first term is the minimal length of a self‑delimiting program that reconstructs a when the value of C(a,b) is given; the second term is the minimal length of an ordinary (non‑self‑delimiting) program that reconstructs b given both a and C(a,b).
The proof consists of two inequalities.
Upper bound (≤): Let p be a prefix‑free program of length K(a | C(a,b)) that computes a from C(a,b), and let q be a (possibly non‑self‑delimiting) program of length C(b | a, C(a,b)) that computes b from a and C(a,b). Because p is self‑delimiting, the concatenation pq can be parsed uniquely, yielding a description of (a,b) of length |p|+|q|+O(1). If the inequality failed, the excess would be a small integer d; encoding d in O(log d) bits and prefixing it to pq would lead to a contradiction, forcing d=O(1).
Lower bound (≥): Fix n=C(a,b). Enumerate all pairs (x,y) with C(x,y)≤n; there are at most 2^{n+1} such pairs. For each fixed x, assign probability 2^{−n−1} to each y that appears with x, defining a conditional semimeasure P(x|n)=N_x·2^{−n−1}, where N_x is the number of y’s with C(x,y)≤n. By the standard coding theorem, K(a|n)≤−log P(a|n)+O(1)=n−log N_a+O(1). On the other hand, given a and n we can enumerate the N_a candidates for y, so C(b|a,n)≤log N_a+O(1). Adding the two inequalities yields K(a|n)+C(b|a,n)≤n+O(1), i.e. the desired equality after substituting n=C(a,b).
From Theorem 1 several known O(1)‑precise identities follow immediately:
- C(a)=K(a | C(a)) and C(b)=C(b | C(b)), which together give C(u | C(u))=K(u | C(u)).
- If a is a computable function of b, say a=f(b), then C(b)=K(f(b) | C(b))+C(b | f(b),C(b)).
The authors also show that swapping the roles of C and K in the two terms cannot yield a true O(1) formula. They construct infinitely many pairs (x,y) with C(x,y)≥C(x)+K(y|x)+log n−2 log log n−O(1) (n=|x|+|y|), demonstrating that C(x,y)≤C(x)+K(y|x)+O(1) fails in general.
Finally, they prove a fixed‑point property: for every pair (x,y) there exists a unique (k,l) (up to O(1) precision) such that C(x|l)=k and C(y|x,k)=l. This pair satisfies C(x,y|k,l)=k+l, showing that the information distance between (k,l) and (k′,l′) is only logarithmic in the coordinate differences. The result implies that K(y|x) cannot be defined simply as the minimal prefix‑free program length mapping x to y, because such a definition would contradict the constructed lower bound on C(x,y).
In summary, the paper delivers a clean, O(1)‑exact additive decomposition of plain Kolmogorov complexity using a hybrid of prefix‑free and plain conditional complexities. It refines the classical symmetry‑of‑information theorem, clarifies the limits of mixing C and K, and provides new insights into the structure of algorithmic information and its quantitative relationships.
Comments & Academic Discussion
Loading comments...
Leave a Comment