Memoryless Near-Collisions, Revisited

Memoryless Near-Collisions, Revisited
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper we discuss the problem of generically finding near-collisions for cryptographic hash functions in a memoryless way. A common approach is to truncate several output bits of the hash function and to look for collisions of this modified function. In two recent papers, an enhancement to this approach was introduced which is based on classical cycle-finding techniques and covering codes. This paper investigates two aspects of the problem of memoryless near-collisions. Firstly, we give a full treatment of the trade-off between the number of truncated bits and the success-probability of the truncation based approach. Secondly, we demonstrate the limits of cycle-finding methods for finding near-collisions by showing that, opposed to the collision case, a memoryless variant cannot match the query-complexity of the “memory-full” birthday-like near-collision finding method.


💡 Research Summary

This paper revisits the problem of finding near‑collisions for cryptographic hash functions without storing large tables of intermediate values. A near‑collision is defined as a pair of distinct inputs x and y such that the Hamming distance between H(x) and H(y) does not exceed a prescribed threshold ε. The classic birthday‑type attack for near‑collisions stores O(2^{(n‑ε)/2}) hash outputs (where n is the output length) and looks for two values whose distance is ≤ ε. While this “memory‑full” approach achieves optimal query complexity, it is impractical in environments where memory is scarce.

The authors focus on two memory‑less strategies that have appeared in recent literature. The first is the truncation method: one discards t output bits of the hash, computes the reduced function f_t(x)=Trunc_t(H(x)), and searches for ordinary collisions of f_t. Any collision of f_t guarantees that the original hash values differ in at most t bits, i.e., a (t‑near)‑collision. The paper provides a complete probabilistic analysis of the trade‑off between t and the success probability. Assuming independent random outputs, the probability that a collision appears after q queries is
 P(t,q)=1−exp(−q(q−1)/(2·2^{n‑t})).
From this expression the authors derive the optimal t for a target success probability α and a given query budget q, yielding t≈n−log₂(q²/(−2·ln(1−α))). They also give an algorithm that, given n, α and a maximum allowable number of queries, computes the exact t that minimizes the expected work. Experimental data for SHA‑256 confirm that the theoretical model predicts the observed success rates with high accuracy.

The second line of investigation concerns cycle‑finding techniques (Floyd’s tortoise‑and‑hare, Brent’s algorithm) combined with covering codes. The idea is to iterate f_t repeatedly, generating a sequence x, f_t(x), f_t²(x), … until a cycle is detected. If two distinct positions in the sequence share the same truncated value, the original hashes are guaranteed to be within t bits of each other. The authors model the iteration as a Markov chain and compute the expected length of the pre‑period and the cycle. They prove that, unlike the pure collision case, the expected number of queries needed to obtain a genuine ε‑near‑collision is lower‑bounded by Ω(2^{(n‑t)/2}·√t). Consequently, even with the most favorable choice of t, a memory‑less cycle‑finding attack cannot match the O(2^{(n‑ε)/2}) queries required by the memory‑full birthday method.

Covering codes are introduced as a way to reduce the effective search space. By partitioning the n‑bit space into codewords of minimum distance d (where d≈ε), each partition can be processed independently with its own cycle‑finding routine. The paper shows that the number of partitions grows roughly as 2^{n−log₂|C|}, while the per‑partition query cost grows with the covering radius. The net effect is a trade‑off: decreasing the covering radius (i.e., tightening the near‑collision bound) dramatically increases the required queries, offsetting any memory savings.

The authors validate their theoretical claims with extensive simulations. For a 256‑bit hash and ε=64, the truncation method with t=40–48 bits reaches a 90 % success probability after roughly 2^{20}–2^{22} queries, using virtually no memory. In contrast, the cycle‑finding approach with the same parameters needs at least 2^{30} queries to achieve comparable success, which is worse than the memory‑full birthday attack that succeeds around 2^{28} queries. The covering‑code variant shows similar behavior: modest memory reductions are possible, but the query overhead grows super‑linearly with the desired ε.

In summary, the paper makes two principal contributions. First, it delivers a precise analytical framework for selecting the truncation length t that balances query effort against success probability in memory‑less near‑collision attacks. Second, it establishes a fundamental limitation of cycle‑finding and covering‑code based memory‑less techniques: they cannot attain the optimal query complexity of the birthday‑type method for near‑collisions. These results provide clear guidance for practitioners who must choose between memory consumption and computational effort when assessing the security of hash functions against near‑collision attacks.


Comments & Academic Discussion

Loading comments...

Leave a Comment