Recover plaintext attack to block ciphers

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

we will present an estimation for the upper-bound of the amount of 16-bytes plaintexts for English texts, which indicates that the block ciphers with block length no more than 16-bytes will be subject to recover plaintext attacks in the occasions of plaintext -known or plaintext-chosen attacks.

💡 Research Summary

The paper titled “Recover plaintext attack to block ciphers” attempts to argue that block ciphers with a block size of 16 bytes (e.g., AES) are vulnerable to a so‑called “recover‑plaintext attack” when the plaintext consists of English text. The authors begin by defining a vocabulary Q of English words, assuming at least 60 000 entries, each composed solely of lowercase letters. They model a 16‑byte plaintext block as a “k‑term” sequence of words (or word fragments) separated by spaces and possibly punctuated by one of three punctuation marks (comma, period, semicolon). Four structural patterns (equations (1.1)–(1.4)) are introduced to capture the possible placements of spaces and punctuation.

Using elementary combinatorics, the authors enumerate the number of possible blocks for each pattern. They split the set of all 16‑byte blocks F into three subsets: those beginning with a lowercase letter (ℱ), those beginning with an uppercase letter (ℱ′), and those beginning with a punctuation mark (ℱ″). For each subset they count the combinations of word lengths, the number of ways to choose words from Q, and the ways to insert spaces and punctuation. The derivations involve approximations such as Stirling’s formula and a constant μ that is meant to capture the distribution of word lengths. After a cascade of inequalities (equations (2.1) through (2.15)), they arrive at an upper bound

|F| ≤ 2⁵⁶ · 3 · 8 · 10²

which they simplify to roughly 2⁵⁶ possible distinct 16‑byte English plaintext blocks.

The paper then proposes a “recover‑plaintext attack” based on the birthday paradox. The idea is that if the total number of possible plaintext blocks is at most 2ᵐ, an adversary who possesses a dictionary of all (ciphertext, plaintext) pairs of size about ½·2ᵐ can, after observing roughly ½·2ᵐ ciphertexts, expect a collision that reveals the underlying plaintext with high probability. Because the authors claim |F| ≤ 2⁵⁶, they argue that an attacker needs only about 2⁵⁵ ciphertexts to mount a successful attack against any 16‑byte block cipher.

The authors conclude that block ciphers with a 16‑byte block length, such as AES, are therefore susceptible to recover‑plaintext attacks when encrypting English text under known‑plaintext or chosen‑plaintext scenarios.

Critical assessment

Unrealistic linguistic assumptions – Real English text contains uppercase letters, digits, hyphens, slashes, parentheses, a wide variety of punctuation marks, and often non‑ASCII characters (e.g., accented letters). The paper restricts the alphabet to 26 lowercase letters and only three punctuation symbols, dramatically under‑estimating the true plaintext space.
Oversimplified word‑length distribution – The constant μ is introduced without empirical justification. English word frequencies follow Zipf’s law, and the distribution of word lengths is far from uniform. Assuming a single constant to bound all |Q_i| leads to a very loose upper bound.
Neglect of padding and encoding – In practice, plaintexts are padded (PKCS#7, ISO/IEC 7816‑4, etc.) before encryption, and often encoded (UTF‑8, UTF‑16). These steps add additional entropy that the paper ignores.
Attack cost is astronomically high – Even if |F| ≈ 2⁵⁶, constructing a dictionary of all possible (ciphertext, plaintext) pairs would require storing on the order of 2⁵⁶ × 16 bytes ≈ 2⁶⁴ bytes (≈ 18 exabytes) of data, plus the computational effort to encrypt each entry. Current technology makes such a pre‑computation infeasible.
Key space vs. plaintext space – The security of a block cipher is primarily determined by the size of the key space (AES‑128: 2¹²⁸) and the cipher’s resistance to collisions, not by the cardinality of the plaintext space. A small plaintext space does not reduce the effective key entropy.
Missing formal security definitions – Modern cryptography evaluates schemes against IND‑CPA, IND‑CCA, or related notions. The paper does not reference these definitions, nor does it demonstrate that the proposed attack violates them. Simply showing that the plaintext space is “small” does not imply a breach of IND‑CPA security.
Statistical vs. deterministic attacks – The birthday paradox argument yields a probabilistic collision; it does not guarantee that the collided ciphertexts correspond to the same plaintext unless the attacker already knows the mapping. In a known‑plaintext scenario the attacker already has the plaintext, making the attack trivial; in a chosen‑plaintext scenario the attacker can directly request encryptions of any desired block, again trivializing the claim.

Conclusion

While the paper raises an interesting theoretical question—how the combinatorial size of a language’s short‑block plaintext space relates to block‑cipher security—the analysis is built on overly simplistic linguistic models, ignores practical encoding and padding, and vastly underestimates the computational resources required for the claimed attack. Consequently, the conclusion that AES‑128 (or any 16‑byte block cipher) is vulnerable to a “recover‑plaintext attack” on English text is not supported by realistic cryptographic reasoning. Future work would need to incorporate actual corpora statistics, consider full Unicode character sets, account for padding schemes, and evaluate attack feasibility in terms of time, memory, and required ciphertexts under established security definitions.

Recover plaintext attack to block ciphers

💡 Research Summary

Comments & Academic Discussion

Leave a Comment