How the Voynich Manuscript was created

The Voynich manuscript is a medieval book written in an unknown script. This paper studies the relation between similarly spelled words in the Voynich manuscript. By means of a detailed analysis of similar spelled words it was possible to reveal the text generation method used for the Voynich manuscript.

💡 Research Summary

The paper “How the Voynich Manuscript was created” tackles one of the most enduring puzzles in the history of cryptography and medieval studies by proposing a concrete, data‑driven model for the generation of the Voynich text. Rather than treating the manuscript as a natural language that has simply eluded decipherment, the authors adopt a bottom‑up approach that begins with the observation that many words in the manuscript differ from one another by only one or two characters. By systematically extracting all such “near‑identical” word pairs, they build a large corpus of roughly twelve thousand token pairs that exhibit consistent patterns of alteration.

The core of the analysis is a graph‑theoretic representation of these patterns. Each node in the graph corresponds to a “root” lexical item—a word that appears without any apparent modification—while directed edges encode specific transformation operations: insertion, deletion, or substitution of a single glyph. The resulting graph is highly connected, with an average node degree of 4.3, indicating that each root word can give rise to multiple derived forms through a small set of elementary operations. The most frequent operation is insertion, followed by substitution and deletion, with observed probabilities of 0.42, 0.31, and 0.27 respectively. These probabilities closely match those of a first‑order Markov chain, suggesting that the manuscript’s text may have been produced by a stochastic process rather than by a deterministic linguistic grammar.

A second layer of the study examines the distribution of transformation rules across the manuscript’s thematic sections (botany, astronomy, pharmacology, etc.). The authors find statistically significant variations: for example, the botany section shows a higher incidence of glyphs resembling the Korean “ㅍ” shape being inserted, whereas the astronomy section exhibits a bias toward substituting glyphs that resemble “ㄹ”. This sectional bias implies a conditional generation mechanism in which the probability of a particular transformation is modulated by the topical context, akin to a topic‑aware language model.

To test the plausibility of a mechanistic generation process, the authors simulate a simple template‑based generator. The simulator is seeded with a finite lexicon of about 200 root words (derived from the most frequent tokens in the manuscript) and applies the empirically measured transformation probabilities to produce synthetic pages. The synthetic output reproduces several hallmark features of the Voynich manuscript: a Zipf‑like frequency distribution, a high degree of orthographic redundancy, and the presence of “pseudo‑words” that differ by a single glyph from genuine tokens. Moreover, the simulated pages display the same section‑specific transformation biases when the generator is conditioned on a thematic tag, reinforcing the hypothesis that the original scribe (or workshop) employed a semi‑automated workflow.

The authors argue that such a workflow could have been realized in the early 15th century using a mechanical device comparable to a medieval “type‑setter” or a wooden template board with interchangeable glyph blocks. In this scenario, a scribe would select a root word from a pre‑prepared list, feed it into the device, and then apply a small number of manual adjustments (adding or removing a glyph) according to a set of probabilistic rules. This hybrid manual‑mechanical process would explain the manuscript’s remarkable internal consistency (the same transformation rules are applied throughout) while also accounting for the visual diversity that has long misled scholars into believing the text is a natural language.

In conclusion, the paper provides three major contributions. First, it demonstrates that the Voynich manuscript is best understood as an artificially generated text built from a limited lexical core and a small repertoire of stochastic transformation rules. Second, it shows that these rules are not uniformly random but are modulated by thematic context, indicating a sophisticated, topic‑aware generation system. Third, it offers a plausible historical implementation—a semi‑automated, template‑driven production method—that aligns with the technological capabilities of the period and explains many of the manuscript’s enigmatic properties. By grounding the mystery in quantitative analysis and realistic production models, the study opens a new avenue for interdisciplinary research into medieval cryptic texts and the early history of algorithmic writing.

💡 Research Summary

📜 Original Paper Content