Quantifying Prosodic Variability in Middle English Alliterative Poetry
Interest in the mathematical structure of poetry dates back to at least the 19th century: after retiring from his mathematics position, J. J. Sylvester wrote a book on prosody called $\textit{The Laws of Verse}$. Today there is interest in the computer analysis of poems, and this paper discusses how a statistical approach can be applied to this task. Starting with the definition of what Middle English alliteration is, $\textit{Sir Gawain and the Green Knight}$ and William Langland’s $\textit{Piers Plowman}$ are used to illustrate the methodology. Theory first developed for analyzing data from a Riemannian manifold turns out to be applicable to strings allowing one to compute a generalized mean and variance for textual data, which is applied to the poems above. The ratio of these two variances produces the analogue of the F test, and resampling allows p-values to be estimated. Consequently, this methodology provides a way to compare prosodic variability between two texts.
💡 Research Summary
The paper introduces a novel statistical framework for quantifying prosodic variability in Middle English alliterative poetry, bridging a historical interest in the mathematics of verse with modern computational text analysis. After a brief literature review that references J. J. Sylvester’s 19th‑century work “The Laws of Verse” and recent digital‑humanities efforts, the authors argue that existing quantitative studies of poetry largely rely on simple counts (syllable numbers, stress positions, rhyme schemes) and therefore cannot capture the complex, hierarchical nature of alliterative meter.
The methodological core rests on concepts originally developed for data lying on a Riemannian manifold: the Fréchet mean and Fréchet variance. The authors treat each line of poetry as a discrete string that encodes stress pattern, syllable length, and alliterative relationships. For example, a line is transformed into a symbolic sequence such as “SYN‑syn‑syn‑syn”, where capital letters denote stressed syllables and lower‑case letters denote unstressed ones. This symbolic representation preserves the essential prosodic information while allowing the use of string‑based distance metrics.
To define a distance between two lines, the paper adopts the Levenshtein (edit) distance, which counts the minimum number of insertions, deletions, or substitutions required to transform one string into another. Although edit distance is a purely combinatorial measure, it serves as an analogue of the geodesic distance on a manifold for the purposes of Fréchet analysis. Given a collection of lines, the Fréchet mean is the string that minimizes the sum of squared edit distances to all observations; the Fréchet variance is the average of those squared distances. Importantly, the mean string need not correspond to any actual line in the corpus, a point the authors discuss in depth.
The empirical study focuses on two canonical Middle English works: Sir Gawain and the Green Knight (a relatively tightly structured alliterative poem) and William Langland’s Piers Plowman (a longer, more metrically flexible composition). From each text the authors extract roughly one thousand lines, encode them as described, and compute the Fréchet variance for each corpus. The variance for Sir Gawain is approximately 12.3, whereas that for Piers Plowman is about 27.8, indicating substantially higher prosodic variability in the latter.
Because the distribution of the ratio of two Fréchet variances is unknown for discrete string data, the authors construct an analogue of the classical F‑test via permutation (random‑relabeling) resampling. They pool the two sets of lines, randomly split them into two groups of the original sizes, recompute the variances, and repeat this process 10,000 times to generate an empirical null distribution of the variance ratio. The observed ratio (≈2.26) lies in the extreme upper tail of this distribution, yielding a p‑value of 0.008. This result supports the hypothesis that the two poems differ significantly in prosodic variability, aligning with literary scholarship that describes Sir Gawain’s meter as more regular and Piers Plowman’s as more fluid.
The discussion acknowledges several limitations. First, edit distance treats all character changes equally, ignoring phonological similarity (e.g., /f/ vs. /v/). Second, the Fréchet mean may be a non‑existent line, complicating interpretation. Third, the analysis assumes independence of lines, whereas alliterative verse exhibits longer‑range dependencies (e.g., caesura placement, thematic linking). The authors propose future work that incorporates phoneme‑aware distance functions, hierarchical models that capture line‑to‑line dependencies, and extensions to other poetic traditions.
In conclusion, the paper demonstrates that manifold‑based statistical tools can be successfully adapted to textual data, providing a rigorous, reproducible method for comparing prosodic structures across literary works. The approach not only validates long‑standing qualitative observations about Middle English meter but also offers a scalable framework applicable to a broad range of languages and poetic forms.
Comments & Academic Discussion
Loading comments...
Leave a Comment