Segmentation and Context of Literary and Musical Sequences
We test a segmentation algorithm, based on the calculation of the Jensen-Shannon divergence between probability distributions, to two symbolic sequences of literary and musical origin. The first sequence represents the successive appearance of characters in a theatrical play, and the second represents the succession of tones from the twelve-tone scale in a keyboard sonata. The algorithm divides the sequences into segments of maximal compositional divergence between them. For the play, these segments are related to changes in the frequency of appearance of different characters and in the geographical setting of the action. For the sonata, the segments correspond to tonal domains and reveal in detail the characteristic tonal progression of such kind of musical composition.
💡 Research Summary
The paper presents a unified segmentation method based on the Jensen‑Shannon divergence (JSD) and demonstrates its applicability to two very different symbolic sequences: the chronological list of characters appearing in a theatrical play and the succession of pitch classes in a twelve‑tone keyboard sonata. The core idea is simple yet powerful: slide a fixed‑size window along the sequence, estimate the empirical probability distribution of symbols (characters or pitch classes) inside each window, and compute the JSD between adjacent windows. Because JSD is a symmetric, bounded version of the Kullback‑Leibler divergence, it remains well‑behaved even when some symbols have zero frequency, making it suitable for sparse categorical data. A segmentation point is placed where the JSD reaches a local maximum, indicating that the two neighboring windows are maximally dissimilar in composition.
For the dramatic text, each distinct character is encoded as a unique token. The authors experimented with window lengths of 200–300 tokens, a range that captures enough interactions to reflect meaningful narrative shifts while preserving temporal resolution. When the algorithm identifies a high‑JSD boundary, the authors examine the character‑frequency profiles on either side. The resulting segments correspond closely to conventional literary markers: the introduction of a new protagonist, the departure of a major figure, or a change of setting (e.g., moving from Venice to Rome). In other words, the statistical segmentation recovers the same “act” or “scene” boundaries that a literary scholar would note, but does so automatically and quantitatively.
In the musical domain, the twelve pitch classes (C, C♯, …, B) are mapped to integers 0–11, and the sonata is reduced to a sequence of these integers, ignoring rhythmic values. A window of 30–40 notes is used, roughly matching the length of a phrase or thematic unit. Again, the JSD between neighboring windows is computed, and peaks signal a tonal shift. The algorithm successfully isolates the conventional sections of sonata form—exposition, development, and recapitulation—without any prior knowledge of the piece’s structure. Within the development section, the method reveals a cascade of tonal domains, reflecting the composer’s characteristic progression through related keys. This demonstrates that JSD‑based segmentation can capture the hierarchical tonal architecture that music theorists describe qualitatively.
The authors highlight several strengths of their approach. First, the method is parameter‑driven (window size, minimum segment length), allowing researchers to explore multiple temporal scales: small windows expose fine‑grained changes, while larger windows reveal macro‑structural transitions. Second, because the algorithm relies only on symbol frequencies, it can be applied to any categorical time series without domain‑specific preprocessing, making it attractive for digital humanities and computational musicology pipelines. Third, the use of JSD ensures robustness to zero‑frequency events, a common issue when dealing with rare characters or seldom‑used pitch classes.
Nevertheless, the study acknowledges limitations. The choice of window size strongly influences sensitivity; an inappropriate size can either miss genuine boundaries or generate spurious ones. Very short segments suffer from unreliable frequency estimates, leading to potential false positives. Moreover, the current implementation disregards sequential dependencies beyond simple frequency counts—it does not model n‑gram contexts, rhythmic information, dynamics, or harmonic progressions that could enrich the analysis, especially for music. The authors propose future work that integrates multi‑scale analysis, Bayesian change‑point detection, and richer feature sets (e.g., inter‑onset intervals, articulation marks) to overcome these constraints.
In conclusion, the paper demonstrates that a single statistical tool—Jensen‑Shannon divergence—can effectively segment both literary and musical sequences, revealing meaningful structural boundaries that align with established scholarly interpretations. By providing a quantitative, language‑agnostic framework, the work opens avenues for large‑scale, automated analysis of cultural artifacts, bridging the gap between traditional qualitative criticism and modern data‑driven methods.
Comments & Academic Discussion
Loading comments...
Leave a Comment