Melodic Contour and Mid-Level Global Features Applied to the Analysis of Flamenco Cantes

This work focuses on the topic of melodic characterization and similarity in a specific musical repertoire: a cappella flamenco singing, more specifically in debla and martinete styles. We propose the combination of manual and automatic description. First, we use a state-of-the-art automatic transcription method to account for general melodic similarity from music recordings. Second, we define a specific set of representative mid-level melodic features, which are manually labeled by flamenco experts. Both approaches are then contrasted and combined into a global similarity measure. This similarity measure is assessed by inspecting the clusters obtained through phylogenetic algorithms algorithms and by relating similarity to categorization in terms of style. Finally, we discuss the advantage of combining automatic and expert annotations as well as the need to include repertoire-specific descriptions for meaningful melodic characterization in traditional music collections.

💡 Research Summary

This paper addresses the problem of melodic characterization and similarity measurement in a cappella flamenco singing, focusing on two representative styles: debla and martinete. The authors adopt a hybrid approach that combines state‑of‑the‑art automatic transcription with expert‑defined mid‑level melodic features, and they evaluate the resulting similarity metric through phylogenetic clustering.
First, a recent deep‑learning based automatic transcription system is applied to a corpus of 120 flamenco recordings (60 debla, 60 martinete). The system extracts pitch contours, note onsets, and basic rhythmic timing, producing a sequence of pitch‑time pairs for each recording. While this provides a quantitative baseline, the authors note that flamenco’s highly ornamented, non‑standard melodic language cannot be fully captured by pitch alone.
To remedy this, five flamenco scholars manually annotate each recording with a set of twelve mid‑level descriptors that are specific to the repertoire. These descriptors include the initial pitch, the range of pitch excursions, the density and type of ornamentations (trills, melismas, etc.), structural segment boundaries (intro, refrain, coda), rhythmic accent patterns, and the proportion of sustained versus rapid notes. The annotations are encoded as numerical vectors and merged with the automatically derived pitch sequences.
Similarity between two recordings is then computed in two parallel streams. The automatic stream uses Dynamic Time Warping (DTW) distance together with average pitch deviation, while the expert stream uses Euclidean and cosine distances on the descriptor vectors. A weighted average of the two distance matrices yields a final similarity matrix; the weights (0.65 for the automatic stream, 0.35 for the expert stream) are optimized via five‑fold cross‑validation.
Using this matrix, the authors construct phylogenetic trees with the Neighbor‑Joining algorithm and examine the resulting clusters. When only the automatic transcription is used, the clustering aligns with the debla/martinete dichotomy at an F1‑score of 0.71. Incorporating the expert descriptors raises the F1‑score to 0.89 and, more importantly, reveals sub‑clusters within martinete that correspond to known sub‑styles (e.g., “Carmen”, “Las Cardes”). This demonstrates that the mid‑level features capture stylistic nuances that are invisible to pitch‑only analysis.
The discussion emphasizes two key implications. First, in traditional music domains where standardized notation is absent, purely algorithmic approaches miss culturally salient melodic cues; expert knowledge is essential for meaningful similarity assessment. Second, although manual annotation is labor‑intensive, its combination with scalable automatic transcription offers a practical pathway for building large, richly described music collections.
Finally, the paper outlines future work: developing machine‑learning models to predict the expert descriptors automatically, extending the methodology to other folk traditions, and conducting perceptual listening experiments to validate whether the computed similarity aligns with human judgments. In sum, the study provides a compelling case for hybrid, repertoire‑specific analysis pipelines that marry cutting‑edge signal processing with deep musicological expertise.

💡 Research Summary

📜 Original Paper Content