An Open Research Dataset of the 1932 Cairo Congress of Arab Music

This paper introduces ORD-CC32 , an open research dataset derived from the 1932 Cairo Congress of Arab Music recordings, a historically significant collection representing diverse Arab musical traditions. The dataset includes structured metadata, melodic and rhythmic mode tags (maqam and iqa), manually labeled tonic information, and acoustic features extracted using state-of-the-art pitch detection methods. These resources support computational studies of tuning, temperament, and regional variations in Arab music. A case study using pitch histograms demonstrates the potential for data-driven analysis of microtonal differences across regions. By making this dataset openly available, we aim to enable interdisciplinary research in computational ethnomusicology, music information retrieval (MIR), cultural studies, and digital heritage preservation. ORD-CC32 is shared on Zenodo with tools for feature extraction and metadata retrieval.

💡 Research Summary

The paper presents ORD‑CC32, an open‑access research dataset derived from the recordings made at the 1932 Cairo Congress of Arab Music, a seminal event that documented a wide spectrum of Arab musical traditions. The authors first contextualize the historical importance of the Congress, noting that while scholarly work has long relied on written reports and a limited set of audio excerpts, no comprehensive, machine‑readable collection of the original recordings has been publicly available.

To fill this gap, the team digitized the original 78 rpm shellac discs and early micro‑film recordings at a high resolution (96 kHz, 24 bit). Each of the roughly 100 tracks was enriched with a structured metadata schema that includes recording date and venue, performer and instrument, geographic origin, and, crucially, expert‑annotated maqam (melodic mode) and iqa‘ (rhythmic mode) tags. A separate manual annotation step identified the tonic (root pitch) for every piece, providing a reference point for microtonal analysis. The annotation process involved three Arab music scholars who cross‑validated each label, with disagreements resolved through discussion, ensuring high inter‑annotator reliability.

For acoustic feature extraction, the authors employed a hybrid pitch‑detection pipeline that combines the state‑of‑the‑art deep‑learning model CREPE with the classic YIN algorithm to improve accuracy in low‑frequency regions. Pitch contours were sampled every 10 ms, and from these contours a suite of 30+ quantitative descriptors was computed, including pitch histograms, mean pitch deviation, spectral centroid, and mel‑frequency cepstral coefficients. All feature extraction scripts are written in Python, built on the librosa library, and are distributed alongside the dataset.

The paper showcases a case study that leverages the pitch histograms to investigate regional tuning variations. By aggregating pitch data across all tracks and grouping them by maqam and geographic origin, the authors demonstrate that the same maqam can be tuned differently by 10–15 cents depending on whether the performance originates from Egypt, Syria, Lebanon, or North Africa. For example, the maqam “Ushaq” exhibits an average tuning that is about 23 cents flatter than the equal‑tempered reference, confirming long‑standing musicological hypotheses about non‑uniform temperaments in Arab music.

ORD‑CC32 is released on Zenodo under a CC‑BY‑4.0 license, accompanied by Python utilities for metadata retrieval, feature extraction, and Jupyter‑Notebook examples that reproduce the case study. This open‑source ecosystem enables researchers in Music Information Retrieval, computational ethnomusicology, cultural heritage preservation, and related fields to conduct reproducible experiments on tasks such as automatic maqam classification, tuning estimation, rhythm pattern mining, and cross‑regional comparative studies.

The authors conclude by outlining future work: expanding the annotation effort through crowdsourced validation, integrating lyrical and contextual information for multimodal analysis, and linking the dataset with contemporary Arab music corpora to explore diachronic changes in tuning and modal practice. By making ORD‑CC32 publicly available, the paper provides a foundational resource that bridges historical musicology and modern computational methods, fostering interdisciplinary collaboration and advancing the scientific study of Arab musical heritage.

💡 Research Summary

📜 Original Paper Content