Learning to Discover: A Generalized Framework for Raga Identification without Forgetting

Learning to Discover: A Generalized Framework for Raga Identification without Forgetting
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Raga identification in Indian Art Music (IAM) remains challenging due to the presence of numerous rarely performed Ragas that are not represented in available training datasets. Traditional classification models struggle in this setting, as they assume a closed set of known categories and therefore fail to recognise or meaningfully group previously unseen Ragas. Recent works have tried categorizing unseen Ragas, but they run into a problem of catastrophic forgetting, where the knowledge of previously seen Ragas is diminished. To address this problem, we adopt a unified learning framework that leverages both labeled and unlabeled audio, enabling the model to discover coherent categories corresponding to the unseen Ragas, while retaining the knowledge of previously known ones. We test our model on benchmark Raga Identification datasets and demonstrate its performance in categorizing previously seen, unseen, and all Raga classes. The proposed approach surpasses the previous NCD-based pipeline even in discovering the unseen Raga categories, offering new insights into representation learning for IAM tasks.


💡 Research Summary

The paper tackles the practical limitation of raga identification in Indian Art Music (IAM), namely the presence of many rarely performed ragas that are absent from training collections. Conventional closed‑set classifiers cannot recognise or meaningfully group such unseen ragas, and recent Novel Class Discovery (NCD) approaches, while able to cluster unlabeled data, suffer from catastrophic forgetting of previously learned ragas when the model adapts to new categories. To overcome this, the authors adopt the Generalized Category Discovery (GCD) paradigm, which jointly optimises supervised and unsupervised contrastive objectives within a shared embedding space. First, a CNN‑LSTM feature extractor is trained on labelled data (12 ragas) using cross‑entropy loss; its final softmax layer is removed to produce fixed‑length embeddings. These embeddings are then refined by a transformer‑based self‑attention encoder. The encoder is trained with a weighted sum of (i) supervised contrastive loss, which pulls together embeddings of the same labelled raga and pushes apart hard negatives, and (ii) unsupervised contrastive loss, which treats the whole dataset as unlabelled but constructs positive pairs from 30‑second clips sharing the same source recording (an implicit raga label) and selects hard negatives based on lowest cosine similarity. The balance parameter λ controls the contribution of each term. After training, the combined embeddings are clustered using either K‑Means (assuming known number of classes) or a cosine‑similarity threshold method. Evaluation on two public datasets—PIM (191 h, 135 ragas) and Saraga (43 h, 61 ragas)—covers three scenarios: previously seen ragas (Old), unseen ragas (New), and the union of both (All). Metrics include clustering accuracy (ACC), Normalised Mutual Information (NMI), Adjusted Rand Index (ARI) and Silhouette Score. Results show that the proposed joint‑learning method (M2) retains high ACC on known ragas while substantially improving NMI and ARI on unseen ragas compared with the NCD baseline, and achieves the best Silhouette Score in the All setting. A purely unsupervised contrastive variant (M1) performs poorly, confirming the necessity of supervised signals for capturing fine‑grained raga distinctions. The study demonstrates that GCD‑style contrastive learning provides a robust, scalable solution for open‑set music information retrieval tasks, and suggests future extensions to multimodal inputs and real‑time raga discovery applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment