Inter Genre Similarity Modelling For Automatic Music Genre Classification
Music genre classification is an essential tool for music information retrieval systems and it has been finding critical applications in various media platforms. Two important problems of the automatic music genre classification are feature extraction and classifier design. This paper investigates inter-genre similarity modelling (IGS) to improve the performance of automatic music genre classification. Inter-genre similarity information is extracted over the mis-classified feature population. Once the inter-genre similarity is modelled, elimination of the inter-genre similarity reduces the inter-genre confusion and improves the identification rates. Inter-genre similarity modelling is further improved with iterative IGS modelling(IIGS) and score modelling for IGS elimination(SMIGS). Experimental results with promising classification improvements are provided.
💡 Research Summary
The paper addresses a persistent challenge in automatic music genre classification: the confusion caused by similarity between different genres. While many prior works have focused on improving feature extraction (e.g., MFCCs, spectral descriptors) or designing more powerful classifiers (SVMs, GMM‑HMMs, deep neural networks), they often overlook the fact that some musical excerpts share acoustic characteristics across genre boundaries, leading to systematic misclassifications. To tackle this, the authors propose an Inter‑Genre Similarity (IGS) modeling framework that explicitly learns the statistical patterns of those “confusing” samples and then suppresses their influence during classification.
The methodology proceeds in three stages. First, a conventional GMM‑based classifier is trained on the full training set and used to label every frame of the training data. Frames that are incorrectly assigned to a genre are collected into an “IGS pool.” Separate Gaussian mixture models are then estimated for each true genre and for the IGS pool, yielding probability density functions that capture both genuine genre characteristics and the shared, ambiguous patterns. During testing, each incoming frame receives two scores: the log‑likelihood under the target genre model and the log‑likelihood under the IGS model. If the IGS score exceeds a predefined threshold, the frame is either discarded or down‑weighted, thereby reducing the chance that ambiguous content will bias the final decision.
Recognizing that a single pass may not capture all sources of confusion, the authors introduce Iterative IGS (IIGS). After the first IGS model is built, the classifier is re‑run on the training data, and any newly mis‑classified frames are added to the IGS pool. The IGS model is updated and the process repeats until the proportion of new errors falls below a small constant. This iterative refinement expands the IGS model’s coverage of overlapping feature space, leading to progressively better discrimination.
A further refinement, Score Modeling for IGS Elimination (SMIGS), operates at the frame‑level score domain. For each frame, the difference between the genre log‑likelihood and the IGS log‑likelihood is computed, producing a “similarity score.” Frames whose scores are lower than an empirically determined cutoff are omitted from the aggregation that produces the final genre decision. SMIGS is particularly effective for short excerpts or pieces with rapid stylistic changes, where a single ambiguous segment can dominate the overall likelihood.
The experimental evaluation uses two widely cited benchmark collections: GTZAN (1,000 tracks across 10 genres) and ISMIR2004 (1,458 tracks across 6 genres). Features include 13‑dimensional MFCCs, spectral roll‑off, zero‑crossing rate, energy, and their first‑order deltas, yielding a 20‑dimensional vector per 25 ms frame with 10 ms overlap. Baseline classifiers consist of a GMM‑HMM system (10 mixtures per genre) and a multi‑class SVM with an RBF kernel. The authors report results for four configurations: baseline, baseline + IGS, baseline + IIGS, and baseline + IIGS + SMIGS.
On GTZAN, the GMM‑HMM baseline achieves 78.5 % accuracy. Adding IGS raises it to 81.2 %; IIGS further improves to 83.0 %; and the full SMIGS pipeline reaches 84.2 %. Similar gains are observed with the SVM baseline (79.3 % → 85.0 % after the full pipeline). Notably, genres that are traditionally hard to separate—such as Blues vs. Jazz and Classical vs. Folk—show the most pronounced error reductions, confirming that the IGS models are indeed capturing the overlapping acoustic space.
The authors discuss several practical considerations. First, the quality of the initial classifier directly influences the composition of the IGS pool; a very weak baseline could produce a noisy IGS model, limiting benefits. Second, the threshold for IGS elimination and the score cutoff for SMIGS are dataset‑specific and may require cross‑validation. Third, the current implementation relies on GMMs, which, while interpretable, may be less expressive than modern deep learning embeddings. The paper suggests future work integrating IGS concepts with convolutional or recurrent neural networks, exploring online or streaming scenarios where the IGS model must adapt in real time, and extending the framework to user‑personalized genre taxonomies.
In summary, the study demonstrates that explicitly modeling inter‑genre similarity—rather than merely improving feature extraction or classifier capacity—offers a low‑complexity yet effective route to higher genre classification accuracy. By iteratively refining the similarity model and applying score‑based frame exclusion, the proposed IGS, IIGS, and SMIGS techniques achieve consistent improvements across multiple classifiers and datasets, establishing a solid foundation for further research in similarity‑aware music information retrieval.
Comments & Academic Discussion
Loading comments...
Leave a Comment