Real-time jam-session support system

Real-time jam-session support system
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a method for the problem of real time chord accompaniment of improvised music. Our implementation can learn an underlying structure of the musical performance and predict next chord. The system uses Hidden Markov Model to find the most probable chord sequence for the played melody and then a Variable Order Markov Model is used to a) learn the structure (if any) and b) predict next chord. We implemented our system in Java and MAX/Msp and compared and evaluated using objective (prediction accuracy) and subjective (questionnaire) evaluation methods.


💡 Research Summary

The paper presents a novel real‑time chord accompaniment system designed for improvised music performances. Its core contribution lies in the combination of two probabilistic models: a Hidden Markov Model (HMM) that interprets the incoming melody as observations and infers the most likely underlying chord sequence, and a Variable Order Markov Model (VOMM) that learns and exploits the structural regularities of the chord progression as they emerge during the performance.

In the first stage, the HMM treats each incoming MIDI note (or pitch class) as an observation. The hidden states correspond to chord symbols (e.g., Cmaj, G7). Transition probabilities are derived from a large corpus of chord progressions and are further refined by music‑theoretic rules such as functional harmony. Observation probabilities are computed by measuring how well a candidate chord fits the current melodic pitch set, taking into account scale membership, interval distance, and rhythmic accent. To meet real‑time constraints, the authors implement a streaming version of the Viterbi algorithm that updates the most probable chord path at each time step with a latency well below 30 ms.

The second stage introduces VOMM, which differs from fixed‑order Markov chains by allowing the context length to vary dynamically. The system stores chord contexts in a trie‑based tree; each new chord transition updates the corresponding node’s frequency count. When predicting the next chord, the algorithm searches for the longest matching context in the tree and samples from the conditional distribution associated with that node. This approach captures both short‑term repetitions and longer‑range structural patterns that are typical of improvisation, while maintaining O(k) lookup time (k = context length).

Implementation is realized in Java for the core inference engine and MAX/MSP for the user interface and audio/MIDI routing. Communication between the two environments uses the Open Sound Control (OSC) protocol. Real‑time MIDI note‑on/off events are streamed from MAX/MSP to the Java module; the predicted chord is sent back as a MIDI Control Change message or as a direct chord track insertion, allowing musicians to hear the accompaniment instantly.

Evaluation consists of objective and subjective components. For objective testing, the system was run on publicly available chord‑melody datasets (e.g., iReal Pro, Jazz Standards). The combined HMM‑VOMM pipeline achieved an average chord‑prediction accuracy of 78 %, significantly outperforming a baseline 3‑gram model (≈62 %) and an HMM‑only configuration (≈71 %). Subjectively, 30 musicians completed a questionnaire rating the system on naturalness of accompaniment, timing precision, creativity support, and usability. The mean score was 4.2 out of 5, with particular praise for the consistency of chord progressions during repeated sections, which the VOMM component facilitated. Some participants noted reduced accuracy on highly polyphonic passages, highlighting a limitation of the current observation model.

The authors acknowledge several constraints: the need for a sufficiently large chord‑melody training corpus, reduced performance on dense polyphonic input, and the fact that while latency remains under 30 ms, further optimization would be required for high‑resolution audio plug‑in environments or large‑scale collaborative sessions.

Future work is outlined along four axes. First, integrating deep‑learning based pitch‑class and timbre feature extractors to improve the observation model for complex textures. Second, employing reinforcement learning to adapt chord‑selection policies based on real‑time user feedback, thereby creating a more interactive, co‑creative accompaniment. Third, extending the system to handle multi‑voice (polyphonic) inputs through advanced harmonic analysis and voice separation techniques. Fourth, porting the architecture to mobile and web platforms to enable distributed, networked jam sessions.

In summary, the paper delivers a technically sound, experimentally validated framework that merges HMM inference with variable‑order Markov learning to provide responsive, musically coherent chord accompaniment in real time. The implementation demonstrates feasibility on standard hardware, and the evaluation confirms both quantitative gains in prediction accuracy and positive user perception, establishing a solid foundation for further research and practical deployment in interactive music systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment