Investigating the impact of stereo processing -- a study for extending the Open Dataset of Audio Quality (ODAQ)

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we present an initial study for extending Open Dataset of Audio Quality (ODAQ) towards the impact of stereo processing. Monaural artifacts from ODAQ were adapted in combinations with left-right (LR) and mid-side (MS) stereo processing, across stimuli including solo instruments, typical wide stereo mixes and and hard-panned mixes. Listening tests in different presentation context – with and without direct comparison of MS and LR conditions – were conducted to collect subjective data beyond monaural artifacts while also scrutinizing the listening test methodology. The ODAQ dataset is extended with new material along with subjective scores from 16 expert listeners. The listening test results show substantial influences of the stimuli’s spatial characteristics as well as the presentation context. Notably, several significant disparities between LR and MS only occur when presented in direct comparison. The findings suggest that listeners primarily assess timbral impairments when spatial characteristics are consistent and focus on stereo image only when timbral quality is similar. The rating of an additional mono anchor was overall consistent across different stereo characteristics, averaging at 65 on the MUSHRA scale, further corroborating that listeners prioritize timbral over spatial impressions.

💡 Research Summary

This paper extends the Open Dataset of Audio Quality (ODAQ) by adding stereo‑processed material and a corresponding set of expert listening scores. Two widely used stereo coding techniques were investigated: conventional left‑right (LR) processing, where the same monaural artifact is applied independently to each channel, and mid‑side (MS) processing, where the artifact is inserted into the mid (M) and side (S) signals before reconstruction. Two monaural artifact types—quantization noise (QN) and spectral holes (SH)—were generated at five quality levels using parameters identical to the original ODAQ (QN controlled by noise‑to‑mask ratio, SH by hole probability). For each artifact, six audio items covering three spatial categories (center‑located solo instruments, wide‑stereo music mixes, and artificially hard‑panned dialogue) were processed, yielding a total of 60 test samples.

A MUSHRA listening test was conducted with 16 trained music‑media students at Ball State University. The experiment featured two presentation contexts: (1) “separated” trials where LR and MS conditions appeared in different screens, allowing only indirect comparison, and (2) “mixed” trials where LR and MS of the same artifact and quality level were presented side‑by‑side, enabling direct comparison. Each trial contained seven options (five processed variants, a hidden reference, and two low‑pass anchors) plus a mono anchor that was identical across all items. An additional smaller group of eight expert listeners evaluated only the hard‑panned items for further insight.

Results show that the quality scale was well covered for QN (0–100 points) but compressed for SH (≈20–80 points), indicating that the current SH parameters may need adjustment to span a broader quality range. Crucially, significant differences between LR and MS emerged only in the mixed (direct‑comparison) condition. For hard‑panned items, LR consistently outperformed MS, likely because MS coding introduces inter‑channel crosstalk that degrades the extreme spatial separation. For wide‑stereo mixes, MS received higher scores, suggesting that MS better preserves natural spatial cues in broadly distributed content. Solo‑instrument items showed little LR/MS difference, implying that timbral distortion dominates perception when spatial cues are minimal.

The mono anchor received an average rating of 65 points, relatively high and consistent across all stimuli, even those with pronounced stereo spread. This confirms earlier findings that timbral quality accounts for roughly 70 % of overall perceived audio quality, while spatial quality contributes around 30 %. Listeners therefore prioritize timbre over stereo image when the latter does not introduce obvious artifacts.

The study highlights three practical implications. First, the choice between LR and MS processing must consider both the spatial characteristics of the source material and the testing context; direct comparison can reveal differences that remain hidden in indirect assessments. Second, the SH artifact parameters should be refined to achieve a more uniform scale coverage, facilitating better discrimination in future studies. Third, the high mono‑anchor scores suggest that future ODAQ extensions should treat timbral and spatial dimensions as partially independent factors, possibly by incorporating dedicated spatial‑only distortions or multichannel formats (e.g., 5.1 surround).

In conclusion, the authors successfully augment ODAQ with stereo‑related data, demonstrate that listeners focus primarily on timbral impairments, and show that stereo image differences become salient only when timbral quality is comparable and when listeners can directly compare processing methods. These findings provide a solid foundation for developing objective quality metrics that account for both monaural and spatial aspects of audio, and they point to promising directions for further dataset expansion and methodological refinement.

Investigating the impact of stereo processing -- a study for extending the Open Dataset of Audio Quality (ODAQ)

💡 Research Summary

Comments & Academic Discussion

Leave a Comment