Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?
Evaluating perceived video quality is essential for ensuring high Quality of Experience (QoE) in modern streaming applications. While existing subjective datasets and Video Quality Metrics (VQMs) cover a broad quality range, many practical use cases especially for premium users focus on high quality scenarios requiring finer granularity. Just Noticeable Difference (JND) has emerged as a key concept for modeling perceptual thresholds in these high end regions and plays an important role in perceptual bitrate ladder construction. However, the relationship between JND and the more widely used Mean Opinion Score (MOS) remains unclear. In this paper, we conduct a Degradation Category Rating (DCR) subjective study based on an existing JND dataset to examine how MOS corresponds to the 75% Satisfied User Ratio (SUR) points of the 1st and 2nd JNDs. We find that while MOS values at JND points generally align with theoretical expectations (e.g., 4.75 for the 75% SUR of the 1st JND), the reverse mapping from MOS to JND is ambiguous due to overlapping confidence intervals across PVS indices. Statistical significance analysis further shows that DCR studies with limited participants may not detect meaningful differences between reference and JND videos.
💡 Research Summary
The paper investigates the quantitative relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND) in the context of high‑quality video streaming, where premium users demand perceptually lossless playback. Existing video quality datasets and objective metrics (VQMs) cover a broad quality spectrum but lack the granularity needed to distinguish subtle differences at the top end. JND, defined as the point at which a viewer first perceives a difference relative to an anchor (usually the pristine source), has become a key concept for constructing perceptual bitrate ladders. However, the mapping between JND and the widely used MOS has never been explicitly studied.
To fill this gap, the authors built a Degradation Category Rating (DCR) subjective test on top of the publicly available HD‑VJND dataset. The HD‑VJND set contains 180 source videos (10 s, 1080p) each encoded at CRF values from 17 to 31 in 0.25 steps, both at 1080p and 720p. JND points were obtained using Robust Binary Search (RBS), and the Satisfied User Ratio (SUR) was introduced to aggregate individual JNDs into a group‑level probability curve. A 75 % SUR threshold is commonly used as the anchor for the next JND level.
In the DCR experiment, 20 participants (10 households, two per household) evaluated nine Processed Video Sequences (PVS) per source: the reference (PVS 0), the 75 % SUR point of the 1st JND (PVS 1), the 75 % SUR points of the 2nd JND at 1080p (PVS 2) and 720p (PVS 3), four intermediate quality levels between PVS 1 and PVS 2, and two boundary points (CRF‑1 of PVS 1 and CRF + 1 of PVS 3). Participants used the DCR five‑point scale ranging from “Imperceptible” (5) to “Very annoying” (1). MOS values were normalized so that the reference received a score of 5.
Theoretical expectations: for a video at the 75 % SUR of the 1st JND, 75 % of observers should rate it “Imperceptible” (5) and 25 % “Perceptible but not annoying” (4), yielding an ideal MOS of 4.75. For the 2nd JND, because the reference remains the pristine source, the ideal MOS drops by the same 0.25 to 4.50.
Empirical results: the measured MOS for the 1st JND point was 4.79 (95 % CI
Comments & Academic Discussion
Loading comments...
Leave a Comment