Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

Reading time: 5 minute
...

📝 Original Info

  • Title: Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?
  • ArXiv ID: 2602.17010
  • Date: 2026-02-19
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명 및 소속을 확인하려면 원문을 참고하십시오.) **

📝 Abstract

Evaluating perceived video quality is essential for ensuring high Quality of Experience (QoE) in modern streaming applications. While existing subjective datasets and Video Quality Metrics (VQMs) cover a broad quality range, many practical use cases especially for premium users focus on high quality scenarios requiring finer granularity. Just Noticeable Difference (JND) has emerged as a key concept for modeling perceptual thresholds in these high end regions and plays an important role in perceptual bitrate ladder construction. However, the relationship between JND and the more widely used Mean Opinion Score (MOS) remains unclear. In this paper, we conduct a Degradation Category Rating (DCR) subjective study based on an existing JND dataset to examine how MOS corresponds to the 75% Satisfied User Ratio (SUR) points of the 1st and 2nd JNDs. We find that while MOS values at JND points generally align with theoretical expectations (e.g., 4.75 for the 75% SUR of the 1st JND), the reverse mapping from MOS to JND is ambiguous due to overlapping confidence intervals across PVS indices. Statistical significance analysis further shows that DCR studies with limited participants may not detect meaningful differences between reference and JND videos.

💡 Deep Analysis

📄 Full Content

Evaluating human perceived video quality is important for the streaming industry to ensure a good Quality of Experience (QoE) for end-users [1]. Extensive research has been conducted to collect subjective datasets for video quality assessment [2], [3], [4], [5], and advanced objective Video Quality Metrics (VQM) [6], [7], [8], [9], [10], [11] have been developed using learning-based methods trained on these datasets. These metrics aim to estimate human perception of quality in a computationally efficient and scalable manner, and are widely used in encoding optimization and adaptive streaming systems.

Subjective tests, in which human viewers rate the quality of videos under controlled conditions, provide the ground truth against which objective metrics are evaluated. The effectiveness of a VQM is typically measured by how well its predictions correlate with subjective scores, using statistical measures such as Pearson or Spearman correlation coefficients. High-performing methods aim to achieve strong correlation with human opinion scores, thereby ensuring their reliability in predicting user-perceived quality. As such, the design and benchmarking of objective VQMs remain closely tied to the availability and quality of subjective datasets, which capture nuanced perceptual effects that are difficult to model directly.

Most existing datasets are designed to cover the full range of quality levels. As shown in Table I, a typical question in a subjective video quality study asks participants to rate the video quality using scales ranging from “Excellent” to “Bad” following the ACR-HR (Absolute Category Rating-Hidden Reference) method for single stimuli, or from “Imperceptible” to “Very annoying” according to the DCR (Degradation Category Rating) method for paired stimuli, as recommended by ITU [12]. The resulting MOS or DMOS (Differential Mean Opinion Score) values serve as ground truth for training and evaluating objective video quality metrics, optimizing encoding pipelines, and more. For end-users, especially premium subscribers, the expectation is to enjoy the best possible visual quality on their devices. These users are primarily concerned with whether the delivered video is perceptually indistinguishable from the original, rather than with degradations at the mid or low end of the quality spectrum. Traditional subjective quality assessment methods such as ACR and DCR evaluate across the entire quality range (Table I), from “Bad” to “Excellent” or from “Very annoying” to “Imperceptible.” While effective for general-purpose benchmarking, these methods often lack the granularity required to make fine distinctions among highquality representations. As a result, multiple high-quality versions may receive similarly high MOS scores (e.g., between 4.5 and 5.0), offering limited guidance when optimizing for perceptual transparency or efficient bitrate usage at the top end of the quality ladder.

To address this limitation, the concept of Just Noticeable Difference (JND) has gained traction in recent research [13], [14], [15], [16]. JND defines the perceptual threshold at which a viewer just begins to notice a difference compared to an anchor-typically the pristine or reference video for the 1 st JND [17], [18]. This is particularly important for applications such as bitrate ladder construction in adaptive streaming systems, where it is critical to determine the point at which increasing the bitrate no longer yields a perceptual benefit [19], [20]. By identifying the boundary between perceptually lossless and lossy representations, JND allows providers to reduce the bitrate of the highest-quality representation without sacrificing visual quality. This not only improves compression efficiency but also reduces bandwidth usage and CDN costs. In this context, JND complements MOS by providing finer resolution around high-quality regions, enabling better-informed decisions in encoding, quality monitoring, and user experience optimization.

One research question we aim to explore in this paper is the relationship between MOS and JND. Specifically, what is the typical MOS value for videos with 1 st or 2 nd JND value qualities? Should it be 5, 4.5, or another value? (see Fig. 1) To the best of our knowledge, no prior study has explicitly examined this relationship. To address this gap, we design a subjective study based on an existing JND video dataset [21], with the goal of better understanding how MOS and JND relate to each other. Defining a relationship between the two is an open challenge, as it is unclear what MOS score corresponds to a video that sits exactly at the perceptual threshold defined by the 1 st JND.

This section provides an overview of the subjective study design used in this work. We begin by describing the HD-VJND dataset, which forms the foundation of our analysis. We then introduce the concept of the Satisfied User Ratio (SUR), which aggregates individual JND points across observers to yield a group-l

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut