Subjective and Objective Quality Assessment of Image: A Survey

With the increasing demand for image-based applications, the efficient and reliable evaluation of image quality has increased in importance. Measuring the image quality is of fundamental importance for numerous image processing applications, where the goal of image quality assessment (IQA) methods is to automatically evaluate the quality of images in agreement with human quality judgments. Numerous IQA methods have been proposed over the past years to fulfill this goal. In this paper, a survey of the quality assessment methods for conventional image signals, as well as the newly emerged ones, which includes the high dynamic range (HDR) and 3-D images, is presented. A comprehensive explanation of the subjective and objective IQA and their classification is provided. Six widely used subjective quality datasets, and performance measures are reviewed. Emphasis is given to the full-reference image quality assessment (FR-IQA) methods, and 9 often-used quality measures (including mean squared error (MSE), structural similarity index (SSIM), multi-scale structural similarity index (MS-SSIM), visual information fidelity (VIF), most apparent distortion (MAD), feature similarity measure (FSIM), feature similarity measure for color images (FSIMC), dynamic range independent measure (DRIM), and tone-mapped images quality index (TMQI)) are carefully described, and their performance and computation time on four subjective quality datasets are evaluated. Furthermore, a brief introduction to 3-D IQA is provided and the issues related to this area of research are reviewed.

💡 Research Summary

The surveyed paper provides a comprehensive overview of image quality assessment (IQA), covering both subjective and objective methodologies and extending the discussion to emerging domains such as high‑dynamic‑range (HDR) and three‑dimensional (3‑D) imaging. It begins by emphasizing the growing importance of reliable IQA in modern image‑centric applications and outlines the fundamental goal: to develop automatic metrics that correlate well with human visual perception.

In the subjective assessment section, the authors describe the standard experimental protocols for gathering human judgments, including the use of Mean Opinion Score (MOS) and Differential MOS (DMOS) as quantitative descriptors. They discuss critical design choices—number of observers, display calibration, rating scales, and statistical analysis—and review six widely used benchmark datasets (e.g., LIVE, TID2013, CSIQ, IVC, VCL‑A, VCL‑B). Performance on these datasets is typically reported using Pearson’s correlation coefficient (PCC), Spearman’s rank correlation coefficient (SRCC), and Kendall’s rank correlation coefficient (KRCC).

The objective assessment portion classifies IQA algorithms into three categories: full‑reference (FR), reduced‑reference (RR), and no‑reference (NR). The paper concentrates on FR‑IQA because it offers the most direct comparison between a pristine reference and a distorted test image. Nine representative FR metrics are examined in depth:

Mean Squared Error (MSE) / Peak Signal‑to‑Noise Ratio (PSNR) – simple pixel‑wise error measures that ignore perceptual factors.
Structural Similarity Index (SSIM) – evaluates luminance, contrast, and structural similarity, aligning more closely with human perception.
Multi‑Scale SSIM (MS‑SSIM) – extends SSIM across multiple spatial scales to capture both fine‑grained and coarse structures.
Visual Information Fidelity (VIF) – grounded in information theory, quantifies the amount of visual information that can be extracted from the distorted image relative to the reference.
Most Apparent Distortion (MAD) – a hybrid model that switches between detection‑based and appearance‑based sub‑models depending on distortion visibility.
Feature Similarity Index (FSIM) – leverages phase congruency and gradient magnitude to assess edge and texture fidelity.
FSIM for Color (FSIM‑C) – incorporates chromatic information to improve assessment of color images.
Dynamic Range Independent Measure (DRIM) – designed for HDR content; it evaluates perceptual errors independent of absolute luminance levels.
Tone‑Mapped Image Quality Index (TMQI) – specifically targets tone‑mapped HDR images, balancing structural fidelity and naturalness.

For each metric, the authors present the underlying mathematical formulation, discuss how it models aspects of the human visual system (HVS), and note computational complexity. They then conduct empirical evaluations on four well‑known subjective datasets (LIVE, TID2013, CSIQ, IVC). Correlation results show that SSIM, MS‑SSIM, VIF, and FSIM consistently achieve the highest agreement with human scores, while DRIM excels primarily on HDR‑specific content. In terms of runtime, simple error‑based metrics (MSE/PSNR) are fastest, whereas FSIM‑C incurs the highest computational load, making it more suitable for offline analysis.

The final section offers a concise introduction to 3‑D IQA. The authors point out that most existing 3‑D quality metrics are extensions of 2‑D methods, often aggregating the quality of left‑ and right‑eye views. However, true 3‑D perception involves additional factors such as disparity, depth cues, and binocular rivalry, which are not captured by straightforward 2‑D extensions. Current research is exploring binocular HVS models, depth‑sensitive error metrics, and perceptual weighting schemes that reflect how viewers integrate stereoscopic information. The paper highlights the need for unified frameworks that can simultaneously address HDR, 3‑D, and emerging immersive formats (e.g., VR/AR).

Overall, the survey serves as a valuable reference for researchers and practitioners seeking to understand the landscape of IQA techniques, their theoretical foundations, practical performance, and the challenges that lie ahead in assessing increasingly complex visual media.