Prominence-Aware Artifact Detection and Dataset for Image Super-Resolution

Prominence-Aware Artifact Detection and Dataset for Image Super-Resolution
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Generative single-image super-resolution (SISR) is advancing rapidly, yet even state-of-the-art models produce visual artifacts: unnatural patterns and texture distortions that degrade perceived quality. These defects vary widely in perceptual impact–some are barely noticeable, while others are highly disturbing–yet existing detection methods treat them equally. We propose characterizing artifacts by their prominence to human observers rather than as uniform binary defects. We present a novel dataset of 1302 artifact examples from 11 SISR methods annotated with crowdsourced prominence scores, and provide prominence annotations for 593 existing artifacts from the DeSRA dataset, revealing that 48% of them go unnoticed by most viewers. Building on this data, we train a lightweight regressor that produces spatial prominence heatmaps. We demonstrate that our method outperforms existing detectors and effectively guides SR model fine-tuning for artifact suppression. Our dataset and code are available at https://tinyurl.com/2u9zxtyh.


💡 Research Summary

The paper addresses a critical gap in the evaluation of generative single‑image super‑resolution (SISR) models: existing artifact detectors treat all visual defects as equally important, while human observers perceive a wide range of prominence. To bridge this gap, the authors introduce the concept of “artifact prominence” – a measure of how noticeable an artifact is to a typical viewer – and build the first large‑scale dataset annotated with such scores.

Dataset construction: Starting from 2,101 images drawn from the Open Images collection, the authors applied eleven state‑of‑the‑art SISR methods (including GFPGAN, SwinIR, SUPIR, RLFN, among others) to generate 23,111 upscaled images. From these, 1,302 artifact instances were identified using a combination of automatic quality metrics (DISTS, LPIPS, LDL) and manual screening. Each instance was presented to 30 crowdworkers on Toloka.ai, who answered whether the highlighted region contained a distortion. The proportion of “yes” votes defines the prominence score (0–100 %). In addition, the authors re‑annotated all 593 artifacts from the existing DeSRA dataset, discovering that nearly half of those binary‑mask artifacts have prominence below 50 %, i.e., they are largely unnoticed.

To make the binary masks more interpretable for crowdworkers, a simple morphological post‑processing pipeline (open → dilate → close) is applied, which modestly improves the consistency of prominence ratings.

The core technical contribution is a lightweight prominence‑prediction model. Three complementary features are extracted on a block‑wise basis: (1) DISTS, a perceptual similarity metric sensitive to texture anomalies; (2) ssm_jup, an adaptation of the LDL‑based small‑color‑artifact detector extended to all RGB channels; and (3) bd_jup, a weighted combination of LPIPS and ERQA, capturing both perceptual fidelity and edge preservation. These three scalar maps are fed into a shallow multilayer perceptron (MLP) with architecture 3‑128‑128‑1, producing a spatial prominence heatmap for any SR output given only the low‑resolution input. The model is extremely compact (~0.2 M parameters) and runs in real time.

Extensive evaluation shows that the proposed method outperforms prior artifact detectors (LDL, DeSRA, PAL4VST) in terms of Pearson and Spearman correlation with human prominence scores, achieving improvements of 0.12–0.15 on average. Subjective user studies confirm higher agreement with ground‑truth prominence. Moreover, the heatmaps can be incorporated as spatial weighting in the loss function of an SR network; fine‑tuning with this guidance reduces highly prominent artifacts while preserving overall perceptual quality, as measured by LPIPS and DISTS.

A systematic analysis of the eleven SR methods using the new prominence metric reveals that even the latest high‑performance models (e.g., SUPIR) frequently generate noticeable artifacts, challenging the assumption that higher PSNR/LPIPS automatically implies better visual fidelity.

The authors release the 1,302‑sample prominence‑annotated dataset, the re‑annotated DeSRA set, the post‑processing code, and the trained MLP model at the provided URL. By shifting the focus from binary artifact detection to human‑centric prominence estimation, this work offers a more nuanced evaluation tool and a practical pathway for artifact‑aware SR model improvement.


Comments & Academic Discussion

Loading comments...

Leave a Comment