Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment

Reading time: 2 minute
...

📝 Original Info

  • Title: Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment
  • ArXiv ID: 2511.09948
  • Date: 2025-11-13
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (해당 정보를 확인할 수 있는 경우, 여기서 나열해 주세요.) **

📝 Abstract

Recent efforts have repurposed the Contrastive Language-Image Pre-training (CLIP) model for No-Reference Image Quality Assessment (NR-IQA) by measuring the cosine similarity between the image embedding and textual prompts such as "a good photo" or "a bad photo." However, this semantic similarity overlooks a critical yet underexplored cue: the magnitude of the CLIP image features, which we empirically find to exhibit a strong correlation with perceptual quality. In this work, we introduce a novel adaptive fusion framework that complements cosine similarity with a magnitude-aware quality cue. Specifically, we first extract the absolute CLIP image features and apply a Box-Cox transformation to statistically normalize the feature distribution and mitigate semantic sensitivity. The resulting scalar summary serves as a semantically-normalized auxiliary cue that complements cosine-based prompt matching. To integrate both cues effectively, we further design a confidence-guided fusion scheme that adaptively weighs each term according to its relative strength. Extensive experiments on multiple benchmark IQA datasets demonstrate that our method consistently outperforms standard CLIP-based IQA and state-of-the-art baselines, without any task-specific training.

💡 Deep Analysis

Figure 1

📄 Full Content

📸 Image Gallery

frame.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut