ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

February 22, 2026

Reading time: 1 minute

...

📝 Original Info

Title: ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts
ArXiv ID: 2510.26186
Date: 2025-10-30
Authors: 정보 제공되지 않음 (논문에 저자 정보가 명시되지 않았습니다).

📝 Abstract

Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-grained attribute annotations. We present ConceptScope, a scalable and automated framework for analyzing visual datasets by discovering and quantifying human-interpretable concepts using Sparse Autoencoders trained on representations from vision foundation models. ConceptScope categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels, enabling class-level dataset characterization, bias identification, and robustness evaluation through concept-based subgrouping. We validate that ConceptScope captures a wide range of visual concepts, including objects, textures, backgrounds, facial attributes, emotions, and actions, through comparisons with annotated datasets. Furthermore, we show that concept activations produce spatial attributions that align with semantically meaningful image regions. ConceptScope reliably detects known biases (e.g., background bias in Waterbirds) and uncovers previously unannotated ones (e.g, co-occurring objects in ImageNet), offering a practical tool for dataset auditing and model diagnostics.

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

A comment on an $L^frac{2n}{n+2}-L^frac{2n}{n-2}$ Carleman inequality in relation to 'the determination of an unbounded potential from Cauchy data'

A data-driven multiscale scheme for anisotropic finite strain magneto-elasticity

Advancements in synthetic data extraction for industrial injection molding

Start searching

No results found