프로토타입 기반 의미 일관성 정렬을 통한 도메인 적응 해시 검색
📝 Abstract
Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic alignment and excessively pursuing pair-wise sample alignment; 2) lacking either pseudo-label reliability consideration or geometric guidance for assessing label correctness; 3) directly quantizing original features affected by domain shift, undermining the quality of learned hash codes. In view of these limitations, we propose Prototype-Based Semantic Consistency Alignment (PSCA), a two-stage framework for effective domain adaptive retrieval. In the first stage, a set of orthogonal prototypes directly establishes class-level semantic connections, maximizing inter-class separability while gathering intra-class samples. During the prototype learning, geometric proximity provides a reliability indicator for semantic consistency alignment through adaptive weighting of pseudolabel confidences. The resulting membership matrix and prototypes facilitate feature reconstruction, ensuring quantization on reconstructed rather than original features, thereby improving subsequent hash coding quality and seamlessly connecting both stages. In the second stage, domain-specific quantization functions process the reconstructed features under mutual approximation constraints, generating unified binary hash codes across domains. Extensive experiments validate PSCA’s superior performance across multiple datasets.
💡 Analysis
Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic alignment and excessively pursuing pair-wise sample alignment; 2) lacking either pseudo-label reliability consideration or geometric guidance for assessing label correctness; 3) directly quantizing original features affected by domain shift, undermining the quality of learned hash codes. In view of these limitations, we propose Prototype-Based Semantic Consistency Alignment (PSCA), a two-stage framework for effective domain adaptive retrieval. In the first stage, a set of orthogonal prototypes directly establishes class-level semantic connections, maximizing inter-class separability while gathering intra-class samples. During the prototype learning, geometric proximity provides a reliability indicator for semantic consistency alignment through adaptive weighting of pseudolabel confidences. The resulting membership matrix and prototypes facilitate feature reconstruction, ensuring quantization on reconstructed rather than original features, thereby improving subsequent hash coding quality and seamlessly connecting both stages. In the second stage, domain-specific quantization functions process the reconstructed features under mutual approximation constraints, generating unified binary hash codes across domains. Extensive experiments validate PSCA’s superior performance across multiple datasets.
📄 Content
Hashing receives extensive attention in the field of image retrieval due to its merits of compact storage and computational efficiency. The main purpose of hashing is to develop effective hash functions that preserve similarity relationships of original data in binary Hamming space. Several methods, such as Spectral hashing (SH) (Weiss, Torralba, and Fergus 2008), Density Sensitive Hashing (DSH) (Liu et al. 2016) and Scalable Graph Hashing (SGH) (Jiang and Li 2015) endeavor to preserve pair-wise similarity of original data within the Hamming space. Ordinal Constraint Hashing (OCH) (Liu et al. 2018) introduces the ordinal relation in learning to hash. Iterative Quantization (ITQ) (Gong et al. 2012) focuses on maintaining the locality structure by improving the consistency between generated discrete codes and their corresponding continuous representations.
Nonetheless, these aforementioned methods assume that queries and retrieved images share consistent data distributions, limiting their applicability in complex real-world scenarios. For instance, online shopping platforms showcase product images shot under ideal conditions, whereas usersubmitted query photos typically contain cluttered backgrounds. To bridge this non-negligible domain gap (Hu et al. 2025a), Domain Adaptation (DA) (Zhang et al. 2023c,b) is combined with hashing, giving rise to a promising research field, i.e., Domain Adaptive Retrieval (DAR).
DAR encompasses two retrieval scenarios, i.e., singledomain retrieval and cross-domain retrieval. The former supposes both queries and retrieved samples originate from the target domain. In the context of cross-domain retrieval, the source domain serves as the retrieved dataset while queries stem from the target domain. Recently, several DAR methods are proposed. Probability Weighted Compact Feature (PWCF) (Huang et al. 2020) utilizes a focal-triplet constraint to mitigate the domain gap in a domain-invariant subspace. Domain Adaptation Preconceived Hashing (DAPH) (Huang, Zhang, and Gao 2021) introduces Maximum Mean Discrepancy (MMD) (Gretton et al. 2012) to prompt the domain marginal distribution alignment. These geometryoriented methods lack consideration of semantic relationships between features and labels, resulting in suboptimal performance when significant semantic misalignment exists. Consequently, subsequent methods shift their focus toward incorporating semantic guidance. Two-Step Strategy (TSS) (Chen et al. 2023) proposes a discriminative semantic fusion for hash learning. Semantic Guided Hashing Learning (SGHL) (Zhang et al. 2023a) and Dynamic Confidence Sampling and Label Semantic Guidance (DCS-LSG) (Zhang et al. 2024) further align the cross-domain conditional distributions by integrating category labels.
Despite their promising performance, we identify certain critical limitations of current DAR methods: 1) excessive focus on pair-wise sample alignment. Specifically, PWCF, TSS, SGHL and DCS-LSG primarily minimize distribution discrepancies between semantically consistent sample pairs, which suffer from computational inefficiency and limited distributional coverage of data (Yuan et al. 2020). 2) inadequate handling of pseudo-label reliability. Pseudo-labeling serves to predict the latent semantic associations between classes and unlabeled data, consequently providing fully annotated data to facilitate knowledge transfer. However, existing methods typically adopt off-the-shelf strategies, neglecting correction mechanisms for erroneous predictions. This inevitably leads to biased domain alignment and degraded hash codes quality. Although the most recent method, DCS-LSG, considers pseudo-label noise, it relies solely on semantic consensus between dual projections, without incorporating geometric knowledge for reliability assessment. 3) directly mapping original features with imperfect domain alignment to Hamming space, resulting in high quantization errors and limited discriminative power of generated codes.
To systematically tackle the limitations mentioned above, we propose the Prototype-Based Semantic Consistency Alignment (PSCA) framework. The core innovation lies in a semantic consistency alignment that evaluates pseudo-label reliability by comparing geometric proximity with semantic predictions, adaptively weighting the pseudo-labels. To be precise, in the first stage, PSCA establishes orthogonal class prototypes within a domain-shared subspace, where the semantic consistency alignment performs as follows: when geometric and semantic indicators agree, semantic weights are adjusted based on decision margins, as larger margins reflect stronger prediction confidence; when they conflict, semantic contribution is reduced proportionally to the disagreement level. This process derives a soft membership matrix that guides prototype learning in turn, thereby capturing more reliable semantic connections that mitigate error propagation.
After stage one, the membership matrix and prototypes re
This content is AI-processed based on ArXiv data.