Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval

Reading time: 2 minute
...

📝 Original Info

  • Title: Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval
  • ArXiv ID: 2512.10596
  • Date: 2025-12-11
  • Authors: J. Xiao, Y. Guo, X. Zi, K. Thiyagarajan, C. Moreira, M. Prasad

📝 Abstract

Semantic retrieval of remote sensing (RS) images is a critical task fundamentally challenged by the "semantic gap", the discrepancy between a model's low-level visual features and high-level human concepts. While large Vision-Language Models (VLMs) offer a promising path to bridge this gap, existing methods often rely on costly, domain-specific training, and there is a lack of benchmarks to evaluate the practical utility of VLM-generated text in a zero-shot retrieval context. To address this research gap, we introduce the Remote Sensing Rich Text (RSRT) dataset, a new benchmark featuring multiple structured captions per image. Based on this dataset, we propose a fully training-free, text-only retrieval reference called TRSLLaVA. Our methodology reformulates cross-modal retrieval as a text-to-text (T2T) matching problem, leveraging rich text descriptions as queries against a database of VLM-generated captions within a unified textual embedding space. This approach completely bypasses model training or fine-tuning. Experiments on the RSITMD and RSICD benchmarks show our training-free method is highly competitive with stateof-the-art supervised models. For instance, on RSITMD, our method achieves a mean ...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut