Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Reading time: 2 minute
...

📝 Original Info

  • Title: Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
  • ArXiv ID: 2512.08873
  • Date: 2025-12-09
  • Authors: Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum

📝 Abstract

Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network archi...

📄 Full Content

In recent years, the field of computer vision has seen remarkable advancements, particularly in the realm of image captioning [1]. Image captioning, which involves generating textual descriptions for visual content, has numerous applications, including accessibility for the visually impaired, contentbased image retrieval, and automatic image annotation [2]. However, the quality of captions generated for low-resolution images remains a significant challenge due to the reduced availability of salient features and finer details [3].

…(Content truncated for length.)

📸 Image Gallery

architecture.png encoder_distance_normal.png encoder_distance_siamese.png sample_resolution.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut