RSAgent Iterative Reasoning for Precision Text-Guided Segmentation

Reading time: 2 minute
...

📝 Original Paper Info

- Title: RSAgent Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations
- ArXiv ID: 2512.24023
- Date: 2025-12-30
- Authors: Xingqi He, Yujie Zhang, Shuyong Gao, Wenjie Li, Lingyi Hong, Mingxi Chen, Kaixun Jiang, Jiyuan Fu, Wenqiang Zhang

📝 Abstract

Text-guided object segmentation requires both cross-modal reasoning and pixel grounding abilities. Most recent methods treat text-guided segmentation as one-shot grounding, where the model predicts pixel prompts in a single forward pass to drive an external segmentor, which limits verification, refocusing and refinement when initial localization is wrong. To address this limitation, we propose RSAgent, an agentic Multimodal Large Language Model (MLLM) which interleaves reasoning and action for segmentation via multi-turn tool invocations. RSAgent queries a segmentation toolbox, observes visual feedback, and revises its spatial hypothesis using historical observations to re-localize targets and iteratively refine masks. We further build a data pipeline to synthesize multi-turn reasoning segmentation trajectories, and train RSAgent with a two-stage framework: cold-start supervised fine-tuning followed by agentic reinforcement learning with fine-grained, task-specific rewards. Extensive experiments show that RSAgent achieves a zero-shot performance of 66.5% gIoU on ReasonSeg test, improving over Seg-Zero-7B by 9%, and reaches 81.5% cIoU on RefCOCOg, demonstrating state-of-the-art performance on both in-domain and out-of-domain benchmarks.

💡 Summary & Analysis

### Explanation 1. **Optimal Radiation Levels**: - **Metaphor**: Sunlight is like water to plants; too little and they won't grow, but too much can scorch them. 2. **Damage Thresholds**: - **Metaphor**: This research helps find the 'fruit of sunlight.' Just as fruit that's overripe rots, excessive sun damages plants. 3. **Controlled Environment Insights**: - **Metaphor**: The lab functions like a 'test garden,' where we discover how much sunlight is ideal for plant growth.

📄 Full Paper Content (ArXiv Source)

### Explanation 1. **Optimal Radiation Levels**: - **Metaphor**: Sunlight is like water to plants; too little and they won't grow, but too much can scorch them. 2. **Damage Thresholds**: - **Metaphor**: This research helps find the 'fruit of sunlight.' Just as fruit that's overripe rots, excessive sun damages plants. 3. **Controlled Environment Insights**: - **Metaphor**: The lab functions like a 'test garden,' where we discover how much sunlight is ideal for plant growth.

📊 논문 시각자료 (Figures)

Figure 1



Figure 2



Figure 3



Figure 4



Figure 5



A Note of Gratitude

The copyright of this content belongs to the respective researchers. We deeply appreciate their hard work and contribution to the advancement of human civilization.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut