FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation

February 09, 2026

Reading time: 1 minute

...

📝 Original Info

Title: FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation
ArXiv ID: 2601.01513
Date: 2026-01-04
Authors: Gen Li, Peiyu Liu

📝 Abstract

Vision-Language Models (VLMs) excel at visual reasoning but still struggle with integrating external knowledge. Retrieval-Augmented Generation (RAG) is a promising solution, but current methods remain inefficient and often fail to maintain high answer quality. To address these challenges, we propose VideoSpecu-lateRAG, an efficient VLM-based RAG framework built on two key ideas. First, we introduce a speculative decoding pipeline: a lightweight draft model quickly generates multiple answer candidates, which are then verified and refined by a more accurate heavyweight model, substantially reducing inference latency without sacrificing correctness. Second, we identify a major source of error-incorrect entity recognition in retrieved knowledge-and mitigate it with a simple yet effective similarity-based filtering strategy that improves entity alignment and boosts overall answer accuracy. Experiments demonstrate that VideoSpeculateRAG achieves comparable or higher accuracy than standard RAG approaches while accelerating inference by approximately 2×. Our framework highlights the potential of combining speculative decoding with retrieval-augmented reasoning to enhance efficiency and reliability in complex, knowledge-intensive multimodal tasks. The codes are available at https:// github.com/FastVRAG/Fast-VRAG.

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation

📝 Original Info

📝 Abstract

📄 Full Content

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Start searching

No results found