Explicit Abstention Knobs for Predictable Reliability in Video Question Answering

Reading time: 1 minute
...

๐Ÿ“ Original Info

  • Title: Explicit Abstention Knobs for Predictable Reliability in Video Question Answering
  • ArXiv ID: 2601.00138
  • Date: 2025-12-31
  • Authors: Jorge Ortiz

๐Ÿ“ Abstract

High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold ฮต produces smooth risk-coverage tradeoffs, reducing error rates from 23.6% to 9.4% at 63.7% coverage with well-calibrated predictions (ECE = 0.018). Second, this control is not epistemic. Under evidence degradation (18 frames reduced to 6), the model's confidence distribution contracts only modestly. Evaluating the same frozen question instances under both evidence conditions, median self-reported confidence remains 0.9 in both regimes despite a 3ร— reduction in visual information. We corroborate this finding with logprob-derived confidence (p max ), obtained via a separate prompt interface on matched question instances; this signal exhibits the same failure mode. The model does not "know when it does not know" under shift. These results motivate warrant-based selec...

๐Ÿ“„ Full Content

...(๋ณธ๋ฌธ ๋‚ด์šฉ์ด ๊ธธ์–ด ์ƒ๋žต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์ดํŠธ์—์„œ ์ „๋ฌธ์„ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.)

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut