
SRโMCR: ์์ฒด์ฐธ์กฐ ์ ํธ๋ฅผ ํ์ฉํ ๋จ๊ณ๋ณ ์ถ๋ก ์ ๋ ฌ ํ๋ ์์ํฌ
Multimodal LLMs often produce fluent yet unreliable reasoning, exhibiting weak step-to-step coherence and insufficient visual grounding, largely because existing alignment approaches supervise only the final answer while ignoring the reliability of the intermediate reasoning process. We introduce SR















































