Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards

February 22, 2026

Reading time: 2 minute

...

📝 Original Info

Title: Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards
ArXiv ID: 2510.23083
Date: 2025-10-27
Authors: ** 정보가 제공되지 않음 (논문에 명시된 저자 정보가 없습니다). **

📝 Abstract

Generating high-quality code remains a challenge for Large Language Models (LLMs). For the evolution of reasoning models on this task, reward models are a necessary intermediate step. These models judge outcomes or intermediate steps. Decoder-only transformer models can be turned into reward models by introducing a regression layer and supervised fine-tuning. While it is known that reflection capabilities generally increase with the size of a model, we want to investigate whether state-of-the-art small language models like the Phi-4 family can be turned into usable reward models blending the consideration of process rewards and outcome rewards. Targeting this goal, we construct a dataset of code samples with correctness labels derived from the APPS coding challenge benchmark. We then train a value-head model to estimate the success probability of intermediate outputs. Our evaluation shows that small LLMs are capable of serving as effective reward models or code evaluation critics, successfully identifying correct solutions among multiple candidates. Using this critic, we achieve over a 20% improvement in the search capability of the most accurate code out of multiple generations.

Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation

An item is worth one token in Multimodal Large Language Models-based Sequential Recommendation

Applying Time Series Deep Learning Models to Forecast the Growth of Perennial Ryegrass in Ireland

Start searching

No results found