SecCodePRM: A Process Reward Model for Code Security

SecCodePRM: A Process Reward Model for Code Security
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.


💡 Research Summary

SecCodePRM introduces a novel process‑reward modeling approach for software security that operates at the granularity of individual code steps rather than whole programs. The authors observe that current vulnerability detection pipelines fall into two categories: static‑analysis‑based tools that require complete program context and are computationally heavy, and learned detectors (LLM‑ or GNN‑based) that are trained with coarse, program‑level binary labels. Both families struggle to provide real‑time, prefix‑level feedback during interactive coding or streaming generation, limiting their usefulness in modern LLM‑assisted development workflows.

To bridge this gap, SecCodePRM constructs step‑level supervision by automatically extracting safety annotations from static analysis tools such as CodeQL and from expert human reviews. The source code is first split into logical steps using a designated separator (e.g., double newline). Each step sₜ receives a binary safety label yₜ (safe = +, vulnerable = –). The model, built on top of a code‑oriented pre‑trained LLM (e.g., CodeBERT or a code‑tuned LLaMA), processes each step together with its preceding context τ_{<t}. A classification head outputs logits hₜ for the two classes; the step‑reward rₜ is defined as the softmax margin: rₜ = softmax(hₜ⁺) − softmax(hₜ⁻), yielding a value in


Comments & Academic Discussion

Loading comments...

Leave a Comment