Counterfactual Self-Questioning for Stable Policy Optimization in Language Models
Reading time: 1 minute
...
📝 Original Info
- Title: Counterfactual Self-Questioning for Stable Policy Optimization in Language Models
- ArXiv ID: 2601.00885
- Date: 2025-12-31
- Authors: Mandar Parab