VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published 3 days ago • 104 • 2
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published about 1 month ago • 31 • 2
FlashThink: An Early Exit Method For Efficient Reasoning Paper • 2505.13949 • Published May 20 • 1 • 2