Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published 22 days ago • 69
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 9 days ago • 42
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published 8 days ago • 12