Submitted by iseesaw 88 A Survey of Reinforcement Learning for Large Reasoning Models · 39 authors 884 1
Submitted by taesiri 17 AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning · 23 authors 106 1
Submitted by spermwhale 5 The Majority is not always right: RL training for solution aggregation · 6 authors 1
Submitted by memyprokotow 3 <think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs · 3 authors 1
Submitted by taesiri - HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants · 4 authors 1