Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published 10 days ago • 20
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations Paper • 2509.03405 • Published 10 days ago • 19
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs Paper • 2509.00930 • Published 13 days ago • 3
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth Paper • 2509.03867 • Published 9 days ago • 199
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published 9 days ago • 67
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? Paper • 2509.04292 • Published 9 days ago • 54
Delta Activations: A Representation for Finetuned Large Language Models Paper • 2509.04442 • Published 9 days ago • 5
Set Block Decoding is a Language Model Inference Accelerator Paper • 2509.04185 • Published 9 days ago • 47
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs Paper • 2509.04013 • Published 9 days ago • 3
Reverse-Engineered Reasoning for Open-Ended Generation Paper • 2509.06160 • Published 6 days ago • 138
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published 5 days ago • 50
Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published 5 days ago • 28
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published 5 days ago • 9
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Paper • 2509.06283 • Published 5 days ago • 8
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet Paper • 2509.06861 • Published 5 days ago • 7
R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World Paper • 2509.06786 • Published 5 days ago • 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published 4 days ago • 87
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published 3 days ago • 443
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding Paper • 2509.06923 • Published 5 days ago • 17
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning Paper • 2509.03646 • Published 10 days ago • 23
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers Paper • 2509.06938 • Published 5 days ago • 2
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published 3 days ago • 128
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models Paper • 2509.09675 • Published 2 days ago • 24
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published 5 days ago • 8