Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 271
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 130
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 119
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published Jul 21 • 34
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published 30 days ago • 45
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts Paper • 2508.09848 • Published 24 days ago • 66
Train Long, Think Short: Curriculum Learning for Efficient Reasoning Paper • 2508.08940 • Published 25 days ago • 25
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment Paper • 2508.07750 • Published 26 days ago • 19
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal Paper • 2508.05988 • Published 29 days ago • 19
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning Paper • 2508.07101 • Published 27 days ago • 13
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper • 2508.09726 • Published 24 days ago • 13
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 234
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 29 days ago • 170
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper • 2508.09983 • Published 23 days ago • 67
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models Paper • 2508.02120 • Published Aug 4 • 19
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following Paper • 2508.02150 • Published Aug 4 • 36
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability Paper • 2508.04017 • Published Aug 6 • 11
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing Paper • 2508.11116 • Published 22 days ago • 22
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published 9 days ago • 85
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published 11 days ago • 35
StepWiser: Stepwise Generative Judges for Wiser Reasoning Paper • 2508.19229 • Published 10 days ago • 19
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models Paper • 2508.18773 • Published 11 days ago • 14
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges Paper • 2508.18076 • Published 11 days ago • 5
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published 15 days ago • 3
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills Paper • 2508.19500 • Published 10 days ago • 1
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning Paper • 2508.15868 • Published 16 days ago • 3
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts Paper • 2508.10390 • Published 23 days ago • 1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published 6 days ago • 72