A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published 3 days ago • 131
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28 • 81
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 50
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 180
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Paper • 2505.13308 • Published May 19 • 27
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 183
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published Apr 18 • 16
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 133
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published Feb 25 • 37
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity Paper • 2502.11901 • Published Feb 17 • 6
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Paper • 2411.02359 • Published Nov 4, 2024 • 13
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 34
LLM-based Optimization of Compound AI Systems: A Survey Paper • 2410.16392 • Published Oct 21, 2024 • 17
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing Paper • 2407.08770 • Published Jul 11, 2024 • 21
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 34
DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints Paper • 2405.19026 • Published May 29, 2024 • 8
A Mixture of Surprises for Unsupervised Reinforcement Learning Paper • 2210.06702 • Published Oct 13, 2022 • 1
Augmenting Unsupervised Reinforcement Learning with Self-Reference Paper • 2311.09692 • Published Nov 16, 2023 • 1