-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 665 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 341 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 221 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 212
Collections
Discover the best community collections!
Collections including paper arxiv:2509.13313
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 70 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 110 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 51 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 103
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
BM25S: Orders of magnitude faster lexical search via eager sparse scoring
Paper • 2407.03618 • Published • 13 -
Deep Think with Confidence
Paper • 2508.15260 • Published • 85 -
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper • 2508.05004 • Published • 126
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 58 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 32 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 76 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 86
-
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 108 -
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Paper • 2508.16949 • Published • 22 -
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 75 -
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper • 2508.21767 • Published • 12
-
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 76 -
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
Paper • 2509.13309 • Published • 67
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 40 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 28
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 665 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 341 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 221 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 212
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 58 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 32 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 76 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 86
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 70 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 110 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 51 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 103
-
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 108 -
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Paper • 2508.16949 • Published • 22 -
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 75 -
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper • 2508.21767 • Published • 12
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
BM25S: Orders of magnitude faster lexical search via eager sparse scoring
Paper • 2407.03618 • Published • 13 -
Deep Think with Confidence
Paper • 2508.15260 • Published • 85 -
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper • 2508.05004 • Published • 126
-
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 76 -
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
Paper • 2509.13309 • Published • 67
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 40 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 28