Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.16949

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 42
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 55
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published 8 days ago • 103
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 14 days ago • 22
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published 8 days ago • 72
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published 8 days ago • 12

about 14 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 281 • 95
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 35
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 98
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 89

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 25
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 39
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 100
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 76

Reasoning Papers

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published 26 days ago • 39
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published 27 days ago • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published 25 days ago • 25

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 14 days ago • 22
Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published 10 days ago • 22
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 11 days ago • 14
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published 15 days ago • 243

PotentialApplication

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 42
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 55
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Reasoning Papers

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published 26 days ago • 39
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published 27 days ago • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published 25 days ago • 25

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published 8 days ago • 103
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 14 days ago • 22
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published 8 days ago • 72
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published 8 days ago • 12

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 14 days ago • 22
Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published 10 days ago • 22
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 11 days ago • 14
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published 15 days ago • 243

about 14 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 281 • 95
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 35
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 98
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 89

PotentialApplication

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 25
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 39
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 100
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 76

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs