Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

laner ten's picture

5

laner ten

that113

·

AI & ML interests

None yet

Organizations

None yet

Collections 2

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 299
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18 • 23
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27 • 14

Running

3.27k

3.27k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published Apr 3 • 32
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
microsoft/Magma-8B

Image-Text-to-Text • 9B • Updated May 13 • 3.59k • 411

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 299
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18 • 23
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27 • 14

Running

3.27k

3.27k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published Apr 3 • 32
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71
microsoft/Magma-8B

Image-Text-to-Text • 9B • Updated May 13 • 3.59k • 411

models 0

None public yet

datasets 0

None public yet

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs