-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 157 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 299 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 23 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14
laner ten
that113
·
AI & ML interests
None yet
Organizations
None yet
d
-
Running3.27k3.27k
The Ultra-Scale Playbook
🌌The ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
microsoft/Magma-8B
Image-Text-to-Text • 9B • Updated • 3.59k • 411
re paper
-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 157 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 299 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 23 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14
d
-
Running3.27k3.27k
The Ultra-Scale Playbook
🌌The ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
microsoft/Magma-8B
Image-Text-to-Text • 9B • Updated • 3.59k • 411
models
0
None public yet
datasets
0
None public yet