kaicheng001

https://kaicheng001.github.io/

kaicheng001

AI & ML interests

None yet

Recent Activity

upvoted a paper about 21 hours ago

Agent Learning via Early Experience

upvoted a paper 22 days ago

FlowRL: Matching Reward Distributions for LLM Reasoning

upvoted a paper about 2 months ago

Intern-S1: A Scientific Multimodal Foundation Model

View all activity

Organizations

None yet

upvoted a paper about 21 hours ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published 3 days ago • 161

upvoted a paper 22 days ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published 24 days ago • 104

upvoted a paper about 2 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 254

upvoted 2 papers 2 months ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 177

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 153

upvoted 3 papers 3 months ago

upvoted an article 3 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

Jul 8

• 694

upvoted 4 papers 5 months ago

Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 37

Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 185

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185

upvoted an article 5 months ago

Article

I trained a Language Model to schedule events with GRPO!

•

Apr 29

• 89

upvoted a paper 6 months ago

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 47

upvoted an article 6 months ago

Article

Mixture of Experts Explained

and 5 others •

Dec 11, 2023

• 930

upvoted 2 papers 6 months ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published Apr 11 • 130

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 86

upvoted 2 papers 7 months ago

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 169

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

kaicheng001

AI & ML interests

Recent Activity

Organizations

kaicheng001's activity

SmolLM3: smol, multilingual, long-context reasoner

I trained a Language Model to schedule events with GRPO!

Mixture of Experts Explained