Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published 10 days ago • 38
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published 11 days ago • 90
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning Paper • 2410.02052 • Published Oct 2, 2024 • 9
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 56
Teaching Language Models to Self-Improve through Interactive Demonstrations Paper • 2310.13522 • Published Oct 20, 2023 • 12
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 11