Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.02245

Learning from examples - training/inference

about 12 hours ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 7 days ago • 70
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Paper • 2510.01132 • Published 8 days ago • 5
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published 3 days ago • 41
MixReasoning: Switching Modes to Think

Paper • 2510.06052 • Published 2 days ago • 19

about 17 hours ago

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Paper • 2510.00880 • Published 8 days ago
Position: Privacy Is Not Just Memorization!

Paper • 2510.01645 • Published 7 days ago • 1
Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published 6 days ago • 2
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 7 days ago • 70

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 63
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 67
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 28
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2 • 24

Model collections trained using ExGRPO.

rzzhan/ExGRPO-Qwen2.5-Math-7B-Zero

8B • Updated 7 days ago • 11
rzzhan/ExGRPO-LUFFY-7B-Continual

8B • Updated 7 days ago • 11
rzzhan/ExGRPO-Qwen2.5-7B-Instruct

8B • Updated 7 days ago • 12
rzzhan/ExGRPO-Qwen2.5-Math-1.5B-Zero

2B • Updated 7 days ago • 11

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 7 days ago • 70
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 96
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 111
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Paper • 2508.19828 • Published Aug 27 • 6

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 43
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Learning from examples - training/inference

about 12 hours ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 7 days ago • 70
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Paper • 2510.01132 • Published 8 days ago • 5
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published 3 days ago • 41
MixReasoning: Switching Modes to Think

Paper • 2510.06052 • Published 2 days ago • 19

Model collections trained using ExGRPO.

rzzhan/ExGRPO-Qwen2.5-Math-7B-Zero

8B • Updated 7 days ago • 11
rzzhan/ExGRPO-LUFFY-7B-Continual

8B • Updated 7 days ago • 11
rzzhan/ExGRPO-Qwen2.5-7B-Instruct

8B • Updated 7 days ago • 12
rzzhan/ExGRPO-Qwen2.5-Math-1.5B-Zero

2B • Updated 7 days ago • 11

about 17 hours ago

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Paper • 2510.00880 • Published 8 days ago
Position: Privacy Is Not Just Memorization!

Paper • 2510.01645 • Published 7 days ago • 1
Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published 6 days ago • 2
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 7 days ago • 70

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 7 days ago • 70
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 96
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 111
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Paper • 2508.19828 • Published Aug 27 • 6

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 63
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 67
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 28
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2 • 24

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 43
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs