Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
Abstract
Fathom-DeepResearch, an agentic system with specialized models for web search and report synthesis, achieves state-of-the-art performance on open-ended information-seeking tasks and diverse reasoning tasks.
Tool-integrated reasoning has emerged as a key focus for enabling agentic applications. Among these, DeepResearch Agents have gained significant attention for their strong performance on complex, open-ended information-seeking tasks. We introduce Fathom-DeepResearch, an agentic system composed of two specialized models. The first is Fathom-Search-4B, a DeepSearch model trained from Qwen3-4B and optimized for evidence-based investigation through live web search and targeted webpage querying. Its training combines three advances: (i) DUETQA, a 5K-sample dataset generated via multi-agent self-play that enforces strict web-search dependence and heterogeneous source grounding; (ii) RAPO, a zero-overhead extension of GRPO that stabilizes multi-turn Reinforcement Learning with Verifiable Rewards through curriculum pruning, reward-aware advantage scaling, and per-prompt replay buffers; and (iii) a steerable step-level reward that classifies each tool call by cognitive behavior and marginal utility, enabling explicit control over search trajectory breadth, depth, and horizon. These improvements enable reliable extension of tool-calling beyond 20 calls when warranted. The second is Fathom-Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports for comprehensive synthesis. Evaluated on DeepSearch benchmarks (SimpleQA, FRAMES, WebWalker, Seal0, MuSiQue) and DeepResearch-Bench, the system achieves state-of-the-art performance in the open-weights category while demonstrating strong generalization to diverse reasoning tasks including HLE, AIME-25, GPQA-Diamond, and MedQA.
Community
We present Fathom-DeepResearch, an agentic system that addresses critical gaps in open-source deep
research capabilities through two specialized 4B models: Fathom-Search-4B for multi-turn web search and
reasoning, and Fathom-Synthesizer-4B for structured report synthesis.
We are open-sourcing everything ->
model weights, research report, training recipe and data !
🤗Fathom-Search-4B: https://huggingface.co/FractalAIResearch/Fathom-Search-4B
📜 Research Paper: https://huggingface.co/papers/2509.24107
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL (2025)
- Reinforcement Learning Foundations for Deep Research Systems: A Survey (2025)
- Open Data Synthesis For Deep Research (2025)
- InfoAgent: Advancing Autonomous Information-Seeking Agents (2025)
- WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents (2025)
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL (2025)
- WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper