arxiv:2509.24107

Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs

Published on Sep 28

· Submitted by

Kunal Singh on Oct 8

#3 Paper of the day

Fractal AI Research

Upvote

Authors:

Shreyas Singh ,

Kunal Singh ,

Abstract

Fathom-DeepResearch, an agentic system with specialized models for web search and report synthesis, achieves state-of-the-art performance on open-ended information-seeking tasks and diverse reasoning tasks.

AI-generated summary

Tool-integrated reasoning has emerged as a key focus for enabling agentic applications. Among these, DeepResearch Agents have gained significant attention for their strong performance on complex, open-ended information-seeking tasks. We introduce Fathom-DeepResearch, an agentic system composed of two specialized models. The first is Fathom-Search-4B, a DeepSearch model trained from Qwen3-4B and optimized for evidence-based investigation through live web search and targeted webpage querying. Its training combines three advances: (i) DUETQA, a 5K-sample dataset generated via multi-agent self-play that enforces strict web-search dependence and heterogeneous source grounding; (ii) RAPO, a zero-overhead extension of GRPO that stabilizes multi-turn Reinforcement Learning with Verifiable Rewards through curriculum pruning, reward-aware advantage scaling, and per-prompt replay buffers; and (iii) a steerable step-level reward that classifies each tool call by cognitive behavior and marginal utility, enabling explicit control over search trajectory breadth, depth, and horizon. These improvements enable reliable extension of tool-calling beyond 20 calls when warranted. The second is Fathom-Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports for comprehensive synthesis. Evaluated on DeepSearch benchmarks (SimpleQA, FRAMES, WebWalker, Seal0, MuSiQue) and DeepResearch-Bench, the system achieves state-of-the-art performance in the open-weights category while demonstrating strong generalization to diverse reasoning tasks including HLE, AIME-25, GPQA-Diamond, and MedQA.

View arXiv page View PDF GitHub 31 Add to collection

Community

Ogkunal

Paper author Paper submitter 5 days ago

We present Fathom-DeepResearch, an agentic system that addresses critical gaps in open-source deep
research capabilities through two specialized 4B models: Fathom-Search-4B for multi-turn web search and
reasoning, and Fathom-Synthesizer-4B for structured report synthesis.

We are open-sourcing everything ->
model weights, research report, training recipe and data !

🤗Fathom-Search-4B: https://huggingface.co/FractalAIResearch/Fathom-Search-4B

📜 Research Paper: https://huggingface.co/papers/2509.24107