Tongyi DeepResearch: 30B Agent Model Beats GPT and Claude on Search Benchmarks

9月18日 Published inLarge Language Models (LLMs)

DeepResearch, developed by Tongyi Lab, is an agentic large language model optimized for extensive, in-depth information retrieval. The model comprises 30.5 billion total parameters, with 3.3 billion activated per token. It is built on a fully automated synthetic data pipeline covering agent pretraining, supervised fine-tuning, and reinforcement learning. Continuous pretraining on vast amounts of agent-interaction data ensures the model remains current and proficient at complex reasoning.

Tongyi DeepResearch employs end-to-end reinforcement learning via a custom Group Relative Policy Optimization (GRPO) framework utilizing token-level policy gradients. The model supports two distinct inference modes: ReAct, designed for rigorous evaluation of core capabilities, and IterResearch "Heavy," optimized for extracting peak performance during complex research tasks.

Automated Data Pipeline Uses end-to-end synthetic data to drive agent pretraining, supervised fine-tuning, and reinforcement learning stages.

Large-Scale Continual Pretraining Leverages diverse agent-interaction datasets to expand the model's capabilities and refresh its knowledge base.

End-to-End Reinforcement Learning Utilizes the GRPO framework and token-level policy gradients to optimize agent behavior directly.

Dual Reasoning Modes Distinguishes between ReAct mode for core metric isolation and IterResearch "Heavy" mode for maximum depth and performance.

Tongyi DeepResearch demonstrates industry-leading performance across major agentic search benchmarks:

Benchmark Score
Humanity's Last Exam 32.9
BrowseComp 43.4
BrowseComp-ZH 46.7
GAIA 70.9
xbench-DeepSearch 75.0
WebWalkerQA 72.2
FRAMES 90.6

Evaluations indicate the model outperforms several leading competitors, including GLM 4.5, DeepSeek V3.1, Claude-4-Sonnet, and OpenAI o3.

DeepResearch Setup

1. Environment

conda create -n react_infer_env python=3.10.0
conda activate react_infer_env

2. Install Dependencies

pip install -r requirements.txt

3. Prepare Data

  • Create an eval_data/ directory.
  • Add JSONL files containing one QA pair per line: {"question": "...","answer": "..."}

4. Configure the Script

Edit run_react_infer.sh and set the following variables:

  • MODEL_PATH: Location of the model weights.
  • DATASET: Dataset identifier.
  • OUTPUT_PATH: Directory for results.
  • API Keys: Required for web search and external services.

5. Run Inference

bash run_react_infer.sh

Model Download

Model Sources Size Context Length
Tongyi-DeepResearch-30B-A3B HuggingFace / ModelScope 30B-A3B 128K

Research Agent Family

Tongyi DeepResearch serves as the foundation for a broader suite of research agents. This family includes specialized projects such as WebWalker, WebDancer, WebSailor, and WebShaper, which address vision-language tasks, long-horizon reasoning, dynamic outline construction, and other research-intensive applications.