Search capabilities represent a critical frontier in the development of large language models (LLMs). Alibaba’s ZeroSearch introduces a novel reinforcement learning (RL) framework designed to cultivate these capabilities without necessitating direct interaction with real-world search engines.
ZeroSearch employs supervised fine-tuning (SFT) to transform an LLM into a retrieval module. This module generates relevant documents—interspersed with synthetic noise—based on a given query. To optimize performance, the researchers implemented a curriculum learning mechanism. This approach incrementally increases the complexity of retrieval scenarios, pushing the model to refine its reasoning and filtering abilities step by step.
Empirical tests across both in-domain and out-of-domain datasets demonstrate that ZeroSearch outperforms models dependent on live search engines while simultaneously eliminating API overhead. The framework exhibits strong generalization across various base and instruction-tuned models and remains compatible with multiple RL algorithms.
Create and activate a virtual environment
Use Conda to create an environment named zerosearch with Python 3.9, then activate it.
conda create -n zerosearch python=3.9
conda activate zerosearch
Install dependencies
Install the necessary libraries, including Torch, vLLM, WandB, and SerpApi.
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3
pip install wandb
pip install serpapi
# Install verl
pip install -e .
# Install Flash Attention 2
pip3 install flash-attn --no-build-isolation
# Install SGLang
pip install sglang
Download the training dataset
Use the huggingface-cli to fetch the ZeroSearch dataset.
huggingface-cli download --repo-type dataset --resume-download sunhaonlp/ZeroSearch_dataset --local-dir ZeroSearch_dataset
Download the simulation LLM
Select the model size appropriate for your hardware.
huggingface-cli download --resume-download sunhaonlp/SearchSimulation_3B --local-dir SearchSimulation_3B
huggingface-cli download --resume-download sunhaonlp/SearchSimulation_7B --local-dir SearchSimulation_7B
huggingface-cli download --resume-download sunhaonlp/SearchSimulation_14B --local-dir SearchSimulation_14B
Launch a local simulation server
You may choose between prompt-based simulation or a fine-tuned simulation.
# Prompt-based simulation
python -m sglang.launch_server --model-path Qwen2.5-14B-Instruct --host 0.0.0.0 --tp 2 --dp 2 --port 6001
# Fine-tuned simulation
python -m sglang.launch_server --model-path SearchSimulation_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001
Execute reinforcement learning training
The following example uses Llama-3.2-3B. Set your Google Search API key, then initiate GRPO or PPO training using the provided scripts.
# Activate the conda environment
conda activate zerosearch
# Set your Google Search API key
export SER_API_KEY=your_api_key
# Run prompt-based simulation training
bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0.25 END_THRESHOLD 0.5
bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0.25 END_THRESHOLD 0.5
# Run fine-tuned simulation training
bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0.25 END_THRESHOLD 0.5
bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0.25 END_THRESHOLD 0.5
DeepSeek-OCR WebUI: Batch OCR with Markdown Tables and Visual Bounding Boxes
SpikingBrain: 100x Faster LLM Inference via Spike Sparsity
One API Setup Guide: Manage LLM Keys and Access 100+ AI Models
Mars3D Vue Examples: 381 Interactive 3D Map Demos and Live Code Editing
Google Analytics MCP Server: Query GA4 Data With Gemini CLI
Easy Agents: Automate Operations with Natural Language and MCP
Emojied: Convert Any URL into a Single Emoji Short Link
Agents From Scratch: AI Email Assistant with Human-in-the-Loop Approval
Larachat: Build a Real-Time AI Chat App with Laravel and React
Claude Code SDK for Python: Installation, Quick Start, and API Reference
II-Agent Review: An Open-Source LLM Assistant Built for Autonomous Tasks
ChatWiki: Open-Source AI Knowledge Base Q&A System