ZeroSearch: Training LLMs to Search Without Real-World Search Engines

5月10日 Published inMachine Learning

Search capabilities represent a critical frontier in the development of large language models (LLMs). Alibaba’s ZeroSearch introduces a novel reinforcement learning (RL) framework designed to cultivate these capabilities without necessitating direct interaction with real-world search engines.

ZeroSearch employs supervised fine-tuning (SFT) to transform an LLM into a retrieval module. This module generates relevant documents—interspersed with synthetic noise—based on a given query. To optimize performance, the researchers implemented a curriculum learning mechanism. This approach incrementally increases the complexity of retrieval scenarios, pushing the model to refine its reasoning and filtering abilities step by step.

Empirical tests across both in-domain and out-of-domain datasets demonstrate that ZeroSearch outperforms models dependent on live search engines while simultaneously eliminating API overhead. The framework exhibits strong generalization across various base and instruction-tuned models and remains compatible with multiple RL algorithms.

ZeroSearch Setup Guide

  1. Create and activate a virtual environment
    Use Conda to create an environment named zerosearch with Python 3.9, then activate it.

    conda create -n zerosearch python=3.9
    conda activate zerosearch
    
  2. Install dependencies
    Install the necessary libraries, including Torch, vLLM, WandB, and SerpApi.

    pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
    pip install vllm==0.6.3
    pip install wandb
    pip install serpapi
    # Install verl
    pip install -e .
    # Install Flash Attention 2
    pip3 install flash-attn --no-build-isolation
    # Install SGLang
    pip install sglang
    
  3. Download the training dataset
    Use the huggingface-cli to fetch the ZeroSearch dataset.

    huggingface-cli download --repo-type dataset --resume-download sunhaonlp/ZeroSearch_dataset --local-dir ZeroSearch_dataset
    
  4. Download the simulation LLM
    Select the model size appropriate for your hardware.

    huggingface-cli download --resume-download sunhaonlp/SearchSimulation_3B --local-dir SearchSimulation_3B
    huggingface-cli download --resume-download sunhaonlp/SearchSimulation_7B --local-dir SearchSimulation_7B
    huggingface-cli download --resume-download sunhaonlp/SearchSimulation_14B --local-dir SearchSimulation_14B
    
  5. Launch a local simulation server
    You may choose between prompt-based simulation or a fine-tuned simulation.

    # Prompt-based simulation
    python -m sglang.launch_server --model-path Qwen2.5-14B-Instruct --host 0.0.0.0 --tp 2 --dp 2 --port 6001
    # Fine-tuned simulation
    python -m sglang.launch_server --model-path SearchSimulation_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001
    
  6. Execute reinforcement learning training
    The following example uses Llama-3.2-3B. Set your Google Search API key, then initiate GRPO or PPO training using the provided scripts.

    # Activate the conda environment
    conda activate zerosearch
    # Set your Google Search API key
    export SER_API_KEY=your_api_key
    # Run prompt-based simulation training
    bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0.25 END_THRESHOLD 0.5
    bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0.25 END_THRESHOLD 0.5
    # Run fine-tuned simulation training
    bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0.25 END_THRESHOLD 0.5
    bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Llama-3.2-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM SearchSimulation_14B START_THRESHOLD 0.25 END_THRESHOLD 0.5