MAS-Zero: Developing Self-Evolving Multi-Agent Systems Without Human Labels

6月10日 Published inAI Agent Tools

MAS-Zero is a multi-agent framework capable of autonomous self-improvement. It requires no human labels and no validation sets. Instead, it relies on a meta-agent that handles design, evaluation, and selection in real time.

The framework operates through two primary stages:

Meta-Iteration
- MAS-Design: The meta-agent decomposes a task into modular components. For each component, it proposes a specialized sub-agent team and translates that design into functional code.
- MAS-Feedback: The system executes the generated code. Intermediate outputs serve as a diagnostic tool, revealing whether the design is effective. The meta-agent assesses whether the sub-agents can solve their assigned tasks and if the interactions between them are logically sound.
Self-Verification: During the iteration process, the meta-agent generates several candidate systems. Self-verification then identifies the most robust candidate based exclusively on internal signals.

This entire process occurs at inference time. There is no separate training phase; the system continuously refines itself through these meta-level feedback loops.

A Practical Reasoning Challenge

Consider a complex geometry problem: ABCDEF is a convex equilateral hexagon with opposite sides parallel. The triangle formed by extending AB, CD, and EF has sides of 200, 240, and 300. Find the hexagon's side length.

A problem of this complexity requires rigorous reasoning. MAS-Zero approaches it by instantiating agents tailored to specific sub-problems—for example, one agent may manage geometric constraints while another focuses on algebraic calculations.

Performance Snapshot

Across various models and tasks, MAS-Zero consistently improves performance without external supervision.

The framework was tested against several benchmarks:

AIME24 (mathematical reasoning)
GPQA (graduate-level science questions)
SWE (software engineering tasks)

The following table compares MAS-Zero against the standard Chain-of-Thought (CoT) method:

LLM / Method	AIME24	GPQA	SWE	Avg
CoT (GPT-4o)	8.33	45.78	9.17	23.26
MAS-Zero (GPT-4o)	33.33	50.60	25.83	35.81
CoT (LLaMA3.3-70B)	16.67	50.60	2.92	22.09
MAS-Zero (LLaMA3.3-70B)	37.50	52.41	16.74	31.67
CoT (Qwen2.5-32B)	12.50	50.00	45.26	35.92
MAS-Zero (Qwen2.5-32B)	29.17	51.81	48.95	43.31

When using GPT-4o, MAS-Zero increased AIME24 accuracy from 8.33% to 33.33%. This performance trend remains consistent across LLaMA and Qwen models, demonstrating that superior results are achievable without relying on a pre-defined validation set.

Installation and Quick Start

Environment Setup

conda create -n mas_zero python=3.12 && conda activate mas_zero
pip install anthropic openai backoff together datasets jinja2 -e human-eval
cd ./ && pip install -r requirements.txt

Running a Search Task

export OPENAI_API_KEY={your_key}
export TOGETHER_API_KEY={your_key}

python main_question.py \
  --dataset workflow_search/aime24 \
  --option plan \
  --meta_model gpt-4o_chatgpt \
  --node_model gpt-4o_chatgpt \
  --verifier_model gpt-4o_chatgpt \
  --blocks COT COT_SC Reflexion LLM_debate \
  --use_oracle_verifier \
  --defer_verifier \
  --n_generation 5

The dataset parameter can be swapped for GPQA or SWE-Bench. The meta_model and node_model flags support various backends, including GPT and Claude.

Verifying Results

python main_judge.py \
  --dataset aime24 \
  --judge_method self \
  --baseline workflow_search \
  --model gpt-4o_chatgpt \
  --min_sample 0 \
  --max_sample 30 \
  --max_response_per_sample 5

MAS-Zero eliminates the need for external feedback. It demonstrates that a meta-agent, supported by an iterative refinement process, can independently construct highly competent multi-agent systems.

▶ Visit

Related Tools

AgentFlow: Modular AI Agent Framework Outperforms GPT-4o

TradingAgents-MCP: A 15-Agent AI Framework for Real-Time Stock Analysis

Parlant: Build AI Agents That Follow Rules, Not Prompts

SE-Agent: Self-Evolving AI Agent Tops SWE-bench Verified

Eigent: Multi-Agent Workflow Desktop App with CAMEL and MCP

Open Deep Research: Customizable AI Agents for Automated Report Generation

Coze Studio: Build and Deploy AI Agents with Golang and React

JoyAgent-JDGenie: An Open-Source Multi-Agent System for Direct Report Generation

RunAgent: Build AI Agents in Python, Invoke Them Natively from Any Language

OxyGent: Build Multi-Agent Systems That Learn and Scale Without YAML