MAS-Zero is a multi-agent framework capable of autonomous self-improvement. It requires no human labels and no validation sets. Instead, it relies on a meta-agent that handles design, evaluation, and selection in real time.
The framework operates through two primary stages:
Meta-Iteration
Self-Verification: During the iteration process, the meta-agent generates several candidate systems. Self-verification then identifies the most robust candidate based exclusively on internal signals.
This entire process occurs at inference time. There is no separate training phase; the system continuously refines itself through these meta-level feedback loops.
A Practical Reasoning Challenge
Consider a complex geometry problem: ABCDEF is a convex equilateral hexagon with opposite sides parallel. The triangle formed by extending AB, CD, and EF has sides of 200, 240, and 300. Find the hexagon's side length.
A problem of this complexity requires rigorous reasoning. MAS-Zero approaches it by instantiating agents tailored to specific sub-problems—for example, one agent may manage geometric constraints while another focuses on algebraic calculations.
Performance Snapshot
Across various models and tasks, MAS-Zero consistently improves performance without external supervision.
The framework was tested against several benchmarks:
The following table compares MAS-Zero against the standard Chain-of-Thought (CoT) method:
| LLM / Method | AIME24 | GPQA | SWE | Avg |
|---|---|---|---|---|
| CoT (GPT-4o) | 8.33 | 45.78 | 9.17 | 23.26 |
| MAS-Zero (GPT-4o) | 33.33 | 50.60 | 25.83 | 35.81 |
| CoT (LLaMA3.3-70B) | 16.67 | 50.60 | 2.92 | 22.09 |
| MAS-Zero (LLaMA3.3-70B) | 37.50 | 52.41 | 16.74 | 31.67 |
| CoT (Qwen2.5-32B) | 12.50 | 50.00 | 45.26 | 35.92 |
| MAS-Zero (Qwen2.5-32B) | 29.17 | 51.81 | 48.95 | 43.31 |
When using GPT-4o, MAS-Zero increased AIME24 accuracy from 8.33% to 33.33%. This performance trend remains consistent across LLaMA and Qwen models, demonstrating that superior results are achievable without relying on a pre-defined validation set.
Installation and Quick Start
Environment Setup
conda create -n mas_zero python=3.12 && conda activate mas_zero
pip install anthropic openai backoff together datasets jinja2 -e human-eval
cd ./ && pip install -r requirements.txt
Running a Search Task
export OPENAI_API_KEY={your_key}
export TOGETHER_API_KEY={your_key}
python main_question.py \
--dataset workflow_search/aime24 \
--option plan \
--meta_model gpt-4o_chatgpt \
--node_model gpt-4o_chatgpt \
--verifier_model gpt-4o_chatgpt \
--blocks COT COT_SC Reflexion LLM_debate \
--use_oracle_verifier \
--defer_verifier \
--n_generation 5
The dataset parameter can be swapped for GPQA or SWE-Bench. The meta_model and node_model flags support various backends, including GPT and Claude.
Verifying Results
python main_judge.py \
--dataset aime24 \
--judge_method self \
--baseline workflow_search \
--model gpt-4o_chatgpt \
--min_sample 0 \
--max_sample 30 \
--max_response_per_sample 5
MAS-Zero eliminates the need for external feedback. It demonstrates that a meta-agent, supported by an iterative refinement process, can independently construct highly competent multi-agent systems.
TradingAgents-MCP: A 15-Agent AI Framework for Real-Time Stock Analysis
HackGPT Enterprise Review: AI-Native Pentesting for Security Teams
Google Analytics MCP Server: Query GA4 Data With Gemini CLI
Gemini-CLI-UI: A Web Interface for the Google Gemini CLI Coding Assistant
Zen Browser: about:config Tweaks, 1Password Setup, and Customization Guide
Machine Learning for Beginners: A Free 26-Lesson Curriculum
NetBird Setup Guide: Building a WireGuard Mesh VPN
n8n-MCP: Give Claude Access to 525+ n8n Nodes in Minutes
Paperless GPT: Smarter OCR and Auto-Tagging for Paperless-NGX
Ditto Clipboard Manager: Never Lose Your Copied Text Again
ChatTTS: A Text-to-Speech Model Optimized for Dialogue
Shendeng VPN: Game & App Accelerator – Now 22% Off