Xiaomi has developed the MiMo-7B series from the ground up, specifically engineering it for complex reasoning tasks. Experimental data shows that the MiMo-7B-Base model performs significantly above its weight class, surpassing many 32-billion-parameter models in raw reasoning potential. Following reinforcement learning (RL) on a cold-started supervised fine-tuning (SFT) model, MiMo-7B-RL demonstrates elite proficiency in mathematics and programming, matching the performance of OpenAI o1-mini.
The MiMo-7B series consists of the base model, an SFT model, an RL model trained directly from the base, and an RL checkpoint derived from SFT.
Pre-training: A Base Model Built for Reasoning
Xiaomi refined its pre-training pipeline by optimizing data preprocessing, upgrading text extraction tools, and filtering data across multiple dimensions. The team also embedded a higher density of reasoning patterns into the training set, utilizing several methods to generate a vast and diverse volume of reasoning-specific data.
The pre-training process followed a three-stage data mixing strategy. MiMo-7B-Base was trained on approximately 25 trillion tokens, incorporating Multiple-Token Prediction (MTP) as a secondary training objective. This approach enhances model performance while simultaneously increasing inference speeds.
Fine-tuning: A New Path to Reasoning Models
The team curated 130,000 math and coding problems to serve as RL training data. A rule-based verifier was used to clean the dataset, assess problem difficulty, and maintain high quality. To prevent "reward hacking," the researchers relied strictly on rule-based accuracy rewards.
To handle complex coding problems where rewards are often sparse, they implemented a test-difficulty-driven reward mechanism. This system scores individual test cases based on difficulty, providing dense reward signals that help optimize the policy more effectively. For simpler problems, data resampling was used to boost sampling efficiency and stabilize policy updates during the later stages of RL.
RL Infrastructure
Xiaomi developed a high-efficiency rollout engine to accelerate RL training and validation. The engine integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time. As a result, training speed increased by 2.29x, and validation speed improved by 1.96x.
Furthermore, the team added MTP support to vLLM, making the inference engine more resilient within the RL system.
The MTP layers in MiMo-7B are fine-tuned during the pre-training and SFT phases but are frozen during RL. When using a single MTP layer for speculative decoding, the model achieves an acceptance rate of approximately 90%.
The MiMo-7B series is available for download on HuggingFace and ModelScope:
| Model | Description | HuggingFace Link | ModelScope Link |
|---|---|---|---|
| MiMo-7B-Base | Base model with high reasoning potential | [🤗 XiaomiMiMo/MiMo-7B-Base](huggingface.co/XiaomiMiMo/MiMo-7B-Base) | [🤖️ XiaomiMiMo/MiMo-7B-Base](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-Base) |
| MiMo-7B-RL-Zero | RL model trained directly from base | [🤗 XiaomiMiMo/MiMo-7B-RL-Zero](huggingface.co/XiaomiMiMo/MiMo-7B-RL-Zero) | [🤖️ XiaomiMiMo/MiMo-7B-RL-Zero](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL-Zero) |
| MiMo-7B-SFT | SFT model trained from base | [🤗 XiaomiMiMo/MiMo-7B-SFT](huggingface.co/XiaomiMiMo/MiMo-7B-SFT) | [🤖️ XiaomiMiMo/MiMo-7B-SFT](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-SFT) |
| MiMo-7B-RL | RL model from SFT; rivals OpenAI o1-mini | [🤗 XiaomiMiMo/MiMo-7B-RL](huggingface.co/XiaomiMiMo/MiMo-7B-RL) | [🤖️ XiaomiMiMo/MiMo-7B-RL](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL) |
MiMo-7B-RL delivers strong results in math and code reasoning across several industry benchmarks:
| Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | MiMo-7B-RL |
|---|---|---|---|---|---|---|---|
| General | |||||||
| GPQA Diamond (Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
| SuperGPQA (Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
| DROP (3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
| MMLU-Pro (EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
| IF-Eval (Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
| Mathematics | |||||||
| MATH-500 (Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
| AIME 2024 (Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
| AIME 2025 (Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
| Code | |||||||
| LiveCodeBench v5 (Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
| LiveCodeBench v6 (Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
Internal comparison within the MiMo-7B series:
| Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | MiMo-7B-RL |
|---|---|---|---|---|
| Mathematics | ||||
| MATH500 (Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 |
| AIME 2024 (Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 |
| AIME 2025 (Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 |
| Code | ||||
| LiveCodeBench v5 (Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 |
| LiveCodeBench v6 (Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 |
Evaluations were conducted at a temperature of 0.6. For AIME24 and AIME25, scores are an average over 32 runs. LiveCodeBench v5 (Aug 2024 – Feb 2025), v6 (Feb 2025 – May 2025), GPQA-Diamond, and IF-Eval scores are averages over 8 runs. MATH500 and SuperGPQA results are based on a single run.
SGLang Inference
The SGLang team supports MiMo, and MTP support is expected in a future update.
Use the following script to install and launch the SGLang server:
# Install the latest SGLang from the main branch
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
# Launch the SGLang server
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --host 0.0.0.0 --trust-remote-code
Refer to the SGLang documentation for more information.
vLLM Inference
Xiaomi recommends using their specific fork of vLLM 0.7.3 for MiMo MTP inference.
Example implementation:
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model=model_path,
trust_remote_code=True,
num_speculative_tokens=1,
disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)
conversation = [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": "Write an essay about the importance of higher education."
},
]
outputs = llm.chat(conversation,
sampling_params=sampling_params,
use_tqdm=False)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
print("=" * 80)
You can also register MiMo with vLLM without loading MTP parameters. Copy registry/register_mimo_in_vllm.py to your local directory, then import it:
import register_mimo_in_vllm
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model=model_path,
trust_remote_code=True,
# num_speculative_tokens=1,
disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)
HuggingFace Inference
Example script for standard inference:
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
model_id = "XiaomiMiMo/MiMo-7B-RL"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(**inputs, max_new_tokens = 100)
print(tokenizer.decode(output.tolist()[0]))
Xiaomi recommends utilizing their vLLM 0.7.3 fork for optimal performance and leaving the system prompt empty.
LightlyStudio: Reduce Annotation Costs Through Intelligent Data Curation
Sora 2 AI Watermark Remover: Remove Sora Watermarks Cleanly
DeepSeek-OCR: High-Speed Visual Text Compression That Actually Works
PromptEnhancer: Rewrite Any Prompt for Stunning AI Images
DeepDoc Turns Local Files Into AI Research Reports (No Cloud Needed)
Strapi Setup Guide: Local Development & Cloud Deployment
SerenityOS Build Guide: A C++ Unix-Like System for x86, Arm, and RISC-V
Microsoft’s NLWeb: Converting Any Website into a Conversational Interface
Spacedrive: An Open-Source Cross-Platform File Manager
Lapce: A Fast, Rust-Powered Code Editor with Remote Development
How to Add Missing Games to Shendeng VPN’s Library
LiebaoVPN: Fast, Private, and Ad-Free – The Top VPN for 2025