Xiaomi MiMo-7B: Built From Scratch for Math and Code Reasoning

5月9日 Published inAI Models

Xiaomi has developed the MiMo-7B series from the ground up, specifically engineering it for complex reasoning tasks. Experimental data shows that the MiMo-7B-Base model performs significantly above its weight class, surpassing many 32-billion-parameter models in raw reasoning potential. Following reinforcement learning (RL) on a cold-started supervised fine-tuning (SFT) model, MiMo-7B-RL demonstrates elite proficiency in mathematics and programming, matching the performance of OpenAI o1-mini.

The MiMo-7B series consists of the base model, an SFT model, an RL model trained directly from the base, and an RL checkpoint derived from SFT.

Pre-training: A Base Model Built for Reasoning

Xiaomi refined its pre-training pipeline by optimizing data preprocessing, upgrading text extraction tools, and filtering data across multiple dimensions. The team also embedded a higher density of reasoning patterns into the training set, utilizing several methods to generate a vast and diverse volume of reasoning-specific data.

The pre-training process followed a three-stage data mixing strategy. MiMo-7B-Base was trained on approximately 25 trillion tokens, incorporating Multiple-Token Prediction (MTP) as a secondary training objective. This approach enhances model performance while simultaneously increasing inference speeds.

Fine-tuning: A New Path to Reasoning Models

The team curated 130,000 math and coding problems to serve as RL training data. A rule-based verifier was used to clean the dataset, assess problem difficulty, and maintain high quality. To prevent "reward hacking," the researchers relied strictly on rule-based accuracy rewards.

To handle complex coding problems where rewards are often sparse, they implemented a test-difficulty-driven reward mechanism. This system scores individual test cases based on difficulty, providing dense reward signals that help optimize the policy more effectively. For simpler problems, data resampling was used to boost sampling efficiency and stabilize policy updates during the later stages of RL.

RL Infrastructure

Xiaomi developed a high-efficiency rollout engine to accelerate RL training and validation. The engine integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time. As a result, training speed increased by 2.29x, and validation speed improved by 1.96x.

Furthermore, the team added MTP support to vLLM, making the inference engine more resilient within the RL system.

MiMo-7B Model Details

The MTP layers in MiMo-7B are fine-tuned during the pre-training and SFT phases but are frozen during RL. When using a single MTP layer for speculative decoding, the model achieves an acceptance rate of approximately 90%.

The MiMo-7B series is available for download on HuggingFace and ModelScope:

Model Description HuggingFace Link ModelScope Link
MiMo-7B-Base Base model with high reasoning potential [🤗 XiaomiMiMo/MiMo-7B-Base](huggingface.co/XiaomiMiMo/MiMo-7B-Base) [🤖️ XiaomiMiMo/MiMo-7B-Base](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-Base)
MiMo-7B-RL-Zero RL model trained directly from base [🤗 XiaomiMiMo/MiMo-7B-RL-Zero](huggingface.co/XiaomiMiMo/MiMo-7B-RL-Zero) [🤖️ XiaomiMiMo/MiMo-7B-RL-Zero](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL-Zero)
MiMo-7B-SFT SFT model trained from base [🤗 XiaomiMiMo/MiMo-7B-SFT](huggingface.co/XiaomiMiMo/MiMo-7B-SFT) [🤖️ XiaomiMiMo/MiMo-7B-SFT](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-SFT)
MiMo-7B-RL RL model from SFT; rivals OpenAI o1-mini [🤗 XiaomiMiMo/MiMo-7B-RL](huggingface.co/XiaomiMiMo/MiMo-7B-RL) [🤖️ XiaomiMiMo/MiMo-7B-RL](www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL)

Evaluation Results

MiMo-7B-RL delivers strong results in math and code reasoning across several industry benchmarks:

Benchmark GPT-4o-0513 Claude-3.5-Sonnet-1022 OpenAI o1-mini QwQ-32B-Preview R1-Distill-Qwen-14B R1-Distill-Qwen-7B MiMo-7B-RL
General
GPQA Diamond (Pass@1) 49.9 65.0 60.0 54.5 59.1 49.1 54.4
SuperGPQA (Pass@1) 42.4 48.2 45.2 43.6 40.6 28.9 40.5
DROP (3-shot F1) 83.7 88.3 83.9 71.2 85.5 77.0 78.7
MMLU-Pro (EM) 72.6 78.0 80.3 52.0 68.8 53.5 58.6
IF-Eval (Prompt Strict) 84.3 86.5 84.8 40.4 78.3 60.5 61.0
Mathematics
MATH-500 (Pass@1) 74.6 78.3 90.0 90.6 93.9 92.8 95.8
AIME 2024 (Pass@1) 9.3 16.0 63.6 50.0 69.7 55.5 68.2
AIME 2025 (Pass@1) 11.6 7.4 50.7 32.4 48.2 38.8 55.4
Code
LiveCodeBench v5 (Pass@1) 32.9 38.9 53.8 41.9 53.1 37.6 57.8
LiveCodeBench v6 (Pass@1) 30.9 37.2 46.8 39.1 31.9 23.9 49.3

Internal comparison within the MiMo-7B series:

Benchmark MiMo-7B-Base MiMo-7B-RL-Zero MiMo-7B-SFT MiMo-7B-RL
Mathematics
MATH500 (Pass@1) 37.4 93.6 93.0 95.8
AIME 2024 (Pass@1) 32.9 56.4 58.7 68.2
AIME 2025 (Pass@1) 24.3 46.3 44.3 55.4
Code
LiveCodeBench v5 (Pass@1) 32.9 49.1 52.3 57.8
LiveCodeBench v6 (Pass@1) 29.1 42.9 45.5 49.3

Evaluations were conducted at a temperature of 0.6. For AIME24 and AIME25, scores are an average over 32 runs. LiveCodeBench v5 (Aug 2024 – Feb 2025), v6 (Feb 2025 – May 2025), GPQA-Diamond, and IF-Eval scores are averages over 8 runs. MATH500 and SuperGPQA results are based on a single run.

Deploying MiMo-7B

SGLang Inference

The SGLang team supports MiMo, and MTP support is expected in a future update.

Use the following script to install and launch the SGLang server:

# Install the latest SGLang from the main branch
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
# Launch the SGLang server
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --host 0.0.0.0 --trust-remote-code

Refer to the SGLang documentation for more information.

vLLM Inference

Xiaomi recommends using their specific fork of vLLM 0.7.3 for MiMo MTP inference.

Example implementation:

from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

conversation = [
    {
        "role": "system",
        "content": ""
    },
    {
        "role": "user",
        "content": "Write an essay about the importance of higher education."
    },
]

outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   use_tqdm=False)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

print("=" * 80)

You can also register MiMo with vLLM without loading MTP parameters. Copy registry/register_mimo_in_vllm.py to your local directory, then import it:

import register_mimo_in_vllm
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    # num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

HuggingFace Inference

Example script for standard inference:

from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
model_id = "XiaomiMiMo/MiMo-7B-RL"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(**inputs, max_new_tokens = 100)
print(tokenizer.decode(output.tolist()[0]))

Xiaomi recommends utilizing their vLLM 0.7.3 fork for optimal performance and leaving the system prompt empty.