Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,6 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
-
|
5 |
<div align="center">
|
6 |
<picture>
|
7 |
<source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
|
@@ -11,11 +10,11 @@ license: mit
|
|
11 |
|
12 |
<h3 align="center">
|
13 |
<b>
|
14 |
-
<span
|
15 |
<br/>
|
16 |
Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
|
17 |
<br/>
|
18 |
-
<span
|
19 |
<br/>
|
20 |
</b>
|
21 |
</h3>
|
@@ -26,6 +25,8 @@ license: mit
|
|
26 |
|
|
27 |
<a href="https://huggingface.co/XiaomiMiMo" target="_blank">π€ HuggingFace</a>
|
28 |
|
|
|
|
|
|
29 |
<a href="https://github.com/XiaomiMiMo/MiMo/blob/main/MiMo-7B-Technical-Report.pdf" target="_blank">π Technical Report</a>
|
30 |
|
|
31 |
<br/>
|
@@ -46,12 +47,12 @@ In this work, we present MiMo-7B, a series of models trained from scratch and bo
|
|
46 |
</p>
|
47 |
|
48 |
We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.
|
49 |
-
We believe this report along with the models will
|
50 |
|
51 |
### π Highlights
|
52 |
|
53 |
- **Pre-Training: Base Model Born for Reasoning**
|
54 |
-
- We optimize data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
|
55 |
- We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
|
56 |
- We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
|
57 |
|
@@ -60,62 +61,83 @@ We believe this report along with the models will provides valuable insights to
|
|
60 |
- To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
|
61 |
- We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
|
62 |
|
63 |
-
- **RL
|
64 |
-
- We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving 2.29
|
65 |
-
- We support MTP in vLLM and enhance the robustness of the inference engine in RL system.
|
66 |
-
|
67 |
|
68 |
## II. Model Details
|
69 |
|
70 |
-
|
|
|
|
|
|
|
|
|
71 |
|
72 |
-
|
73 |
-
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
|
|
|
|
|
|
78 |
|
79 |
## III. Evaluation Results
|
80 |
|
81 |
-
| Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B |
|
82 |
-
| ----------------------------- | :---------: | :--------------------: | :------------: | :-------------: | :-----------------: | :----------------: |
|
83 |
-
| **General** | | | | | | |
|
84 |
-
| GPQA Diamond<br/>(Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 |
|
85 |
-
| SuperGPQA<br/>(Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 |
|
86 |
-
| DROP<br/>(3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 |
|
87 |
-
| MMLU-Pro<br/>(EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 |
|
88 |
-
| IF-Eval<br/>(Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 |
|
89 |
-
| **Mathematics** | | | | | | |
|
90 |
-
| MATH-500<br/>(Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 |
|
91 |
-
| AIME 2024<br/>(Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 |
|
92 |
-
| AIME 2025<br/>(Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 |
|
93 |
-
| **Code** | | | | | | |
|
94 |
-
| LiveCodeBench v5<br/>(Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 |
|
95 |
-
| LiveCodeBench v6<br/>(Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 |
|
96 |
|
97 |
MiMo-7B series
|
98 |
|
99 |
-
| Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT |
|
100 |
-
| ----------------------------- | :----------: | :-------------: | :---------: |
|
101 |
-
| **Mathematics** | | | |
|
102 |
-
| MATH500<br/>(Pass@1) | 37.4 | 93.6 | 93.0 |
|
103 |
-
| AIME 2024<br/>(Pass@1) | 32.9 | 56.4 | 58.7 |
|
104 |
-
| AIME 2025<br/>(Pass@1) | 24.3 | 46.3 | 44.3 |
|
105 |
-
| **Code** | | | |
|
106 |
-
| LiveCodeBench v5<br/>(Pass@1) | 32.9 | 49.1 | 52.3 |
|
107 |
-
| LiveCodeBench v6<br/>(Pass@1) | 29.1 | 42.9 | 45.5 |
|
108 |
|
109 |
> [!IMPORTANT]
|
110 |
-
> The
|
111 |
>
|
112 |
> AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
|
113 |
|
114 |
## IV. Deployment
|
115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
### vLLM inference
|
117 |
|
118 |
-
1. [Recommended] We
|
119 |
|
120 |
Example script
|
121 |
|
@@ -180,9 +202,9 @@ Example script
|
|
180 |
```py
|
181 |
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
|
182 |
|
183 |
-
|
184 |
-
model = AutoModelForCausalLM.from_pretrained(
|
185 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
186 |
inputs = tokenizer(["Today is"], return_tensors='pt')
|
187 |
output = model.generate(**inputs, max_new_tokens = 100)
|
188 |
print(tokenizer.decode(output.tolist()[0]))
|
@@ -190,7 +212,7 @@ print(tokenizer.decode(output.tolist()[0]))
|
|
190 |
|
191 |
### Recommended environment and prompts
|
192 |
|
193 |
-
- We recommend using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/
|
194 |
- We recommend using empty system prompt.
|
195 |
|
196 |
> We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo π».
|
@@ -210,4 +232,4 @@ print(tokenizer.decode(output.tolist()[0]))
|
|
210 |
|
211 |
## VI. Contact
|
212 |
|
213 |
-
Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
4 |
<div align="center">
|
5 |
<picture>
|
6 |
<source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
|
|
|
10 |
|
11 |
<h3 align="center">
|
12 |
<b>
|
13 |
+
<span>βββββββββββββββββββββββββ</span>
|
14 |
<br/>
|
15 |
Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
|
16 |
<br/>
|
17 |
+
<span>βββββββββββββββββββββββββ</span>
|
18 |
<br/>
|
19 |
</b>
|
20 |
</h3>
|
|
|
25 |
|
|
26 |
<a href="https://huggingface.co/XiaomiMiMo" target="_blank">π€ HuggingFace</a>
|
27 |
|
|
28 |
+
<a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">π€οΈ ModelScope</a>
|
29 |
+
|
|
30 |
<a href="https://github.com/XiaomiMiMo/MiMo/blob/main/MiMo-7B-Technical-Report.pdf" target="_blank">π Technical Report</a>
|
31 |
|
|
32 |
<br/>
|
|
|
47 |
</p>
|
48 |
|
49 |
We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.
|
50 |
+
We believe this report along with the models will provide valuable insights to develop powerful reasoning LLMs that benefit the larger community.
|
51 |
|
52 |
### π Highlights
|
53 |
|
54 |
- **Pre-Training: Base Model Born for Reasoning**
|
55 |
+
- We optimize the data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
|
56 |
- We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
|
57 |
- We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
|
58 |
|
|
|
61 |
- To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
|
62 |
- We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
|
63 |
|
64 |
+
- **RL Infrastructure**
|
65 |
+
- We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
|
66 |
+
- We support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
|
|
|
67 |
|
68 |
## II. Model Details
|
69 |
|
70 |
+
The MTP layers of MiMo-7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
|
71 |
+
|
72 |
+
<p align="center">
|
73 |
+
<img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/architecture.png?raw=true">
|
74 |
+
</p>
|
75 |
|
76 |
+
> Models are available at [https://huggingface.co/XiaomiMiMo](https://huggingface.co/XiaomiMiMo) and [https://www.modelscope.cn/organization/XiaomiMiMo](https://www.modelscope.cn/organization/XiaomiMiMo)
|
77 |
+
|
78 |
+
| **Model** | **Description** | **Download (HuggingFace)** | **Download (ModelScope)** |
|
79 |
+
| :-------------: | :---------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: |
|
80 |
+
| MiMo-7B-Base | Base model with extraordinary reasoning potential | [π€ XiaomiMiMo/MiMo-7B-Base](https://huggingface.co/XiaomiMiMo/MiMo-7B-Base) | [π€οΈ XiaomiMiMo/MiMo-7B-Base](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-Base) |
|
81 |
+
| MiMo-7B-RL-Zero | RL model trained from base model | [π€ XiaomiMiMo/MiMo-7B-RL-Zero](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-Zero) | [π€οΈ XiaomiMiMo/MiMo-7B-RL-Zero](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL-Zero) |
|
82 |
+
| MiMo-7B-SFT | SFT model trained from base model | [π€ XiaomiMiMo/MiMo-7B-SFT](https://huggingface.co/XiaomiMiMo/MiMo-7B-SFT) | [π€οΈ XiaomiMiMo/MiMo-7B-SFT](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-SFT) |
|
83 |
+
| MiMo-7B-RL | RL model trained from SFT model, superior performance matching OpenAI o1-mini | [π€ XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL) | [π€οΈ XiaomiMiMo/MiMo-7B-RL](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL) |
|
84 |
|
85 |
## III. Evaluation Results
|
86 |
|
87 |
+
| Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | MiMo-7B-RL |
|
88 |
+
| ----------------------------- | :---------: | :--------------------: | :------------: | :-------------: | :-----------------: | :----------------: | :--------: |
|
89 |
+
| **General** | | | | | | | |
|
90 |
+
| GPQA Diamond<br/>(Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
|
91 |
+
| SuperGPQA<br/>(Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
|
92 |
+
| DROP<br/>(3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
|
93 |
+
| MMLU-Pro<br/>(EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
|
94 |
+
| IF-Eval<br/>(Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
|
95 |
+
| **Mathematics** | | | | | | | |
|
96 |
+
| MATH-500<br/>(Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
|
97 |
+
| AIME 2024<br/>(Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
|
98 |
+
| AIME 2025<br/>(Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
|
99 |
+
| **Code** | | | | | | | |
|
100 |
+
| LiveCodeBench v5<br/>(Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
|
101 |
+
| LiveCodeBench v6<br/>(Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
|
102 |
|
103 |
MiMo-7B series
|
104 |
|
105 |
+
| Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | MiMo-7B-RL |
|
106 |
+
| ----------------------------- | :----------: | :-------------: | :---------: | :--------: |
|
107 |
+
| **Mathematics** | | | | |
|
108 |
+
| MATH500<br/>(Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 |
|
109 |
+
| AIME 2024<br/>(Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 |
|
110 |
+
| AIME 2025<br/>(Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 |
|
111 |
+
| **Code** | | | | |
|
112 |
+
| LiveCodeBench v5<br/>(Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 |
|
113 |
+
| LiveCodeBench v6<br/>(Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 |
|
114 |
|
115 |
> [!IMPORTANT]
|
116 |
+
> The evaluations are conducted with `temperature=0.6`.
|
117 |
>
|
118 |
> AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
|
119 |
|
120 |
## IV. Deployment
|
121 |
|
122 |
+
### SGLang Inference
|
123 |
+
|
124 |
+
Thanks to the [contribution](https://github.com/sgl-project/sglang/pull/5921) from the SGLang team, we supported MiMo in SGLang mainstream within 24h with MTP coming soon.
|
125 |
+
|
126 |
+
Example Script
|
127 |
+
|
128 |
+
```bash
|
129 |
+
# Install the latest SGlang from main branch
|
130 |
+
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
|
131 |
+
|
132 |
+
# Launch SGLang Server
|
133 |
+
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --host 0.0.0.0 --trust-remote-code
|
134 |
+
```
|
135 |
+
|
136 |
+
Detailed usage can be found in [SGLang documents](https://docs.sglang.ai/backend/send_request.html). MTP will also be supported in 24h.
|
137 |
+
|
138 |
### vLLM inference
|
139 |
|
140 |
+
1. [Recommended] We officially support inference with MiMo-MTP using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073).
|
141 |
|
142 |
Example script
|
143 |
|
|
|
202 |
```py
|
203 |
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
|
204 |
|
205 |
+
model_id = "XiaomiMiMo/MiMo-7B-Base"
|
206 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
|
207 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
208 |
inputs = tokenizer(["Today is"], return_tensors='pt')
|
209 |
output = model.generate(**inputs, max_new_tokens = 100)
|
210 |
print(tokenizer.decode(output.tolist()[0]))
|
|
|
212 |
|
213 |
### Recommended environment and prompts
|
214 |
|
215 |
+
- We recommend using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073) which is developed based on vLLM 0.7.3.
|
216 |
- We recommend using empty system prompt.
|
217 |
|
218 |
> We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo π».
|
|
|
232 |
|
233 |
## VI. Contact
|
234 |
|
235 |
+
Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.
|