MiMo-Admin commited on
Commit
9c071e0
Β·
verified Β·
1 Parent(s): 8c8a496

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -47
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
  license: mit
3
  ---
4
-
5
  <div align="center">
6
  <picture>
7
  <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
@@ -11,11 +10,11 @@ license: mit
11
 
12
  <h3 align="center">
13
  <b>
14
- <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
15
  <br/>
16
  Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
17
  <br/>
18
- <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>
19
  <br/>
20
  </b>
21
  </h3>
@@ -26,6 +25,8 @@ license: mit
26
  |
27
  <a href="https://huggingface.co/XiaomiMiMo" target="_blank">πŸ€— HuggingFace</a>
28
  &nbsp;|
 
 
29
  <a href="https://github.com/XiaomiMiMo/MiMo/blob/main/MiMo-7B-Technical-Report.pdf" target="_blank">πŸ“” Technical Report</a>
30
  &nbsp;|
31
  <br/>
@@ -46,12 +47,12 @@ In this work, we present MiMo-7B, a series of models trained from scratch and bo
46
  </p>
47
 
48
  We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.
49
- We believe this report along with the models will provides valuable insights to develop powerful reasoning LLM that benefit the larger community.
50
 
51
  ### 🌟 Highlights
52
 
53
  - **Pre-Training: Base Model Born for Reasoning**
54
- - We optimize data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
55
  - We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
56
  - We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
57
 
@@ -60,62 +61,83 @@ We believe this report along with the models will provides valuable insights to
60
  - To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
61
  - We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
62
 
63
- - **RL Infrastructures**
64
- - We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving 2.29 \\(\times\\) faster training and 1.96 \\(\times\\) faster validation.
65
- - We support MTP in vLLM and enhance the robustness of the inference engine in RL system.
66
-
67
 
68
  ## II. Model Details
69
 
70
- > Models are avaliable at [https://huggingface.co/XiaomiMiMo](https://huggingface.co/XiaomiMiMo)
 
 
 
 
71
 
72
- | **Model** | **Description** | **Download** |
73
- | :-------------: | :---------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: |
74
- | MiMo-7B-Base | Base model with extraordinary reasoning potential | [πŸ€— XiaomiMiMo/MiMo-7B-Base](https://huggingface.co/XiaomiMiMo/MiMo-7B-Base) |
75
- | MiMo-7B-RL-Zero | RL model trained from base model | [πŸ€— XiaomiMiMo/MiMo-7B-RL-Zero](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-Zero) |
76
- | MiMo-7B-SFT | SFT model trained from base model | [πŸ€— XiaomiMiMo/MiMo-7B-SFT](https://huggingface.co/XiaomiMiMo/MiMo-7B-SFT) |
77
- | **MiMo-7B-RL** | RL model trained from SFT model, superior performance matching OpenAI o1-mini | [πŸ€— XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL) |
 
 
78
 
79
  ## III. Evaluation Results
80
 
81
- | Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | **MiMo-7B-RL** |
82
- | ----------------------------- | :---------: | :--------------------: | :------------: | :-------------: | :-----------------: | :----------------: | :------------: |
83
- | **General** | | | | | | | |
84
- | GPQA Diamond<br/>(Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
85
- | SuperGPQA<br/>(Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
86
- | DROP<br/>(3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
87
- | MMLU-Pro<br/>(EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
88
- | IF-Eval<br/>(Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
89
- | **Mathematics** | | | | | | | |
90
- | MATH-500<br/>(Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
91
- | AIME 2024<br/>(Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
92
- | AIME 2025<br/>(Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
93
- | **Code** | | | | | | | |
94
- | LiveCodeBench v5<br/>(Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
95
- | LiveCodeBench v6<br/>(Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
96
 
97
  MiMo-7B series
98
 
99
- | Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | **MiMo-7B-RL** |
100
- | ----------------------------- | :----------: | :-------------: | :---------: | :------------: |
101
- | **Mathematics** | | | | |
102
- | MATH500<br/>(Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 |
103
- | AIME 2024<br/>(Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 |
104
- | AIME 2025<br/>(Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 |
105
- | **Code** | | | | |
106
- | LiveCodeBench v5<br/>(Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 |
107
- | LiveCodeBench v6<br/>(Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 |
108
 
109
  > [!IMPORTANT]
110
- > The evaluation are conducted with `temperature=0.6`.
111
  >
112
  > AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
113
 
114
  ## IV. Deployment
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  ### vLLM inference
117
 
118
- 1. [Recommended] We official support inference with MiMo-MTP using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp).
119
 
120
  Example script
121
 
@@ -180,9 +202,9 @@ Example script
180
  ```py
181
  from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
182
 
183
- model_path = "/path/to/MiMo"
184
- model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
185
- tokenizer = AutoTokenizer.from_pretrained(model_path)
186
  inputs = tokenizer(["Today is"], return_tensors='pt')
187
  output = model.generate(**inputs, max_new_tokens = 100)
188
  print(tokenizer.decode(output.tolist()[0]))
@@ -190,7 +212,7 @@ print(tokenizer.decode(output.tolist()[0]))
190
 
191
  ### Recommended environment and prompts
192
 
193
- - We recommend using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp) which is developed based on vLLM 0.7.3.
194
  - We recommend using empty system prompt.
195
 
196
  > We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo πŸ’».
@@ -210,4 +232,4 @@ print(tokenizer.decode(output.tolist()[0]))
210
 
211
  ## VI. Contact
212
 
213
- Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.
 
1
  ---
2
  license: mit
3
  ---
 
4
  <div align="center">
5
  <picture>
6
  <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
 
10
 
11
  <h3 align="center">
12
  <b>
13
+ <span>━━━━━━━━━━━━━━━━━━━━━━━━━</span>
14
  <br/>
15
  Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining
16
  <br/>
17
+ <span>━━━━━━━━━━━━━━━━━━━━━━━━━</span>
18
  <br/>
19
  </b>
20
  </h3>
 
25
  |
26
  <a href="https://huggingface.co/XiaomiMiMo" target="_blank">πŸ€— HuggingFace</a>
27
  &nbsp;|
28
+ <a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">πŸ€–οΈ ModelScope</a>
29
+ &nbsp;|
30
  <a href="https://github.com/XiaomiMiMo/MiMo/blob/main/MiMo-7B-Technical-Report.pdf" target="_blank">πŸ“” Technical Report</a>
31
  &nbsp;|
32
  <br/>
 
47
  </p>
48
 
49
  We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.
50
+ We believe this report along with the models will provide valuable insights to develop powerful reasoning LLMs that benefit the larger community.
51
 
52
  ### 🌟 Highlights
53
 
54
  - **Pre-Training: Base Model Born for Reasoning**
55
+ - We optimize the data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
56
  - We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
57
  - We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
58
 
 
61
  - To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
62
  - We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
63
 
64
+ - **RL Infrastructure**
65
+ - We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
66
+ - We support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
 
67
 
68
  ## II. Model Details
69
 
70
+ The MTP layers of MiMo-7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
71
+
72
+ <p align="center">
73
+ <img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/architecture.png?raw=true">
74
+ </p>
75
 
76
+ > Models are available at [https://huggingface.co/XiaomiMiMo](https://huggingface.co/XiaomiMiMo) and [https://www.modelscope.cn/organization/XiaomiMiMo](https://www.modelscope.cn/organization/XiaomiMiMo)
77
+
78
+ | **Model** | **Description** | **Download (HuggingFace)** | **Download (ModelScope)** |
79
+ | :-------------: | :---------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: |
80
+ | MiMo-7B-Base | Base model with extraordinary reasoning potential | [πŸ€— XiaomiMiMo/MiMo-7B-Base](https://huggingface.co/XiaomiMiMo/MiMo-7B-Base) | [πŸ€–οΈ XiaomiMiMo/MiMo-7B-Base](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-Base) |
81
+ | MiMo-7B-RL-Zero | RL model trained from base model | [πŸ€— XiaomiMiMo/MiMo-7B-RL-Zero](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-Zero) | [πŸ€–οΈ XiaomiMiMo/MiMo-7B-RL-Zero](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL-Zero) |
82
+ | MiMo-7B-SFT | SFT model trained from base model | [πŸ€— XiaomiMiMo/MiMo-7B-SFT](https://huggingface.co/XiaomiMiMo/MiMo-7B-SFT) | [πŸ€–οΈ XiaomiMiMo/MiMo-7B-SFT](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-SFT) |
83
+ | MiMo-7B-RL | RL model trained from SFT model, superior performance matching OpenAI o1-mini | [πŸ€— XiaomiMiMo/MiMo-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-7B-RL) | [πŸ€–οΈ XiaomiMiMo/MiMo-7B-RL](https://www.modelscope.cn/models/XiaomiMiMo/MiMo-7B-RL) |
84
 
85
  ## III. Evaluation Results
86
 
87
+ | Benchmark | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | MiMo-7B-RL |
88
+ | ----------------------------- | :---------: | :--------------------: | :------------: | :-------------: | :-----------------: | :----------------: | :--------: |
89
+ | **General** | | | | | | | |
90
+ | GPQA Diamond<br/>(Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
91
+ | SuperGPQA<br/>(Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
92
+ | DROP<br/>(3-shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
93
+ | MMLU-Pro<br/>(EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
94
+ | IF-Eval<br/>(Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
95
+ | **Mathematics** | | | | | | | |
96
+ | MATH-500<br/>(Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
97
+ | AIME 2024<br/>(Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
98
+ | AIME 2025<br/>(Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
99
+ | **Code** | | | | | | | |
100
+ | LiveCodeBench v5<br/>(Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
101
+ | LiveCodeBench v6<br/>(Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
102
 
103
  MiMo-7B series
104
 
105
+ | Benchmark | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | MiMo-7B-RL |
106
+ | ----------------------------- | :----------: | :-------------: | :---------: | :--------: |
107
+ | **Mathematics** | | | | |
108
+ | MATH500<br/>(Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 |
109
+ | AIME 2024<br/>(Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 |
110
+ | AIME 2025<br/>(Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 |
111
+ | **Code** | | | | |
112
+ | LiveCodeBench v5<br/>(Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 |
113
+ | LiveCodeBench v6<br/>(Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 |
114
 
115
  > [!IMPORTANT]
116
+ > The evaluations are conducted with `temperature=0.6`.
117
  >
118
  > AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
119
 
120
  ## IV. Deployment
121
 
122
+ ### SGLang Inference
123
+
124
+ Thanks to the [contribution](https://github.com/sgl-project/sglang/pull/5921) from the SGLang team, we supported MiMo in SGLang mainstream within 24h with MTP coming soon.
125
+
126
+ Example Script
127
+
128
+ ```bash
129
+ # Install the latest SGlang from main branch
130
+ python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
131
+
132
+ # Launch SGLang Server
133
+ python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --host 0.0.0.0 --trust-remote-code
134
+ ```
135
+
136
+ Detailed usage can be found in [SGLang documents](https://docs.sglang.ai/backend/send_request.html). MTP will also be supported in 24h.
137
+
138
  ### vLLM inference
139
 
140
+ 1. [Recommended] We officially support inference with MiMo-MTP using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073).
141
 
142
  Example script
143
 
 
202
  ```py
203
  from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
204
 
205
+ model_id = "XiaomiMiMo/MiMo-7B-Base"
206
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
207
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
208
  inputs = tokenizer(["Today is"], return_tensors='pt')
209
  output = model.generate(**inputs, max_new_tokens = 100)
210
  print(tokenizer.decode(output.tolist()[0]))
 
212
 
213
  ### Recommended environment and prompts
214
 
215
+ - We recommend using [our fork of vLLM](https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073) which is developed based on vLLM 0.7.3.
216
  - We recommend using empty system prompt.
217
 
218
  > We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo πŸ’».
 
232
 
233
  ## VI. Contact
234
 
235
+ Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.