Update README.md
Browse files
README.md
CHANGED
@@ -169,7 +169,7 @@ A large variety of training data was used for the knowledge distillation phase b
|
|
169 |
|
170 |
The data for the multi-stage post-training phases for improvements in Code, Math, and Reasoning is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model.
|
171 |
|
172 |
-
In conjunction with this model release, NVIDIA has released 30M samples of post-training data, as public and permissive. [Llama-Nemotron-Postraining
|
173 |
|
174 |
Distribution of the domains is as follows:
|
175 |
|
@@ -184,17 +184,6 @@ Distribution of the domains is as follows:
|
|
184 |
|
185 |
Prompts have been sourced from either public and open corpus or synthetically generated. Responses were synthetically generated by a variety of models, with some prompts containing responses for both reasoning on and off modes, to train the model to distinguish between two modes.
|
186 |
|
187 |
-
Models that were used in the creation of this dataset:
|
188 |
-
|
189 |
-
* Llama-3.3-70B-Instruct
|
190 |
-
* Llama-3.1-Nemotron-70B-Instruct
|
191 |
-
* Llama-3.3-Nemotron-70B-Feedback/Edit/Select
|
192 |
-
* Mixtral-8x22B-Instruct-v0.1
|
193 |
-
* DeepSeek-R1
|
194 |
-
* Qwen-2.5-Math-7B-Instruct
|
195 |
-
* Qwen-2.5-Coder-32B-Instruct
|
196 |
-
* Qwen-2.5-72B-Instruct
|
197 |
-
* Qwen-2.5-32B-Instruct
|
198 |
|
199 |
**Data Collection for Training Datasets:**
|
200 |
|
|
|
169 |
|
170 |
The data for the multi-stage post-training phases for improvements in Code, Math, and Reasoning is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model.
|
171 |
|
172 |
+
In conjunction with this model release, NVIDIA has released 30M samples of post-training data, as public and permissive. Please see [Llama-Nemotron-Postraining-Dataset-v1](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1).
|
173 |
|
174 |
Distribution of the domains is as follows:
|
175 |
|
|
|
184 |
|
185 |
Prompts have been sourced from either public and open corpus or synthetically generated. Responses were synthetically generated by a variety of models, with some prompts containing responses for both reasoning on and off modes, to train the model to distinguish between two modes.
|
186 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
187 |
|
188 |
**Data Collection for Training Datasets:**
|
189 |
|