igitman commited on
Commit
3cf5314
·
verified ·
1 Parent(s): f091ea1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -263,6 +263,8 @@ Data Labeling for Evaluation Datasets:
263
  ## Evaluation Results
264
  We evaluate the model using temperature=`0.6`, top_p=`0.95`, and 64k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate.
265
 
 
 
266
  ### MATH500
267
 
268
  | Reasoning Mode | pass@1 (avg. over 4 runs) |
 
263
  ## Evaluation Results
264
  We evaluate the model using temperature=`0.6`, top_p=`0.95`, and 64k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate.
265
 
266
+ All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills). We published a [tutorial](https://nvidia.github.io/NeMo-Skills/tutorials/2025/08/15/reproducing-llama-nemotron-super-49b-v15-evals/) with all details necessary to reproduce our evaluation results.
267
+
268
  ### MATH500
269
 
270
  | Reasoning Mode | pass@1 (avg. over 4 runs) |