RedHatAI
/

Phi-3-mini-128k-instruct-FP8

@@ -13,7 +13,7 @@ tags:
 * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
 Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
-Calibrated with 10 repeats of each token in the tokenizer in random order to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
 Reduces space on disk by ~50%.
 Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
@@ -34,7 +34,7 @@ final_model_dir = MODEL_DIR.split("/")[-1]
 CONTEXT_LENGTH = 4096
 NUM_SAMPLES = 512
-NUM_REPEATS = 10
 pretrained_model_dir = MODEL_DIR
 tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=CONTEXT_LENGTH)
@@ -948,11 +948,11 @@ Evaluated on the Open LLM Leaderboard evaluations through vLLM.
 ### Open LLM Leaderboard evaluation scores
 |                      | Phi-3-mini-128k-instruct-FP8       | neuralmagic/Phi-3-mini-128k-instruct-FP8<br>(this model) |
 | :------------------: | :----------------------: | :------------------------------------------------: |
-| arc-c<br>25-shot     | 63.65                    | 63.31                                              |
-| hellaswag<br>10-shot | 79.76                    | 79.44                                              |
-| mmlu<br>5-shot       | 68.10                    | 68.08                                              |
-| truthfulqa<br>0-shot | 53.97                    | 53.76                                              |
-| winogrande<br>5-shot | 73.72                    | 72.45                                              |
-| gsm8k<br>5-shot      | 75.59                    | 72.86                                              |
-| **Average<br>Accuracy**  | **69.13**                    |              **68.32**                                      |
-| **Recovery**             | **100%**                     |              **98.82%**                                     |

 * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
 Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
+Calibrated with 1 repeats of each token in the tokenizer in random order to achieve ~100% performance recovery on the Open LLM Benchmark evaluations.
 Reduces space on disk by ~50%.
 Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
 CONTEXT_LENGTH = 4096
 NUM_SAMPLES = 512
+NUM_REPEATS = 1
 pretrained_model_dir = MODEL_DIR
 tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=CONTEXT_LENGTH)
 ### Open LLM Leaderboard evaluation scores
 |                      | Phi-3-mini-128k-instruct-FP8       | neuralmagic/Phi-3-mini-128k-instruct-FP8<br>(this model) |
 | :------------------: | :----------------------: | :------------------------------------------------: |
+| arc-c<br>25-shot     | 63.65                    | 64.33                                              |
+| hellaswag<br>10-shot | 79.76                    | 79.61                                              |
+| mmlu<br>5-shot       | 68.10                    | 67.78                                              |
+| truthfulqa<br>0-shot | 53.97                    | 52.95                                              |
+| winogrande<br>5-shot | 73.72                    | 73.40                                              |
+| gsm8k<br>5-shot      | 75.59                    | 74.22                                              |
+| **Average<br>Accuracy**  | **69.13**                    |              **68.72**                                      |
+| **Recovery**             | **100%**                     |              **99.40%**                                     |