Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ tags:
|
|
13 |
* <h3 style="display: inline;">Model Developers:</h3> Neural Magic
|
14 |
|
15 |
Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
|
16 |
-
Calibrated with
|
17 |
Reduces space on disk by ~50%.
|
18 |
Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
|
19 |
|
@@ -34,7 +34,7 @@ final_model_dir = MODEL_DIR.split("/")[-1]
|
|
34 |
|
35 |
CONTEXT_LENGTH = 4096
|
36 |
NUM_SAMPLES = 512
|
37 |
-
NUM_REPEATS =
|
38 |
|
39 |
pretrained_model_dir = MODEL_DIR
|
40 |
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=CONTEXT_LENGTH)
|
@@ -948,11 +948,11 @@ Evaluated on the Open LLM Leaderboard evaluations through vLLM.
|
|
948 |
### Open LLM Leaderboard evaluation scores
|
949 |
| | Phi-3-mini-128k-instruct-FP8 | neuralmagic/Phi-3-mini-128k-instruct-FP8<br>(this model) |
|
950 |
| :------------------: | :----------------------: | :------------------------------------------------: |
|
951 |
-
| arc-c<br>25-shot | 63.65 |
|
952 |
-
| hellaswag<br>10-shot | 79.76 | 79.
|
953 |
-
| mmlu<br>5-shot | 68.10 |
|
954 |
-
| truthfulqa<br>0-shot | 53.97 |
|
955 |
-
| winogrande<br>5-shot | 73.72 |
|
956 |
-
| gsm8k<br>5-shot | 75.59 |
|
957 |
-
| **Average<br>Accuracy** | **69.13** | **68.
|
958 |
-
| **Recovery** | **100%** | **
|
|
|
13 |
* <h3 style="display: inline;">Model Developers:</h3> Neural Magic
|
14 |
|
15 |
Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
|
16 |
+
Calibrated with 1 repeats of each token in the tokenizer in random order to achieve ~100% performance recovery on the Open LLM Benchmark evaluations.
|
17 |
Reduces space on disk by ~50%.
|
18 |
Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
|
19 |
|
|
|
34 |
|
35 |
CONTEXT_LENGTH = 4096
|
36 |
NUM_SAMPLES = 512
|
37 |
+
NUM_REPEATS = 1
|
38 |
|
39 |
pretrained_model_dir = MODEL_DIR
|
40 |
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=CONTEXT_LENGTH)
|
|
|
948 |
### Open LLM Leaderboard evaluation scores
|
949 |
| | Phi-3-mini-128k-instruct-FP8 | neuralmagic/Phi-3-mini-128k-instruct-FP8<br>(this model) |
|
950 |
| :------------------: | :----------------------: | :------------------------------------------------: |
|
951 |
+
| arc-c<br>25-shot | 63.65 | 64.33 |
|
952 |
+
| hellaswag<br>10-shot | 79.76 | 79.61 |
|
953 |
+
| mmlu<br>5-shot | 68.10 | 67.78 |
|
954 |
+
| truthfulqa<br>0-shot | 53.97 | 52.95 |
|
955 |
+
| winogrande<br>5-shot | 73.72 | 73.40 |
|
956 |
+
| gsm8k<br>5-shot | 75.59 | 74.22 |
|
957 |
+
| **Average<br>Accuracy** | **69.13** | **68.72** |
|
958 |
+
| **Recovery** | **100%** | **99.40%** |
|