Lin-K76 commited on
Commit
b16ac79
·
verified ·
1 Parent(s): a64da78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -13,7 +13,7 @@ tags:
13
  * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
14
 
15
  Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
16
- Calibrated with 10 repeats of each token in the tokenizer in random order to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
17
  Reduces space on disk by ~50%.
18
  Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
19
 
@@ -34,7 +34,7 @@ final_model_dir = MODEL_DIR.split("/")[-1]
34
 
35
  CONTEXT_LENGTH = 4096
36
  NUM_SAMPLES = 512
37
- NUM_REPEATS = 10
38
 
39
  pretrained_model_dir = MODEL_DIR
40
  tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=CONTEXT_LENGTH)
@@ -948,11 +948,11 @@ Evaluated on the Open LLM Leaderboard evaluations through vLLM.
948
  ### Open LLM Leaderboard evaluation scores
949
  | | Phi-3-mini-128k-instruct-FP8 | neuralmagic/Phi-3-mini-128k-instruct-FP8<br>(this model) |
950
  | :------------------: | :----------------------: | :------------------------------------------------: |
951
- | arc-c<br>25-shot | 63.65 | 63.31 |
952
- | hellaswag<br>10-shot | 79.76 | 79.44 |
953
- | mmlu<br>5-shot | 68.10 | 68.08 |
954
- | truthfulqa<br>0-shot | 53.97 | 53.76 |
955
- | winogrande<br>5-shot | 73.72 | 72.45 |
956
- | gsm8k<br>5-shot | 75.59 | 72.86 |
957
- | **Average<br>Accuracy** | **69.13** | **68.32** |
958
- | **Recovery** | **100%** | **98.82%** |
 
13
  * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
14
 
15
  Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
16
+ Calibrated with 1 repeats of each token in the tokenizer in random order to achieve ~100% performance recovery on the Open LLM Benchmark evaluations.
17
  Reduces space on disk by ~50%.
18
  Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
19
 
 
34
 
35
  CONTEXT_LENGTH = 4096
36
  NUM_SAMPLES = 512
37
+ NUM_REPEATS = 1
38
 
39
  pretrained_model_dir = MODEL_DIR
40
  tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=CONTEXT_LENGTH)
 
948
  ### Open LLM Leaderboard evaluation scores
949
  | | Phi-3-mini-128k-instruct-FP8 | neuralmagic/Phi-3-mini-128k-instruct-FP8<br>(this model) |
950
  | :------------------: | :----------------------: | :------------------------------------------------: |
951
+ | arc-c<br>25-shot | 63.65 | 64.33 |
952
+ | hellaswag<br>10-shot | 79.76 | 79.61 |
953
+ | mmlu<br>5-shot | 68.10 | 67.78 |
954
+ | truthfulqa<br>0-shot | 53.97 | 52.95 |
955
+ | winogrande<br>5-shot | 73.72 | 73.40 |
956
+ | gsm8k<br>5-shot | 75.59 | 74.22 |
957
+ | **Average<br>Accuracy** | **69.13** | **68.72** |
958
+ | **Recovery** | **100%** | **99.40%** |