Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,12 @@ tags:
|
|
7 |
# Phi-3-mini-128k-instruct-FP8
|
8 |
|
9 |
## Model Overview
|
10 |
-
* <h3 style="display: inline;">Model Architecture:</h3> Based on and identical to the Phi-3-mini-128k-instruct
|
11 |
* <h3 style="display: inline;">Model Optimizations:</h3> Weights and activations quantized to FP8
|
12 |
* <h3 style="display: inline;">Release Date:</h3> June 29, 2024
|
13 |
* <h3 style="display: inline;">Model Developers:</h3> Neural Magic
|
14 |
|
15 |
-
Phi-3-mini-128k-instruct
|
16 |
Calibrated with 10 repeats of each token in the tokenizer in random order to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
|
17 |
Reduces space on disk by ~50%.
|
18 |
Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
|
|
|
7 |
# Phi-3-mini-128k-instruct-FP8
|
8 |
|
9 |
## Model Overview
|
10 |
+
* <h3 style="display: inline;">Model Architecture:</h3> Based on and identical to the Phi-3-mini-128k-instruct architecture
|
11 |
* <h3 style="display: inline;">Model Optimizations:</h3> Weights and activations quantized to FP8
|
12 |
* <h3 style="display: inline;">Release Date:</h3> June 29, 2024
|
13 |
* <h3 style="display: inline;">Model Developers:</h3> Neural Magic
|
14 |
|
15 |
+
Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
|
16 |
Calibrated with 10 repeats of each token in the tokenizer in random order to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
|
17 |
Reduces space on disk by ~50%.
|
18 |
Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
|