RedHatAI
/

Phi-3-mini-128k-instruct-FP8

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

Lin-K76 commited on Jun 29, 2024

Commit

581c229

·

verified ·

1 Parent(s): ca403ec

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -7,12 +7,12 @@ tags:
 # Phi-3-mini-128k-instruct-FP8
 ## Model Overview
-* <h3 style="display: inline;">Model Architecture:</h3> Based on and identical to the Phi-3-mini-128k-instruct-FP8 architecture
 * <h3 style="display: inline;">Model Optimizations:</h3> Weights and activations quantized to FP8
 * <h3 style="display: inline;">Release Date:</h3> June 29, 2024
 * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
-Phi-3-mini-128k-instruct-FP8 quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
 Calibrated with 10 repeats of each token in the tokenizer in random order to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
 Reduces space on disk by ~50%.
 Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).

 # Phi-3-mini-128k-instruct-FP8
 ## Model Overview
+* <h3 style="display: inline;">Model Architecture:</h3> Based on and identical to the Phi-3-mini-128k-instruct architecture
 * <h3 style="display: inline;">Model Optimizations:</h3> Weights and activations quantized to FP8
 * <h3 style="display: inline;">Release Date:</h3> June 29, 2024
 * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
+Phi-3-mini-128k-instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
 Calibrated with 10 repeats of each token in the tokenizer in random order to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
 Reduces space on disk by ~50%.
 Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).