jerryzh168 commited on
Commit
038b223
·
verified ·
1 Parent(s): b861310

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ tags:
17
  - conversational
18
  ---
19
 
20
- This repository hosts the **Phi4-mini-instruct** model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) using int4 weight-only quantization and the [hqq](https://mobiusml.github.io/hqq_blog/) algorithm. This work is brought to you by the PyTorch team. This model can be used directly or served using [vLLM](https://docs.vllm.ai/en/latest/) for significant VRAM reduction and speedup on A100 GPUs.
21
 
22
  ---
23
 
 
17
  - conversational
18
  ---
19
 
20
+ This repository hosts the **Phi4-mini-instruct** model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) using int4 weight-only quantization and the [hqq](https://mobiusml.github.io/hqq_blog/) algorithm. This work is brought to you by the PyTorch team. This model can be used directly or served using [vLLM](https://docs.vllm.ai/en/latest/) for 67% VRAM reduction (2.98 GB needed) and speedup on A100 GPUs.
21
 
22
  ---
23