mistralai/Mistral-Small-3.1-24B-Instruct-2503 · FP8 Dynamic/W8A16 Quants Please

rjmehta

Mar 27

FP8 Dynamic/W8A16 Quants Please

mgoin

Mistral AI_ org 25 days ago

•

edited 25 days ago

You can use this model for FP8 with the latest vLLM nightly https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

stelterlab

23 days ago

The chat template is br0ken in the nm-testing repo. See also https://github.com/vllm-project/vllm/pull/15505#issuecomment-2768873223.

mgoin

Mistral AI_ org 21 days ago

It has been updated now, thanks!

RayHuang1991

14 days ago

Thanks, it seems that in the nm-testing repo, one can only use the default setting to host vllm 0.8.3, "vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic --tool-call-parser mistral --enable-auto-tool-choice". However, "--tokenizer_mode mistral --config_format mistral --load_format mistral" are not allowed since the params.json was missing in this version . The difference is that nm-testing uses the transformer-based tokenizer and Mistral-Small-3.1-24B-Instruct-2503 uses V7-Tekken. Will there be significant performance different in term of the function calling for the two different version?