wikeeyang
/

Nexus-GenV2-nf4-fp8

4-bit precision

Model card Files Files and versions

wikeeyang commited on Jul 22

Commit

8c75e73

·

verified ·

1 Parent(s): f0e6dc0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pipeline_tag: any-to-any
 ---
 ===================================================================================
-本模型是阿里魔搭 https://huggingface.co/modelscope/Nexus-GenV2 官方模型的量化版本，其中 Qwen-VL 部分采用 NF4 量化，微调 generation_decoder 和 edit_decoder 部分采用 float8_e4m3fn 量化，用户可采用官方代码，经少许调整模型加载方式的代码调整，即可在无需量化的情况下直接加载本模型进行推理，下载流量和硬盘占用空间大大减少。
 This model is a quantized version of the official Ali ModelScope https://huggingface.co/modelscope/Nexus-GenV2. The Qwen-VL part uses NF4 quantization, and the generation_decoder and edit_decoder fine-tuned parts using float8_e4m3fn quantization. Users can through the official code and adjustment a little for model loading method to directly use this model for inference without quantization, which significantly reduces download traffic and disk space usage.

 ---
 ===================================================================================
+本模型是阿里魔搭 https://huggingface.co/modelscope/Nexus-GenV2 官方模型的量化版本，其中 Qwen-VL 部分采用 NF4 量化，微调 generation_decoder 和 edit_decoder 部分采用 float8_e4m3fn 量化，用户可采用官方代码，稍微调整一下模型加载方式的代码，即可在无需量化的情况下直接加载本模型进行推理，大大减少模型下载流量和硬盘占用空间。
 This model is a quantized version of the official Ali ModelScope https://huggingface.co/modelscope/Nexus-GenV2. The Qwen-VL part uses NF4 quantization, and the generation_decoder and edit_decoder fine-tuned parts using float8_e4m3fn quantization. Users can through the official code and adjustment a little for model loading method to directly use this model for inference without quantization, which significantly reduces download traffic and disk space usage.