Vibevoice 7B GGUF?

by dan9070 - opened 24 days ago

24 days ago

I really appreciate the work you did with the 1.5b model, but is there any chance you are also doing the 7B model as well?

calcuis

gguf org 24 days ago

can't find the 7b from microsoft; where did you see it

dan9070

24 days ago

It's because Microsoft instantly removed it.
Someone saved it here. https://huggingface.co/aoi-ot/VibeVoice-Large/

dan9070

23 days ago

•

edited 23 days ago

I do worry that the 7b model will have memory issues on my end, even with 64gb of ram and 16gb vram.
I'm on an intel arc a770 and for the last week I've been trying to find methods of running this model fully on my xpu.
Problem is, I can't code. I've tried using LLMs to edit scripts but I simply do not know enough on the terminology of what to ask in order to get it to operate.

I mainly only say this because I've tried myself to quantize the 7b model to Q8, but the dequant takes a very long time and I haven't been able to get anything out of inferencing it.

calcuis

gguf org 22 days ago

too large; it's more than 10b size in fact; no surprise, since the 1.5b is actually 3b

ca231321

21 days ago

the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.

R0cky0

19 days ago

any chance we still target a q8 on 7B?

dan9070

19 days ago

the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.

Cannot. Outputs are garbled for me on XPU.
I instead used sdnq https://github.com/Disty0/sdnq to dynamically convert it to int8

ca231321

19 days ago

the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.

Cannot. Outputs are garbled for me on XPU.
I instead used sdnq https://github.com/Disty0/sdnq to dynamically convert it to int8

Weird, maybe you've got the wrong attention setting?

dan9070

18 days ago

No. I had tried both SDPA and eager. Both failed with bnb 4-bit for me.
This is most likely due to bnb being rather new territory on intel arc GPUs. (XPUs)

calcuis

gguf org 18 days ago

•

edited 18 days ago

this so called "7b" is too bulky (~20GB) for tts, that's why microsoft removed it for a reason; try the new updated chatterbox gguf, the overall size less than 1GB, 22-23 languages support, super crazy

ca231321

18 days ago

Sounds interesting, how can I use the GGUF models with Chatterbox exactly?

calcuis

gguf org 17 days ago

•

edited 17 days ago

just follow the instruction in the model card here; you need gguf-connector first (pip install gguf-connector), then execute ggc c3 in console/terminal and opt the gguf files downloaded in your current directory

ca231321

17 days ago

I saw the page, but I have no idea what to do or where to apply it. I am not a coder or anything. I have Chatterbox on Pinokio, can I make it work that way? Or via Comfyui?

dan9070

8 days ago

this so called "7b" is too bulky (~20GB) for tts, that's why microsoft removed it for a reason; try the new updated chatterbox gguf, the overall size less than 1GB, 22-23 languages support, super crazy

Too bulky? I got it running with SDNQ 8-BIT. It fits on my card fully now, and runs at 1.78s/it.

dan9070

8 days ago

dan9070

8 days ago

The quality does seem quite a lot higher than it's 1.5b counterpart.

calcuis

gguf org 8 days ago

file size? how long does it take to generate this wav file

dan9070

8 days ago

file size? how long does it take to generate this wav file

It's 12x slower than realtime, around there. That 39 second file took around 7-8 minutes.
File size is the same. SDNQ quantizes on loading, but I use around 10-12GB of VRAM.

dan9070

8 days ago

However I'm on an Arc A770 LE (Intel GPU) with 16gb of VRAM.

calcuis

gguf org 8 days ago

However I'm on an Arc A770 LE (Intel GPU) with 16gb of VRAM.

your hardware is very good; seems these SDNQ or SVDQ are converting the tensor back to float16 instead of bfloat16, and doesn't require cuda
10-12GB is fp8 or similar 8-bit format since the f16 file is around 20GB; making sense
will look into that; thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment