Vibevoice 7B GGUF?

#1
by dan9070 - opened

I really appreciate the work you did with the 1.5b model, but is there any chance you are also doing the 7B model as well?

gguf org

can't find the 7b from microsoft; where did you see it

It's because Microsoft instantly removed it.
Someone saved it here. https://huggingface.co/aoi-ot/VibeVoice-Large/

I do worry that the 7b model will have memory issues on my end, even with 64gb of ram and 16gb vram.
I'm on an intel arc a770 and for the last week I've been trying to find methods of running this model fully on my xpu.
Problem is, I can't code. I've tried using LLMs to edit scripts but I simply do not know enough on the terminology of what to ask in order to get it to operate.

I mainly only say this because I've tried myself to quantize the 7b model to Q8, but the dequant takes a very long time and I haven't been able to get anything out of inferencing it.

gguf org

too large; it's more than 10b size in fact; no surprise, since the 1.5b is actually 3b

the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.

any chance we still target a q8 on 7B?

the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.

Cannot. Outputs are garbled for me on XPU.
I instead used sdnq https://github.com/Disty0/sdnq to dynamically convert it to int8

the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.

Cannot. Outputs are garbled for me on XPU.
I instead used sdnq https://github.com/Disty0/sdnq to dynamically convert it to int8

Weird, maybe you've got the wrong attention setting?

No. I had tried both SDPA and eager. Both failed with bnb 4-bit for me.
This is most likely due to bnb being rather new territory on intel arc GPUs. (XPUs)

this so called "7b" is too bulky (~20GB) for tts, that's why microsoft removed it for a reason; try the new updated chatterbox gguf, the overall size less than 1GB, 22-23 languages support, super crazy

Sounds interesting, how can I use the GGUF models with Chatterbox exactly?

just follow the instruction in the model card here; you need gguf-connector first (pip install gguf-connector), then execute ggc c3 in console/terminal and opt the gguf files downloaded in your current directory

I saw the page, but I have no idea what to do or where to apply it. I am not a coder or anything. I have Chatterbox on Pinokio, can I make it work that way? Or via Comfyui?

this so called "7b" is too bulky (~20GB) for tts, that's why microsoft removed it for a reason; try the new updated chatterbox gguf, the overall size less than 1GB, 22-23 languages support, super crazy

Too bulky? I got it running with SDNQ 8-BIT. It fits on my card fully now, and runs at 1.78s/it.

The quality does seem quite a lot higher than it's 1.5b counterpart.

gguf org

file size? how long does it take to generate this wav file

file size? how long does it take to generate this wav file

It's 12x slower than realtime, around there. That 39 second file took around 7-8 minutes.
File size is the same. SDNQ quantizes on loading, but I use around 10-12GB of VRAM.

However I'm on an Arc A770 LE (Intel GPU) with 16gb of VRAM.

gguf org

However I'm on an Arc A770 LE (Intel GPU) with 16gb of VRAM.

your hardware is very good; seems these SDNQ or SVDQ are converting the tensor back to float16 instead of bfloat16, and doesn't require cuda
10-12GB is fp8 or similar 8-bit format since the f16 file is around 20GB; making sense
will look into that; thanks

Sign up or log in to comment