Vibevoice 7B GGUF?
I really appreciate the work you did with the 1.5b model, but is there any chance you are also doing the 7B model as well?
can't find the 7b from microsoft; where did you see it
It's because Microsoft instantly removed it.
Someone saved it here. https://huggingface.co/aoi-ot/VibeVoice-Large/
I do worry that the 7b model will have memory issues on my end, even with 64gb of ram and 16gb vram.
I'm on an intel arc a770 and for the last week I've been trying to find methods of running this model fully on my xpu.
Problem is, I can't code. I've tried using LLMs to edit scripts but I simply do not know enough on the terminology of what to ask in order to get it to operate.
I mainly only say this because I've tried myself to quantize the 7b model to Q8, but the dequant takes a very long time and I haven't been able to get anything out of inferencing it.
too large; it's more than 10b size in fact; no surprise, since the 1.5b is actually 3b
the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.
any chance we still target a q8 on 7B?
the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.
Cannot. Outputs are garbled for me on XPU.
I instead used sdnq https://github.com/Disty0/sdnq to dynamically convert it to int8
the 4bit lowvram version of 7B from devparker works pretty well, maybe try that one.
Cannot. Outputs are garbled for me on XPU.
I instead used sdnq https://github.com/Disty0/sdnq to dynamically convert it to int8
Weird, maybe you've got the wrong attention setting?
No. I had tried both SDPA and eager. Both failed with bnb 4-bit for me.
This is most likely due to bnb being rather new territory on intel arc GPUs. (XPUs)
this so called "7b" is too bulky (~20GB) for tts, that's why microsoft removed it for a reason; try the new updated chatterbox gguf, the overall size less than 1GB, 22-23 languages support, super crazy
Sounds interesting, how can I use the GGUF models with Chatterbox exactly?
just follow the instruction in the model card here; you need gguf-connector first (pip install gguf-connector), then execute ggc c3
in console/terminal and opt the gguf files downloaded in your current directory
I saw the page, but I have no idea what to do or where to apply it. I am not a coder or anything. I have Chatterbox on Pinokio, can I make it work that way? Or via Comfyui?
this so called "7b" is too bulky (~20GB) for tts, that's why microsoft removed it for a reason; try the new updated chatterbox gguf, the overall size less than 1GB, 22-23 languages support, super crazy
Too bulky? I got it running with SDNQ 8-BIT. It fits on my card fully now, and runs at 1.78s/it.
The quality does seem quite a lot higher than it's 1.5b counterpart.
file size? how long does it take to generate this wav file
file size? how long does it take to generate this wav file
It's 12x slower than realtime, around there. That 39 second file took around 7-8 minutes.
File size is the same. SDNQ quantizes on loading, but I use around 10-12GB of VRAM.
However I'm on an Arc A770 LE (Intel GPU) with 16gb of VRAM.
However I'm on an Arc A770 LE (Intel GPU) with 16gb of VRAM.
your hardware is very good; seems these SDNQ or SVDQ are converting the tensor back to float16 instead of bfloat16, and doesn't require cuda
10-12GB is fp8 or similar 8-bit format since the f16 file is around 20GB; making sense
will look into that; thanks