8bit Quantization

#5
by Leobuilt - opened

Looks like this model is BF16. All together it needs ~36 gigs of vram to run. Maybe with 8bit weights and shrinking the context a little it could fit in 24 gigs of vram?

Sign up or log in to comment