Specific Model Information

Dolphin-Mistral-24B-Venice-Edition_Q8_0.gguf
This model combines Dophin 3.0 and Mistral-24B-Venice-Edition, with 24 billion parameters, using 8-bit quantization.

General Information

What is Quantization? Think of it like image resolution. Imagine you have a super high-resolution photo. It looks fantastic, but it takes up tons of space on your phone. Quantization is like saving that photo at a lower resolution. It is like going from high definition to standard definition. You lose some detail, but the file size gets considerably smaller. In this analogy, our photo is a large language model (LLM), and the space is the space in memory (RAM) and the storage space on disk.

image/png

Extremely Important Caveats (Read This!) Keep in mind that this table of estimates and ranges is very generalized. Speed is highly variable, so your mileage may vary depending on hardware, software, the specific model used, and other more detailed variables I have not listed. Have fun, be a computer scientist, try out the different models, make your observations and notes, evaluate them, and come up with your conclusions.

This model is based on the amazing model(s) and work at https://huggingface.co/cognitivecomputations

llama_model_quantize_impl: model size = 44961.58 MB llama_model_quantize_impl: quant size = 23886.58 MB

main: quantize time = 204766.03 ms main: total time = 204766.04 ms

Downloads last month
37
GGUF
Model size
23.6B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ProfessorH/Dolphin-Mistral-24B-Venice-Edition_Q8_0.gguf