Specific Model Information
Dolphin-Mistral-24B-Venice-Edition_Q8_0.ggufThis model combines Dophin 3.0 and Mistral-24B-Venice-Edition, with 24 billion parameters, using 8-bit quantization.
General Information
What is Quantization? Think of it like image resolution. Imagine you have a super high-resolution photo. It looks fantastic, but it takes up tons of space on your phone. Quantization is like saving that photo at a lower resolution. It is like going from high definition to standard definition. You lose some detail, but the file size gets considerably smaller. In this analogy, our photo is a large language model (LLM), and the space is the space in memory (RAM) and the storage space on disk.
Extremely Important Caveats (Read This!) Keep in mind that this table of estimates and ranges is very generalized. Speed is highly variable, so your mileage may vary depending on hardware, software, the specific model used, and other more detailed variables I have not listed. Have fun, be a computer scientist, try out the different models, make your observations and notes, evaluate them, and come up with your conclusions.
This model is based on the amazing model(s) and work at https://huggingface.co/cognitivecomputations
llama_model_quantize_impl: model size = 44961.58 MB llama_model_quantize_impl: quant size = 23886.58 MB
main: quantize time = 204766.03 ms main: total time = 204766.04 ms
- Downloads last month
- 37
8-bit
Model tree for ProfessorH/Dolphin-Mistral-24B-Venice-Edition_Q8_0.gguf
Base model
mistralai/Mistral-Small-24B-Base-2501