Specific Model Information

Dolphin-Mistral-24B-Venice-Edition_Q8_0.gguf
This model combines Dophin 3.0 and Mistral-24B-Venice-Edition, with 24 billion parameters, using 8-bit quantization.

General Information

What is Quantization? Think of it like image resolution. Imagine you have a super high-resolution photo. It looks fantastic, but it takes up tons of space on your phone. Quantization is like saving that photo at a lower resolution. It is like going from high definition to standard definition. You lose some detail, but the file size gets considerably smaller. In this analogy, our photo is a large language model (LLM), and the space is the space in memory (RAM) and the storage space on disk.

Extremely Important Caveats (Read This!) Keep in mind that this table of estimates and ranges is very generalized. Speed is highly variable, so your mileage may vary depending on hardware, software, the specific model used, and other more detailed variables I have not listed. Have fun, be a computer scientist, try out the different models, make your observations and notes, evaluate them, and come up with your conclusions.

This model is based on the amazing model(s) and work at https://huggingface.co/cognitivecomputations

llama_model_quantize_impl: model size = 44961.58 MB llama_model_quantize_impl: quant size = 23886.58 MB

main: quantize time = 204766.03 ms main: total time = 204766.04 ms

Downloads last month: 37

GGUF

Model size

23.6B params

Architecture

llama

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ProfessorH/Dolphin-Mistral-24B-Venice-Edition_Q8_0.gguf

Base model

mistralai/Mistral-Small-24B-Base-2501

Finetuned

mistralai/Mistral-Small-24B-Instruct-2501

Finetuned

dphn/Dolphin-Mistral-24B-Venice-Edition

Quantized

(43)

this model