README.md · DevQuasar/nvidia.Llama-3_1-Nemotron-Ultra-253B-v1-GGUF at main

metadata

base_model:
  - nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
pipeline_tag: text-generation

Big thanks to ymcki for updating the llama.cpp code to support the 'dummy' layers. Use the llama.cpp branch from this PR: https://github.com/ggml-org/llama.cpp/pull/12843 if it hasn't been merged yet.

Note the imatrix data used for the IQ quants has been produced from the Q4 quant!

'Make knowledge free for everyone'

Quantized version of: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1