Having Trouble playing quantize with llama.cpp
#8
by
bobchenyx
- opened
Thanks for all the amazing works at the very beginning!
I pulled BF16
and imatrix_unsloth.dat
form this unsloth/DeepSeek-V3-0324-GGUF-UD
and tried to play around with llama.cpp quantizations.
However I was running into issues below tensor cols 128 x 512 are not divisible by 256
================================ Have weights data with 720 entries
[ 1/1086] output.weight - [ 7168, 129280, 1, 1], type = bf16,
====== llama_model_quantize_impl: did not find weights for output.weight
converting to q8_0 .. load_imatrix: imatrix dataset='unsloth_calibration_DeepSeek-V3-0324.txt'
load_imatrix: loaded 720 importance matrix entries from /home/user1/workspace/llm-work/unsloth/DeepSeek-V3-0324-GGUF-UD/imatrix_unsloth.dat computed on 60 chunks
prepare_imatrix: have 720 importance matrix entries
size = 1767.50 MiB -> 938.98 MiB
[ 2/1086] output_norm.weight - [ 7168, 1, 1, 1], type = f32, size = 0.027 MB
[ 3/1086] token_embd.weight - [ 7168, 129280, 1, 1], type = bf16,
====== llama_model_quantize_impl: did not find weights for token_embd.weight
converting to q8_0 .. size = 1767.50 MiB -> 938.98 MiB
[ 4/1086] blk.0.attn_k_b.weight - [ 128, 512, 128, 1], type = bf16,
llama_tensor_get_type : tensor cols 128 x 512 are not divisible by 256, required for iq1_m - using fallback quantization iq4_nl
====== llama_model_quantize_impl: imatrix size 128 is different from tensor size 16384 for blk.0.attn_k_b.weight
llama_model_quantize: failed to quantize: imatrix size 128 is different from tensor size 16384 for blk.0.attn_k_b.weight
main: failed to quantize model from '/home/user1/workspace/llm-work/unsloth/DeepSeek-V3-0324-GGUF-UD/BF16/DeepSeek-V3-0324-BF16-00001-of-00030.gguf'
Would like to kindly ask if this is llama.cpp
issue or if it's me not using things correctly ?
here's my command for reference
build/bin/llama-quantize \
--imatrix unsloth/DeepSeek-V3-0324-GGUF-UD/imatrix_unsloth.dat \
--token-embedding-type Q8_0 \
--output-tensor-type Q8_0 \
unsloth/DeepSeek-V3-0324-GGUF-UD/BF16/DeepSeek-V3-0324-BF16-00001-of-00030.gguf \
DeepSeek-V3-0324-IQ1_M/DeepSeek-V3-0324-IQ1_M.gguf \
IQ1_M \
48 2>&1 | tee DeepSeek-V3-0324-IQ1_M.log
bobchenyx
changed discussion status to
closed