Having Trouble playing quantize with llama.cpp

#8
by bobchenyx - opened

Thanks for all the amazing works at the very beginning!

I pulled BF16 and imatrix_unsloth.dat form this unsloth/DeepSeek-V3-0324-GGUF-UD

and tried to play around with llama.cpp quantizations.

However I was running into issues below tensor cols 128 x 512 are not divisible by 256

================================ Have weights data with 720 entries
[   1/1086]                        output.weight - [ 7168, 129280,     1,     1], type =   bf16, 
====== llama_model_quantize_impl: did not find weights for output.weight
converting to q8_0 .. load_imatrix: imatrix dataset='unsloth_calibration_DeepSeek-V3-0324.txt'
load_imatrix: loaded 720 importance matrix entries from /home/user1/workspace/llm-work/unsloth/DeepSeek-V3-0324-GGUF-UD/imatrix_unsloth.dat computed on 60 chunks
prepare_imatrix: have 720 importance matrix entries
size =  1767.50 MiB ->   938.98 MiB
[   2/1086]                   output_norm.weight - [ 7168,     1,     1,     1], type =    f32, size =    0.027 MB
[   3/1086]                    token_embd.weight - [ 7168, 129280,     1,     1], type =   bf16, 
====== llama_model_quantize_impl: did not find weights for token_embd.weight
converting to q8_0 .. size =  1767.50 MiB ->   938.98 MiB
[   4/1086]                blk.0.attn_k_b.weight - [  128,   512,   128,     1], type =   bf16, 

llama_tensor_get_type : tensor cols 128 x 512 are not divisible by 256, required for iq1_m - using fallback quantization iq4_nl

====== llama_model_quantize_impl: imatrix size 128 is different from tensor size 16384 for blk.0.attn_k_b.weight
llama_model_quantize: failed to quantize: imatrix size 128 is different from tensor size 16384 for blk.0.attn_k_b.weight
main: failed to quantize model from '/home/user1/workspace/llm-work/unsloth/DeepSeek-V3-0324-GGUF-UD/BF16/DeepSeek-V3-0324-BF16-00001-of-00030.gguf'

Would like to kindly ask if this is llama.cpp issue or if it's me not using things correctly ?

here's my command for reference

build/bin/llama-quantize \
    --imatrix unsloth/DeepSeek-V3-0324-GGUF-UD/imatrix_unsloth.dat \
    --token-embedding-type Q8_0 \
    --output-tensor-type Q8_0 \
    unsloth/DeepSeek-V3-0324-GGUF-UD/BF16/DeepSeek-V3-0324-BF16-00001-of-00030.gguf \
    DeepSeek-V3-0324-IQ1_M/DeepSeek-V3-0324-IQ1_M.gguf \
    IQ1_M \
    48 2>&1 | tee DeepSeek-V3-0324-IQ1_M.log
bobchenyx changed discussion status to closed

Sign up or log in to comment