fail
🥲 Failed to load the model
Failed to load model
error loading model: missing tensor 'blk.0.ffn_down_exps.weight'
I'm facing the same issue when running GH200 using the CUDA Deep Learning image. https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda-dl-base
the same error
$ ollama run mixtral:8x7b-ins-v0.1-q8 "hi"
Error: llama runner process has terminated: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'
$ ollama show mixtral:8x7b-ins-v0.1-q8
Model
architecture llama
parameters 46.7B
context length 32768
embedding length 4096
quantization Q8_0
Parameters
stop "[INST]"
stop "[/INST]"
$ ollama --version
ollama version is 0.6.2
same error
llama_model_load: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'
Model mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf (blk.0.ffn_down_exps.weight)
maybe corrupt download? i try to download again then, 3080 ti here. will update if it works
Git LFS Details
SHA256: 9193684683657e90707087bd1ed19fd0b277ab66358d19edeadc26d6fdec4f53
Pointer size: 136 Bytes
Size of remote file: 26.4 GB
Raw pointer file
Git Large File Storage (LFS) replaces large files with text pointers inside Git, while storing the file contents on a remote server. More info.
C:\Users\User>certutil -hashfile "C:\AI\text-generation-webui-main\user_data\models\mixtral.gguf" SHA256
SHA256 hash of C:\AI\text-generation-webui-main\user_data\models\mixtral.gguf:
9193684683657e90707087bd1ed19fd0b277ab66358d19edeadc26d6fdec4f53
CertUtil: -hashfile command completed successfully.
my hash is the same, daym what is wrong with this..
This quant is more than a year old, and there was a breaking change regarding how Llama.cpp handles MoE models about 5 months ago see this post for more details:
https://huggingface.co/posts/bartowski/894091265291588