Introduction
Use https://github.com/im0qianqian/llama.cpp to quantize.
For model inference, please download our release package from this url https://github.com/im0qianqian/llama.cpp/releases .
Quick start
# Use a local model file
llama-cli -m my_model.gguf
# Launch OpenAI-compatible API server
llama-server -m my_model.gguf
Demo
PR
Let's look forward to the following PR being merged:
- Downloads last month
- 1,696
Hardware compatibility
Log In
to view the estimation
2-bit
4-bit
6-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for inclusionAI/Ling-flash-2.0-GGUF
Base model
inclusionAI/Ling-flash-base-2.0
Finetuned
inclusionAI/Ling-flash-2.0