|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- mistralai/Mistral-Small-3.1-24B-Base-2503 |
|
--- |
|
|
|
# Model Card for Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 |
|
|
|
Building upon Mistral Small 3 (2501) and Mistral Small 3.1 (2503), Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 introduces cutting-edge 6-bit quantization technology to enhance efficiency and reduce memory usage without sacrificing performance. This model supports long context capabilities up to 128k tokens and maintains top-tier performance in both text and vision tasks with 24 billion parameters. |
|
|
|
This model is an instruction-finetuned and 6-bit quantized version of: Mistral-Small-3.1-24B-Base-2503. |
|
|
|
# CPP Agent(Usage): Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 |
|
|
|
https://huggingface.co/spaces/ginigen/Private-BitSix-Mistral |
|
|
|
|
|
# Key Features |
|
|
|
6-Bit Quantization: Improved memory efficiency, allowing the model to run on lower-end hardware without compromising quality. |
|
|
|
Knowledge-Dense Architecture: Fits within a single RTX 4090 or a 32GB RAM MacBook once quantized. |
|
|
|
Enhanced Long Context Understanding: Supports up to 128k tokens, providing superior performance for extended documents. |
|
|
|
State-of-the-Art Vision Understanding: Optimized for tasks involving both text and visual comprehension. |
|
|
|
# Ideal Use Cases |
|
|
|
Fast-Response Conversational Agents |
|
|
|
Low-Latency Function Calling |
|
|
|
Subject Matter Experts via Fine-Tuning |
|
|
|
Local Inference for Hobbyists and Organizations Handling Sensitive Data |
|
|
|
Programming and Math Reasoning |
|
|
|
Long Document Understanding |
|
|
|
Visual Understanding |
|
|
|
# Deployment |
|
|
|
This model can be deployed locally with 6-bit quantization, ensuring both high performance and efficiency on compatible hardware. |
|
|
|
# Key Features |
|
|
|
Vision: Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text. |
|
|
|
Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi. |
|
|
|
Agent-Centric: Offers best-in-class agentic capabilities with native function calling and JSON outputting. |
|
|
|
Advanced Reasoning: State-of-the-art conversational and reasoning capabilities. |
|
|
|
Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. |
|
|
|
Context Window: A 128k context window. |
|
|
|
System Prompt: Maintains strong adherence and support for system prompts. |
|
|
|
Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. |
|
|
|
# GGUF Format Conversion |
|
|
|
This model was converted to GGUF format from Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. |
|
|
|
Use with llama.cpp |
|
|
|
Install llama.cpp through brew (works on Mac and Linux): |
|
|
|
```bash |
|
brew install llama.cpp |
|
``` |
|
|
|
Invoke the llama.cpp server or the CLI. |
|
|
|
CLI: |
|
|
|
```bash |
|
llama-cli --hf-repo ginigen/Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 --hf-file private-bitsix-mistral-small-3.1-24b-instruct-2503.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
Server: |
|
|
|
```bash |
|
llama-server --hf-repo openfree/Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 --hf-file private-bitsix-mistral-small-3.1-24b-instruct-2503.gguf -c 2048 |
|
``` |
|
|
|
# Setup Instructions |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
|
|
```bash |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (e.g., LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
|
|
```bash |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
```bash |
|
./llama-cli --hf-repo openfree/Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503 --hf-file private-bitsix-mistral-small-3.1-24b-instruct-2503.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|