|
--- |
|
base_model: prithivMLmods/PocketThinker-QwQ-3B-Instruct |
|
datasets: |
|
- amphora/QwQ-LongCoT-130K |
|
- amphora/QwQ-LongCoT-130K-2 |
|
- amphora/verfiable-25k |
|
- amphora/m-math500 |
|
language: |
|
- en |
|
- zh |
|
library_name: transformers |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- Math |
|
- Code |
|
- Thinker |
|
- Reasoning |
|
- 3B |
|
- QwQ |
|
- Mini |
|
- text-generation-inference |
|
- SFT |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/PocketThinker-QwQ-3B-Instruct-Q4_K_S-GGUF |
|
This model was converted to GGUF format from [`prithivMLmods/PocketThinker-QwQ-3B-Instruct`](https://huggingface.co/prithivMLmods/PocketThinker-QwQ-3B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/prithivMLmods/PocketThinker-QwQ-3B-Instruct) for more details on the model. |
|
|
|
--- |
|
PocketThinker-QwQ-3B-Instruct |
|
- |
|
|
|
PocketThinker-QwQ-3B-Instruct is based on the Qwen2.5-3B-Instruct architecture, designed as a lightweight and efficient reasoning |
|
assistant. It serves as the pocket-sized version of QwQ-LCoT-7B-Instruct, optimized for fast inference while maintaining |
|
strong problem-solving and computational capabilities. This model is fine-tuned for enhanced structured reasoning, minimal token wastage, and high-quality technical responses. |
|
|
|
Key Improvements |
|
- |
|
|
|
Optimized for Coding: Specializes in generating structured, efficient code with minimal redundancy for smooth execution. |
|
|
|
Compact yet Powerful: Maintains strong problem-solving capabilities within a smaller 3B parameter architecture, ensuring accessibility on resource-limited devices. |
|
|
|
Advanced Reasoning Capabilities: Excels in algorithmic problem-solving, mathematical reasoning, and structured technical explanations. |
|
|
|
Efficient Memory Utilization: Reduces computational overhead while maintaining high-quality outputs. |
|
|
|
Focused Output Generation: Avoids unnecessary token generation, ensuring concise and relevant responses. |
|
|
|
|
|
Intended Use |
|
- |
|
Code Generation & Optimization: |
|
Supports developers in writing, refining, and optimizing code across multiple programming languages. |
|
|
|
Algorithm & Mathematical Problem Solving: |
|
Delivers precise solutions and structured explanations for complex problems. |
|
|
|
Technical Documentation & Explanation: |
|
Assists in generating well-structured documentation for libraries, APIs, and coding concepts. |
|
|
|
Debugging Assistance: |
|
Helps identify and correct errors in code snippets. |
|
|
|
Educational Support: |
|
Simplifies programming topics for students and learners with clear explanations. |
|
|
|
Structured Data Processing: |
|
Generates structured outputs like JSON, XML, and tables for data science applications. |
|
|
|
Limitations |
|
- |
|
|
|
Hardware Constraints: |
|
Although lighter than larger models, still requires a moderately powerful GPU or TPU for optimal performance. |
|
|
|
Potential Bias in Responses: |
|
Outputs may reflect biases present in training data. |
|
|
|
Limited Creativity: |
|
May generate variable results in non-technical, creative tasks. |
|
|
|
No Real-Time Awareness: |
|
Lacks access to real-world events beyond its training cutoff. |
|
|
|
Error Propagation in Long Responses: |
|
Minor mistakes in early outputs may affect overall coherence in lengthy responses. |
|
|
|
Prompt Sensitivity: |
|
The effectiveness of responses depends on well-structured prompts. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q4_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q4_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q4_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q4_k_s.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q4_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q4_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q4_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q4_k_s.gguf -c 2048 |
|
``` |
|
|