Update README.md
Browse files
README.md
CHANGED
@@ -55,21 +55,21 @@ cp llama.cpp/build/bin/llama-* llama.cpp
|
|
55 |
|
56 |
from huggingface_hub import snapshot_download
|
57 |
snapshot_download(
|
58 |
-
repo_id = "unsloth/
|
59 |
-
local_dir = "
|
60 |
-
allow_patterns = ["*
|
61 |
)
|
62 |
```
|
63 |
5. Example with Q4_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
|
64 |
```bash
|
65 |
./llama.cpp/llama-cli \
|
66 |
-
--model
|
67 |
--cache-type-k q4_0 \
|
68 |
--threads 12 -no-cnv --prio 2 \
|
69 |
--temp 0.6 \
|
70 |
--ctx-size 8192 \
|
71 |
--seed 3407 \
|
72 |
-
--prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant
|
73 |
```
|
74 |
Example output:
|
75 |
|
@@ -85,19 +85,19 @@ snapshot_download(
|
|
85 |
6. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
|
86 |
```bash
|
87 |
./llama.cpp/llama-cli \
|
88 |
-
--model
|
89 |
--cache-type-k q4_0 \
|
90 |
--threads 12 -no-cnv --prio 2 \
|
91 |
--n-gpu-layers 7 \
|
92 |
--temp 0.6 \
|
93 |
--ctx-size 8192 \
|
94 |
--seed 3407 \
|
95 |
-
--prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant
|
96 |
```
|
97 |
7. If you want to merge the weights together, use this script:
|
98 |
```
|
99 |
./llama.cpp/llama-gguf-split --merge \
|
100 |
-
|
101 |
merged_file.gguf
|
102 |
```
|
103 |
|
|
|
55 |
|
56 |
from huggingface_hub import snapshot_download
|
57 |
snapshot_download(
|
58 |
+
repo_id = "unsloth/r1-1776-GGUF",
|
59 |
+
local_dir = "r1-1776-GGU",
|
60 |
+
allow_patterns = ["*Q4_K_M*"], # Select quant type Q4_K_M for 4.5bit
|
61 |
)
|
62 |
```
|
63 |
5. Example with Q4_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
|
64 |
```bash
|
65 |
./llama.cpp/llama-cli \
|
66 |
+
--model r1-1776-GGUF/Q4_K_M/r1-1776-Q4_K_M-00001-of-00009.gguf \
|
67 |
--cache-type-k q4_0 \
|
68 |
--threads 12 -no-cnv --prio 2 \
|
69 |
--temp 0.6 \
|
70 |
--ctx-size 8192 \
|
71 |
--seed 3407 \
|
72 |
+
--prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|><think>\n"
|
73 |
```
|
74 |
Example output:
|
75 |
|
|
|
85 |
6. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
|
86 |
```bash
|
87 |
./llama.cpp/llama-cli \
|
88 |
+
--model r1-1776-GGUF/Q4_K_M/r1-1776-Q4_K_M-00001-of-00009.gguf \
|
89 |
--cache-type-k q4_0 \
|
90 |
--threads 12 -no-cnv --prio 2 \
|
91 |
--n-gpu-layers 7 \
|
92 |
--temp 0.6 \
|
93 |
--ctx-size 8192 \
|
94 |
--seed 3407 \
|
95 |
+
--prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|><think>\n"
|
96 |
```
|
97 |
7. If you want to merge the weights together, use this script:
|
98 |
```
|
99 |
./llama.cpp/llama-gguf-split --merge \
|
100 |
+
r1-1776-GGUF/Q4_K_M/r1-1776-Q4_K_M-00001-of-00009.gguf \
|
101 |
merged_file.gguf
|
102 |
```
|
103 |
|