danielhanchen commited on
Commit
8a7a0c8
·
verified ·
1 Parent(s): 0c4407f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -55,21 +55,21 @@ cp llama.cpp/build/bin/llama-* llama.cpp
55
 
56
  from huggingface_hub import snapshot_download
57
  snapshot_download(
58
- repo_id = "unsloth/DeepSeek-R1-GGUF",
59
- local_dir = "DeepSeek-R1-GGUF",
60
- allow_patterns = ["*UD-IQ1_S*"], # Select quant type UD-IQ1_S for 1.58bit
61
  )
62
  ```
63
  5. Example with Q4_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
64
  ```bash
65
  ./llama.cpp/llama-cli \
66
- --model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
67
  --cache-type-k q4_0 \
68
  --threads 12 -no-cnv --prio 2 \
69
  --temp 0.6 \
70
  --ctx-size 8192 \
71
  --seed 3407 \
72
- --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>"
73
  ```
74
  Example output:
75
 
@@ -85,19 +85,19 @@ snapshot_download(
85
  6. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
86
  ```bash
87
  ./llama.cpp/llama-cli \
88
- --model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
89
  --cache-type-k q4_0 \
90
  --threads 12 -no-cnv --prio 2 \
91
  --n-gpu-layers 7 \
92
  --temp 0.6 \
93
  --ctx-size 8192 \
94
  --seed 3407 \
95
- --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>"
96
  ```
97
  7. If you want to merge the weights together, use this script:
98
  ```
99
  ./llama.cpp/llama-gguf-split --merge \
100
- DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
101
  merged_file.gguf
102
  ```
103
 
 
55
 
56
  from huggingface_hub import snapshot_download
57
  snapshot_download(
58
+ repo_id = "unsloth/r1-1776-GGUF",
59
+ local_dir = "r1-1776-GGU",
60
+ allow_patterns = ["*Q4_K_M*"], # Select quant type Q4_K_M for 4.5bit
61
  )
62
  ```
63
  5. Example with Q4_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
64
  ```bash
65
  ./llama.cpp/llama-cli \
66
+ --model r1-1776-GGUF/Q4_K_M/r1-1776-Q4_K_M-00001-of-00009.gguf \
67
  --cache-type-k q4_0 \
68
  --threads 12 -no-cnv --prio 2 \
69
  --temp 0.6 \
70
  --ctx-size 8192 \
71
  --seed 3407 \
72
+ --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|><think>\n"
73
  ```
74
  Example output:
75
 
 
85
  6. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
86
  ```bash
87
  ./llama.cpp/llama-cli \
88
+ --model r1-1776-GGUF/Q4_K_M/r1-1776-Q4_K_M-00001-of-00009.gguf \
89
  --cache-type-k q4_0 \
90
  --threads 12 -no-cnv --prio 2 \
91
  --n-gpu-layers 7 \
92
  --temp 0.6 \
93
  --ctx-size 8192 \
94
  --seed 3407 \
95
+ --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|><think>\n"
96
  ```
97
  7. If you want to merge the weights together, use this script:
98
  ```
99
  ./llama.cpp/llama-gguf-split --merge \
100
+ r1-1776-GGUF/Q4_K_M/r1-1776-Q4_K_M-00001-of-00009.gguf \
101
  merged_file.gguf
102
  ```
103