hellork commited on
Commit
cf0cd85
·
verified ·
1 Parent(s): 0a4784b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -19,6 +19,8 @@ tags:
19
  - gguf-my-repo
20
  ---
21
 
 
 
22
  # hellork/BlenderLLM-IQ3_XXS-GGUF
23
  This model was converted to GGUF format from [`FreedomIntelligence/BlenderLLM`](https://huggingface.co/FreedomIntelligence/BlenderLLM) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
24
  Refer to the [original model card](https://huggingface.co/FreedomIntelligence/BlenderLLM) for more details on the model.
@@ -30,6 +32,25 @@ Install llama.cpp through brew (works on Mac and Linux)
30
  brew install llama.cpp
31
 
32
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  Invoke the llama.cpp server or the CLI.
34
 
35
  ### CLI:
 
19
  - gguf-my-repo
20
  ---
21
 
22
+ # TESTING...TESTING! The quantization used on this model may reduce quality, but it is hopefully faster, and maybe usable with 4GB VRAM. TESTING...
23
+
24
  # hellork/BlenderLLM-IQ3_XXS-GGUF
25
  This model was converted to GGUF format from [`FreedomIntelligence/BlenderLLM`](https://huggingface.co/FreedomIntelligence/BlenderLLM) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
26
  Refer to the [original model card](https://huggingface.co/FreedomIntelligence/BlenderLLM) for more details on the model.
 
32
  brew install llama.cpp
33
 
34
  ```
35
+
36
+ # Compile to take advantage of `Nvidia CUDA` hardware:
37
+
38
+ ```bash
39
+ git clone https://github.com/ggerganov/llama.cpp.git
40
+ cd llama*
41
+ # look at docs for other hardware builds or to make sure none of this has changed.
42
+
43
+ cmake -B build -DGGML_CUDA=ON
44
+ CMAKE_ARGS="-DGGML_CUDA=on" cmake --build build --config Release # -j6 (optional: use a number less than the number of cores)
45
+
46
+ # If your version of gcc is > 12 and it gives errors, use conda to install gcc-12 and activate it.
47
+ # Run the above cmake commands again.
48
+ # Then run conda deactivate and re-run the last line once more to link the build outside of conda.
49
+
50
+ # Add the -ngl 33 flag to the commands below to take advantage of all the GPU layers.
51
+ # If it uses too much GPU and crashes, use some lower number.
52
+ ```
53
+
54
  Invoke the llama.cpp server or the CLI.
55
 
56
  ### CLI: