How to input image using llama-cpp-python
#1
by
Michelangiolo
- opened
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="mradermacher/NuMarkdown-8B-Thinking-GGUF",
filename="NuMarkdown-8B-Thinking.Q4_K_M.gguf",
)
messages = [{
"role": "user",
"content": [
{"type": "image", "image": "https://blog.lipsumhub.com/wp-content/uploads/2024/07/lorem-ipsum-meaning-in-english-lipsumhub.jpg"},
{"type" : "text", "text": "Describe the image"}
],
}]
response = llm.create_chat_completion(
messages = messages
)
response
Hi, I am initiating the model using llama_cpp, however, there is no reference on how to input the image to transform into markdown.
I am using the following documentation: https://huggingface.co/docs/transformers/en/tasks/image_text_to_text
The following code is not working (it says the image is BLANK). I was wondering if you could reference the correct code.
Thank you
For the model to have vision capabilities you not only need to specify the GGUF containing the LLM stack but also the mmproj file containing the vision stack which for this model is either NuMarkdown-8B-Thinking.mmproj-Q8_0.gguf or NuMarkdown-8B-Thinking.mmproj-f16.gguf