How to input image using llama-cpp-python

#1
by Michelangiolo - opened
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="mradermacher/NuMarkdown-8B-Thinking-GGUF",
    filename="NuMarkdown-8B-Thinking.Q4_K_M.gguf",
)

messages = [{
"role": "user",
    "content": [
        {"type": "image", "image": "https://blog.lipsumhub.com/wp-content/uploads/2024/07/lorem-ipsum-meaning-in-english-lipsumhub.jpg"},
        {"type" : "text", "text": "Describe the image"}
    ],
}]

response = llm.create_chat_completion(
    messages = messages
)
response 

Hi, I am initiating the model using llama_cpp, however, there is no reference on how to input the image to transform into markdown.
I am using the following documentation: https://huggingface.co/docs/transformers/en/tasks/image_text_to_text
The following code is not working (it says the image is BLANK). I was wondering if you could reference the correct code.

Thank you

For the model to have vision capabilities you not only need to specify the GGUF containing the LLM stack but also the mmproj file containing the vision stack which for this model is either NuMarkdown-8B-Thinking.mmproj-Q8_0.gguf or NuMarkdown-8B-Thinking.mmproj-f16.gguf

Sign up or log in to comment