ibm-granite-docling-258M-GGUF

This is the GGUF version of the ibm-granite/granite-docling-258M model.

Model Information

  • Model Name: granite-docling-258M
  • Base Model: ibm-granite/granite-docling-258M
  • License: Apache-2.0
  • Pipeline Tag: image-text-to-text
  • Language: English
  • Model Size: 258M
  • Model Format: GGUF

Description

Granite Docling is a family of instruction-tuned models designed for document understanding tasks. These models are fine-tuned on a diverse set of tasks including document classification, information extraction, and question answering. The models are optimized for performance on document-centric tasks and can handle a variety of document formats and layouts.

Usage

You need this version of the llama.cpp to run the model.

Run with docker:

docker run -p 8080:8080 ghcr.io/danchev/llama.cpp:docling \
  --server \
  -hf danchev/ibm-granite-docling-258M-GGUF \
  --host 0.0.0.0 \
  --port 8080

Build from source:

git clone git@github.com:gabe-l-hart/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j $(nproc)

Once you have llama.cpp set up, you can use the following command to run the model:

./build/bin/llama-server -hf danchev/ibm-granite-docling-258M-GGUF

Example Request

You can then send requests to the server using curl. Here is an example request:

curl -X POST "http://localhost:8080/v1/chat/completions" \
    -H "Content-Type: application/json" \
    --data '{
        "model": "ibm-granite/granite-docling-258M",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Describe this image in one sentence."
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                        }
                    }
                ]
            }
        ]
    }'
Downloads last month
1,358
GGUF
Model size
164M params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for danchev/ibm-granite-docling-258M-GGUF

Quantized
(5)
this model