ds4sd
/

SmolDocling-256M-preview

Image-Text-to-Text

Model card Files Files and versions Community

asnassar commited on Feb 13

Commit

848bd04

·

verified ·

1 Parent(s): ee67c58

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ SmolDocling is a multimodal Image-Text-to-Text model designed for efficient docu
 - 📝 **Caption Correspondence** – Links captions to relevant images and figures.
 - 📜 **List Grouping** – Organizes and structures list elements correctly.
 - 📄 **Full-Page Conversion** – Processes entire pages for comprehensive document transformation.
-- 📂 **General Document Processing** – Optimized for non-scientific documents.
 - 🔄 **Seamless Docling Integration** – Import into **Docling** and export in multiple formats.
 - 📚 **Multi-Page & Full Document Conversion** – *Coming soon!* 🚧
@@ -33,7 +33,7 @@ SmolDocling is a multimodal Image-Text-to-Text model designed for efficient docu
 **Demo [optional]:** [More Information Needed]
-## Model Summary
 - **Developed by:** Docling Team
 - **Model type:** Multi-modal model (image+text)
@@ -42,12 +42,11 @@ SmolDocling is a multimodal Image-Text-to-Text model designed for efficient docu
 - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
-### How to get started
 You can use transformers or docling to perform inference:
-# Transformers:
 ```python
 import torch

 - 📝 **Caption Correspondence** – Links captions to relevant images and figures.
 - 📜 **List Grouping** – Organizes and structures list elements correctly.
 - 📄 **Full-Page Conversion** – Processes entire pages for comprehensive document transformation.
+- 📂 **General Document Processing** – Trained for non-scientific documents and scientific.
 - 🔄 **Seamless Docling Integration** – Import into **Docling** and export in multiple formats.
 - 📚 **Multi-Page & Full Document Conversion** – *Coming soon!* 🚧
 **Demo [optional]:** [More Information Needed]
+#### Model Summary
 - **Developed by:** Docling Team
 - **Model type:** Multi-modal model (image+text)
 - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
+## How to get started
 You can use transformers or docling to perform inference:
+#### Transformers:
 ```python
 import torch