Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 131
view article Article From Files to Chunks: Improving Hugging Face Storage Efficiency Nov 20, 2024 • 59
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 64
Nomic Embed Multimodal Collection Multimodal models allowing you to search over interleaved text, PDFs, charts, and images! • 15 items • Updated 22 days ago • 20
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 143
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 277
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 63
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 135