Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper β’ 2411.10442 β’ Published Nov 15, 2024 β’ 85
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper β’ 2402.03300 β’ Published Feb 5, 2024 β’ 125
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other β’ 24 days ago β’ 623
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others β’ 25 days ago β’ 602
view article Article cocogold: training Marigold for text-grounded segmentation By pcuenq β’ 25 days ago β’ 28
view article Article Efficient MultiModal Data Pipeline By ariG23498 and 4 others β’ 25 days ago β’ 52
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper β’ 2507.01955 β’ Published about 1 month ago β’ 34
view article Article Tiny Agents in Python: a MCP-powered agent in ~70 lines of code By celinah and 3 others β’ May 23 β’ 152
view article Article π€ππ¬π₯οΈπ Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other β’ Jun 21 β’ 66
view article Article Universal Image Segmentation with Mask2Former and OneFormer By nielsr and 2 others β’ Jan 19, 2023 β’ 14
view article Article Common Pitfalls in Sharing Open Source Models on Hugging Face (and How to Dodge Them) By FriendliAI and 2 others β’ Jul 1 β’ 21
view article Article Gemma 3n fully available in the open-source ecosystem! By ariG23498 and 7 others β’ Jun 26 β’ 113
view article Article Groq on Hugging Face Inference Providers π₯ By sbrandeis and 4 others β’ Jun 16 β’ 41
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction Paper β’ 2502.11663 β’ Published Feb 17 β’ 41
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other β’ Jun 3 β’ 70