view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 666
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published 29 days ago • 57
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering Paper • 2503.14996 • Published Mar 19 • 3
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Paper • 2410.05077 • Published Oct 7, 2024 • 5
Reward Bench 2 Collection Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated Jun 3 • 14
CLIPPER Collection Models and datasets for CLIPPER: Compression enables long-context synthetic data generation • 6 items • Updated Feb 20 • 5
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS Paper • 2411.19655 • Published Nov 29, 2024 • 20
Minerva LLMs Collection The first family of LLMs pretrained from scratch on Italian. • 6 items • Updated Dec 7, 2024 • 35