GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning Paper • 2509.17437 • Published 6 days ago • 17
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering Paper • 2509.17396 • Published 6 days ago • 18
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Paper • 2509.16941 • Published 7 days ago • 19
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions Paper • 2509.17177 • Published 7 days ago • 13