xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 14 days ago • 84
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published 7 days ago • 68