V-GameGym: Visual Game Generation for Code Large Language Models Paper • 2509.20136 • Published 14 days ago • 9
V-GameGym: Visual Game Generation for Code Large Language Models Paper • 2509.20136 • Published 14 days ago • 9 • 2
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27 • 25 • 4
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27 • 25 • 4
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27 • 25
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27 • 25 • 4
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 127
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 52
Evaluating and Aligning CodeLLMs on Human Preference Paper • 2412.05210 • Published Dec 6, 2024 • 50 • 2
LongIns: A Challenging Long-context Instruction-based Exam for LLMs Paper • 2406.17588 • Published Jun 25, 2024 • 23