EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14 • 36
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published about 1 month ago • 19
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17 • 19
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published about 1 month ago • 63