BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Paper • 2510.00438 • Published 8 days ago • 2
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published 11 days ago • 39
Advancing Speech Understanding in Speech-Aware Language Models with GRPO Paper • 2509.16990 • Published 18 days ago • 17
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27 • 25
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28 • 32
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes Paper • 2509.06266 • Published Sep 8 • 10
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published 28 days ago • 223
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published 21 days ago • 52
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19 • 26
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published May 1 • 26
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1 • 28
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published May 14 • 36
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3 • 39
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published May 22 • 45