Submitted by AlexCuadron 59 The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks · 16 authors 2
Submitted by turrf 56 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model · 115 authors 3
Submitted by taesiri 44 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models · 23 authors 46 5
Submitted by yifanzhang114 35 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment · 20 authors 5
Submitted by rinong 21 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation · 4 authors 87 2
Submitted by Shengkun 18 DarwinLM: Evolutionary Structured Pruning of Large Language Models · 5 authors 7
Submitted by deqing 15 FoNE: Precise Single-Token Number Embeddings via Fourier Features · 5 authors 3
Submitted by lukasz-staniszewski 12 Precise Parameter Localization for Textual Generation in Diffusion Models · 5 authors 2
Submitted by abenechehab 9 AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting · 6 authors 2
Submitted by DGurgurov 9 Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages · 4 authors 2
Submitted by Asaf-Yehudai 9 Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models · 6 authors 2
Submitted by xuxw98 6 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding · 6 authors 2
Submitted by SP4595 6 STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning · 7 authors 2
Submitted by nielsr 5 Cluster and Predict Latents Patches for Improved Masked Image Modeling · 5 authors 2
Submitted by z-hb 5 MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers · 6 authors 3 2
Submitted by akhaliq 4 CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages · 10 authors 2
Submitted by cmhungsteve 4 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models · 6 authors 3
Submitted by mjbuehler 3 Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model · 2 authors 2