Submitted by omriav 41 Story2Board: A Training-Free Approach for Expressive Storyboard Generation · 5 authors 15 2
Submitted by weidawang 29 Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery · 9 authors 8
Submitted by RichardQRQ 27 Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation · 5 authors 181 3
Submitted by chengle 22 AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving · 5 authors 561 2
Submitted by UnhurriedDawn 22 Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing · 6 authors 39 3
Submitted by hyc2026 21 Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory · 8 authors 64 1
Submitted by CaraJ 16 Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation · 12 authors 40 2
Submitted by whw199833 14 Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment · 15 authors 2
Submitted by junfeng0288 11 MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models · 8 authors 6 2
Submitted by yanyc 10 Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models · 8 authors 12 2
Submitted by shyamgopal 6 Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models · 5 authors 2
Submitted by Duke-de-Artois 6 IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding · 3 authors 2
Submitted by lingjie23 4 VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models · 6 authors 3 2
Submitted by vshrivas 3 Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning · 6 authors 2
Submitted by vinthony 3 GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors · 9 authors 2
Submitted by vaynexie 3 CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing · 7 authors 4 5
Submitted by mdhaini 2 Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study · 5 authors 0 2
Submitted by Yuqunyang 2 ASM-UNet: Adaptive Scan Mamba Integrating Group Commonalities and Individual Variations for Fine-Grained Segmentation · 9 authors 3 2
Submitted by jackzeng-robotics 2 Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning · 5 authors 16 2
Submitted by JJ-TMT 1 AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance · 3 authors 2 2
Submitted by hallisky - The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage · 10 authors 1
Submitted by abhilekhborah - ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering · 4 authors 2