Submitted by kuznetsoffandrey 130 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models · 5 authors 21
Submitted by wujie10 101 Seedance 1.0: Exploring the Boundaries of Video Generation Models · 44 authors 9
Submitted by Hanyuezhuohua 55 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation · 5 authors 2
Submitted by imryanxu 53 ComfyUI-R1: Exploring Reasoning Models for Workflow Generation · 8 authors 3.22k 4
Submitted by akhaliq 47 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation · 9 authors 2
Submitted by hassid 32 Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation · 3 authors 2
Submitted by LongMountain 23 SeerAttention-R: Sparse Attention Adaptation for Long Reasoning · 15 authors 156 2
Submitted by jy-yuan 18 Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning · 10 authors 35 2
Submitted by Lemoncoke 18 SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner · 9 authors 29 3
Submitted by zhenzhiwang 14 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions · 8 authors 4
Submitted by niveck 14 Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games · 3 authors 8 2
Submitted by WaltonFuture 9 Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning · 7 authors 57 2
Submitted by guqiao 8 SAFE: Multitask Failure Detection for Vision-Language-Action Models · 7 authors 2
Submitted by ashawkey 7 Efficient Part-level 3D Object Generation via Dual Volume Packing · 10 authors 745 2
Submitted by taesiri 7 Hidden in plain sight: VLMs overlook their visual representations · 4 authors 1
Submitted by NikV09 5 UFM: A Simple Path towards Unified Dense Correspondence with Flow · 12 authors 246 2
Submitted by sungwon95 5 Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models · 5 authors 11 2
Submitted by wy1iu 5 Reparameterized LLM Training via Orthogonal Equivalence Transformation · 6 authors 12 2
Submitted by Zory 4 Can Vision Language Models Infer Human Gaze Direction? A Controlled Study · 10 authors 9 2
Submitted by j-morano 3 MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis · 10 authors 17 2
Submitted by SushantGautam 1 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy · 3 authors 0 2
Submitted by fangwu97 1 When to Trust Context: Self-Reflective Debates for Context Reliability · 8 authors 5 2
Submitted by Prakamya - TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games · 6 authors 2
Submitted by TreeForest - A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy · 13 authors 142 2