Submitted by Juanxi 87 MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization · 11 authors 7
Submitted by Howe666 66 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction · 5 authors 2
Submitted by akhaliq 65 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance · 6 authors 7
Submitted by hanyang-21 40 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step · 4 authors 2
Submitted by wenhu 40 ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations · 10 authors 2
Submitted by 8ruceLi 39 Towards Physically Plausible Video Generation via VLM Planning · 11 authors 3
Submitted by akhaliq 24 Articulated Kinematics Distillation from Video Diffusion Models · 7 authors 3
Submitted by huangrh9 23 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement · 11 authors 4
Submitted by AdinaY 22 Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback · 3 authors 3
Submitted by Jarvis1111 13 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks · 7 authors 2
Submitted by YanNeu 12 DASH: Detection and Assessment of Systematic Hallucinations of VLMs · 3 authors 2
Submitted by nielsr 12 MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis · 14 authors 2
Submitted by hychiang 10 Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models · 6 authors 2
Submitted by Jiuzhouh 6 VerifiAgent: a Unified Verification Agent in Language Model Reasoning · 3 authors 2
Submitted by mawjdgus 4 Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations · 2 authors 1