Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content Paper • 2410.08260 • Published Oct 10, 2024
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning Paper • 2504.00396 • Published Apr 1 • 4
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment Paper • 2503.23907 • Published Mar 31 • 2
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Paper • 2503.19907 • Published Mar 25 • 8
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 76
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 76
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18 • 27
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18 • 27
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21, 2024 • 12
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding Paper • 2410.21747 • Published Oct 29, 2024
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published Dec 10, 2024 • 56
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Paper • 2412.07759 • Published Dec 10, 2024 • 18
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published Dec 10, 2024 • 19
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing Paper • 2411.15260 • Published Nov 22, 2024