Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Paper • 2504.02821 • Published Apr 3 • 9
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper • 2504.17343 • Published Apr 24 • 12
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper • 2504.15921 • Published Apr 22 • 7
Distilling semantically aware orders for autoregressive image generation Paper • 2504.17069 • Published Apr 23 • 7
VideoDeepResearch: Long Video Understanding With Agentic Tool Using Paper • 2506.10821 • Published Jun 12 • 19
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper • 2506.07177 • Published Jun 8 • 22
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published 16 days ago • 21
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published 10 days ago • 38
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Paper • 2509.25182 • Published 9 days ago • 33