Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published Feb 17 • 19
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published 12 days ago • 31
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 143
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14 • 56