Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated about 9 hours ago • 453
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition Paper • 2401.09759 • Published Jan 18, 2024 • 2