UItron: Foundational GUI Agent with Advanced Perception and Planning Paper • 2508.21767 • Published 9 days ago • 12
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection Paper • 2508.20766 • Published 10 days ago • 14
Train Long, Think Short: Curriculum Learning for Efficient Reasoning Paper • 2508.08940 • Published 26 days ago • 25
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Paper • 2505.22642 • Published May 28 • 3
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions Paper • 2505.21724 • Published May 27 • 4
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think Paper • 2504.20708 • Published Apr 29 • 23
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 52
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding Paper • 2503.17827 • Published Mar 22 • 8
Vivid-ZOO: Multi-View Video Generation with Diffusion Model Paper • 2406.08659 • Published Jun 12, 2024 • 8
SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation Paper • 2105.04447 • Published May 10, 2021 • 1