MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge Paper • 2206.08853 • Published Jun 17, 2022 • 1
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time Paper • 2211.03375 • Published Nov 7, 2022
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation Paper • 2410.08208 • Published Oct 10, 2024
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm Paper • 2310.08586 • Published Oct 12, 2023
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Paper • 2507.01016 • Published Jul 1 • 1
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving Paper • 2310.08370 • Published Oct 12, 2023
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17 • 64
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool Paper • 2509.05296 • Published 28 days ago • 7
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published 18 days ago • 103