Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published 2 days ago • 64
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation Paper • 2506.11924 • Published Jun 13 • 34
Fine-Grained Perturbation Guidance via Attention Head Selection Paper • 2506.10978 • Published Jun 12 • 26
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs Paper • 2506.09522 • Published Jun 11 • 20