HoliTom: Holistic Token Merging for Fast Video Large Language Models Paper • 2505.21334 • Published May 27 • 20
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression Paper • 2505.19602 • Published May 26 • 13
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 24
VeriThinker: Learning to Verify Makes Reasoning Model Efficient Paper • 2505.17941 • Published May 23 • 25
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning Paper • 2505.16400 • Published May 22 • 34
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4 • 13
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning Paper • 2504.11409 • Published Apr 15 • 10
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published Sep 26, 2024 • 48
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published Jun 11, 2024 • 12