StateX: Enhancing RNN Recall via Post-training State Expansion Paper • 2509.22630 • Published 9 days ago • 2
StateX: Enhancing RNN Recall via Post-training State Expansion Paper • 2509.22630 • Published 9 days ago • 2
StateX: Enhancing RNN Recall via Post-training State Expansion Paper • 2509.22630 • Published 9 days ago • 2 • 2
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Paper • 2507.08771 • Published Jul 11 • 9
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Paper • 2507.08771 • Published Jul 11 • 9
view article Article Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub By drbh and 6 others • Jun 12 • 144
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published Mar 12 • 5 • 2
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4, 2024 • 11