SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 28 days ago • 179
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! Mar 7 • 53
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper • 2409.15241 • Published Sep 23, 2024 • 1
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 257
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 21
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper • 2201.02177 • Published Jan 6, 2022 • 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 62
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30, 2024 • 6
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 162