7 31 81

neuralink

AI & ML interests

nanotron @ hf

Recent Activity

liked a model 5 days ago

Qwen/Qwen3-235B-A22B

upvoted an article 24 days ago

You could have designed state of the art positional encoding

upvoted an article 24 days ago

Welcome Llama 4 Maverick & Scout on Hugging Face!

View all activity

Organizations

neuralink's activity

upvoted 2 articles 24 days ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

• 240

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

about 1 month ago

• 142

upvoted a paper 27 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 28 days ago • 179

upvoted an article about 2 months ago

Article

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

Mar 7

• 53

upvoted an article 2 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.24k

upvoted an article 3 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 849

upvoted 2 papers 4 months ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Paper • 2409.15241 • Published Sep 23, 2024 • 1

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5 • 27

upvoted a paper 6 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 257

upvoted a paper 8 months ago

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 21

upvoted an article 8 months ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

• 120

upvoted a paper 8 months ago

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Paper • 2201.02177 • Published Jan 6, 2022 • 2

upvoted an article 9 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 62

upvoted 2 papers 9 months ago

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30, 2024 • 6

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 162

upvoted a paper 10 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104

upvoted a paper 11 months ago

DiPaCo: Distributed Path Composition

Paper • 2403.10616 • Published Mar 15, 2024 • 13

upvoted an article 11 months ago

Article

Putting RL back in RLHF

Jun 12, 2024

• 88