Hi there! Congrats on the great work, really appreciate seeing discussions like these in the open ✨
Just one question: in the long context extension phase you mention using an extra 100B tokens - from where do you source them? Are they from the same sources as the pretraining, with different upscaling weights?
In general, I would really appreciate it if you could point me to some resource/inspiration regarding what data to use for the long-context extension!
Tommaso Bonomo
tommasobonomo
AI & ML interests
None yet
Recent Activity
commented on
an
article
9 days ago
SmolLM3: smol, multilingual, long-context reasoner
upvoted
an
article
9 days ago
SmolLM3: smol, multilingual, long-context reasoner
liked
a model
25 days ago
Salesforce/WQRM-PRE