Address discrepancies in the languages supported by the Mistral Small 3.1 2503
Hi Mistral team
The language listed as supported by the Mistral Small 3.1 2503 - both base and instruct version seem to be inconsistent on different model cards provider. This makes it harder to get a comprehensive picture of what you support and is likely to limit adoption (people often search for language supported in their model).
Example, if I only take Dutch and Finnish:
- on the Mistral website release page https://mistral.ai/news/mistral-small-3-1 I see no information on language supported,
- on this HF model card, 24 languages are listed, not Dutch nor Finnish
- on the Kaggle model card - also published by the Mistral org https://www.kaggle.com/models/mistral-ai/mistral-small-3.1/ Dutch and Finnish are mentioned
- On the GCP Vertex doc - https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/mistral-small-2503?invt=Abt7lA — lots of languages are listed with among others, Dutch and Finnish.
See screenshot from GCP below.
When testing this model with simple task in Dutch and Finnish it seemed to understand the language and the answer were relevant so I guess our model actually support those languages among others.
@mgoin
I propose a PR to update our supported model list on the HF model card since it’s the go to truth source to compare model supported. If relevant for you, I propose we also update this card on the base model of the small 3.1 2503.
Thanks for sharing the model, really impressed by its performance. Bonus ask; could you share a percentage of language break down in your train set (pre training)?
let me know if any precisions needed on this @mgoin @patrickvonplaten , I'd be happy to fix the PR to expected format