|
--- |
|
title: AI Language Monitor |
|
emoji: π |
|
colorFrom: purple |
|
colorTo: pink |
|
sdk: gradio |
|
license: cc-by-sa-4.0 |
|
short_description: Evaluating LLM performance across all human languages. |
|
datasets: |
|
- openlanguagedata/flores_plus |
|
- google/fleurs |
|
- mozilla-foundation/common_voice_1_0 |
|
models: |
|
- meta-llama/Llama-3.3-70B-Instruct |
|
- mistralai/Mistral-Small-24B-Instruct-2501 |
|
- deepseek-ai/DeepSeek-V3 |
|
- microsoft/phi-4 |
|
- openai/whisper-large-v3 |
|
- google/gemma-3-27b-it |
|
tags: |
|
- leaderboard |
|
- submission:manual |
|
- test:public |
|
- judge:auto |
|
- modality:text |
|
- modality:artefacts |
|
- eval:generation |
|
- language:English |
|
- language:German |
|
--- |
|
|
|
<!-- |
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
For tag meaning, see https://huggingface.co/spaces/leaderboards/LeaderboardsExplorer |
|
--> |
|
|
|
[](https://huggingface.co/spaces/datenlabor-bmz/ai-language-monitor) |
|
|
|
# AI Language Monitor π |
|
|
|
Benchmarking all big AI models on all benchmarkable languages. |
|
|
|
```bash |
|
uv run evals/main.py |
|
``` |
|
|