library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3-turbo
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-large-v3-turbo-urdu
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 17.0 (Urdu)
type: mozilla-foundation/common_voice_17_0
config: ur
split: test
args: ur
metrics:
- type: wer
value: 26.234
name: WER
- type: cer
value: 8.795
name: CER
- type: bleu
value: 58.032
name: BLEU
- type: chrf
value: 81.636
name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
Whisper large V3 Turbo Urdu ASR Model 🥇
This model is a fine-tuned version of openai/whisper-large-v3-turbo on the common_voice_17_0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3534
- Wer: 25.7842
Quick Usage
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-large-v3-turbo-urdu"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.6764 | 0.2545 | 300 | 0.6244 | 44.9776 |
0.5881 | 0.5089 | 600 | 0.5089 | 37.6214 |
0.4662 | 0.7634 | 900 | 0.4349 | 32.1322 |
0.3661 | 1.0178 | 1200 | 0.3634 | 26.5683 |
0.2293 | 1.2723 | 1500 | 0.3534 | 25.7842 |
Framework versions
- Transformers 4.53.1
- Pytorch 2.8.0.dev20250319+cu128
- Datasets 3.6.0
- Tokenizers 0.21.2
Evaluation
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
Metric | Value | Description |
---|---|---|
WER | 26.234% | Word Error Rate (lower is better) |
CER | 8.795% | Character Error Rate |
BLEU | 58.032% | BLEU Score (higher is better) |
ChrF | 81.636 | Character n-gram F-score |
👉 Review the testing script: Testing Whisper Large V3 Turbo Urdu
Summary
The Word Error Rate (WER) of 26.23% is respectable, indicating that roughly three out of every four words are transcribed correctly. While there is room for improvement, this is a functional level of accuracy.
The model excels at the character level, with a low Character Error Rate (CER) of 8.80% and a very high ChrF score of 81.64. This shows it accurately captures the phonetic and structural details of the language. The high BLEU score further confirms that the generated transcriptions are coherent and closely match the reference text.
In summary, this is a high-performing and reliable ASR system that produces largely accurate and intelligible transcriptions.