kingabzpro's picture
Update README.md
62345c2
metadata
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3-turbo
tags:
  - automatic-speech-recognition
  - whisper
  - urdu
  - mozilla-foundation/common_voice_17_0
  - hf-asr-leaderboard
datasets:
  - mozilla-foundation/common_voice_17_0
metrics:
  - wer
  - cer
  - bleu
  - chrf
model-index:
  - name: whisper-large-v3-turbo-urdu
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Common Voice 17.0 (Urdu)
          type: mozilla-foundation/common_voice_17_0
          config: ur
          split: test
          args: ur
        metrics:
          - type: wer
            value: 26.234
            name: WER
          - type: cer
            value: 8.795
            name: CER
          - type: bleu
            value: 58.032
            name: BLEU
          - type: chrf
            value: 81.636
            name: ChrF
language:
  - ur
pipeline_tag: automatic-speech-recognition

Whisper large V3 Turbo Urdu ASR Model 🥇

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the common_voice_17_0 dataset.

It achieves the following results on the evaluation set:

  • Loss: 0.3534
  • Wer: 25.7842

Quick Usage

from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-large-v3-turbo-urdu"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1500

Training results

Training Loss Epoch Step Validation Loss Wer
0.6764 0.2545 300 0.6244 44.9776
0.5881 0.5089 600 0.5089 37.6214
0.4662 0.7634 900 0.4349 32.1322
0.3661 1.0178 1200 0.3634 26.5683
0.2293 1.2723 1500 0.3534 25.7842

Framework versions

  • Transformers 4.53.1
  • Pytorch 2.8.0.dev20250319+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.2

Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split).

Metric Value Description
WER 26.234% Word Error Rate (lower is better)
CER 8.795% Character Error Rate
BLEU 58.032% BLEU Score (higher is better)
ChrF 81.636 Character n-gram F-score

👉 Review the testing script: Testing Whisper Large V3 Turbo Urdu

Summary

The Word Error Rate (WER) of 26.23% is respectable, indicating that roughly three out of every four words are transcribed correctly. While there is room for improvement, this is a functional level of accuracy.

The model excels at the character level, with a low Character Error Rate (CER) of 8.80% and a very high ChrF score of 81.64. This shows it accurately captures the phonetic and structural details of the language. The high BLEU score further confirms that the generated transcriptions are coherent and closely match the reference text.

In summary, this is a high-performing and reliable ASR system that produces largely accurate and intelligible transcriptions.