excalibur12's picture
Update README.md
8274001 verified
metadata
license: apache-2.0
base_model: facebook/wav2vec2-large-lv60
tags:
  - generated_from_trainer
  - phoneme-recognition
model-index:
  - name: wav2vec2-large-lv60_phoneme-timit_english_timit-4k_simplified
    results:
      - task:
          type: phoneme-recognition
          name: Phoneme Recognition
        dataset:
          name: TIMIT
          type: timit-asr/timit_asr
          split: test
          args: simplified phoneme set
        metrics:
          - name: Phone Error Rate
            type: per
            value: 0.0838
datasets:
  - timit-asr/timit_asr
language:
  - en
metrics:
  - per
library_name: transformers

wav2vec2-large-lv60_phoneme-timit_english_timit-4k_simplified

This model is a fine-tuned version of facebook/wav2vec2-large-lv60 on the TIMIT dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2796
  • Phone Error Rate: 0.0838 (8.38%)

Model description

Trained on a simplified version of the TIMIT phone set.

Intended uses & limitations

Merged Phonemes

  • Based on error analysis for each phoneme from the original TIMIT phoneme set.
  • See this repo for detailed analysis.
  • ax-h β†’ ax
  • axr β†’ er
  • ix β†’ ih
  • ux β†’ uw
  • zh β†’ z
  • em β†’ m
  • en β†’ n
  • eng β†’ ng
  • nx β†’ n
  • hv β†’ hh

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • training_steps: 3000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Phone Error Rate
7.3185 1.04 300 3.6437 0.9617
2.5644 2.08 600 0.7668 0.1559
0.6782 3.11 900 0.3794 0.1231
0.4542 4.15 1200 0.3278 0.1164
0.3834 5.19 1500 0.3043 0.1151
0.3407 6.23 1800 0.2872 0.1119
0.3179 7.27 2100 0.2842 0.1110
0.2988 8.3 2400 0.2834 0.1102
0.2834 9.34 2700 0.2826 0.1100
0.2814 10.38 3000 0.2796 0.1100

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.0.1
  • Datasets 2.16.1
  • Tokenizers 0.15.2