vimey commited on
Commit
e838033
·
1 Parent(s): ab3d188

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md CHANGED
@@ -1,3 +1,118 @@
1
  ---
2
  license: unknown
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: unknown
3
+ language:
4
+ - en
5
+ metrics:
6
+ - wer
7
+ tags:
8
+ - whisper
9
+ - speech processing
10
+ - nlp
11
+ - asr
12
+ - domain adaptation
13
  ---
14
+ # Whispered TIA
15
+
16
+ Whispered TIA is a fine-tuned ASR model based on Whisper. It is adapted to the software
17
+ <a href="https://www.siemens.com/de/de/produkte/automatisierung/industrie-software/automatisierungs-software/tia-portal.html">TIA (Totally Integrated Automation)</a> from Siemens AG and is able to predict domain specific words and to transcribe them correctly.
18
+
19
+ # Base Model Whisper
20
+
21
+ Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.
22
+ Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
23
+ by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper).
24
+
25
+ # Training Results
26
+
27
+ The False HallucER indicates how many hallucinations and deletions were produced.
28
+
29
+ <!DOCTYPE html>
30
+ <html>
31
+ <head>
32
+ <style>
33
+ table {
34
+ width: 100%;
35
+ border-collapse: collapse;
36
+ }
37
+ th, td {
38
+ padding: 8px;
39
+ text-align: left;
40
+ border-bottom: 1px solid #ddd;
41
+ }
42
+ th {
43
+ background-color: #f2f2f2;
44
+ }
45
+ </style>
46
+ </head>
47
+ <body>
48
+
49
+ <table>
50
+ <tr>
51
+ <th>WER</th>
52
+ <th>False HallucER</th>
53
+ <th>Runtime</th>
54
+ <th>Batch Size</th>
55
+ <th>Memory Usage</th>
56
+ <tr>
57
+ <td>1.6</td>
58
+ <td>499.76</td>
59
+ <td>1.72</td>
60
+ <td>64</td>
61
+ <td>20049</td>
62
+ </tr>
63
+ <tr>
64
+ <td>~</td>
65
+ <td>~</td>
66
+ <td>Predictions &gt; References: 34%</td>
67
+ <td>~</td>
68
+ <td>~</td>
69
+ </tr>
70
+ <tr>
71
+ <td>~</td>
72
+ <td>~</td>
73
+ <td>Predictions &lt; References: 30%</td>
74
+ <td>~</td>
75
+ <td>~</td>
76
+ </tr>
77
+ <tr>
78
+ <td>~</td>
79
+ <td>~</td>
80
+ <td>Predictions = References: 35%</td>
81
+ <td>~</td>
82
+ <td>~</td>
83
+ </tr>
84
+ </table>
85
+
86
+ </body>
87
+ </html>
88
+
89
+ # Dataset
90
+ The underlying dataset is <a href="https://huggingface.co/datasets/vimey/whispered_TIA_normal">dataset: normal</a>.
91
+
92
+ # Inference
93
+
94
+ ```python
95
+ import librosa
96
+ import torch
97
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
98
+
99
+ # Insert audio file
100
+ file = "/path/to/audio"
101
+
102
+ # Convert to Mel Spectrogram
103
+ arr, sampling_rate = librosa.load(file, sr=16000)
104
+
105
+ # Load whisper model and processor
106
+ processor = WhisperProcessor.from_pretrained("openai/whisper-small")
107
+ model = WhisperForConditionalGeneration.from_pretrained("vimey/whispered_TIA_small_ad_tokenization_encoder_freezing_normal")
108
+
109
+ # Preprocessing
110
+ input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features
111
+
112
+ # Prediction
113
+ forced_decoder_ids = processor.get_decoder_prompt_ids(language="ko", task="transcribe")
114
+ predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
115
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
116
+
117
+ print(transcription)
118
+ ```