masters-thesis-vm
/

whispered_TIA_small_ad_tokenization_encoder_freezing_normal

 ---
 license: unknown
+language:
+- en
+metrics:
+- wer
+tags:
+- whisper
+- speech processing
+- nlp
+- asr
+- domain adaptation
 ---
+# Whispered TIA
+Whispered TIA is a fine-tuned ASR model based on Whisper. It is adapted to the software
+<a href="https://www.siemens.com/de/de/produkte/automatisierung/industrie-software/automatisierungs-software/tia-portal.html">TIA (Totally Integrated Automation)</a> from Siemens AG and is able to predict domain specific words and to transcribe them correctly.
+# Base Model Whisper
+Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.
+Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
+by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper).
+# Training Results
+The False HallucER indicates how many hallucinations and deletions were produced.
+<!DOCTYPE html>
+<html>
+<head>
+<style>
+    table {
+        width: 100%;
+        border-collapse: collapse;
+    }
+    th, td {
+        padding: 8px;
+        text-align: left;
+        border-bottom: 1px solid #ddd;
+    }
+    th {
+        background-color: #f2f2f2;
+    }
+</style>
+</head>
+<body>
+<table>
+    <tr>
+        <th>WER</th>
+        <th>False HallucER</th>
+        <th>Runtime</th>
+        <th>Batch Size</th>
+        <th>Memory Usage</th>
+            <tr>
+        <td>1.6</td>
+        <td>499.76</td>
+        <td>1.72</td>
+        <td>64</td>
+        <td>20049</td>
+    </tr>
+    <tr>
+        <td>~</td>
+        <td>~</td>
+        <td>Predictions &gt; References: 34%</td>
+        <td>~</td>
+        <td>~</td>
+    </tr>
+    <tr>
+        <td>~</td>
+        <td>~</td>
+        <td>Predictions &lt; References: 30%</td>
+        <td>~</td>
+        <td>~</td>
+    </tr>
+    <tr>
+        <td>~</td>
+        <td>~</td>
+        <td>Predictions = References: 35%</td>
+        <td>~</td>
+        <td>~</td>
+    </tr>
+</table>
+</body>
+</html>
+# Dataset
+The underlying dataset is <a href="https://huggingface.co/datasets/vimey/whispered_TIA_normal">dataset: normal</a>.
+# Inference
+```python
+import librosa
+import torch
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+# Insert audio file
+file = "/path/to/audio"
+# Convert to Mel Spectrogram
+arr, sampling_rate = librosa.load(file, sr=16000)
+# Load whisper model and processor
+processor = WhisperProcessor.from_pretrained("openai/whisper-small")
+model = WhisperForConditionalGeneration.from_pretrained("vimey/whispered_TIA_small_ad_tokenization_encoder_freezing_normal")
+# Preprocessing
+input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features
+# Prediction
+forced_decoder_ids = processor.get_decoder_prompt_ids(language="ko", task="transcribe")
+predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+print(transcription)
+```