metadata
language:
- fa
tags:
- speech
- emotion-recognition
- persian
- SER
- ECAPA-TDNN
- speechbrain
library_name: speechbrain
license: mit
datasets:
- SHEMO
pipeline_tag: audio-classification
base_model: speechbrain/emotion-recognition-wav2vec2-IEMOCAP
🎙️ Persian Speech Emotion Recognition with SpeechBrain (ShEMO)
This repository provides an ECAPA-TDNN model for speech emotion recognition in Persian, developed using the SpeechBrain toolkit.
The model has been trained on the ShEMO dataset, which includes annotated emotional speech in Persian.
It leverages the ECAPA-TDNN architecture, commonly used in speaker recognition and emotion classification tasks.
Model Overview
- Architecture: ECAPA-TDNN (Emphasized Channel Attention, Propagation and Aggregation in TDNN)
- Features: 80‑dim Mel‑filterbank, input normalization
- Loss: Additive Angular Margin (AAM‑Softmax)
- Classes: 6 emotions –
anger
,surprise
,happiness
,sadness
,neutral
,fear
- Trained on: ShEMO dataset (≈3 k Persian utterances, 87 speakers)
Installation
pip install torch torchaudio speechbrain
(Tested with SpeechBrain ≥1.0 and torch ≥2.0.)
Quickstart Inference
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(
source="mobina1380/speechbrain-persian-ser",
savedir="tmp_cache",
)
# Classify a WAV file
emotion = classifier.classify_file("your_audio.wav")
print("Predicted emotion:", emotion)
No manual Git‑LFS, no snapshot_download calls—just one import and one from_hparams
.
File Structure
README.md
hyperparams.yaml
embedding_model.ckpt
classifier.ckpt
normalizer.ckpt
label_encoder.txt
Limitations
- 📉 Not highly accurate for subtle or noisy emotions