metadata

language:
  - fa
tags:
  - speech
  - emotion-recognition
  - persian
  - SER
  - ECAPA-TDNN
  - speechbrain
library_name: speechbrain
license: mit
datasets:
  - SHEMO
pipeline_tag: audio-classification
base_model: speechbrain/emotion-recognition-wav2vec2-IEMOCAP

🎙️ Persian Speech Emotion Recognition with SpeechBrain (ShEMO)

This repository provides an ECAPA-TDNN model for speech emotion recognition in Persian, developed using the SpeechBrain toolkit.

The model has been trained on the ShEMO dataset, which includes annotated emotional speech in Persian.
It leverages the ECAPA-TDNN architecture, commonly used in speaker recognition and emotion classification tasks.

Model Overview

Architecture: ECAPA-TDNN (Emphasized Channel Attention, Propagation and Aggregation in TDNN)
Features: 80‑dim Mel‑filterbank, input normalization
Loss: Additive Angular Margin (AAM‑Softmax)
Classes: 6 emotions – anger, surprise, happiness, sadness, neutral, fear
Trained on: ShEMO dataset (≈3 k Persian utterances, 87 speakers)

Installation

pip install torch torchaudio speechbrain

(Tested with SpeechBrain ≥1.0 and torch ≥2.0.)

Quickstart Inference

from speechbrain.pretrained import EncoderClassifier

classifier = EncoderClassifier.from_hparams(
    source="mobina1380/speechbrain-persian-ser",
    savedir="tmp_cache",          
)

# Classify a WAV file
emotion = classifier.classify_file("your_audio.wav")
print("Predicted emotion:", emotion)

No manual Git‑LFS, no snapshot_download calls—just one import and one from_hparams.

File Structure

README.md                  
hyperparams.yaml           
embedding_model.ckpt       
classifier.ckpt            
normalizer.ckpt            
label_encoder.txt

Limitations

📉 Not highly accurate for subtle or noisy emotions