🎙️ Persian Speech Emotion Recognition with SpeechBrain (ShEMO)

This repository provides an ECAPA-TDNN model for speech emotion recognition in Persian, developed using the SpeechBrain toolkit.

The model has been trained on the ShEMO dataset, which includes annotated emotional speech in Persian.
It leverages the ECAPA-TDNN architecture, commonly used in speaker recognition and emotion classification tasks.

Supported Emotion Classes

The model predicts one of the following six emotions: anger, sadness, neutral, surprise, happiness, fear

📦 How to use this model locally

You can run inference using the included Python script. Here's how:

1️⃣ Clone the repository

git lfs install
git clone https://huggingface.co/mobina1380/speechbrain-persian-ser
cd speechbrain-persian-ser

2️⃣ Install required libraries

pip install speechbrain torchaudio

3️⃣ Run inference on your audio file

Put your Persian speech file in the same folder (WAV, mono, 16kHz). Then:

from inference import predict
emotion = predict("your_audio.wav")
print("Predicted emotion:", emotion)

📁 Repository Structure

speechbrain-persian-ser/
├── inference.py            # Inference logic
├── hyperparams.yaml        # Model configuration
├── custom.yaml             # Optional training config
├── save/                   # Folder with checkpoints
│   └── CKPT+...            # Fine-tuned weights
└── README.md               # You're reading it!

📄 License

Model: MIT License

Dataset: ShEMO dataset — check original license

📬 Contact

If you have any questions, feedback, or would like to collaborate, feel free to reach out:

📧 Email: esmaeilimobina98@gmail.com
🤗 Hugging Face: mobina1380

mobina1380
/

speechbrain-persian-ser