clapAI/mmBERT-small-multilingual-sentiment
Introduction
mmBERT-small-multilingual-sentiment is a multilingual sentiment classification model, part of the Multilingual-Sentiment collection.
The model is fine-tuned from jhu-clsp/mmBERT-small using the multilingual sentiment dataset clapAI/MultiLingualSentiment.
Model supports multilingual sentiment classification across 16+ languages, including English, Vietnamese, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, and more.
Key Highlights
📈 Improved accuracy: Achieves F1 = 82.2.
📜 Long context support: Handles sequences up to 8192 tokens.
🪶 Efficient size: Only 140M parameters, smaller than RoBERTa-base (278M) with better performance.
⚡ FlashAttention-2 support: Enables much faster inference on modern GPUs.
Evaluation & Performance
Results on the test split of clapAI/MultiLingualSentiment
Model | Pretrained Model | Parameters | Context-length | F1-score |
---|---|---|---|---|
clapAI/mmBERT-small-multilingual-sentiment | jhu-clsp/mmBERT-small | 140M | 8192 | 82.2 |
modernBERT-base-multilingual-sentiment | ModernBERT-base | 150M | 8192 | 80.16 |
roberta-base-multilingual-sentiment | XLM-roberta-base | 278M | 512 | 81.8 |
How to use
Installation
pip install torch==2.8
pip install transformers==4.55.0
Optional: accelerate inference with FlashAttention-2 (if supported by your GPU):
pip install packaging==25.0 ninja==1.13.0
MAX_JOBS=4 pip install flash-attn==2.8.3 --no-build-isolation
Example Usage
Try it on Google Colab
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_id = "clapAI/mmBERT-small-multilingual-sentiment"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
model = AutoModelForSequenceClassification.from_pretrained(
model_id,
torch_dtype=dtype,
# Uncomment if device supports FA2
# attn_implementation="flash_attention_2"
)
model.to(device)
model.eval()
# Retrieve labels from the model's configuration
id2label = model.config.id2label
texts = [
"I absolutely love the new design of this app!", # English
"الخدمة كانت سيئة للغاية.",
"Ich bin sehr zufrieden mit dem Kauf.", # German
"El producto llegó roto y no funciona.", # Spanish
"J'adore ce restaurant, la nourriture est délicieuse!", # French
"Makanannya benar-benar tidak enak.", # Indonesian
"この製品は本当に素晴らしいです!", # Japanese
"고객 서비스가 정말 실망스러웠어요.", # Korean
"Этот фильм просто потрясающий!", # Russian
"Tôi thực sự yêu thích sản phẩm này!", # Vietnamese
"质量真的很差。" # Chinese
]
for text in texts:
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.inference_mode():
outputs = model(**inputs)
prediction = id2label[outputs.logits.argmax(dim=-1).item()]
print(f"Text: {text} | Prediction: {prediction}")
Citation
If you use this model, please consider citing:
@misc{clapAI_mmbert_small_multilingual_sentiment,
title={mmBERT-small-multilingual-sentiment: A Multilingual Sentiment Classification Model},
author={clapAI},
howpublished={\url{https://huggingface.co/clapAI/mmBERT-small-multilingual-sentiment}},
year={2025},
}
- Downloads last month
- 184
Model tree for clapAI/mmBERT-small-multilingual-sentiment
Base model
jhu-clsp/mmBERT-small