|
--- |
|
license: mit |
|
datasets: |
|
- tyqiangz/multilingual-sentiments |
|
- cardiffnlp/tweet_sentiment_multilingual |
|
- mteb/tweet_sentiment_multilingual |
|
- Sp1786/multiclass-sentiment-analysis-dataset |
|
- stanfordnlp/sst2 |
|
- statmt/cc100 |
|
language: |
|
- en |
|
- de |
|
- es |
|
- fr |
|
- ja |
|
- zh |
|
- id |
|
- ar |
|
- hi |
|
- it |
|
- ms |
|
- pt |
|
metrics: |
|
- accuracy |
|
- f1 |
|
base_model: |
|
- microsoft/mdeberta-v3-base |
|
tags: |
|
- sentiment |
|
--- |
|
|
|
# Model |
|
|
|
Multi-language sentiment classification model developed over the multi-language Microsoft [mDeBERTa-v3 base model](https://huggingface.co/microsoft/mdeberta-v3-base). |
|
This model where originally trained over CC100 multi-lingual dataset with more that 100+ languages. In this repo we provide fine-tuned model towards the multi-language sentiment analysis. |
|
Model where trained on mulitple datasets with multiple languages with additional weights over class (sentiment categories: Negative, Positive, Neutral). |
|
In order to train the model the following dataset where used: |
|
- tyqiangz/multilingual-sentiments |
|
- cardiffnlp/tweet_sentiment_multilingual |
|
- mteb/tweet_sentiment_multilingual |
|
- Sp1786/multiclass-sentiment-analysis-dataset |
|
- ABSC amazon review |
|
- SST2 |
|
|
|
# Model parameters |
|
|
|
Defined training arguments: |
|
```python |
|
TrainingArguments( |
|
label_smoothing_factor=0.1, # Add label smoothing |
|
evaluation_strategy="epoch", |
|
greater_is_better=True, |
|
# Adding weight decay |
|
weight_decay=0.02, |
|
num_train_epochs=10, |
|
learning_rate=5e-6, # 1e-5, |
|
optim="adamw_torch", |
|
adam_beta1=0.9, |
|
adam_beta2=0.999, |
|
adam_epsilon=1e-6, |
|
max_grad_norm=0.5, # 1.0, # clipping |
|
lr_scheduler_type='cosine', |
|
per_device_train_batch_size=48, |
|
per_device_eval_batch_size=48, |
|
gradient_accumulation_steps=1, |
|
gradient_checkpointing=True, |
|
warmup_ratio=0.1, |
|
fp16=False, |
|
logging_strategy="epoch", |
|
save_strategy="epoch", |
|
metric_for_best_model="f1", |
|
save_total_limit=3, |
|
) |
|
``` |
|
Additionaly dropout where changed to: |
|
```python |
|
model.config.classifier_dropout = 0.3 # Set classifier dropout rate |
|
model.config.hidden_dropout_prob = 0.2 # Add hidden layer dropout |
|
model.config.attention_probs_dropout_prob = 0.2 # Add attention dropout |
|
``` |
|
|
|
Also in order to improve model generalization we make custom compute loss with focal loss function and pre-computed class weights: |
|
```python |
|
def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None): |
|
labels = inputs.pop("labels") |
|
labels = labels.to(model.device) |
|
# forward pass |
|
outputs = model(**inputs) |
|
logits = outputs.logits.float() |
|
logits = logits.to(model.device) |
|
# compute custom loss |
|
loss = torch.nn.CrossEntropyLoss(weight=self.tensor_class_w, reduction='none') |
|
loss = loss.to(model.device) |
|
if self.tensor_class_w is not None: |
|
"""In case of imbalance data compute focal loss""" |
|
loss = loss(logits.view(-1, self.model.config.num_labels), labels.view(-1)) |
|
pt = torch.exp(-loss) |
|
loss = ((1-pt)**self.gamma*loss).mean() |
|
return (loss, outputs) if return_outputs else loss |
|
``` |
|
|
|
# Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
model = pipeline(task='sentiment-analysis', model='alexander-sh/mDeBERTa-v3-multi-sent', device='cuda') |
|
model('Keep your face always toward the sunshine—and shadows will fall behind you.') |
|
>>> [{'label': 'positive', 'score': 0.6478521227836609}] |
|
model('I am not coming with you.') |
|
>>> [{'label': 'neutral', 'score': 0.790919840335846}] |
|
model("I am hating that my transformer model don't work properly.") |
|
>>> [{'label': 'negative', 'score': 0.7474458813667297}] |
|
``` |
|
|
|
# Evaluation and comparison with Vanilla and GPT-4o model: |
|
|
|
| Dataset | Model | F1 | Accuracy | |
|
|------------------|--------|--------|----------| |
|
| | Vanilla| 0.0000 | 0.0000 | |
|
| **sst2** | Our | 0.6161 | 0.9231 | |
|
| | GPT-4 | 0.6113 | 0.8605 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.2453 | 0.5820 | |
|
| **sent-eng** | Our | 0.6289 | 0.6470 | |
|
| | GPT-4 | 0.4611 | 0.5870 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.0889 | 0.1538 | |
|
| **sent-twi** | Our | 0.3368 | 0.3488 | |
|
| | GPT-4 | 0.5049 | 0.5385 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.0000 | 0.0000 | |
|
| **mixed** | Our | 0.5644 | 0.7786 | |
|
| | GPT-4 | 0.5336 | 0.6863 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.1475 | 0.2842 | |
|
| **absc-laptop** | Our | 0.5513 | 0.6682 | |
|
| | GPT-4 | 0.6679 | 0.7642 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.1045 | 0.1858 | |
|
| **absc-rest** | Our | 0.6149 | 0.7726 | |
|
| | GPT-4 | 0.7057 | 0.8385 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.1455 | 0.2791 | |
|
| **stanford** | Our | 0.8352 | 0.8353 | |
|
| | GPT-4 | 0.8045 | 0.8032 | |
|
|---|---|---|---| |
|
| | Vanilla| 0.0000 | 0.0000 | |
|
| **amazon-var** | Our | 0.6432 | 0.9647 | |
|
| | GPT-4 | ----- | 0.9450 | |
|
|
|
F1 score is measured with macro average computation parameter. |
|
|
|
# Source code |
|
[GitHub Repository](https://github.com/alexdrk14/mDeBERTa-v3-multi-sent) |