File size: 5,427 Bytes
02b9712
 
692cb53
 
 
 
 
 
576325b
692cb53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
576325b
692cb53
 
02b9712
 
f0c1028
692cb53
576325b
 
692cb53
 
cd321af
 
 
 
02b9712
 
 
c05e354
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82fe803
 
 
 
 
 
 
 
 
 
 
 
 
d24534d
f0c1028
 
 
d24534d
f0c1028
 
d24534d
 
f0c1028
 
d24534d
 
f0c1028
 
d24534d
 
f0c1028
 
d24534d
 
f0c1028
 
d24534d
 
f0c1028
 
d24534d
 
f0c1028
 
d24534d
 
f0c1028
d24534d
 
 
02b9712
692cb53
d24534d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
license: mit
datasets:
- tyqiangz/multilingual-sentiments
- cardiffnlp/tweet_sentiment_multilingual
- mteb/tweet_sentiment_multilingual
- Sp1786/multiclass-sentiment-analysis-dataset
- stanfordnlp/sst2
- statmt/cc100
language:
- en
- de
- es
- fr
- ja
- zh
- id
- ar
- hi
- it
- ms
- pt
metrics:
- accuracy
- f1
base_model:
- microsoft/mdeberta-v3-base
tags:
- sentiment
---

# Model 

Multi-language sentiment classification model developed over the multi-language Microsoft [mDeBERTa-v3 base model](https://huggingface.co/microsoft/mdeberta-v3-base). 
This model where originally trained over CC100 multi-lingual dataset with more that 100+ languages. In this repo we provide fine-tuned model towards the multi-language sentiment analysis.
Model where trained on mulitple datasets with multiple languages with additional weights over class (sentiment categories: Negative, Positive, Neutral).
In order to train the model the following dataset where used:
 - tyqiangz/multilingual-sentiments
 - cardiffnlp/tweet_sentiment_multilingual
 - mteb/tweet_sentiment_multilingual
 - Sp1786/multiclass-sentiment-analysis-dataset
 - ABSC amazon review
 - SST2

# Model parameters

Defined training arguments:
```python
TrainingArguments(
        label_smoothing_factor=0.1,  # Add label smoothing
        evaluation_strategy="epoch",
        greater_is_better=True,
        # Adding weight decay
        weight_decay=0.02,
        num_train_epochs=10,
        learning_rate=5e-6,  # 1e-5,
        optim="adamw_torch",
        adam_beta1=0.9,
        adam_beta2=0.999,
        adam_epsilon=1e-6,
        max_grad_norm=0.5,  # 1.0, # clipping
        lr_scheduler_type='cosine',
        per_device_train_batch_size=48,
        per_device_eval_batch_size=48,
        gradient_accumulation_steps=1,
        gradient_checkpointing=True,
        warmup_ratio=0.1,
        fp16=False,
        logging_strategy="epoch",
        save_strategy="epoch",
        metric_for_best_model="f1",
        save_total_limit=3,
    )
```
Additionaly dropout where changed to:
```python
model.config.classifier_dropout = 0.3  # Set classifier dropout rate
model.config.hidden_dropout_prob = 0.2  # Add hidden layer dropout
model.config.attention_probs_dropout_prob = 0.2  # Add attention dropout
```

Also in order to improve model generalization we make custom compute loss with focal loss function and pre-computed class weights:
```python
def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        labels = inputs.pop("labels")
        labels = labels.to(model.device)
        # forward pass
        outputs = model(**inputs)
        logits = outputs.logits.float()
        logits = logits.to(model.device)
        # compute custom loss
        loss = torch.nn.CrossEntropyLoss(weight=self.tensor_class_w, reduction='none')
        loss = loss.to(model.device)
        if self.tensor_class_w is not None:
            """In case of imbalance data compute focal loss"""
            loss = loss(logits.view(-1, self.model.config.num_labels), labels.view(-1))
            pt = torch.exp(-loss)
            loss = ((1-pt)**self.gamma*loss).mean()
        return (loss, outputs) if return_outputs else loss
```

# Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model = pipeline(task='sentiment-analysis', model='alexander-sh/mDeBERTa-v3-multi-sent', device='cuda')
model('Keep your face always toward the sunshine—and shadows will fall behind you.')
>>> [{'label': 'positive', 'score': 0.6478521227836609}]
model('I am not coming with you.')
>>> [{'label': 'neutral', 'score': 0.790919840335846}]
model("I am hating that my transformer model don't work properly.")
>>> [{'label': 'negative', 'score': 0.7474458813667297}]
```

# Evaluation and comparison with Vanilla and GPT-4o model:

| Dataset          | Model  | F1     | Accuracy |
|------------------|--------|--------|----------|
|                  | Vanilla| 0.0000 | 0.0000   |
| **sst2**         | Our    | 0.6161 | 0.9231   |
|                  | GPT-4  | 0.6113 | 0.8605   |
|---|---|---|---|
|                  | Vanilla| 0.2453 | 0.5820   |
| **sent-eng**     | Our    | 0.6289 | 0.6470   |
|                  | GPT-4  | 0.4611 | 0.5870   |
|---|---|---|---|
|                  | Vanilla| 0.0889 | 0.1538   |
| **sent-twi**     | Our    | 0.3368 | 0.3488   |
|                  | GPT-4  | 0.5049 | 0.5385   |
|---|---|---|---|
|                  | Vanilla| 0.0000 | 0.0000   |
| **mixed**        | Our    | 0.5644 | 0.7786   |
|                  | GPT-4  | 0.5336 | 0.6863   |
|---|---|---|---|
|                  | Vanilla| 0.1475 | 0.2842   |
| **absc-laptop**  | Our    | 0.5513 | 0.6682   |
|                  | GPT-4  | 0.6679 | 0.7642   |
|---|---|---|---|
|                  | Vanilla| 0.1045 | 0.1858   |
| **absc-rest**    | Our    | 0.6149 | 0.7726   |
|                  | GPT-4  | 0.7057 | 0.8385   |
|---|---|---|---|
|                  | Vanilla| 0.1455 | 0.2791   |
| **stanford**     | Our    | 0.8352 | 0.8353   |
|                  | GPT-4  | 0.8045 | 0.8032   |
|---|---|---|---|
|                  | Vanilla| 0.0000 | 0.0000   |
| **amazon-var**   | Our    | 0.6432 | 0.9647   |
|                  | GPT-4  | -----  | 0.9450   |

F1 score is measured with macro average computation parameter. 

# Source code
[GitHub Repository](https://github.com/alexdrk14/mDeBERTa-v3-multi-sent)