--- license: mit datasets: - tyqiangz/multilingual-sentiments - cardiffnlp/tweet_sentiment_multilingual - mteb/tweet_sentiment_multilingual - Sp1786/multiclass-sentiment-analysis-dataset - stanfordnlp/sst2 - statmt/cc100 language: - en - de - es - fr - ja - zh - id - ar - hi - it - ms - pt metrics: - accuracy - f1 base_model: - microsoft/mdeberta-v3-base tags: - sentiment --- # Model Multi-language sentiment classification model developed over the multi-language Microsoft [mDeBERTa-v3 base model](https://huggingface.co/microsoft/mdeberta-v3-base). This model where originally trained over CC100 multi-lingual dataset with more that 100+ languages. In this repo we provide fine-tuned model towards the multi-language sentiment analysis. Model where trained on mulitple datasets with multiple languages with additional weights over class (sentiment categories: Negative, Positive, Neutral). In order to train the model the following dataset where used: - tyqiangz/multilingual-sentiments - cardiffnlp/tweet_sentiment_multilingual - mteb/tweet_sentiment_multilingual - Sp1786/multiclass-sentiment-analysis-dataset - ABSC amazon review - SST2 # Model parameters Defined training arguments: ```python TrainingArguments( label_smoothing_factor=0.1, # Add label smoothing evaluation_strategy="epoch", greater_is_better=True, # Adding weight decay weight_decay=0.02, num_train_epochs=10, learning_rate=5e-6, # 1e-5, optim="adamw_torch", adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-6, max_grad_norm=0.5, # 1.0, # clipping lr_scheduler_type='cosine', per_device_train_batch_size=48, per_device_eval_batch_size=48, gradient_accumulation_steps=1, gradient_checkpointing=True, warmup_ratio=0.1, fp16=False, logging_strategy="epoch", save_strategy="epoch", metric_for_best_model="f1", save_total_limit=3, ) ``` Additionaly dropout where changed to: ```python model.config.classifier_dropout = 0.3 # Set classifier dropout rate model.config.hidden_dropout_prob = 0.2 # Add hidden layer dropout model.config.attention_probs_dropout_prob = 0.2 # Add attention dropout ``` Also in order to improve model generalization we make custom compute loss with focal loss function and pre-computed class weights: ```python def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None): labels = inputs.pop("labels") labels = labels.to(model.device) # forward pass outputs = model(**inputs) logits = outputs.logits.float() logits = logits.to(model.device) # compute custom loss loss = torch.nn.CrossEntropyLoss(weight=self.tensor_class_w, reduction='none') loss = loss.to(model.device) if self.tensor_class_w is not None: """In case of imbalance data compute focal loss""" loss = loss(logits.view(-1, self.model.config.num_labels), labels.view(-1)) pt = torch.exp(-loss) loss = ((1-pt)**self.gamma*loss).mean() return (loss, outputs) if return_outputs else loss ``` # Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline model = pipeline(task='sentiment-analysis', model='alexander-sh/mDeBERTa-v3-multi-sent', device='cuda') model('Keep your face always toward the sunshine—and shadows will fall behind you.') >>> [{'label': 'positive', 'score': 0.6478521227836609}] model('I am not coming with you.') >>> [{'label': 'neutral', 'score': 0.790919840335846}] model("I am hating that my transformer model don't work properly.") >>> [{'label': 'negative', 'score': 0.7474458813667297}] ``` # Evaluation and comparison with Vanilla and GPT-4o model: | Dataset | Model | F1 | Accuracy | |------------------|--------|--------|----------| | | Vanilla| 0.0000 | 0.0000 | | **sst2** | Our | 0.6161 | 0.9231 | | | GPT-4 | 0.6113 | 0.8605 | |---|---|---|---| | | Vanilla| 0.2453 | 0.5820 | | **sent-eng** | Our | 0.6289 | 0.6470 | | | GPT-4 | 0.4611 | 0.5870 | |---|---|---|---| | | Vanilla| 0.0889 | 0.1538 | | **sent-twi** | Our | 0.3368 | 0.3488 | | | GPT-4 | 0.5049 | 0.5385 | |---|---|---|---| | | Vanilla| 0.0000 | 0.0000 | | **mixed** | Our | 0.5644 | 0.7786 | | | GPT-4 | 0.5336 | 0.6863 | |---|---|---|---| | | Vanilla| 0.1475 | 0.2842 | | **absc-laptop** | Our | 0.5513 | 0.6682 | | | GPT-4 | 0.6679 | 0.7642 | |---|---|---|---| | | Vanilla| 0.1045 | 0.1858 | | **absc-rest** | Our | 0.6149 | 0.7726 | | | GPT-4 | 0.7057 | 0.8385 | |---|---|---|---| | | Vanilla| 0.1455 | 0.2791 | | **stanford** | Our | 0.8352 | 0.8353 | | | GPT-4 | 0.8045 | 0.8032 | |---|---|---|---| | | Vanilla| 0.0000 | 0.0000 | | **amazon-var** | Our | 0.6432 | 0.9647 | | | GPT-4 | ----- | 0.9450 | F1 score is measured with macro average computation parameter. # Source code [GitHub Repository](https://github.com/alexdrk14/mDeBERTa-v3-multi-sent)