alexander-sh
/

mDeBERTa-v3-multi-sent

Model card Files Files and versions Community

alexander-sh commited on Mar 10

Commit

c05e354

·

verified ·

1 Parent(s): d24534d

Update README.md

Files changed (1) hide show

README.md +57 -0

README.md CHANGED Viewed

@@ -42,6 +42,63 @@ In order to train the model the following dataset where used:
  - ABSC amazon review
  - SST2
 # Evaluation and comparison with Vanilla and GPT-4o model:
 | Dataset          | Model  | F1     | Accuracy |

  - ABSC amazon review
  - SST2
+# Model parameters
+Defined training arguments:
+```python
+TrainingArguments(
+        label_smoothing_factor=0.1,  # Add label smoothing
+        evaluation_strategy="epoch",
+        greater_is_better=True,
+        # Adding weight decay
+        weight_decay=0.02,
+        num_train_epochs=10,
+        learning_rate=5e-6,  # 1e-5,
+        optim="adamw_torch",
+        adam_beta1=0.9,
+        adam_beta2=0.999,
+        adam_epsilon=1e-6,
+        max_grad_norm=0.5,  # 1.0, # clipping
+        lr_scheduler_type='cosine',
+        per_device_train_batch_size=48,
+        per_device_eval_batch_size=48,
+        gradient_accumulation_steps=1,
+        gradient_checkpointing=True,
+        warmup_ratio=0.1,
+        fp16=False,
+        logging_strategy="epoch",
+        save_strategy="epoch",
+        metric_for_best_model="f1",
+        save_total_limit=3,
+    )
+```
+Additionaly dropout where changed to:
+```python
+model.config.classifier_dropout = 0.3  # Set classifier dropout rate
+model.config.hidden_dropout_prob = 0.2  # Add hidden layer dropout
+model.config.attention_probs_dropout_prob = 0.2  # Add attention dropout
+```
+Also in order to improve model generalization we make custom compute loss with focal loss function and pre-computed class weights:
+```python
+def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
+        labels = inputs.pop("labels")
+        labels = labels.to(model.device)
+        # forward pass
+        outputs = model(**inputs)
+        logits = outputs.logits.float()
+        logits = logits.to(model.device)
+        # compute custom loss
+        loss = torch.nn.CrossEntropyLoss(weight=self.tensor_class_w, reduction='none')
+        loss = loss.to(model.device)
+        if self.tensor_class_w is not None:
+            """In case of imbalance data compute focal loss"""
+            loss = loss(logits.view(-1, self.model.config.num_labels), labels.view(-1))
+            pt = torch.exp(-loss)
+            loss = ((1-pt)**self.gamma*loss).mean()
+        return (loss, outputs) if return_outputs else loss
+```
 # Evaluation and comparison with Vanilla and GPT-4o model:
 | Dataset          | Model  | F1     | Accuracy |