Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,63 @@ In order to train the model the following dataset where used:
|
|
42 |
- ABSC amazon review
|
43 |
- SST2
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
# Evaluation and comparison with Vanilla and GPT-4o model:
|
46 |
|
47 |
| Dataset | Model | F1 | Accuracy |
|
|
|
42 |
- ABSC amazon review
|
43 |
- SST2
|
44 |
|
45 |
+
# Model parameters
|
46 |
+
|
47 |
+
Defined training arguments:
|
48 |
+
```python
|
49 |
+
TrainingArguments(
|
50 |
+
label_smoothing_factor=0.1, # Add label smoothing
|
51 |
+
evaluation_strategy="epoch",
|
52 |
+
greater_is_better=True,
|
53 |
+
# Adding weight decay
|
54 |
+
weight_decay=0.02,
|
55 |
+
num_train_epochs=10,
|
56 |
+
learning_rate=5e-6, # 1e-5,
|
57 |
+
optim="adamw_torch",
|
58 |
+
adam_beta1=0.9,
|
59 |
+
adam_beta2=0.999,
|
60 |
+
adam_epsilon=1e-6,
|
61 |
+
max_grad_norm=0.5, # 1.0, # clipping
|
62 |
+
lr_scheduler_type='cosine',
|
63 |
+
per_device_train_batch_size=48,
|
64 |
+
per_device_eval_batch_size=48,
|
65 |
+
gradient_accumulation_steps=1,
|
66 |
+
gradient_checkpointing=True,
|
67 |
+
warmup_ratio=0.1,
|
68 |
+
fp16=False,
|
69 |
+
logging_strategy="epoch",
|
70 |
+
save_strategy="epoch",
|
71 |
+
metric_for_best_model="f1",
|
72 |
+
save_total_limit=3,
|
73 |
+
)
|
74 |
+
```
|
75 |
+
Additionaly dropout where changed to:
|
76 |
+
```python
|
77 |
+
model.config.classifier_dropout = 0.3 # Set classifier dropout rate
|
78 |
+
model.config.hidden_dropout_prob = 0.2 # Add hidden layer dropout
|
79 |
+
model.config.attention_probs_dropout_prob = 0.2 # Add attention dropout
|
80 |
+
```
|
81 |
+
|
82 |
+
Also in order to improve model generalization we make custom compute loss with focal loss function and pre-computed class weights:
|
83 |
+
```python
|
84 |
+
def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
|
85 |
+
labels = inputs.pop("labels")
|
86 |
+
labels = labels.to(model.device)
|
87 |
+
# forward pass
|
88 |
+
outputs = model(**inputs)
|
89 |
+
logits = outputs.logits.float()
|
90 |
+
logits = logits.to(model.device)
|
91 |
+
# compute custom loss
|
92 |
+
loss = torch.nn.CrossEntropyLoss(weight=self.tensor_class_w, reduction='none')
|
93 |
+
loss = loss.to(model.device)
|
94 |
+
if self.tensor_class_w is not None:
|
95 |
+
"""In case of imbalance data compute focal loss"""
|
96 |
+
loss = loss(logits.view(-1, self.model.config.num_labels), labels.view(-1))
|
97 |
+
pt = torch.exp(-loss)
|
98 |
+
loss = ((1-pt)**self.gamma*loss).mean()
|
99 |
+
return (loss, outputs) if return_outputs else loss
|
100 |
+
```
|
101 |
+
|
102 |
# Evaluation and comparison with Vanilla and GPT-4o model:
|
103 |
|
104 |
| Dataset | Model | F1 | Accuracy |
|