alexander-sh commited on
Commit
c05e354
·
verified ·
1 Parent(s): d24534d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -42,6 +42,63 @@ In order to train the model the following dataset where used:
42
  - ABSC amazon review
43
  - SST2
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  # Evaluation and comparison with Vanilla and GPT-4o model:
46
 
47
  | Dataset | Model | F1 | Accuracy |
 
42
  - ABSC amazon review
43
  - SST2
44
 
45
+ # Model parameters
46
+
47
+ Defined training arguments:
48
+ ```python
49
+ TrainingArguments(
50
+ label_smoothing_factor=0.1, # Add label smoothing
51
+ evaluation_strategy="epoch",
52
+ greater_is_better=True,
53
+ # Adding weight decay
54
+ weight_decay=0.02,
55
+ num_train_epochs=10,
56
+ learning_rate=5e-6, # 1e-5,
57
+ optim="adamw_torch",
58
+ adam_beta1=0.9,
59
+ adam_beta2=0.999,
60
+ adam_epsilon=1e-6,
61
+ max_grad_norm=0.5, # 1.0, # clipping
62
+ lr_scheduler_type='cosine',
63
+ per_device_train_batch_size=48,
64
+ per_device_eval_batch_size=48,
65
+ gradient_accumulation_steps=1,
66
+ gradient_checkpointing=True,
67
+ warmup_ratio=0.1,
68
+ fp16=False,
69
+ logging_strategy="epoch",
70
+ save_strategy="epoch",
71
+ metric_for_best_model="f1",
72
+ save_total_limit=3,
73
+ )
74
+ ```
75
+ Additionaly dropout where changed to:
76
+ ```python
77
+ model.config.classifier_dropout = 0.3 # Set classifier dropout rate
78
+ model.config.hidden_dropout_prob = 0.2 # Add hidden layer dropout
79
+ model.config.attention_probs_dropout_prob = 0.2 # Add attention dropout
80
+ ```
81
+
82
+ Also in order to improve model generalization we make custom compute loss with focal loss function and pre-computed class weights:
83
+ ```python
84
+ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
85
+ labels = inputs.pop("labels")
86
+ labels = labels.to(model.device)
87
+ # forward pass
88
+ outputs = model(**inputs)
89
+ logits = outputs.logits.float()
90
+ logits = logits.to(model.device)
91
+ # compute custom loss
92
+ loss = torch.nn.CrossEntropyLoss(weight=self.tensor_class_w, reduction='none')
93
+ loss = loss.to(model.device)
94
+ if self.tensor_class_w is not None:
95
+ """In case of imbalance data compute focal loss"""
96
+ loss = loss(logits.view(-1, self.model.config.num_labels), labels.view(-1))
97
+ pt = torch.exp(-loss)
98
+ loss = ((1-pt)**self.gamma*loss).mean()
99
+ return (loss, outputs) if return_outputs else loss
100
+ ```
101
+
102
  # Evaluation and comparison with Vanilla and GPT-4o model:
103
 
104
  | Dataset | Model | F1 | Accuracy |