train_qnli_1744902606

This model is a fine-tuned version of google/gemma-3-1b-it on the qnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0371
  • Num Input Tokens Seen: 73102784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0898 0.0339 200 0.0801 367200
0.0562 0.0679 400 0.0682 737312
0.0644 0.1018 600 0.0694 1102816
0.0614 0.1358 800 0.0580 1468736
0.0318 0.1697 1000 0.0552 1829952
0.0671 0.2037 1200 0.0516 2199200
0.0582 0.2376 1400 0.0509 2565536
0.0508 0.2716 1600 0.0522 2930336
0.0514 0.3055 1800 0.0492 3297216
0.0578 0.3395 2000 0.0466 3666880
0.0529 0.3734 2200 0.0446 4036544
0.0403 0.4073 2400 0.0459 4400256
0.0463 0.4413 2600 0.0544 4765408
0.065 0.4752 2800 0.0456 5130336
0.0477 0.5092 3000 0.0427 5495328
0.057 0.5431 3200 0.0415 5857280
0.0669 0.5771 3400 0.0421 6221504
0.0378 0.6110 3600 0.0428 6589568
0.0461 0.6450 3800 0.0434 6959584
0.0348 0.6789 4000 0.0434 7323712
0.051 0.7129 4200 0.0424 7690880
0.0322 0.7468 4400 0.0418 8053632
0.0414 0.7808 4600 0.0406 8417216
0.0495 0.8147 4800 0.0406 8782624
0.0455 0.8486 5000 0.0439 9145728
0.0537 0.8826 5200 0.0403 9513920
0.04 0.9165 5400 0.0409 9877152
0.0324 0.9505 5600 0.0397 10240128
0.0535 0.9844 5800 0.0392 10606272
0.0408 1.0183 6000 0.0403 10971744
0.0453 1.0523 6200 0.0423 11335648
0.0401 1.0862 6400 0.0424 11702592
0.0442 1.1202 6600 0.0476 12070112
0.0372 1.1541 6800 0.0390 12437088
0.0326 1.1881 7000 0.0391 12802848
0.0412 1.2220 7200 0.0413 13171040
0.0292 1.2560 7400 0.0397 13539968
0.0753 1.2899 7600 0.0427 13904864
0.0577 1.3238 7800 0.0402 14272512
0.0361 1.3578 8000 0.0381 14634880
0.0222 1.3917 8200 0.0386 15002592
0.0266 1.4257 8400 0.0429 15369600
0.0196 1.4596 8600 0.0382 15731008
0.0247 1.4936 8800 0.0390 16092896
0.0682 1.5275 9000 0.0400 16458208
0.049 1.5615 9200 0.0387 16823328
0.0613 1.5954 9400 0.0383 17185120
0.0385 1.6294 9600 0.0374 17551488
0.0497 1.6633 9800 0.0376 17914752
0.0469 1.6972 10000 0.0383 18281888
0.0365 1.7312 10200 0.0401 18645120
0.0498 1.7651 10400 0.0381 19010848
0.0379 1.7991 10600 0.0379 19377344
0.0237 1.8330 10800 0.0391 19739232
0.0261 1.8670 11000 0.0378 20107584
0.0447 1.9009 11200 0.0381 20470912
0.0389 1.9349 11400 0.0377 20832736
0.032 1.9688 11600 0.0378 21199808
0.0346 2.0027 11800 0.0371 21568384
0.0255 2.0367 12000 0.0423 21931424
0.0267 2.0706 12200 0.0384 22294816
0.0299 2.1046 12400 0.0427 22655968
0.0243 2.1385 12600 0.0440 23020896
0.0381 2.1724 12800 0.0394 23383104
0.0146 2.2064 13000 0.0394 23746656
0.0214 2.2403 13200 0.0388 24110208
0.0479 2.2743 13400 0.0372 24476544
0.0325 2.3082 13600 0.0385 24841440
0.0293 2.3422 13800 0.0392 25206624
0.0163 2.3761 14000 0.0403 25573280
0.033 2.4101 14200 0.0405 25939392
0.0217 2.4440 14400 0.0400 26303968
0.0319 2.4780 14600 0.0408 26666944
0.0511 2.5119 14800 0.0389 27035136
0.0265 2.5458 15000 0.0468 27406144
0.027 2.5798 15200 0.0383 27772832
0.0183 2.6137 15400 0.0413 28134848
0.0104 2.6477 15600 0.0381 28505504
0.0182 2.6816 15800 0.0379 28870784
0.0489 2.7156 16000 0.0384 29233952
0.0451 2.7495 16200 0.0381 29603328
0.0274 2.7835 16400 0.0406 29968768
0.0247 2.8174 16600 0.0424 30334496
0.0182 2.8514 16800 0.0401 30703616
0.038 2.8853 17000 0.0383 31068224
0.029 2.9193 17200 0.0405 31438688
0.02 2.9532 17400 0.0377 31802368
0.0345 2.9871 17600 0.0389 32165728
0.0076 3.0210 17800 0.0448 32528896
0.0214 3.0550 18000 0.0438 32897376
0.0316 3.0889 18200 0.0436 33262688
0.0128 3.1229 18400 0.0466 33623616
0.0179 3.1568 18600 0.0442 33989920
0.0272 3.1908 18800 0.0457 34354528
0.0171 3.2247 19000 0.0471 34724672
0.0327 3.2587 19200 0.0459 35092288
0.026 3.2926 19400 0.0445 35458048
0.0206 3.3266 19600 0.0477 35826240
0.0286 3.3605 19800 0.0470 36191232
0.0131 3.3944 20000 0.0457 36553088
0.0068 3.4284 20200 0.0459 36917376
0.0211 3.4623 20400 0.0442 37284512
0.0231 3.4963 20600 0.0501 37649248
0.019 3.5302 20800 0.0492 38012256
0.0074 3.5642 21000 0.0467 38378592
0.0128 3.5981 21200 0.0504 38743328
0.0086 3.6321 21400 0.0448 39111200
0.0135 3.6660 21600 0.0474 39473536
0.0162 3.7000 21800 0.0456 39836704
0.0172 3.7339 22000 0.0436 40202176
0.012 3.7679 22200 0.0445 40568544
0.0169 3.8018 22400 0.0455 40932032
0.0253 3.8357 22600 0.0437 41296544
0.0123 3.8697 22800 0.0454 41661472
0.0147 3.9036 23000 0.0460 42031616
0.0102 3.9376 23200 0.0453 42395200
0.0168 3.9715 23400 0.0451 42760960
0.003 4.0054 23600 0.0480 43128480
0.008 4.0394 23800 0.0485 43492288
0.0094 4.0733 24000 0.0513 43859360
0.0151 4.1073 24200 0.0591 44222400
0.0028 4.1412 24400 0.0590 44585632
0.0037 4.1752 24600 0.0627 44956064
0.0035 4.2091 24800 0.0609 45323456
0.0077 4.2431 25000 0.0584 45688544
0.0164 4.2770 25200 0.0547 46054272
0.0164 4.3109 25400 0.0616 46420608
0.0127 4.3449 25600 0.0559 46787232
0.0224 4.3788 25800 0.0569 47151008
0.002 4.4128 26000 0.0547 47516064
0.0043 4.4467 26200 0.0612 47880960
0.0038 4.4807 26400 0.0576 48244480
0.0251 4.5146 26600 0.0549 48612352
0.0054 4.5486 26800 0.0569 48977376
0.0139 4.5825 27000 0.0543 49343328
0.0043 4.6165 27200 0.0558 49712064
0.0215 4.6504 27400 0.0580 50076832
0.003 4.6843 27600 0.0612 50439616
0.0027 4.7183 27800 0.0604 50803552
0.0148 4.7522 28000 0.0574 51165472
0.0049 4.7862 28200 0.0568 51527808
0.0091 4.8201 28400 0.0593 51895200
0.0187 4.8541 28600 0.0556 52259648
0.016 4.8880 28800 0.0572 52628032
0.0128 4.9220 29000 0.0587 52997024
0.0294 4.9559 29200 0.0565 53364352
0.0075 4.9899 29400 0.0543 53730624
0.0027 5.0238 29600 0.0597 54094208
0.0005 5.0577 29800 0.0667 54461312
0.0025 5.0917 30000 0.0701 54825216
0.0008 5.1256 30200 0.0630 55189504
0.0117 5.1595 30400 0.0680 55553280
0.0037 5.1935 30600 0.0669 55917792
0.0082 5.2274 30800 0.0647 56282176
0.0081 5.2614 31000 0.0695 56643104
0.0009 5.2953 31200 0.0686 57005120
0.0081 5.3293 31400 0.0704 57373152
0.0262 5.3632 31600 0.0708 57735872
0.0038 5.3972 31800 0.0743 58101536
0.006 5.4311 32000 0.0705 58472288
0.0041 5.4651 32200 0.0664 58840960
0.007 5.4990 32400 0.0684 59204992
0.0105 5.5329 32600 0.0679 59570752
0.0128 5.5669 32800 0.0753 59937728
0.0017 5.6008 33000 0.0706 60306240
0.008 5.6348 33200 0.0724 60675168
0.0103 5.6687 33400 0.0732 61042176
0.014 5.7027 33600 0.0721 61409120
0.0045 5.7366 33800 0.0715 61775168
0.0008 5.7706 34000 0.0717 62143616
0.0134 5.8045 34200 0.0742 62507552
0.0127 5.8385 34400 0.0731 62872928
0.0041 5.8724 34600 0.0733 63234816
0.0006 5.9064 34800 0.0747 63599616
0.0028 5.9403 35000 0.0746 63966688
0.0006 5.9742 35200 0.0752 64332704
0.0091 6.0081 35400 0.0762 64693664
0.0006 6.0421 35600 0.0755 65053728
0.0099 6.0760 35800 0.0791 65419648
0.003 6.1100 36000 0.0794 65786464
0.0011 6.1439 36200 0.0783 66152416
0.0004 6.1779 36400 0.0806 66522528
0.0093 6.2118 36600 0.0813 66888512
0.0016 6.2458 36800 0.0806 67255840
0.009 6.2797 37000 0.0809 67620416
0.0139 6.3137 37200 0.0815 67983360
0.024 6.3476 37400 0.0813 68348480
0.0015 6.3816 37600 0.0812 68715840
0.0013 6.4155 37800 0.0813 69081536
0.0094 6.4494 38000 0.0820 69446208
0.0022 6.4834 38200 0.0818 69813728
0.0011 6.5173 38400 0.0818 70182464
0.0014 6.5513 38600 0.0817 70547904
0.0007 6.5852 38800 0.0813 70911456
0.0009 6.6192 39000 0.0811 71277536
0.0126 6.6531 39200 0.0811 71642624
0.0108 6.6871 39400 0.0812 72006592
0.0007 6.7210 39600 0.0812 72370176
0.0008 6.7550 39800 0.0813 72737088
0.0011 6.7889 40000 0.0814 73102784

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_qnli_1744902606

Adapter
(50)
this model