mt5-lora-hf

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8417
  • Rouge1: 4.6911
  • Rouge2: 0.0143
  • Rougel: 4.5972
  • Rougelsum: 4.5916

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
26.7014 0.0251 5 17.9690 0.4959 0.0229 0.4821 0.4821
22.8828 0.0503 10 17.8948 0.4330 0.0229 0.4394 0.4427
25.3054 0.0754 15 17.7656 0.4335 0.0229 0.4399 0.4431
24.2626 0.1005 20 17.6523 0.4614 0.0229 0.4498 0.4541
26.9164 0.1256 25 17.4744 0.4291 0.0229 0.4298 0.4340
26.7442 0.1508 30 17.3540 0.4528 0.0229 0.4507 0.4560
22.8846 0.1759 35 17.1714 0.4395 0.0229 0.4407 0.4423
23.0382 0.2010 40 17.0225 0.4176 0.0229 0.4200 0.4272
24.5139 0.2261 45 16.9602 0.3881 0.0229 0.3958 0.4018
23.5225 0.2513 50 16.8832 0.4163 0.0229 0.4241 0.4257
24.7283 0.2764 55 16.7457 0.4156 0.0229 0.4235 0.4254
24.5206 0.3015 60 16.4804 0.4285 0.0229 0.4337 0.4356
22.8146 0.3266 65 16.2983 0.4012 0.0229 0.4064 0.4090
21.381 0.3518 70 16.0329 0.3890 0.0229 0.3922 0.3988
23.4543 0.3769 75 15.8419 0.3890 0.0229 0.3922 0.3988
20.3948 0.4020 80 15.6935 0.4034 0.0229 0.4058 0.4152
23.315 0.4271 85 15.5347 0.3758 0.0229 0.3724 0.3848
20.4828 0.4523 90 15.4258 0.3758 0.0229 0.3724 0.3848
22.6177 0.4774 95 15.2736 0.3821 0.0229 0.3830 0.3916
22.7118 0.5025 100 15.0719 0.3408 0.0229 0.3414 0.3506
24.145 0.5276 105 14.8205 0.3314 0.0229 0.3350 0.3426
21.6796 0.5528 110 14.5375 0.3310 0.0229 0.3348 0.3413
21.0313 0.5779 115 14.2323 0.3313 0.0229 0.3349 0.3425
20.1509 0.6030 120 13.9040 0.3306 0.0229 0.3340 0.3412
20.8036 0.6281 125 13.5365 0.3903 0.0493 0.3953 0.3977
18.9977 0.6533 130 13.2251 0.3972 0.0493 0.3996 0.4052
21.1749 0.6784 135 13.0306 0.3848 0.0493 0.3826 0.3906
18.1424 0.7035 140 12.7417 0.3568 0.0493 0.3589 0.3579
17.3758 0.7286 145 12.5348 0.3567 0.0493 0.3590 0.3577
21.9186 0.7538 150 12.2754 0.3735 0.0493 0.3773 0.3764
18.0612 0.7789 155 11.9966 0.3858 0.0493 0.3916 0.3914
16.8307 0.8040 160 11.8330 0.3154 0.0493 0.3172 0.3202
17.5778 0.8291 165 11.6714 0.2914 0.0493 0.2913 0.2967
18.1477 0.8543 170 11.4841 0.2913 0.0493 0.2913 0.2964
15.9704 0.8794 175 11.3850 0.3049 0.0493 0.3017 0.3097
17.1034 0.9045 180 11.2041 0.2991 0.0619 0.2978 0.3027
19.7897 0.9296 185 11.0869 0.3345 0.0662 0.3265 0.3309
16.608 0.9548 190 10.9445 0.3345 0.0662 0.3265 0.3309
17.1781 0.9799 195 10.8493 0.3345 0.0662 0.3264 0.3308
14.5506 1.0050 200 10.7443 0.3557 0.0662 0.3475 0.3524
15.1794 1.0302 205 10.6058 0.3699 0.0662 0.3608 0.3684
14.1433 1.0553 210 10.4693 0.3699 0.0662 0.3608 0.3684
15.3501 1.0804 215 10.2548 0.3697 0.0662 0.3600 0.3678
14.2343 1.1055 220 10.0423 0.3992 0.0662 0.3840 0.3902
13.6561 1.1307 225 9.8362 0.3637 0.0594 0.3538 0.3593
14.1522 1.1558 230 9.6526 0.4294 0.0662 0.4139 0.4260
12.2793 1.1809 235 9.4753 0.4294 0.0662 0.4139 0.4260
12.999 1.2060 240 9.3080 0.4818 0.0662 0.4527 0.4640
12.6114 1.2312 245 9.1514 0.5574 0.0843 0.5411 0.5470
12.6979 1.2563 250 9.0131 0.5651 0.0843 0.5186 0.5269
11.6085 1.2814 255 8.8960 0.5875 0.0761 0.5317 0.5412
11.9352 1.3065 260 8.7639 0.6295 0.1163 0.5637 0.5745
12.1973 1.3317 265 8.6219 0.6309 0.1163 0.5660 0.5754
11.8386 1.3568 270 8.5034 0.6541 0.1163 0.5877 0.5997
11.1427 1.3819 275 8.3913 0.7308 0.1302 0.6666 0.6784
11.6074 1.4070 280 8.2927 0.7541 0.1298 0.6892 0.7015
11.879 1.4322 285 8.1927 0.7048 0.0662 0.6537 0.6663
12.5362 1.4573 290 8.0887 0.7321 0.0662 0.6832 0.6939
11.1034 1.4824 295 7.9519 0.7805 0.0742 0.7167 0.7279
10.742 1.5075 300 7.7980 0.9069 0.1048 0.8357 0.8377
10.3591 1.5327 305 7.6581 1.1354 0.2100 0.9999 1.0172
11.0729 1.5578 310 7.5439 1.2665 0.2698 1.1212 1.1410
11.0979 1.5829 315 7.4250 1.2760 0.2269 1.1590 1.1749
10.7504 1.6080 320 7.3066 1.0842 0.1685 1.0249 1.0238
11.2598 1.6332 325 7.2042 1.1594 0.1687 1.0796 1.0846
10.4366 1.6583 330 7.1160 1.1594 0.1687 1.0796 1.0846
10.0824 1.6834 335 7.0343 1.1730 0.1684 1.0898 1.0982
9.9589 1.7085 340 6.9468 1.1306 0.1686 1.0485 1.0545
10.3309 1.7337 345 6.8646 1.1736 0.1824 1.0809 1.0876
9.6166 1.7588 350 6.7984 1.1165 0.1690 1.0478 1.0482
9.3742 1.7839 355 6.7329 1.2620 0.2014 1.1354 1.1476
9.853 1.8090 360 6.6669 1.2627 0.2014 1.1455 1.1582
10.1404 1.8342 365 6.6068 1.3100 0.2156 1.2083 1.2155
9.3509 1.8593 370 6.5692 1.2194 0.2020 1.1225 1.1152
8.8801 1.8844 375 6.5346 1.1335 0.1431 1.0131 1.0107
9.3656 1.9095 380 6.5026 1.1119 0.1291 1.0170 1.0141
9.0491 1.9347 385 6.4711 1.2375 0.1293 1.1102 1.1153
9.6425 1.9598 390 6.4447 1.2243 0.1409 1.1258 1.1284
8.8074 1.9849 395 6.4136 1.3684 0.2034 1.2323 1.2465
8.6168 2.0101 400 6.3833 1.4884 0.1787 1.3723 1.3762
8.9557 2.0352 405 6.3572 1.4520 0.1638 1.2990 1.2947
9.101 2.0603 410 6.3413 1.6343 0.1641 1.4365 1.4336
8.438 2.0854 415 6.3290 1.6232 0.1505 1.4573 1.4618
8.6262 2.1106 420 6.3048 1.6377 0.1245 1.5029 1.5195
8.9535 2.1357 425 6.2767 1.6318 0.1749 1.5081 1.5353
8.3392 2.1608 430 6.2523 1.6524 0.1743 1.5073 1.5367
8.6226 2.1859 435 6.2420 1.6597 0.1743 1.5340 1.5374
8.5399 2.2111 440 6.2366 1.6552 0.1743 1.5178 1.5262
8.4814 2.2362 445 6.2269 1.6997 0.1599 1.5507 1.5492
8.402 2.2613 450 6.2206 1.7743 0.1729 1.6229 1.6207
8.1715 2.2864 455 6.2157 1.7005 0.1731 1.5580 1.5519
8.3982 2.3116 460 6.2149 1.7912 0.1731 1.6301 1.6303
8.2935 2.3367 465 6.2051 1.8151 0.1813 1.6772 1.6708
8.1023 2.3618 470 6.1992 1.7187 0.1813 1.6023 1.5988
8.4083 2.3869 475 6.1925 1.7018 0.1816 1.5594 1.5567
8.3179 2.4121 480 6.1839 1.7161 0.1941 1.5709 1.5685
7.8477 2.4372 485 6.1699 1.6932 0.1816 1.5782 1.5846
7.9573 2.4623 490 6.1573 1.7737 0.1941 1.6399 1.6485
8.3412 2.4874 495 6.1501 1.6200 0.1553 1.5142 1.5160
8.2275 2.5126 500 6.1420 1.5973 0.1553 1.4930 1.4994
7.7802 2.5377 505 6.1371 1.5843 0.1553 1.4519 1.4570
8.2208 2.5628 510 6.1301 1.5734 0.1552 1.4672 1.4746
7.988 2.5879 515 6.1250 1.6141 0.1552 1.5196 1.5243
8.0406 2.6131 520 6.1216 1.5612 0.1317 1.4781 1.4902
7.6537 2.6382 525 6.1177 1.5042 0.1174 1.4341 1.4512
7.7706 2.6633 530 6.1124 1.5480 0.1110 1.4766 1.4870
7.7587 2.6884 535 6.1041 1.6054 0.0975 1.5301 1.5384
7.5912 2.7136 540 6.0947 1.6413 0.0975 1.5722 1.5747
7.6195 2.7387 545 6.0872 1.6897 0.0975 1.6322 1.6231
7.9719 2.7638 550 6.0840 1.6390 0.0980 1.5458 1.5456
7.5861 2.7889 555 6.0818 1.7055 0.0984 1.6106 1.6002
7.3751 2.8141 560 6.0693 1.7887 0.0984 1.7099 1.6969
7.4287 2.8392 565 6.0521 1.9438 0.0984 1.8707 1.8458
7.8715 2.8643 570 6.0418 1.9864 0.0742 1.9169 1.9069
7.5668 2.8894 575 6.0371 2.0494 0.0742 1.9751 1.9491
7.5644 2.9146 580 6.0284 2.0795 0.0742 2.0033 1.9955
7.5837 2.9397 585 6.0187 2.0435 0.0593 1.9903 1.9873
7.8794 2.9648 590 6.0076 2.1343 0.0593 2.0976 2.0879
7.4229 2.9899 595 5.9940 2.1421 0.0593 2.1090 2.0987
7.3116 3.0151 600 5.9697 2.2915 0.0593 2.2485 2.2434
7.237 3.0402 605 5.9432 2.2761 0.0593 2.2335 2.2297
7.5251 3.0653 610 5.9066 2.3241 0.0593 2.2857 2.2795
7.5311 3.0905 615 5.8749 2.3968 0.0593 2.3618 2.3526
7.3948 3.1156 620 5.8503 2.4292 0.0722 2.3934 2.3859
7.4102 3.1407 625 5.8441 2.5045 0.0722 2.4562 2.4443
7.3152 3.1658 630 5.8373 2.5838 0.0722 2.5309 2.5271
7.2793 3.1910 635 5.8287 2.5969 0.0722 2.5425 2.5405
7.2854 3.2161 640 5.8204 2.6641 0.0722 2.6240 2.6098
7.2151 3.2412 645 5.8081 2.7296 0.0722 2.6823 2.6686
7.1616 3.2663 650 5.7995 2.8340 0.0721 2.7816 2.7651
7.2671 3.2915 655 5.7911 2.9706 0.0721 2.9038 2.8953
7.3364 3.3166 660 5.7806 3.0656 0.0721 2.9978 2.9924
7.345 3.3417 665 5.7695 3.1378 0.0721 3.0744 3.0664
7.3118 3.3668 670 5.7532 3.1238 0.0722 3.0706 3.0618
7.4469 3.3920 675 5.7453 3.1653 0.0722 3.1293 3.1149
7.2567 3.4171 680 5.7376 3.1821 0.0722 3.1314 3.1141
7.1828 3.4422 685 5.7268 3.2855 0.0848 3.2291 3.1953
7.3317 3.4673 690 5.7070 3.3113 0.0848 3.2480 3.2158
7.1762 3.4925 695 5.6925 3.3213 0.0848 3.2595 3.2338
7.0286 3.5176 700 5.6794 3.3345 0.0848 3.2713 3.2443
7.1958 3.5427 705 5.6638 3.3834 0.0848 3.3169 3.2835
7.2112 3.5678 710 5.6573 3.4198 0.0744 3.3372 3.3030
7.0299 3.5930 715 5.6404 3.5031 0.0744 3.4199 3.3898
7.4005 3.6181 720 5.6231 3.5545 0.0744 3.4659 3.4403
7.2407 3.6432 725 5.6160 3.6875 0.0744 3.6277 3.6033
7.1189 3.6683 730 5.6075 3.7917 0.0744 3.7315 3.7144
7.0044 3.6935 735 5.5928 3.9431 0.0744 3.8972 3.8828
7.0864 3.7186 740 5.5823 3.9375 0.0593 3.8940 3.8878
7.3772 3.7437 745 5.5713 3.9630 0.0593 3.9203 3.9155
7.0098 3.7688 750 5.5583 4.1243 0.0744 4.0620 4.0602
6.8234 3.7940 755 5.5445 4.1046 0.0593 4.0478 4.0421
7.1442 3.8191 760 5.5222 4.0768 0.0593 4.0170 4.0034
6.9834 3.8442 765 5.5042 4.0013 0.0 3.9340 3.9286
7.0864 3.8693 770 5.4785 4.0012 0.0 3.9335 3.9288
6.863 3.8945 775 5.4519 4.0494 0.0 3.9829 3.9836
6.8511 3.9196 780 5.4235 4.1077 0.0 4.0382 4.0420
6.8788 3.9447 785 5.3975 4.2017 0.0 4.1333 4.1334
6.6429 3.9698 790 5.3782 4.2539 0.0 4.2007 4.1935
6.8546 3.9950 795 5.3626 4.3106 0.0 4.2576 4.2563
6.8145 4.0201 800 5.3410 4.3560 0.0 4.3242 4.3170
6.7826 4.0452 805 5.3263 4.3417 0.0 4.3236 4.3164
6.9502 4.0704 810 5.3144 4.3921 0.0 4.3602 4.3603
6.6682 4.0955 815 5.3021 4.3912 0.0 4.3575 4.3582
6.7195 4.1206 820 5.2895 4.3898 0.0 4.3564 4.3573
6.8389 4.1457 825 5.2787 4.3896 0.0 4.3561 4.3573
6.9199 4.1709 830 5.2683 4.4240 0.0 4.3939 4.3847
6.8859 4.1960 835 5.2588 4.4237 0.0 4.3934 4.3838
6.7521 4.2211 840 5.2438 4.5064 0.0 4.4407 4.4426
6.7168 4.2462 845 5.2284 4.5129 0.0 4.4405 4.4425
7.2573 4.2714 850 5.2204 4.4857 0.0 4.4131 4.4064
6.5104 4.2965 855 5.2126 4.5147 0.0 4.4269 4.4225
6.6178 4.3216 860 5.2034 4.5454 0.0 4.4468 4.4405
6.5719 4.3467 865 5.1942 4.5156 0.0 4.4389 4.4297
6.7698 4.3719 870 5.1824 4.5155 0.0 4.4383 4.4294
6.5936 4.3970 875 5.1708 4.5155 0.0 4.4383 4.4294
6.6705 4.4221 880 5.1564 4.5968 0.0 4.4961 4.5006
6.8366 4.4472 885 5.1465 4.5968 0.0 4.4961 4.5006
6.8101 4.4724 890 5.1397 4.5968 0.0 4.4960 4.5006
6.6961 4.4975 895 5.1324 4.5962 0.0 4.4958 4.5003
6.8763 4.5226 900 5.1243 4.6352 0.0 4.5400 4.5474
6.7891 4.5477 905 5.1166 4.6352 0.0 4.5400 4.5474
6.6563 4.5729 910 5.1063 4.6352 0.0 4.5400 4.5474
6.7201 4.5980 915 5.0973 4.6352 0.0 4.5395 4.5464
6.6108 4.6231 920 5.0880 4.6472 0.0 4.5387 4.5463
6.5714 4.6482 925 5.0786 4.6379 0.0 4.5388 4.5456
6.5445 4.6734 930 5.0695 4.6765 0.0 4.5586 4.5678
6.6799 4.6985 935 5.0614 4.6765 0.0 4.5586 4.5678
6.568 4.7236 940 5.0531 4.6765 0.0 4.5579 4.5675
6.2814 4.7487 945 5.0416 4.6892 0.0 4.5728 4.5799
6.8206 4.7739 950 5.0313 4.6892 0.0 4.5728 4.5799
6.5936 4.7990 955 5.0224 4.6957 0.0 4.5869 4.5870
6.662 4.8241 960 5.0125 4.6957 0.0 4.5869 4.5870
6.6761 4.8492 965 5.0031 4.6957 0.0 4.5869 4.5870
6.8252 4.8744 970 4.9960 4.6957 0.0 4.5868 4.5869
6.4136 4.8995 975 4.9888 4.6957 0.0 4.5868 4.5869
6.6854 4.9246 980 4.9813 4.6957 0.0 4.5868 4.5869
6.3622 4.9497 985 4.9766 4.6957 0.0 4.5868 4.5869
6.6554 4.9749 990 4.9696 4.6960 0.0 4.5868 4.5870
6.4508 5.0 995 4.9637 4.7116 0.0 4.6024 4.6028
6.701 5.0251 1000 4.9589 4.7116 0.0 4.6024 4.6028
6.2751 5.0503 1005 4.9519 4.7116 0.0 4.6024 4.6029
6.5376 5.0754 1010 4.9458 4.7116 0.0 4.6024 4.6029
6.4412 5.1005 1015 4.9417 4.7115 0.0 4.6020 4.6028
6.5644 5.1256 1020 4.9374 4.6979 0.0 4.6030 4.6028
6.1549 5.1508 1025 4.9295 4.6909 0.0 4.5909 4.5974
6.4149 5.1759 1030 4.9197 4.6675 0.0 4.5708 4.5716
6.5379 5.2010 1035 4.9123 4.6675 0.0 4.5585 4.5552
6.3613 5.2261 1040 4.9056 4.6791 0.0 4.5692 4.5656
6.5305 5.2513 1045 4.9001 4.6791 0.0 4.5692 4.5656
6.5593 5.2764 1050 4.8975 4.6661 0.0143 4.5695 4.5663
6.529 5.3015 1055 4.8906 4.6550 0.0143 4.5524 4.5508
6.4264 5.3266 1060 4.8837 4.6550 0.0143 4.5524 4.5508
6.679 5.3518 1065 4.8797 4.6444 0.0143 4.5417 4.5448
6.4163 5.3769 1070 4.8762 4.6444 0.0143 4.5417 4.5448
6.5349 5.4020 1075 4.8723 4.6444 0.0143 4.5417 4.5448
6.469 5.4271 1080 4.8693 4.6444 0.0143 4.5417 4.5448
6.3743 5.4523 1085 4.8668 4.6444 0.0143 4.5417 4.5448
6.3293 5.4774 1090 4.8648 4.6442 0.0143 4.5415 4.5447
6.3905 5.5025 1095 4.8615 4.6442 0.0143 4.5415 4.5447
6.6543 5.5276 1100 4.8589 4.6442 0.0143 4.5415 4.5447
6.2526 5.5528 1105 4.8557 4.6442 0.0143 4.5415 4.5447
6.5861 5.5779 1110 4.8521 4.6586 0.0143 4.5602 4.5559
6.6042 5.6030 1115 4.8498 4.6586 0.0143 4.5602 4.5559
6.5273 5.6281 1120 4.8485 4.6586 0.0143 4.5602 4.5559
6.3963 5.6533 1125 4.8477 4.6586 0.0143 4.5602 4.5559
6.3541 5.6784 1130 4.8468 4.6586 0.0143 4.5602 4.5559
6.2128 5.7035 1135 4.8460 4.6586 0.0143 4.5602 4.5559
6.6066 5.7286 1140 4.8454 4.6586 0.0143 4.5602 4.5559
6.366 5.7538 1145 4.8448 4.6590 0.0143 4.5612 4.5562
6.4843 5.7789 1150 4.8440 4.6590 0.0143 4.5612 4.5562
6.8107 5.8040 1155 4.8434 4.6590 0.0143 4.5612 4.5562
6.2873 5.8291 1160 4.8429 4.6590 0.0143 4.5612 4.5562
6.5391 5.8543 1165 4.8426 4.6911 0.0143 4.5972 4.5916
6.7077 5.8794 1170 4.8425 4.6911 0.0143 4.5972 4.5916
6.5323 5.9045 1175 4.8424 4.6911 0.0143 4.5972 4.5916
6.1429 5.9296 1180 4.8422 4.6911 0.0143 4.5972 4.5916
6.457 5.9548 1185 4.8419 4.6911 0.0143 4.5972 4.5916
6.1296 5.9799 1190 4.8417 4.6911 0.0143 4.5972 4.5916

Framework versions

  • PEFT 0.14.0
  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for benitoals/mt5-lora-hf

Base model

google/mt5-small
Adapter
(13)
this model