End of training
Browse files
README.md
CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
22 |
-
- eval_loss: 1.
|
23 |
-
- eval_runtime:
|
24 |
-
- eval_samples_per_second:
|
25 |
-
- eval_steps_per_second: 7.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -45,7 +45,7 @@ More information needed
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
48 |
-
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=
|
49 |
- train_embeddings: True
|
50 |
- learning_rate: 4e-05
|
51 |
- train_batch_size: 8
|
@@ -56,75 +56,75 @@ The following hyperparameters were used during training:
|
|
56 |
- num_epochs: 1.0
|
57 |
|
58 |
### Resource Usage
|
59 |
-
Peak GPU Memory:
|
60 |
|
61 |
### Eval-Phase Metrics
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
|
65 |
-
| 0 | 0 |
|
66 |
-
| 1000 | 0.0162 |
|
67 |
-
| 2000 | 0.0323 |
|
68 |
-
| 3000 | 0.0485 |
|
69 |
-
| 4000 | 0.0646 |
|
70 |
-
| 5000 | 0.0808 |
|
71 |
-
| 6000 | 0.0970 |
|
72 |
-
| 7000 | 0.1131 |
|
73 |
-
| 8000 | 0.1293 |
|
74 |
-
| 9000 | 0.1455 |
|
75 |
-
| 10000 | 0.1616 |
|
76 |
-
| 11000 | 0.1778 |
|
77 |
-
| 12000 | 0.1939 |
|
78 |
-
| 13000 | 0.2101 |
|
79 |
-
| 14000 | 0.2263 |
|
80 |
-
| 15000 | 0.2424 |
|
81 |
-
| 16000 | 0.2586 | 139.
|
82 |
-
| 17000 | 0.2747 | 136.
|
83 |
-
| 18000 | 0.2909 |
|
84 |
-
| 19000 | 0.3071 |
|
85 |
-
| 20000 | 0.3232 | 128.
|
86 |
-
| 21000 | 0.3394 |
|
87 |
-
| 22000 | 0.3556 |
|
88 |
-
| 23000 | 0.3717 |
|
89 |
-
| 24000 | 0.3879 |
|
90 |
-
| 25000 | 0.4040 |
|
91 |
-
| 26000 | 0.4202 |
|
92 |
-
| 27000 | 0.4364 |
|
93 |
-
| 28000 | 0.4525 |
|
94 |
-
| 29000 | 0.4687 |
|
95 |
-
| 30000 | 0.4848 |
|
96 |
-
| 31000 | 0.5010 |
|
97 |
-
| 32000 | 0.5172 |
|
98 |
-
| 33000 | 0.5333 |
|
99 |
-
| 34000 | 0.5495 | 116.
|
100 |
-
| 35000 | 0.5657 |
|
101 |
-
| 36000 | 0.5818 |
|
102 |
-
| 37000 | 0.5980 | 115.
|
103 |
-
| 38000 | 0.6141 |
|
104 |
-
| 39000 | 0.6303 | 113.
|
105 |
-
| 40000 | 0.6465 | 113.
|
106 |
-
| 41000 | 0.6626 |
|
107 |
-
| 42000 | 0.6788 | 112.
|
108 |
-
| 43000 | 0.6949 |
|
109 |
-
| 44000 | 0.7111 |
|
110 |
-
| 45000 | 0.7273 |
|
111 |
-
| 46000 | 0.7434 |
|
112 |
-
| 47000 | 0.7596 |
|
113 |
-
| 48000 | 0.7758 | 112.
|
114 |
-
| 49000 | 0.7919 |
|
115 |
-
| 50000 | 0.8081 |
|
116 |
-
| 51000 | 0.8242 | 110.
|
117 |
-
| 52000 | 0.8404 | 109.
|
118 |
-
| 53000 | 0.8566 |
|
119 |
-
| 54000 | 0.8727 |
|
120 |
-
| 55000 | 0.8889 |
|
121 |
-
| 56000 | 0.9051 |
|
122 |
-
| 57000 | 0.9212 |
|
123 |
-
| 58000 | 0.9374 |
|
124 |
-
| 59000 | 0.9535 | 110.
|
125 |
-
| 60000 | 0.9697 |
|
126 |
-
| 61000 | 0.9859 |
|
127 |
-
| 61875 | 1.0 | 109.
|
128 |
|
129 |
### Framework versions
|
130 |
- Distily 0.2.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 215.4055
|
20 |
+
- eval_frwikippl: 1190.7479
|
21 |
+
- eval_zhwikippl: 547.2146
|
22 |
+
- eval_loss: 1.2012
|
23 |
+
- eval_runtime: 86.3928
|
24 |
+
- eval_samples_per_second: 57.875
|
25 |
+
- eval_steps_per_second: 7.234
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
48 |
+
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
- learning_rate: 4e-05
|
51 |
- train_batch_size: 8
|
|
|
56 |
- num_epochs: 1.0
|
57 |
|
58 |
### Resource Usage
|
59 |
+
Peak GPU Memory: 8.2206 GB
|
60 |
|
61 |
### Eval-Phase Metrics
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
|
65 |
+
| 0 | 0 | 56314.7695 | 59887.2773 | 5.8256 | 86.2439 | 57.975 | 7.247 | 59033.8086 |
|
66 |
+
| 1000 | 0.0162 | 707.4770 | 4242.8809 | 1.8516 | 86.1491 | 58.039 | 7.255 | 11038.7695 |
|
67 |
+
| 2000 | 0.0323 | 507.7405 | 3239.7178 | 1.6796 | 86.186 | 58.014 | 7.252 | 1887.9902 |
|
68 |
+
| 3000 | 0.0485 | 425.0894 | 2858.4150 | 1.5756 | 86.0159 | 58.129 | 7.266 | 841.8765 |
|
69 |
+
| 4000 | 0.0646 | 361.4349 | 2351.2927 | 1.4943 | 86.0626 | 58.097 | 7.262 | 1237.3851 |
|
70 |
+
| 5000 | 0.0808 | 320.2736 | 1811.6420 | 1.4160 | 86.3077 | 57.932 | 7.242 | 941.8109 |
|
71 |
+
| 6000 | 0.0970 | 279.4263 | 1586.2935 | 1.3478 | 86.3392 | 57.911 | 7.239 | 744.3502 |
|
72 |
+
| 7000 | 0.1131 | 252.5366 | 1452.6782 | 1.2903 | 86.3844 | 57.881 | 7.235 | 651.1284 |
|
73 |
+
| 8000 | 0.1293 | 229.7639 | 1333.1338 | 1.2422 | 86.4019 | 57.869 | 7.234 | 586.1718 |
|
74 |
+
| 9000 | 0.1455 | 215.4055 | 1190.7479 | 1.2012 | 86.3928 | 57.875 | 7.234 | 547.2146 |
|
75 |
+
| 10000 | 0.1616 | 195.7073 | 1147.2347 | 1.1512 | 86.3689 | 57.891 | 7.236 | 673.5028 |
|
76 |
+
| 11000 | 0.1778 | 181.4088 | 1060.8735 | 1.1073 | 86.4921 | 57.809 | 7.226 | 521.8091 |
|
77 |
+
| 12000 | 0.1939 | 164.0534 | 896.9886 | 1.0636 | 86.3399 | 57.911 | 7.239 | 488.8237 |
|
78 |
+
| 13000 | 0.2101 | 157.4142 | 890.0587 | 1.0357 | 86.4286 | 57.851 | 7.231 | 510.7101 |
|
79 |
+
| 14000 | 0.2263 | 148.5198 | 793.2602 | 1.0069 | 86.4451 | 57.84 | 7.23 | 415.8904 |
|
80 |
+
| 15000 | 0.2424 | 143.5310 | 728.5455 | 0.9844 | 86.5014 | 57.803 | 7.225 | 414.5595 |
|
81 |
+
| 16000 | 0.2586 | 139.7042 | 766.6470 | 0.9726 | 86.5584 | 57.764 | 7.221 | 539.9557 |
|
82 |
+
| 17000 | 0.2747 | 136.4025 | 723.4780 | 0.9594 | 86.3816 | 57.883 | 7.235 | 877.2245 |
|
83 |
+
| 18000 | 0.2909 | 133.8320 | 733.1834 | 0.9461 | 86.4657 | 57.826 | 7.228 | 582.4266 |
|
84 |
+
| 19000 | 0.3071 | 130.4055 | 720.7795 | 0.9391 | 86.5854 | 57.746 | 7.218 | 564.7347 |
|
85 |
+
| 20000 | 0.3232 | 128.2763 | 679.3307 | 0.9259 | 86.469 | 57.824 | 7.228 | 364.2420 |
|
86 |
+
| 21000 | 0.3394 | 126.0545 | 666.4741 | 0.9208 | 86.3084 | 57.932 | 7.241 | 392.6297 |
|
87 |
+
| 22000 | 0.3556 | 126.3289 | 618.9599 | 0.9146 | 86.2819 | 57.95 | 7.244 | 383.1512 |
|
88 |
+
| 23000 | 0.3717 | 125.7710 | 652.6170 | 0.9106 | 86.3709 | 57.89 | 7.236 | 382.0272 |
|
89 |
+
| 24000 | 0.3879 | 121.7352 | 649.1292 | 0.9010 | 86.4132 | 57.862 | 7.233 | 407.5338 |
|
90 |
+
| 25000 | 0.4040 | 121.2164 | 677.1313 | 0.8985 | 86.5605 | 57.763 | 7.22 | 378.4727 |
|
91 |
+
| 26000 | 0.4202 | 121.4331 | 604.5543 | 0.8920 | 86.6149 | 57.727 | 7.216 | 400.5201 |
|
92 |
+
| 27000 | 0.4364 | 121.4896 | 636.5748 | 0.8898 | 86.977 | 57.486 | 7.186 | 344.3297 |
|
93 |
+
| 28000 | 0.4525 | 120.0641 | 614.8710 | 0.8867 | 86.9971 | 57.473 | 7.184 | 385.8209 |
|
94 |
+
| 29000 | 0.4687 | 121.5085 | 662.3517 | 0.8855 | 86.6921 | 57.675 | 7.209 | 386.8527 |
|
95 |
+
| 30000 | 0.4848 | 121.3954 | 620.4891 | 0.8915 | 86.9396 | 57.511 | 7.189 | 805.0448 |
|
96 |
+
| 31000 | 0.5010 | 119.1724 | 604.0428 | 0.8831 | 87.0473 | 57.44 | 7.18 | 382.2313 |
|
97 |
+
| 32000 | 0.5172 | 118.1496 | 632.1021 | 0.8800 | 87.0169 | 57.46 | 7.183 | 377.2617 |
|
98 |
+
| 33000 | 0.5333 | 116.5277 | 597.8567 | 0.8738 | 86.7512 | 57.636 | 7.205 | 322.2620 |
|
99 |
+
| 34000 | 0.5495 | 116.1844 | 591.6924 | 0.8734 | 87.2311 | 57.319 | 7.165 | 431.3317 |
|
100 |
+
| 35000 | 0.5657 | 115.5994 | 565.9454 | 0.8686 | 86.8167 | 57.593 | 7.199 | 336.3313 |
|
101 |
+
| 36000 | 0.5818 | 115.9320 | 609.9918 | 0.8674 | 87.1488 | 57.373 | 7.172 | 253.6102 |
|
102 |
+
| 37000 | 0.5980 | 115.0621 | 595.2911 | 0.8660 | 87.1004 | 57.405 | 7.176 | 323.4260 |
|
103 |
+
| 38000 | 0.6141 | 115.5635 | 590.6086 | 0.8654 | 86.9067 | 57.533 | 7.192 | 282.2412 |
|
104 |
+
| 39000 | 0.6303 | 113.5796 | 546.1489 | 0.8586 | 86.5012 | 57.803 | 7.225 | 306.1125 |
|
105 |
+
| 40000 | 0.6465 | 113.4385 | 558.4144 | 0.8583 | 86.6261 | 57.719 | 7.215 | 246.7947 |
|
106 |
+
| 41000 | 0.6626 | 112.7097 | 563.5562 | 0.8558 | 86.9289 | 57.518 | 7.19 | 263.4834 |
|
107 |
+
| 42000 | 0.6788 | 112.6048 | 556.9202 | 0.8573 | 86.8975 | 57.539 | 7.192 | 287.7979 |
|
108 |
+
| 43000 | 0.6949 | 112.9025 | 569.7087 | 0.8534 | 86.3213 | 57.923 | 7.24 | 295.2722 |
|
109 |
+
| 44000 | 0.7111 | 111.3180 | 584.7252 | 0.8534 | 86.7833 | 57.615 | 7.202 | 311.5563 |
|
110 |
+
| 45000 | 0.7273 | 112.7623 | 589.8597 | 0.8520 | 85.8832 | 58.219 | 7.277 | 452.9366 |
|
111 |
+
| 46000 | 0.7434 | 111.0763 | 583.6953 | 0.8497 | 86.9028 | 57.536 | 7.192 | 323.7285 |
|
112 |
+
| 47000 | 0.7596 | 110.0631 | 570.5529 | 0.8481 | 86.1396 | 58.045 | 7.256 | 278.4229 |
|
113 |
+
| 48000 | 0.7758 | 112.4039 | 498.8431 | 0.8470 | 86.0091 | 58.133 | 7.267 | 315.6181 |
|
114 |
+
| 49000 | 0.7919 | 111.2748 | 564.9885 | 0.8465 | 86.4014 | 57.869 | 7.234 | 261.0319 |
|
115 |
+
| 50000 | 0.8081 | 111.5950 | 594.9554 | 0.8454 | 87.4501 | 57.175 | 7.147 | 240.7725 |
|
116 |
+
| 51000 | 0.8242 | 110.0546 | 563.8345 | 0.8446 | 85.9134 | 58.198 | 7.275 | 320.1174 |
|
117 |
+
| 52000 | 0.8404 | 109.2966 | 548.4256 | 0.8428 | 86.5788 | 57.751 | 7.219 | 318.7099 |
|
118 |
+
| 53000 | 0.8566 | 109.3136 | 539.9846 | 0.8395 | 86.3394 | 57.911 | 7.239 | 340.8982 |
|
119 |
+
| 54000 | 0.8727 | 110.7834 | 561.4149 | 0.8436 | 86.2011 | 58.004 | 7.25 | 361.5285 |
|
120 |
+
| 55000 | 0.8889 | 110.2941 | 576.0907 | 0.8421 | 86.733 | 57.648 | 7.206 | 297.2107 |
|
121 |
+
| 56000 | 0.9051 | 109.5600 | 571.4385 | 0.8433 | 86.2508 | 57.97 | 7.246 | 370.3730 |
|
122 |
+
| 57000 | 0.9212 | 109.7474 | 566.3444 | 0.8457 | 86.5407 | 57.776 | 7.222 | 900.0065 |
|
123 |
+
| 58000 | 0.9374 | 109.4155 | 621.2332 | 0.8426 | 86.3669 | 57.893 | 7.237 | 493.3487 |
|
124 |
+
| 59000 | 0.9535 | 110.1230 | 581.3542 | 0.8391 | 86.4324 | 57.849 | 7.231 | 272.2826 |
|
125 |
+
| 60000 | 0.9697 | 108.2997 | 582.5030 | 0.8340 | 86.046 | 58.108 | 7.264 | 323.5555 |
|
126 |
+
| 61000 | 0.9859 | 109.2711 | 566.8240 | 0.8381 | 86.749 | 57.638 | 7.205 | 312.4312 |
|
127 |
+
| 61875 | 1.0 | 109.1439 | 575.3599 | 0.8346 | 86.8825 | 57.549 | 7.194 | 265.7449 |
|
128 |
|
129 |
### Framework versions
|
130 |
- Distily 0.2.0
|
logs/attn_loss_fn=mse, attn_weight=2.0/events.out.tfevents.1723724446.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:85185b325f8578284a12e9f098013cf4c3d5ccf2e4cbd9c6fd55cd8961a0e644
|
3 |
+
size 529
|