Text Classification
Transformers
TensorBoard
Safetensors
modernbert
wissamantoun commited on
Commit
63dad66
·
verified ·
1 Parent(s): 53e394a

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ datasets:
4
+ - WebOrganizer/TopicAnnotations-Llama-3.1-8B
5
+ - WebOrganizer/TopicAnnotations-Llama-3.1-405B-FP8
6
+ base_model:
7
+ - answerdotai/ModernBERT-base
8
+ ---
9
+ # wissamantoun/WebOrganizer-FormatClassifier-ModernBERT
10
+
11
+ [[Paper](https://arxiv.org/abs/2502.10341)] [[Website](https://weborganizer.allenai.org)] [[GitHub](https://github.com/CodeCreator/WebOrganizer)]
12
+
13
+ *All credit goes to the original authors of the model and dataset. This is a retraining of the original model with a different base model*
14
+
15
+ The TopicClassifier organizes web content into 17 categories based on the URL and text contents of web pages.
16
+ The model is a [ModernBERT-base](answerdotai/ModernBERT-base) with 140M parameters fine-tuned on the following training data:
17
+
18
+ 1. [WebOrganizer/TopicAnnotations-Llama-3.1-8B](https://huggingface.co/datasets/WebOrganizer/TopicAnnotations-Llama-3.1-8B): 1M documents annotated by Llama-3.1-8B (first-stage training)
19
+ 2. [WebOrganizer/TopicAnnotations-Llama-3.1-405B-FP8](https://huggingface.co/datasets/WebOrganizer/TopicAnnotations-Llama-3.1-405B-FP8): 100K documents annotated by Llama-3.1-405B-FP8 (second-stage training)
20
+
21
+ #### All Domain Classifiers
22
+ - [wissamantoun/WebOrganizer-FormatClassifier-ModernBERT](https://huggingface.co/wissamantoun/WebOrganizer-FormatClassifier-ModernBERT) *← you are here!*
23
+ - [wissamantoun/WebOrganizer-TopicClassifier-ModernBERT](https://huggingface.co/wissamantoun/WebOrganizer-TopicClassifier-ModernBERT)
24
+
25
+ ## Usage
26
+
27
+ This classifier expects input in the following input format:
28
+ ```
29
+ {url}
30
+
31
+ {text}
32
+ ```
33
+
34
+ Example:
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained("wissamantoun/WebOrganizer-FormatClassifier-ModernBERT")
39
+ model = AutoModelForSequenceClassification.from_pretrained(
40
+ "wissamantoun/WebOrganizer-FormatClassifier-ModernBERT",
41
+ trust_remote_code=True,
42
+ use_memory_efficient_attention=False)
43
+
44
+ web_page = """http://www.example.com
45
+
46
+ How to build a computer from scratch? Here are the components you need..."""
47
+
48
+ inputs = tokenizer([web_page], return_tensors="pt")
49
+ outputs = model(**inputs)
50
+
51
+ probs = outputs.logits.softmax(dim=-1)
52
+ print(probs.argmax(dim=-1))
53
+ # -> 5 ("Hardware" topic)
54
+ ```
55
+
56
+ You can convert the `logits` of the model with a softmax to obtain a probability distribution over the following 24 categories (in order of labels, also see `id2label` and `label2id` in the model config):
57
+ 1. Adult
58
+ 2. Art & Design
59
+ 3. Software Dev.
60
+ 4. Crime & Law
61
+ 5. Education & Jobs
62
+ 6. Hardware
63
+ 7. Entertainment
64
+ 8. Social Life
65
+ 9. Fashion & Beauty
66
+ 10. Finance & Business
67
+ 11. Food & Dining
68
+ 12. Games
69
+ 13. Health
70
+ 14. History
71
+ 15. Home & Hobbies
72
+ 16. Industrial
73
+ 17. Literature
74
+ 18. Politics
75
+ 19. Religion
76
+ 20. Science & Tech.
77
+ 21. Software
78
+ 22. Sports & Fitness
79
+ 23. Transportation
80
+ 24. Travel
81
+
82
+ The full definitions of the categories can be found in the [taxonomy config](https://github.com/CodeCreator/WebOrganizer/blob/main/define_domains/taxonomies/topics.yaml).
83
+
84
+ # Scores
85
+ ```
86
+ ***** pred metrics *****
87
+ test_accuracy = 0.8154
88
+ test_accuracy__0 = 0.855
89
+ test_accuracy__1 = 0.7558
90
+ test_accuracy__10 = 0.9071
91
+ test_accuracy__11 = 0.6869
92
+ test_accuracy__12 = 0.8055
93
+ test_accuracy__13 = 0.7897
94
+ test_accuracy__14 = 0.8592
95
+ test_accuracy__15 = 0.8541
96
+ test_accuracy__16 = 0.8788
97
+ test_accuracy__17 = 0.7733
98
+ test_accuracy__18 = 0.7286
99
+ test_accuracy__19 = 0.6989
100
+ test_accuracy__2 = 0.7474
101
+ test_accuracy__20 = 0.7609
102
+ test_accuracy__21 = 0.7807
103
+ test_accuracy__22 = 0.7703
104
+ test_accuracy__23 = 0.7931
105
+ test_accuracy__3 = 0.6351
106
+ test_accuracy__4 = 0.871
107
+ test_accuracy__5 = 0.8333
108
+ test_accuracy__6 = 0.6125
109
+ test_accuracy__7 = 0.6416
110
+ test_accuracy__8 = 0.78
111
+ test_accuracy__9 = 0.7668
112
+ test_accuracy_conf50 = 0.8312
113
+ test_accuracy_conf50__0 = 0.8852
114
+ test_accuracy_conf50__1 = 0.7651
115
+ test_accuracy_conf50__10 = 0.9167
116
+ test_accuracy_conf50__11 = 0.7168
117
+ test_accuracy_conf50__12 = 0.8256
118
+ test_accuracy_conf50__13 = 0.7996
119
+ test_accuracy_conf50__14 = 0.8696
120
+ test_accuracy_conf50__15 = 0.8684
121
+ test_accuracy_conf50__16 = 0.8878
122
+ test_accuracy_conf50__17 = 0.7838
123
+ test_accuracy_conf50__18 = 0.7663
124
+ test_accuracy_conf50__19 = 0.7276
125
+ test_accuracy_conf50__2 = 0.7609
126
+ test_accuracy_conf50__20 = 0.7907
127
+ test_accuracy_conf50__21 = 0.8
128
+ test_accuracy_conf50__22 = 0.7927
129
+ test_accuracy_conf50__23 = 0.7904
130
+ test_accuracy_conf50__3 = 0.6617
131
+ test_accuracy_conf50__4 = 0.877
132
+ test_accuracy_conf50__5 = 0.8571
133
+ test_accuracy_conf50__6 = 0.6299
134
+ test_accuracy_conf50__7 = 0.6786
135
+ test_accuracy_conf50__8 = 0.7755
136
+ test_accuracy_conf50__9 = 0.7796
137
+ test_accuracy_conf75 = 0.9003 <--- Metric from the paper
138
+ test_accuracy_conf75__0 = 0.9412
139
+ test_accuracy_conf75__1 = 0.8318
140
+ test_accuracy_conf75__10 = 0.9542
141
+ test_accuracy_conf75__11 = 0.8478
142
+ test_accuracy_conf75__12 = 0.8841
143
+ test_accuracy_conf75__13 = 0.8724
144
+ test_accuracy_conf75__14 = 0.914
145
+ test_accuracy_conf75__15 = 0.9345
146
+ test_accuracy_conf75__16 = 0.9316
147
+ test_accuracy_conf75__17 = 0.8667
148
+ test_accuracy_conf75__18 = 0.8446
149
+ test_accuracy_conf75__19 = 0.8209
150
+ test_accuracy_conf75__2 = 0.8333
151
+ test_accuracy_conf75__20 = 0.9333
152
+ test_accuracy_conf75__21 = 0.8587
153
+ test_accuracy_conf75__22 = 0.8708
154
+ test_accuracy_conf75__23 = 0.8309
155
+ test_accuracy_conf75__3 = 0.7292
156
+ test_accuracy_conf75__4 = 0.9357
157
+ test_accuracy_conf75__5 = 0.9032
158
+ test_accuracy_conf75__6 = 0.7816
159
+ test_accuracy_conf75__7 = 0.8011
160
+ test_accuracy_conf75__8 = 0.8409
161
+ test_accuracy_conf75__9 = 0.8592
162
+ test_accuracy_label_average = 0.7744
163
+ test_accuracy_label_average_conf50 = 0.7919
164
+ test_accuracy_label_average_conf75 = 0.8676
165
+ test_accuracy_label_min = 0.6125 <--- Metric from the paper
166
+ test_accuracy_label_min_conf75 = 0.7292
167
+ test_loss = 0.6023
168
+ test_proportion_conf50 = 0.9638
169
+ test_proportion_conf75 = 0.7951
170
+ test_runtime = 0:00:08.38
171
+ test_samples_per_second = 1192.262
172
+ test_steps_per_second = 37.318
173
+ ```
174
+
175
+
176
+
177
+ ## Citation
178
+ ```bibtex
179
+ @article{wettig2025organize,
180
+ title={Organize the Web: Constructing Domains Enhances Pre-Training Data Curation},
181
+ author={Alexander Wettig and Kyle Lo and Sewon Min and Hannaneh Hajishirzi and Danqi Chen and Luca Soldaini},
182
+ journal={arXiv preprint arXiv:2502.10341},
183
+ year={2025}
184
+ }
185
+ ```
all_results.json ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.9728,
3
+ "eval_validation.parquet_accuracy": 0.8279,
4
+ "eval_validation.parquet_accuracy__0": 0.831081081081081,
5
+ "eval_validation.parquet_accuracy__1": 0.7971830985915493,
6
+ "eval_validation.parquet_accuracy__10": 0.9108589951377634,
7
+ "eval_validation.parquet_accuracy__11": 0.7390396659707724,
8
+ "eval_validation.parquet_accuracy__12": 0.8164556962025317,
9
+ "eval_validation.parquet_accuracy__13": 0.7906542056074767,
10
+ "eval_validation.parquet_accuracy__14": 0.875,
11
+ "eval_validation.parquet_accuracy__15": 0.8706088992974239,
12
+ "eval_validation.parquet_accuracy__16": 0.8806916426512968,
13
+ "eval_validation.parquet_accuracy__17": 0.8229166666666666,
14
+ "eval_validation.parquet_accuracy__18": 0.7022222222222222,
15
+ "eval_validation.parquet_accuracy__19": 0.7072368421052632,
16
+ "eval_validation.parquet_accuracy__2": 0.717948717948718,
17
+ "eval_validation.parquet_accuracy__20": 0.6097560975609756,
18
+ "eval_validation.parquet_accuracy__21": 0.7575757575757576,
19
+ "eval_validation.parquet_accuracy__22": 0.7551020408163265,
20
+ "eval_validation.parquet_accuracy__23": 0.8111111111111111,
21
+ "eval_validation.parquet_accuracy__3": 0.751937984496124,
22
+ "eval_validation.parquet_accuracy__4": 0.874251497005988,
23
+ "eval_validation.parquet_accuracy__5": 0.8125,
24
+ "eval_validation.parquet_accuracy__6": 0.623574144486692,
25
+ "eval_validation.parquet_accuracy__7": 0.6719367588932806,
26
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
27
+ "eval_validation.parquet_accuracy__9": 0.8181818181818182,
28
+ "eval_validation.parquet_accuracy_conf50": 0.8428349715468184,
29
+ "eval_validation.parquet_accuracy_conf50__0": 0.8540145985401459,
30
+ "eval_validation.parquet_accuracy_conf50__1": 0.8159057437407953,
31
+ "eval_validation.parquet_accuracy_conf50__10": 0.9169407894736842,
32
+ "eval_validation.parquet_accuracy_conf50__11": 0.7593818984547461,
33
+ "eval_validation.parquet_accuracy_conf50__12": 0.8377483443708609,
34
+ "eval_validation.parquet_accuracy_conf50__13": 0.8062015503875969,
35
+ "eval_validation.parquet_accuracy_conf50__14": 0.8918918918918919,
36
+ "eval_validation.parquet_accuracy_conf50__15": 0.8822822822822823,
37
+ "eval_validation.parquet_accuracy_conf50__16": 0.887905604719764,
38
+ "eval_validation.parquet_accuracy_conf50__17": 0.8229166666666666,
39
+ "eval_validation.parquet_accuracy_conf50__18": 0.7183098591549296,
40
+ "eval_validation.parquet_accuracy_conf50__19": 0.7353951890034365,
41
+ "eval_validation.parquet_accuracy_conf50__2": 0.7333333333333333,
42
+ "eval_validation.parquet_accuracy_conf50__20": 0.6578947368421053,
43
+ "eval_validation.parquet_accuracy_conf50__21": 0.7653061224489796,
44
+ "eval_validation.parquet_accuracy_conf50__22": 0.7808641975308642,
45
+ "eval_validation.parquet_accuracy_conf50__23": 0.8202247191011236,
46
+ "eval_validation.parquet_accuracy_conf50__3": 0.775,
47
+ "eval_validation.parquet_accuracy_conf50__4": 0.8803030303030303,
48
+ "eval_validation.parquet_accuracy_conf50__5": 0.8125,
49
+ "eval_validation.parquet_accuracy_conf50__6": 0.6470588235294118,
50
+ "eval_validation.parquet_accuracy_conf50__7": 0.7205240174672489,
51
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
52
+ "eval_validation.parquet_accuracy_conf50__9": 0.8224852071005917,
53
+ "eval_validation.parquet_accuracy_conf75": 0.9034977352793155,
54
+ "eval_validation.parquet_accuracy_conf75__0": 0.8947368421052632,
55
+ "eval_validation.parquet_accuracy_conf75__1": 0.8790613718411552,
56
+ "eval_validation.parquet_accuracy_conf75__10": 0.9492619926199262,
57
+ "eval_validation.parquet_accuracy_conf75__11": 0.8811188811188811,
58
+ "eval_validation.parquet_accuracy_conf75__12": 0.8793774319066148,
59
+ "eval_validation.parquet_accuracy_conf75__13": 0.8762135922330098,
60
+ "eval_validation.parquet_accuracy_conf75__14": 0.9235294117647059,
61
+ "eval_validation.parquet_accuracy_conf75__15": 0.9338181818181818,
62
+ "eval_validation.parquet_accuracy_conf75__16": 0.9240506329113924,
63
+ "eval_validation.parquet_accuracy_conf75__17": 0.9240506329113924,
64
+ "eval_validation.parquet_accuracy_conf75__18": 0.7861635220125787,
65
+ "eval_validation.parquet_accuracy_conf75__19": 0.8385650224215246,
66
+ "eval_validation.parquet_accuracy_conf75__2": 0.7627118644067796,
67
+ "eval_validation.parquet_accuracy_conf75__20": 0.7777777777777778,
68
+ "eval_validation.parquet_accuracy_conf75__21": 0.88,
69
+ "eval_validation.parquet_accuracy_conf75__22": 0.8735632183908046,
70
+ "eval_validation.parquet_accuracy_conf75__23": 0.8656716417910447,
71
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
72
+ "eval_validation.parquet_accuracy_conf75__4": 0.9387755102040817,
73
+ "eval_validation.parquet_accuracy_conf75__5": 0.8888888888888888,
74
+ "eval_validation.parquet_accuracy_conf75__6": 0.7543859649122807,
75
+ "eval_validation.parquet_accuracy_conf75__7": 0.7965116279069767,
76
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
77
+ "eval_validation.parquet_accuracy_conf75__9": 0.9015151515151515,
78
+ "eval_validation.parquet_accuracy_label_average": 0.7850284202694991,
79
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.801551906216693,
80
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.8712012342178551,
81
+ "eval_validation.parquet_accuracy_label_min": 0.6097560975609756,
82
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.6470588235294118,
83
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7543859649122807,
84
+ "eval_validation.parquet_loss": 0.6012852787971497,
85
+ "eval_validation.parquet_proportion_conf50": 0.9665,
86
+ "eval_validation.parquet_proportion_conf75": 0.7948,
87
+ "eval_validation.parquet_runtime": 8.4553,
88
+ "eval_validation.parquet_samples_per_second": 1182.688,
89
+ "eval_validation.parquet_steps_per_second": 37.018,
90
+ "num_input_tokens_seen": 1949274656,
91
+ "test_accuracy": 0.8154,
92
+ "test_accuracy__0": 0.8549618320610687,
93
+ "test_accuracy__1": 0.7558139534883721,
94
+ "test_accuracy__10": 0.9070830159939071,
95
+ "test_accuracy__11": 0.6868686868686869,
96
+ "test_accuracy__12": 0.8054607508532423,
97
+ "test_accuracy__13": 0.7896749521988528,
98
+ "test_accuracy__14": 0.8591549295774648,
99
+ "test_accuracy__15": 0.8541416566626651,
100
+ "test_accuracy__16": 0.8787535410764873,
101
+ "test_accuracy__17": 0.7733333333333333,
102
+ "test_accuracy__18": 0.7286432160804021,
103
+ "test_accuracy__19": 0.6988847583643123,
104
+ "test_accuracy__2": 0.7473684210526316,
105
+ "test_accuracy__20": 0.7608695652173914,
106
+ "test_accuracy__21": 0.7807017543859649,
107
+ "test_accuracy__22": 0.7703488372093024,
108
+ "test_accuracy__23": 0.7931034482758621,
109
+ "test_accuracy__3": 0.6351351351351351,
110
+ "test_accuracy__4": 0.8709677419354839,
111
+ "test_accuracy__5": 0.8333333333333334,
112
+ "test_accuracy__6": 0.6125461254612546,
113
+ "test_accuracy__7": 0.6415770609318996,
114
+ "test_accuracy__8": 0.78,
115
+ "test_accuracy__9": 0.7668393782383419,
116
+ "test_accuracy_conf50": 0.8311890433699938,
117
+ "test_accuracy_conf50__0": 0.8852459016393442,
118
+ "test_accuracy_conf50__1": 0.7650602409638554,
119
+ "test_accuracy_conf50__10": 0.9167315175097276,
120
+ "test_accuracy_conf50__11": 0.7167755991285403,
121
+ "test_accuracy_conf50__12": 0.8256227758007118,
122
+ "test_accuracy_conf50__13": 0.7995991983967936,
123
+ "test_accuracy_conf50__14": 0.8695652173913043,
124
+ "test_accuracy_conf50__15": 0.8683559950556242,
125
+ "test_accuracy_conf50__16": 0.8877964141122036,
126
+ "test_accuracy_conf50__17": 0.7837837837837838,
127
+ "test_accuracy_conf50__18": 0.7663043478260869,
128
+ "test_accuracy_conf50__19": 0.7276264591439688,
129
+ "test_accuracy_conf50__2": 0.7608695652173914,
130
+ "test_accuracy_conf50__20": 0.7906976744186046,
131
+ "test_accuracy_conf50__21": 0.8,
132
+ "test_accuracy_conf50__22": 0.7926829268292683,
133
+ "test_accuracy_conf50__23": 0.7904191616766467,
134
+ "test_accuracy_conf50__3": 0.6616541353383458,
135
+ "test_accuracy_conf50__4": 0.8770491803278688,
136
+ "test_accuracy_conf50__5": 0.8571428571428571,
137
+ "test_accuracy_conf50__6": 0.6299212598425197,
138
+ "test_accuracy_conf50__7": 0.6785714285714286,
139
+ "test_accuracy_conf50__8": 0.7755102040816326,
140
+ "test_accuracy_conf50__9": 0.7795698924731183,
141
+ "test_accuracy_conf75": 0.9002641177210414,
142
+ "test_accuracy_conf75__0": 0.9411764705882353,
143
+ "test_accuracy_conf75__1": 0.831758034026465,
144
+ "test_accuracy_conf75__10": 0.9541850220264317,
145
+ "test_accuracy_conf75__11": 0.8477508650519031,
146
+ "test_accuracy_conf75__12": 0.8841201716738197,
147
+ "test_accuracy_conf75__13": 0.8724489795918368,
148
+ "test_accuracy_conf75__14": 0.9139784946236559,
149
+ "test_accuracy_conf75__15": 0.9345238095238095,
150
+ "test_accuracy_conf75__16": 0.9315687540348612,
151
+ "test_accuracy_conf75__17": 0.8666666666666667,
152
+ "test_accuracy_conf75__18": 0.8445945945945946,
153
+ "test_accuracy_conf75__19": 0.8208955223880597,
154
+ "test_accuracy_conf75__2": 0.8333333333333334,
155
+ "test_accuracy_conf75__20": 0.9333333333333333,
156
+ "test_accuracy_conf75__21": 0.8586956521739131,
157
+ "test_accuracy_conf75__22": 0.8708487084870848,
158
+ "test_accuracy_conf75__23": 0.8308823529411765,
159
+ "test_accuracy_conf75__3": 0.7291666666666666,
160
+ "test_accuracy_conf75__4": 0.935672514619883,
161
+ "test_accuracy_conf75__5": 0.9032258064516129,
162
+ "test_accuracy_conf75__6": 0.7816091954022989,
163
+ "test_accuracy_conf75__7": 0.8011363636363636,
164
+ "test_accuracy_conf75__8": 0.8409090909090909,
165
+ "test_accuracy_conf75__9": 0.8591549295774648,
166
+ "test_accuracy_label_average": 0.7743985594889748,
167
+ "test_accuracy_label_average_conf50": 0.7919398223613178,
168
+ "test_accuracy_label_average_conf75": 0.8675681388467735,
169
+ "test_accuracy_label_min": 0.6125461254612546,
170
+ "test_accuracy_label_min_conf50": 0.6299212598425197,
171
+ "test_accuracy_label_min_conf75": 0.7291666666666666,
172
+ "test_loss": 0.6023229956626892,
173
+ "test_proportion_conf50": 0.9638,
174
+ "test_proportion_conf75": 0.7951,
175
+ "test_runtime": 8.3874,
176
+ "test_samples_per_second": 1192.262,
177
+ "test_steps_per_second": 37.318,
178
+ "train_loss": 2.327354235526843,
179
+ "train_runtime": 577.2472,
180
+ "train_samples_per_second": 692.944,
181
+ "train_steps_per_second": 1.351
182
+ }
config.json ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForSequenceClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "embedding_dropout": 0.0,
16
+ "eos_token_id": 50282,
17
+ "global_attn_every_n_layers": 3,
18
+ "global_rope_theta": 160000.0,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "Academic Writing",
24
+ "1": "Content Listing",
25
+ "10": "News Article",
26
+ "11": "Nonfiction Writing",
27
+ "12": "About (Org.)",
28
+ "13": "News (Org.)",
29
+ "14": "About (Pers.)",
30
+ "15": "Personal Blog",
31
+ "16": "Product Page",
32
+ "17": "Q&A Forum",
33
+ "18": "Spam / Ads",
34
+ "19": "Structured Data",
35
+ "2": "Creative Writing",
36
+ "20": "Documentation",
37
+ "21": "Audio Transcript",
38
+ "22": "Tutorial",
39
+ "23": "User Review",
40
+ "3": "Customer Support",
41
+ "4": "Comment Section",
42
+ "5": "FAQ",
43
+ "6": "Truncated",
44
+ "7": "Knowledge Article",
45
+ "8": "Legal Notices",
46
+ "9": "Listicle"
47
+ },
48
+ "initializer_cutoff_factor": 2.0,
49
+ "initializer_range": 0.02,
50
+ "intermediate_size": 1152,
51
+ "label2id": {
52
+ "About (Org.)": 12,
53
+ "About (Pers.)": 14,
54
+ "Academic Writing": 0,
55
+ "Audio Transcript": 21,
56
+ "Comment Section": 4,
57
+ "Content Listing": 1,
58
+ "Creative Writing": 2,
59
+ "Customer Support": 3,
60
+ "Documentation": 20,
61
+ "FAQ": 5,
62
+ "Knowledge Article": 7,
63
+ "Legal Notices": 8,
64
+ "Listicle": 9,
65
+ "News (Org.)": 13,
66
+ "News Article": 10,
67
+ "Nonfiction Writing": 11,
68
+ "Personal Blog": 15,
69
+ "Product Page": 16,
70
+ "Q&A Forum": 17,
71
+ "Spam / Ads": 18,
72
+ "Structured Data": 19,
73
+ "Truncated": 6,
74
+ "Tutorial": 22,
75
+ "User Review": 23
76
+ },
77
+ "layer_norm_eps": 1e-05,
78
+ "local_attention": 128,
79
+ "local_rope_theta": 10000.0,
80
+ "max_position_embeddings": 8192,
81
+ "mlp_bias": false,
82
+ "mlp_dropout": 0.0,
83
+ "model_type": "modernbert",
84
+ "norm_bias": false,
85
+ "norm_eps": 1e-05,
86
+ "num_attention_heads": 12,
87
+ "num_hidden_layers": 22,
88
+ "pad_token_id": 50283,
89
+ "position_embedding_type": "absolute",
90
+ "reference_compile": true,
91
+ "repad_logits_with_grad": false,
92
+ "sep_token_id": 50282,
93
+ "sparse_pred_ignore_index": -100,
94
+ "sparse_prediction": false,
95
+ "torch_dtype": "bfloat16",
96
+ "transformers_version": "4.50.0",
97
+ "vocab_size": 50368
98
+ }
eval_results.json ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.9728,
3
+ "eval_validation.parquet_accuracy": 0.8279,
4
+ "eval_validation.parquet_accuracy__0": 0.831081081081081,
5
+ "eval_validation.parquet_accuracy__1": 0.7971830985915493,
6
+ "eval_validation.parquet_accuracy__10": 0.9108589951377634,
7
+ "eval_validation.parquet_accuracy__11": 0.7390396659707724,
8
+ "eval_validation.parquet_accuracy__12": 0.8164556962025317,
9
+ "eval_validation.parquet_accuracy__13": 0.7906542056074767,
10
+ "eval_validation.parquet_accuracy__14": 0.875,
11
+ "eval_validation.parquet_accuracy__15": 0.8706088992974239,
12
+ "eval_validation.parquet_accuracy__16": 0.8806916426512968,
13
+ "eval_validation.parquet_accuracy__17": 0.8229166666666666,
14
+ "eval_validation.parquet_accuracy__18": 0.7022222222222222,
15
+ "eval_validation.parquet_accuracy__19": 0.7072368421052632,
16
+ "eval_validation.parquet_accuracy__2": 0.717948717948718,
17
+ "eval_validation.parquet_accuracy__20": 0.6097560975609756,
18
+ "eval_validation.parquet_accuracy__21": 0.7575757575757576,
19
+ "eval_validation.parquet_accuracy__22": 0.7551020408163265,
20
+ "eval_validation.parquet_accuracy__23": 0.8111111111111111,
21
+ "eval_validation.parquet_accuracy__3": 0.751937984496124,
22
+ "eval_validation.parquet_accuracy__4": 0.874251497005988,
23
+ "eval_validation.parquet_accuracy__5": 0.8125,
24
+ "eval_validation.parquet_accuracy__6": 0.623574144486692,
25
+ "eval_validation.parquet_accuracy__7": 0.6719367588932806,
26
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
27
+ "eval_validation.parquet_accuracy__9": 0.8181818181818182,
28
+ "eval_validation.parquet_accuracy_conf50": 0.8428349715468184,
29
+ "eval_validation.parquet_accuracy_conf50__0": 0.8540145985401459,
30
+ "eval_validation.parquet_accuracy_conf50__1": 0.8159057437407953,
31
+ "eval_validation.parquet_accuracy_conf50__10": 0.9169407894736842,
32
+ "eval_validation.parquet_accuracy_conf50__11": 0.7593818984547461,
33
+ "eval_validation.parquet_accuracy_conf50__12": 0.8377483443708609,
34
+ "eval_validation.parquet_accuracy_conf50__13": 0.8062015503875969,
35
+ "eval_validation.parquet_accuracy_conf50__14": 0.8918918918918919,
36
+ "eval_validation.parquet_accuracy_conf50__15": 0.8822822822822823,
37
+ "eval_validation.parquet_accuracy_conf50__16": 0.887905604719764,
38
+ "eval_validation.parquet_accuracy_conf50__17": 0.8229166666666666,
39
+ "eval_validation.parquet_accuracy_conf50__18": 0.7183098591549296,
40
+ "eval_validation.parquet_accuracy_conf50__19": 0.7353951890034365,
41
+ "eval_validation.parquet_accuracy_conf50__2": 0.7333333333333333,
42
+ "eval_validation.parquet_accuracy_conf50__20": 0.6578947368421053,
43
+ "eval_validation.parquet_accuracy_conf50__21": 0.7653061224489796,
44
+ "eval_validation.parquet_accuracy_conf50__22": 0.7808641975308642,
45
+ "eval_validation.parquet_accuracy_conf50__23": 0.8202247191011236,
46
+ "eval_validation.parquet_accuracy_conf50__3": 0.775,
47
+ "eval_validation.parquet_accuracy_conf50__4": 0.8803030303030303,
48
+ "eval_validation.parquet_accuracy_conf50__5": 0.8125,
49
+ "eval_validation.parquet_accuracy_conf50__6": 0.6470588235294118,
50
+ "eval_validation.parquet_accuracy_conf50__7": 0.7205240174672489,
51
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
52
+ "eval_validation.parquet_accuracy_conf50__9": 0.8224852071005917,
53
+ "eval_validation.parquet_accuracy_conf75": 0.9034977352793155,
54
+ "eval_validation.parquet_accuracy_conf75__0": 0.8947368421052632,
55
+ "eval_validation.parquet_accuracy_conf75__1": 0.8790613718411552,
56
+ "eval_validation.parquet_accuracy_conf75__10": 0.9492619926199262,
57
+ "eval_validation.parquet_accuracy_conf75__11": 0.8811188811188811,
58
+ "eval_validation.parquet_accuracy_conf75__12": 0.8793774319066148,
59
+ "eval_validation.parquet_accuracy_conf75__13": 0.8762135922330098,
60
+ "eval_validation.parquet_accuracy_conf75__14": 0.9235294117647059,
61
+ "eval_validation.parquet_accuracy_conf75__15": 0.9338181818181818,
62
+ "eval_validation.parquet_accuracy_conf75__16": 0.9240506329113924,
63
+ "eval_validation.parquet_accuracy_conf75__17": 0.9240506329113924,
64
+ "eval_validation.parquet_accuracy_conf75__18": 0.7861635220125787,
65
+ "eval_validation.parquet_accuracy_conf75__19": 0.8385650224215246,
66
+ "eval_validation.parquet_accuracy_conf75__2": 0.7627118644067796,
67
+ "eval_validation.parquet_accuracy_conf75__20": 0.7777777777777778,
68
+ "eval_validation.parquet_accuracy_conf75__21": 0.88,
69
+ "eval_validation.parquet_accuracy_conf75__22": 0.8735632183908046,
70
+ "eval_validation.parquet_accuracy_conf75__23": 0.8656716417910447,
71
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
72
+ "eval_validation.parquet_accuracy_conf75__4": 0.9387755102040817,
73
+ "eval_validation.parquet_accuracy_conf75__5": 0.8888888888888888,
74
+ "eval_validation.parquet_accuracy_conf75__6": 0.7543859649122807,
75
+ "eval_validation.parquet_accuracy_conf75__7": 0.7965116279069767,
76
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
77
+ "eval_validation.parquet_accuracy_conf75__9": 0.9015151515151515,
78
+ "eval_validation.parquet_accuracy_label_average": 0.7850284202694991,
79
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.801551906216693,
80
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.8712012342178551,
81
+ "eval_validation.parquet_accuracy_label_min": 0.6097560975609756,
82
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.6470588235294118,
83
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7543859649122807,
84
+ "eval_validation.parquet_loss": 0.6012852787971497,
85
+ "eval_validation.parquet_proportion_conf50": 0.9665,
86
+ "eval_validation.parquet_proportion_conf75": 0.7948,
87
+ "eval_validation.parquet_runtime": 8.4553,
88
+ "eval_validation.parquet_samples_per_second": 1182.688,
89
+ "eval_validation.parquet_steps_per_second": 37.018,
90
+ "num_input_tokens_seen": 1949274656
91
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6c578b7042f23c1bac76a95f56e5db721ebb02f8d4d59b1b55b12f4ab462a8d
3
+ size 299260928
pred_results.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "test_accuracy": 0.8154,
3
+ "test_accuracy__0": 0.8549618320610687,
4
+ "test_accuracy__1": 0.7558139534883721,
5
+ "test_accuracy__10": 0.9070830159939071,
6
+ "test_accuracy__11": 0.6868686868686869,
7
+ "test_accuracy__12": 0.8054607508532423,
8
+ "test_accuracy__13": 0.7896749521988528,
9
+ "test_accuracy__14": 0.8591549295774648,
10
+ "test_accuracy__15": 0.8541416566626651,
11
+ "test_accuracy__16": 0.8787535410764873,
12
+ "test_accuracy__17": 0.7733333333333333,
13
+ "test_accuracy__18": 0.7286432160804021,
14
+ "test_accuracy__19": 0.6988847583643123,
15
+ "test_accuracy__2": 0.7473684210526316,
16
+ "test_accuracy__20": 0.7608695652173914,
17
+ "test_accuracy__21": 0.7807017543859649,
18
+ "test_accuracy__22": 0.7703488372093024,
19
+ "test_accuracy__23": 0.7931034482758621,
20
+ "test_accuracy__3": 0.6351351351351351,
21
+ "test_accuracy__4": 0.8709677419354839,
22
+ "test_accuracy__5": 0.8333333333333334,
23
+ "test_accuracy__6": 0.6125461254612546,
24
+ "test_accuracy__7": 0.6415770609318996,
25
+ "test_accuracy__8": 0.78,
26
+ "test_accuracy__9": 0.7668393782383419,
27
+ "test_accuracy_conf50": 0.8311890433699938,
28
+ "test_accuracy_conf50__0": 0.8852459016393442,
29
+ "test_accuracy_conf50__1": 0.7650602409638554,
30
+ "test_accuracy_conf50__10": 0.9167315175097276,
31
+ "test_accuracy_conf50__11": 0.7167755991285403,
32
+ "test_accuracy_conf50__12": 0.8256227758007118,
33
+ "test_accuracy_conf50__13": 0.7995991983967936,
34
+ "test_accuracy_conf50__14": 0.8695652173913043,
35
+ "test_accuracy_conf50__15": 0.8683559950556242,
36
+ "test_accuracy_conf50__16": 0.8877964141122036,
37
+ "test_accuracy_conf50__17": 0.7837837837837838,
38
+ "test_accuracy_conf50__18": 0.7663043478260869,
39
+ "test_accuracy_conf50__19": 0.7276264591439688,
40
+ "test_accuracy_conf50__2": 0.7608695652173914,
41
+ "test_accuracy_conf50__20": 0.7906976744186046,
42
+ "test_accuracy_conf50__21": 0.8,
43
+ "test_accuracy_conf50__22": 0.7926829268292683,
44
+ "test_accuracy_conf50__23": 0.7904191616766467,
45
+ "test_accuracy_conf50__3": 0.6616541353383458,
46
+ "test_accuracy_conf50__4": 0.8770491803278688,
47
+ "test_accuracy_conf50__5": 0.8571428571428571,
48
+ "test_accuracy_conf50__6": 0.6299212598425197,
49
+ "test_accuracy_conf50__7": 0.6785714285714286,
50
+ "test_accuracy_conf50__8": 0.7755102040816326,
51
+ "test_accuracy_conf50__9": 0.7795698924731183,
52
+ "test_accuracy_conf75": 0.9002641177210414,
53
+ "test_accuracy_conf75__0": 0.9411764705882353,
54
+ "test_accuracy_conf75__1": 0.831758034026465,
55
+ "test_accuracy_conf75__10": 0.9541850220264317,
56
+ "test_accuracy_conf75__11": 0.8477508650519031,
57
+ "test_accuracy_conf75__12": 0.8841201716738197,
58
+ "test_accuracy_conf75__13": 0.8724489795918368,
59
+ "test_accuracy_conf75__14": 0.9139784946236559,
60
+ "test_accuracy_conf75__15": 0.9345238095238095,
61
+ "test_accuracy_conf75__16": 0.9315687540348612,
62
+ "test_accuracy_conf75__17": 0.8666666666666667,
63
+ "test_accuracy_conf75__18": 0.8445945945945946,
64
+ "test_accuracy_conf75__19": 0.8208955223880597,
65
+ "test_accuracy_conf75__2": 0.8333333333333334,
66
+ "test_accuracy_conf75__20": 0.9333333333333333,
67
+ "test_accuracy_conf75__21": 0.8586956521739131,
68
+ "test_accuracy_conf75__22": 0.8708487084870848,
69
+ "test_accuracy_conf75__23": 0.8308823529411765,
70
+ "test_accuracy_conf75__3": 0.7291666666666666,
71
+ "test_accuracy_conf75__4": 0.935672514619883,
72
+ "test_accuracy_conf75__5": 0.9032258064516129,
73
+ "test_accuracy_conf75__6": 0.7816091954022989,
74
+ "test_accuracy_conf75__7": 0.8011363636363636,
75
+ "test_accuracy_conf75__8": 0.8409090909090909,
76
+ "test_accuracy_conf75__9": 0.8591549295774648,
77
+ "test_accuracy_label_average": 0.7743985594889748,
78
+ "test_accuracy_label_average_conf50": 0.7919398223613178,
79
+ "test_accuracy_label_average_conf75": 0.8675681388467735,
80
+ "test_accuracy_label_min": 0.6125461254612546,
81
+ "test_accuracy_label_min_conf50": 0.6299212598425197,
82
+ "test_accuracy_label_min_conf75": 0.7291666666666666,
83
+ "test_loss": 0.6023229956626892,
84
+ "test_proportion_conf50": 0.9638,
85
+ "test_proportion_conf75": 0.7951,
86
+ "test_runtime": 8.3874,
87
+ "test_samples_per_second": 1192.262,
88
+ "test_steps_per_second": 37.318
89
+ }
predictions.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:579f40424a84ed1df4a5e78c7c04e408b8937ad0dd6590975736131a235f7de3
3
+ size 1920210
runs/May03_13-07-30_jzxh358/events.out.tfevents.1746270461.jzxh358.1141829.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7993f79ee9b696b057f484f9c4e0f199da3871de81840641bc8c9a71f343ecc8
3
+ size 45705
runs/May03_13-07-30_jzxh358/events.out.tfevents.1746271047.jzxh358.1141829.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4edb4d74806a2ba1d9d081c4e9fcf15f818a05843c03afd33df2fa670cfde16f
3
+ size 7120
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizer",
944
+ "unk_token": "[UNK]"
945
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.9728,
3
+ "num_input_tokens_seen": 1949274656,
4
+ "train_loss": 2.327354235526843,
5
+ "train_runtime": 577.2472,
6
+ "train_samples_per_second": 692.944,
7
+ "train_steps_per_second": 1.351
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,560 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 471,
3
+ "best_metric": 0.6097560975609756,
4
+ "best_model_checkpoint": "/linkhome/rech/genini01/udd26kf/scratch/weborganizer/models/runs/answerdotai--ModernBERT-base_FormatAnnotations-Llama-3.1-8B_bsz512_lr1e-4_epochs5_warmup0.1_url1_FormatAnnotations-Llama-3.1-405B-FP8_bsz512_lr1e-4_epochs5_warmup0.1_url1/checkpoint-471",
5
+ "epoch": 4.9728,
6
+ "eval_steps": 500,
7
+ "global_step": 780,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.64,
14
+ "grad_norm": 23.25,
15
+ "learning_rate": 9.686609686609687e-05,
16
+ "loss": 3.0414,
17
+ "num_input_tokens_seen": 249204064,
18
+ "step": 100
19
+ },
20
+ {
21
+ "epoch": 1.0,
22
+ "eval_validation.parquet_accuracy": 0.8204,
23
+ "eval_validation.parquet_accuracy__0": 0.8175675675675675,
24
+ "eval_validation.parquet_accuracy__1": 0.7605633802816901,
25
+ "eval_validation.parquet_accuracy__10": 0.9124797406807131,
26
+ "eval_validation.parquet_accuracy__11": 0.6826722338204593,
27
+ "eval_validation.parquet_accuracy__12": 0.7278481012658228,
28
+ "eval_validation.parquet_accuracy__13": 0.794392523364486,
29
+ "eval_validation.parquet_accuracy__14": 0.8229166666666666,
30
+ "eval_validation.parquet_accuracy__15": 0.8694379391100703,
31
+ "eval_validation.parquet_accuracy__16": 0.8818443804034583,
32
+ "eval_validation.parquet_accuracy__17": 0.7916666666666666,
33
+ "eval_validation.parquet_accuracy__18": 0.76,
34
+ "eval_validation.parquet_accuracy__19": 0.7697368421052632,
35
+ "eval_validation.parquet_accuracy__2": 0.7435897435897436,
36
+ "eval_validation.parquet_accuracy__20": 0.5609756097560976,
37
+ "eval_validation.parquet_accuracy__21": 0.696969696969697,
38
+ "eval_validation.parquet_accuracy__22": 0.7434402332361516,
39
+ "eval_validation.parquet_accuracy__23": 0.8166666666666667,
40
+ "eval_validation.parquet_accuracy__3": 0.751937984496124,
41
+ "eval_validation.parquet_accuracy__4": 0.8727544910179641,
42
+ "eval_validation.parquet_accuracy__5": 0.78125,
43
+ "eval_validation.parquet_accuracy__6": 0.6387832699619772,
44
+ "eval_validation.parquet_accuracy__7": 0.6719367588932806,
45
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
46
+ "eval_validation.parquet_accuracy__9": 0.7727272727272727,
47
+ "eval_validation.parquet_accuracy_conf50": 0.8353854112778065,
48
+ "eval_validation.parquet_accuracy_conf50__0": 0.8394160583941606,
49
+ "eval_validation.parquet_accuracy_conf50__1": 0.7776141384388807,
50
+ "eval_validation.parquet_accuracy_conf50__10": 0.9185855263157895,
51
+ "eval_validation.parquet_accuracy_conf50__11": 0.7041942604856513,
52
+ "eval_validation.parquet_accuracy_conf50__12": 0.7582781456953642,
53
+ "eval_validation.parquet_accuracy_conf50__13": 0.810077519379845,
54
+ "eval_validation.parquet_accuracy_conf50__14": 0.8432432432432433,
55
+ "eval_validation.parquet_accuracy_conf50__15": 0.8792792792792793,
56
+ "eval_validation.parquet_accuracy_conf50__16": 0.8890855457227138,
57
+ "eval_validation.parquet_accuracy_conf50__17": 0.7916666666666666,
58
+ "eval_validation.parquet_accuracy_conf50__18": 0.7699530516431925,
59
+ "eval_validation.parquet_accuracy_conf50__19": 0.7903780068728522,
60
+ "eval_validation.parquet_accuracy_conf50__2": 0.76,
61
+ "eval_validation.parquet_accuracy_conf50__20": 0.6052631578947368,
62
+ "eval_validation.parquet_accuracy_conf50__21": 0.7040816326530612,
63
+ "eval_validation.parquet_accuracy_conf50__22": 0.7716049382716049,
64
+ "eval_validation.parquet_accuracy_conf50__23": 0.8258426966292135,
65
+ "eval_validation.parquet_accuracy_conf50__3": 0.775,
66
+ "eval_validation.parquet_accuracy_conf50__4": 0.8803030303030303,
67
+ "eval_validation.parquet_accuracy_conf50__5": 0.78125,
68
+ "eval_validation.parquet_accuracy_conf50__6": 0.6680672268907563,
69
+ "eval_validation.parquet_accuracy_conf50__7": 0.7205240174672489,
70
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
71
+ "eval_validation.parquet_accuracy_conf50__9": 0.7810650887573964,
72
+ "eval_validation.parquet_accuracy_conf75": 0.8959486663311524,
73
+ "eval_validation.parquet_accuracy_conf75__0": 0.8859649122807017,
74
+ "eval_validation.parquet_accuracy_conf75__1": 0.8411552346570397,
75
+ "eval_validation.parquet_accuracy_conf75__10": 0.9511070110701108,
76
+ "eval_validation.parquet_accuracy_conf75__11": 0.8146853146853147,
77
+ "eval_validation.parquet_accuracy_conf75__12": 0.8326848249027238,
78
+ "eval_validation.parquet_accuracy_conf75__13": 0.8713592233009708,
79
+ "eval_validation.parquet_accuracy_conf75__14": 0.8823529411764706,
80
+ "eval_validation.parquet_accuracy_conf75__15": 0.9323636363636364,
81
+ "eval_validation.parquet_accuracy_conf75__16": 0.9233844103930713,
82
+ "eval_validation.parquet_accuracy_conf75__17": 0.9113924050632911,
83
+ "eval_validation.parquet_accuracy_conf75__18": 0.8301886792452831,
84
+ "eval_validation.parquet_accuracy_conf75__19": 0.8878923766816144,
85
+ "eval_validation.parquet_accuracy_conf75__2": 0.7796610169491526,
86
+ "eval_validation.parquet_accuracy_conf75__20": 0.7407407407407407,
87
+ "eval_validation.parquet_accuracy_conf75__21": 0.8266666666666667,
88
+ "eval_validation.parquet_accuracy_conf75__22": 0.8582375478927203,
89
+ "eval_validation.parquet_accuracy_conf75__23": 0.8656716417910447,
90
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
91
+ "eval_validation.parquet_accuracy_conf75__4": 0.9332096474953617,
92
+ "eval_validation.parquet_accuracy_conf75__5": 0.8518518518518519,
93
+ "eval_validation.parquet_accuracy_conf75__6": 0.7719298245614035,
94
+ "eval_validation.parquet_accuracy_conf75__7": 0.7790697674418605,
95
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
96
+ "eval_validation.parquet_accuracy_conf75__9": 0.8863636363636364,
97
+ "eval_validation.parquet_accuracy_label_average": 0.7722922880043742,
98
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.7890679322442429,
99
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.8598755738060326,
100
+ "eval_validation.parquet_accuracy_label_min": 0.5609756097560976,
101
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.6052631578947368,
102
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7407407407407407,
103
+ "eval_validation.parquet_loss": 0.6205468773841858,
104
+ "eval_validation.parquet_proportion_conf50": 0.9665,
105
+ "eval_validation.parquet_proportion_conf75": 0.7948,
106
+ "eval_validation.parquet_runtime": 10.5515,
107
+ "eval_validation.parquet_samples_per_second": 947.73,
108
+ "eval_validation.parquet_steps_per_second": 29.664,
109
+ "num_input_tokens_seen": 390215936,
110
+ "step": 157
111
+ },
112
+ {
113
+ "epoch": 1.2752,
114
+ "grad_norm": 11.8125,
115
+ "learning_rate": 8.262108262108262e-05,
116
+ "loss": 2.4648,
117
+ "num_input_tokens_seen": 499147424,
118
+ "step": 200
119
+ },
120
+ {
121
+ "epoch": 1.9152,
122
+ "grad_norm": 15.375,
123
+ "learning_rate": 6.837606837606838e-05,
124
+ "loss": 2.3634,
125
+ "num_input_tokens_seen": 751160992,
126
+ "step": 300
127
+ },
128
+ {
129
+ "epoch": 2.0,
130
+ "eval_validation.parquet_accuracy": 0.824,
131
+ "eval_validation.parquet_accuracy__0": 0.777027027027027,
132
+ "eval_validation.parquet_accuracy__1": 0.780281690140845,
133
+ "eval_validation.parquet_accuracy__10": 0.9149108589951378,
134
+ "eval_validation.parquet_accuracy__11": 0.8225469728601252,
135
+ "eval_validation.parquet_accuracy__12": 0.7436708860759493,
136
+ "eval_validation.parquet_accuracy__13": 0.7158878504672898,
137
+ "eval_validation.parquet_accuracy__14": 0.8229166666666666,
138
+ "eval_validation.parquet_accuracy__15": 0.8852459016393442,
139
+ "eval_validation.parquet_accuracy__16": 0.8818443804034583,
140
+ "eval_validation.parquet_accuracy__17": 0.84375,
141
+ "eval_validation.parquet_accuracy__18": 0.6711111111111111,
142
+ "eval_validation.parquet_accuracy__19": 0.756578947368421,
143
+ "eval_validation.parquet_accuracy__2": 0.717948717948718,
144
+ "eval_validation.parquet_accuracy__20": 0.6097560975609756,
145
+ "eval_validation.parquet_accuracy__21": 0.7777777777777778,
146
+ "eval_validation.parquet_accuracy__22": 0.7725947521865889,
147
+ "eval_validation.parquet_accuracy__23": 0.7888888888888889,
148
+ "eval_validation.parquet_accuracy__3": 0.7364341085271318,
149
+ "eval_validation.parquet_accuracy__4": 0.8622754491017964,
150
+ "eval_validation.parquet_accuracy__5": 0.78125,
151
+ "eval_validation.parquet_accuracy__6": 0.5513307984790875,
152
+ "eval_validation.parquet_accuracy__7": 0.6640316205533597,
153
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
154
+ "eval_validation.parquet_accuracy__9": 0.8181818181818182,
155
+ "eval_validation.parquet_accuracy_conf50": 0.8392136575271598,
156
+ "eval_validation.parquet_accuracy_conf50__0": 0.8102189781021898,
157
+ "eval_validation.parquet_accuracy_conf50__1": 0.7997054491899853,
158
+ "eval_validation.parquet_accuracy_conf50__10": 0.9210526315789473,
159
+ "eval_validation.parquet_accuracy_conf50__11": 0.8410596026490066,
160
+ "eval_validation.parquet_accuracy_conf50__12": 0.7715231788079471,
161
+ "eval_validation.parquet_accuracy_conf50__13": 0.7344961240310077,
162
+ "eval_validation.parquet_accuracy_conf50__14": 0.8486486486486486,
163
+ "eval_validation.parquet_accuracy_conf50__15": 0.8942942942942943,
164
+ "eval_validation.parquet_accuracy_conf50__16": 0.8896755162241888,
165
+ "eval_validation.parquet_accuracy_conf50__17": 0.84375,
166
+ "eval_validation.parquet_accuracy_conf50__18": 0.6854460093896714,
167
+ "eval_validation.parquet_accuracy_conf50__19": 0.7766323024054983,
168
+ "eval_validation.parquet_accuracy_conf50__2": 0.7333333333333333,
169
+ "eval_validation.parquet_accuracy_conf50__20": 0.6578947368421053,
170
+ "eval_validation.parquet_accuracy_conf50__21": 0.7857142857142857,
171
+ "eval_validation.parquet_accuracy_conf50__22": 0.7932098765432098,
172
+ "eval_validation.parquet_accuracy_conf50__23": 0.797752808988764,
173
+ "eval_validation.parquet_accuracy_conf50__3": 0.7666666666666667,
174
+ "eval_validation.parquet_accuracy_conf50__4": 0.8696969696969697,
175
+ "eval_validation.parquet_accuracy_conf50__5": 0.78125,
176
+ "eval_validation.parquet_accuracy_conf50__6": 0.5756302521008403,
177
+ "eval_validation.parquet_accuracy_conf50__7": 0.7161572052401747,
178
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
179
+ "eval_validation.parquet_accuracy_conf50__9": 0.8224852071005917,
180
+ "eval_validation.parquet_accuracy_conf75": 0.8993457473578259,
181
+ "eval_validation.parquet_accuracy_conf75__0": 0.8771929824561403,
182
+ "eval_validation.parquet_accuracy_conf75__1": 0.8646209386281588,
183
+ "eval_validation.parquet_accuracy_conf75__10": 0.9566420664206642,
184
+ "eval_validation.parquet_accuracy_conf75__11": 0.9405594405594405,
185
+ "eval_validation.parquet_accuracy_conf75__12": 0.8326848249027238,
186
+ "eval_validation.parquet_accuracy_conf75__13": 0.8106796116504854,
187
+ "eval_validation.parquet_accuracy_conf75__14": 0.8823529411764706,
188
+ "eval_validation.parquet_accuracy_conf75__15": 0.944,
189
+ "eval_validation.parquet_accuracy_conf75__16": 0.9227181878747501,
190
+ "eval_validation.parquet_accuracy_conf75__17": 0.9493670886075949,
191
+ "eval_validation.parquet_accuracy_conf75__18": 0.7672955974842768,
192
+ "eval_validation.parquet_accuracy_conf75__19": 0.8834080717488789,
193
+ "eval_validation.parquet_accuracy_conf75__2": 0.7627118644067796,
194
+ "eval_validation.parquet_accuracy_conf75__20": 0.7777777777777778,
195
+ "eval_validation.parquet_accuracy_conf75__21": 0.8933333333333333,
196
+ "eval_validation.parquet_accuracy_conf75__22": 0.8735632183908046,
197
+ "eval_validation.parquet_accuracy_conf75__23": 0.8432835820895522,
198
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
199
+ "eval_validation.parquet_accuracy_conf75__4": 0.9294990723562152,
200
+ "eval_validation.parquet_accuracy_conf75__5": 0.8518518518518519,
201
+ "eval_validation.parquet_accuracy_conf75__6": 0.6900584795321637,
202
+ "eval_validation.parquet_accuracy_conf75__7": 0.7674418604651163,
203
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
204
+ "eval_validation.parquet_accuracy_conf75__9": 0.9090909090909091,
205
+ "eval_validation.parquet_accuracy_label_average": 0.7745458110341109,
206
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.7920479675168944,
207
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.8628839233572584,
208
+ "eval_validation.parquet_accuracy_label_min": 0.5513307984790875,
209
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.5756302521008403,
210
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.6900584795321637,
211
+ "eval_validation.parquet_loss": 0.6139147281646729,
212
+ "eval_validation.parquet_proportion_conf50": 0.9665,
213
+ "eval_validation.parquet_proportion_conf75": 0.7948,
214
+ "eval_validation.parquet_runtime": 8.3891,
215
+ "eval_validation.parquet_samples_per_second": 1192.017,
216
+ "eval_validation.parquet_steps_per_second": 37.31,
217
+ "num_input_tokens_seen": 783399104,
218
+ "step": 314
219
+ },
220
+ {
221
+ "epoch": 2.5504,
222
+ "grad_norm": 9.25,
223
+ "learning_rate": 5.413105413105414e-05,
224
+ "loss": 2.2093,
225
+ "num_input_tokens_seen": 999700736,
226
+ "step": 400
227
+ },
228
+ {
229
+ "epoch": 3.0,
230
+ "eval_validation.parquet_accuracy": 0.8279,
231
+ "eval_validation.parquet_accuracy__0": 0.831081081081081,
232
+ "eval_validation.parquet_accuracy__1": 0.7971830985915493,
233
+ "eval_validation.parquet_accuracy__10": 0.9108589951377634,
234
+ "eval_validation.parquet_accuracy__11": 0.7390396659707724,
235
+ "eval_validation.parquet_accuracy__12": 0.8164556962025317,
236
+ "eval_validation.parquet_accuracy__13": 0.7906542056074767,
237
+ "eval_validation.parquet_accuracy__14": 0.875,
238
+ "eval_validation.parquet_accuracy__15": 0.8706088992974239,
239
+ "eval_validation.parquet_accuracy__16": 0.8806916426512968,
240
+ "eval_validation.parquet_accuracy__17": 0.8229166666666666,
241
+ "eval_validation.parquet_accuracy__18": 0.7022222222222222,
242
+ "eval_validation.parquet_accuracy__19": 0.7072368421052632,
243
+ "eval_validation.parquet_accuracy__2": 0.717948717948718,
244
+ "eval_validation.parquet_accuracy__20": 0.6097560975609756,
245
+ "eval_validation.parquet_accuracy__21": 0.7575757575757576,
246
+ "eval_validation.parquet_accuracy__22": 0.7551020408163265,
247
+ "eval_validation.parquet_accuracy__23": 0.8111111111111111,
248
+ "eval_validation.parquet_accuracy__3": 0.751937984496124,
249
+ "eval_validation.parquet_accuracy__4": 0.874251497005988,
250
+ "eval_validation.parquet_accuracy__5": 0.8125,
251
+ "eval_validation.parquet_accuracy__6": 0.623574144486692,
252
+ "eval_validation.parquet_accuracy__7": 0.6719367588932806,
253
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
254
+ "eval_validation.parquet_accuracy__9": 0.8181818181818182,
255
+ "eval_validation.parquet_accuracy_conf50": 0.8428349715468184,
256
+ "eval_validation.parquet_accuracy_conf50__0": 0.8540145985401459,
257
+ "eval_validation.parquet_accuracy_conf50__1": 0.8159057437407953,
258
+ "eval_validation.parquet_accuracy_conf50__10": 0.9169407894736842,
259
+ "eval_validation.parquet_accuracy_conf50__11": 0.7593818984547461,
260
+ "eval_validation.parquet_accuracy_conf50__12": 0.8377483443708609,
261
+ "eval_validation.parquet_accuracy_conf50__13": 0.8062015503875969,
262
+ "eval_validation.parquet_accuracy_conf50__14": 0.8918918918918919,
263
+ "eval_validation.parquet_accuracy_conf50__15": 0.8822822822822823,
264
+ "eval_validation.parquet_accuracy_conf50__16": 0.887905604719764,
265
+ "eval_validation.parquet_accuracy_conf50__17": 0.8229166666666666,
266
+ "eval_validation.parquet_accuracy_conf50__18": 0.7183098591549296,
267
+ "eval_validation.parquet_accuracy_conf50__19": 0.7353951890034365,
268
+ "eval_validation.parquet_accuracy_conf50__2": 0.7333333333333333,
269
+ "eval_validation.parquet_accuracy_conf50__20": 0.6578947368421053,
270
+ "eval_validation.parquet_accuracy_conf50__21": 0.7653061224489796,
271
+ "eval_validation.parquet_accuracy_conf50__22": 0.7808641975308642,
272
+ "eval_validation.parquet_accuracy_conf50__23": 0.8202247191011236,
273
+ "eval_validation.parquet_accuracy_conf50__3": 0.775,
274
+ "eval_validation.parquet_accuracy_conf50__4": 0.8803030303030303,
275
+ "eval_validation.parquet_accuracy_conf50__5": 0.8125,
276
+ "eval_validation.parquet_accuracy_conf50__6": 0.6470588235294118,
277
+ "eval_validation.parquet_accuracy_conf50__7": 0.7205240174672489,
278
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
279
+ "eval_validation.parquet_accuracy_conf50__9": 0.8224852071005917,
280
+ "eval_validation.parquet_accuracy_conf75": 0.9034977352793155,
281
+ "eval_validation.parquet_accuracy_conf75__0": 0.8947368421052632,
282
+ "eval_validation.parquet_accuracy_conf75__1": 0.8790613718411552,
283
+ "eval_validation.parquet_accuracy_conf75__10": 0.9492619926199262,
284
+ "eval_validation.parquet_accuracy_conf75__11": 0.8811188811188811,
285
+ "eval_validation.parquet_accuracy_conf75__12": 0.8793774319066148,
286
+ "eval_validation.parquet_accuracy_conf75__13": 0.8762135922330098,
287
+ "eval_validation.parquet_accuracy_conf75__14": 0.9235294117647059,
288
+ "eval_validation.parquet_accuracy_conf75__15": 0.9338181818181818,
289
+ "eval_validation.parquet_accuracy_conf75__16": 0.9240506329113924,
290
+ "eval_validation.parquet_accuracy_conf75__17": 0.9240506329113924,
291
+ "eval_validation.parquet_accuracy_conf75__18": 0.7861635220125787,
292
+ "eval_validation.parquet_accuracy_conf75__19": 0.8385650224215246,
293
+ "eval_validation.parquet_accuracy_conf75__2": 0.7627118644067796,
294
+ "eval_validation.parquet_accuracy_conf75__20": 0.7777777777777778,
295
+ "eval_validation.parquet_accuracy_conf75__21": 0.88,
296
+ "eval_validation.parquet_accuracy_conf75__22": 0.8735632183908046,
297
+ "eval_validation.parquet_accuracy_conf75__23": 0.8656716417910447,
298
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
299
+ "eval_validation.parquet_accuracy_conf75__4": 0.9387755102040817,
300
+ "eval_validation.parquet_accuracy_conf75__5": 0.8888888888888888,
301
+ "eval_validation.parquet_accuracy_conf75__6": 0.7543859649122807,
302
+ "eval_validation.parquet_accuracy_conf75__7": 0.7965116279069767,
303
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
304
+ "eval_validation.parquet_accuracy_conf75__9": 0.9015151515151515,
305
+ "eval_validation.parquet_accuracy_label_average": 0.7850284202694991,
306
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.801551906216693,
307
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.8712012342178551,
308
+ "eval_validation.parquet_accuracy_label_min": 0.6097560975609756,
309
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.6470588235294118,
310
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7543859649122807,
311
+ "eval_validation.parquet_loss": 0.6012852787971497,
312
+ "eval_validation.parquet_proportion_conf50": 0.9665,
313
+ "eval_validation.parquet_proportion_conf75": 0.7948,
314
+ "eval_validation.parquet_runtime": 8.4045,
315
+ "eval_validation.parquet_samples_per_second": 1189.832,
316
+ "eval_validation.parquet_steps_per_second": 37.242,
317
+ "num_input_tokens_seen": 1176307328,
318
+ "step": 471
319
+ },
320
+ {
321
+ "epoch": 3.1856,
322
+ "grad_norm": 11.125,
323
+ "learning_rate": 3.988603988603989e-05,
324
+ "loss": 2.1824,
325
+ "num_input_tokens_seen": 1250925472,
326
+ "step": 500
327
+ },
328
+ {
329
+ "epoch": 3.8256,
330
+ "grad_norm": 10.3125,
331
+ "learning_rate": 2.564102564102564e-05,
332
+ "loss": 2.1124,
333
+ "num_input_tokens_seen": 1499507040,
334
+ "step": 600
335
+ },
336
+ {
337
+ "epoch": 4.0,
338
+ "eval_validation.parquet_accuracy": 0.8296,
339
+ "eval_validation.parquet_accuracy__0": 0.8378378378378378,
340
+ "eval_validation.parquet_accuracy__1": 0.8140845070422535,
341
+ "eval_validation.parquet_accuracy__10": 0.9181523500810372,
342
+ "eval_validation.parquet_accuracy__11": 0.7536534446764092,
343
+ "eval_validation.parquet_accuracy__12": 0.8037974683544303,
344
+ "eval_validation.parquet_accuracy__13": 0.7738317757009345,
345
+ "eval_validation.parquet_accuracy__14": 0.8645833333333334,
346
+ "eval_validation.parquet_accuracy__15": 0.8600702576112412,
347
+ "eval_validation.parquet_accuracy__16": 0.8737752161383285,
348
+ "eval_validation.parquet_accuracy__17": 0.8229166666666666,
349
+ "eval_validation.parquet_accuracy__18": 0.7466666666666667,
350
+ "eval_validation.parquet_accuracy__19": 0.7138157894736842,
351
+ "eval_validation.parquet_accuracy__2": 0.782051282051282,
352
+ "eval_validation.parquet_accuracy__20": 0.6097560975609756,
353
+ "eval_validation.parquet_accuracy__21": 0.7474747474747475,
354
+ "eval_validation.parquet_accuracy__22": 0.7900874635568513,
355
+ "eval_validation.parquet_accuracy__23": 0.7722222222222223,
356
+ "eval_validation.parquet_accuracy__3": 0.751937984496124,
357
+ "eval_validation.parquet_accuracy__4": 0.8937125748502994,
358
+ "eval_validation.parquet_accuracy__5": 0.78125,
359
+ "eval_validation.parquet_accuracy__6": 0.596958174904943,
360
+ "eval_validation.parquet_accuracy__7": 0.6996047430830039,
361
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
362
+ "eval_validation.parquet_accuracy__9": 0.8181818181818182,
363
+ "eval_validation.parquet_accuracy_conf50": 0.8443869632695292,
364
+ "eval_validation.parquet_accuracy_conf50__0": 0.8540145985401459,
365
+ "eval_validation.parquet_accuracy_conf50__1": 0.8306332842415317,
366
+ "eval_validation.parquet_accuracy_conf50__10": 0.9243421052631579,
367
+ "eval_validation.parquet_accuracy_conf50__11": 0.7748344370860927,
368
+ "eval_validation.parquet_accuracy_conf50__12": 0.8278145695364238,
369
+ "eval_validation.parquet_accuracy_conf50__13": 0.7906976744186046,
370
+ "eval_validation.parquet_accuracy_conf50__14": 0.8864864864864865,
371
+ "eval_validation.parquet_accuracy_conf50__15": 0.8714714714714715,
372
+ "eval_validation.parquet_accuracy_conf50__16": 0.8825958702064897,
373
+ "eval_validation.parquet_accuracy_conf50__17": 0.8229166666666666,
374
+ "eval_validation.parquet_accuracy_conf50__18": 0.755868544600939,
375
+ "eval_validation.parquet_accuracy_conf50__19": 0.738831615120275,
376
+ "eval_validation.parquet_accuracy_conf50__2": 0.8,
377
+ "eval_validation.parquet_accuracy_conf50__20": 0.6578947368421053,
378
+ "eval_validation.parquet_accuracy_conf50__21": 0.7551020408163265,
379
+ "eval_validation.parquet_accuracy_conf50__22": 0.808641975308642,
380
+ "eval_validation.parquet_accuracy_conf50__23": 0.7808988764044944,
381
+ "eval_validation.parquet_accuracy_conf50__3": 0.775,
382
+ "eval_validation.parquet_accuracy_conf50__4": 0.9,
383
+ "eval_validation.parquet_accuracy_conf50__5": 0.78125,
384
+ "eval_validation.parquet_accuracy_conf50__6": 0.6176470588235294,
385
+ "eval_validation.parquet_accuracy_conf50__7": 0.7510917030567685,
386
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
387
+ "eval_validation.parquet_accuracy_conf50__9": 0.8284023668639053,
388
+ "eval_validation.parquet_accuracy_conf75": 0.903749370910921,
389
+ "eval_validation.parquet_accuracy_conf75__0": 0.9035087719298246,
390
+ "eval_validation.parquet_accuracy_conf75__1": 0.8916967509025271,
391
+ "eval_validation.parquet_accuracy_conf75__10": 0.9529520295202952,
392
+ "eval_validation.parquet_accuracy_conf75__11": 0.8811188811188811,
393
+ "eval_validation.parquet_accuracy_conf75__12": 0.8793774319066148,
394
+ "eval_validation.parquet_accuracy_conf75__13": 0.8640776699029126,
395
+ "eval_validation.parquet_accuracy_conf75__14": 0.9176470588235294,
396
+ "eval_validation.parquet_accuracy_conf75__15": 0.928,
397
+ "eval_validation.parquet_accuracy_conf75__16": 0.9187208527648234,
398
+ "eval_validation.parquet_accuracy_conf75__17": 0.9240506329113924,
399
+ "eval_validation.parquet_accuracy_conf75__18": 0.8238993710691824,
400
+ "eval_validation.parquet_accuracy_conf75__19": 0.8385650224215246,
401
+ "eval_validation.parquet_accuracy_conf75__2": 0.8305084745762712,
402
+ "eval_validation.parquet_accuracy_conf75__20": 0.7777777777777778,
403
+ "eval_validation.parquet_accuracy_conf75__21": 0.8666666666666667,
404
+ "eval_validation.parquet_accuracy_conf75__22": 0.8850574712643678,
405
+ "eval_validation.parquet_accuracy_conf75__23": 0.835820895522388,
406
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
407
+ "eval_validation.parquet_accuracy_conf75__4": 0.948051948051948,
408
+ "eval_validation.parquet_accuracy_conf75__5": 0.8518518518518519,
409
+ "eval_validation.parquet_accuracy_conf75__6": 0.7309941520467836,
410
+ "eval_validation.parquet_accuracy_conf75__7": 0.813953488372093,
411
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
412
+ "eval_validation.parquet_accuracy_conf75__9": 0.9090909090909091,
413
+ "eval_validation.parquet_accuracy_label_average": 0.7883033152009263,
414
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.8045538843588002,
415
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.8730195236776117,
416
+ "eval_validation.parquet_accuracy_label_min": 0.596958174904943,
417
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.6176470588235294,
418
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7309941520467836,
419
+ "eval_validation.parquet_loss": 0.5987765789031982,
420
+ "eval_validation.parquet_proportion_conf50": 0.9665,
421
+ "eval_validation.parquet_proportion_conf75": 0.7948,
422
+ "eval_validation.parquet_runtime": 8.4305,
423
+ "eval_validation.parquet_samples_per_second": 1186.174,
424
+ "eval_validation.parquet_steps_per_second": 37.127,
425
+ "num_input_tokens_seen": 1566401088,
426
+ "step": 628
427
+ },
428
+ {
429
+ "epoch": 4.4608,
430
+ "grad_norm": 7.875,
431
+ "learning_rate": 1.1396011396011397e-05,
432
+ "loss": 2.0751,
433
+ "num_input_tokens_seen": 1745927840,
434
+ "step": 700
435
+ },
436
+ {
437
+ "epoch": 4.9728,
438
+ "eval_validation.parquet_accuracy": 0.8297,
439
+ "eval_validation.parquet_accuracy__0": 0.8378378378378378,
440
+ "eval_validation.parquet_accuracy__1": 0.8014084507042254,
441
+ "eval_validation.parquet_accuracy__10": 0.9173419773095624,
442
+ "eval_validation.parquet_accuracy__11": 0.755741127348643,
443
+ "eval_validation.parquet_accuracy__12": 0.8006329113924051,
444
+ "eval_validation.parquet_accuracy__13": 0.7700934579439253,
445
+ "eval_validation.parquet_accuracy__14": 0.8489583333333334,
446
+ "eval_validation.parquet_accuracy__15": 0.8676814988290398,
447
+ "eval_validation.parquet_accuracy__16": 0.8731988472622478,
448
+ "eval_validation.parquet_accuracy__17": 0.8229166666666666,
449
+ "eval_validation.parquet_accuracy__18": 0.72,
450
+ "eval_validation.parquet_accuracy__19": 0.743421052631579,
451
+ "eval_validation.parquet_accuracy__2": 0.7564102564102564,
452
+ "eval_validation.parquet_accuracy__20": 0.6341463414634146,
453
+ "eval_validation.parquet_accuracy__21": 0.7575757575757576,
454
+ "eval_validation.parquet_accuracy__22": 0.7871720116618076,
455
+ "eval_validation.parquet_accuracy__23": 0.7944444444444444,
456
+ "eval_validation.parquet_accuracy__3": 0.751937984496124,
457
+ "eval_validation.parquet_accuracy__4": 0.8907185628742516,
458
+ "eval_validation.parquet_accuracy__5": 0.8125,
459
+ "eval_validation.parquet_accuracy__6": 0.596958174904943,
460
+ "eval_validation.parquet_accuracy__7": 0.6996047430830039,
461
+ "eval_validation.parquet_accuracy__8": 0.8928571428571429,
462
+ "eval_validation.parquet_accuracy__9": 0.8125,
463
+ "eval_validation.parquet_accuracy_conf50": 0.844593895499224,
464
+ "eval_validation.parquet_accuracy_conf50__0": 0.8540145985401459,
465
+ "eval_validation.parquet_accuracy_conf50__1": 0.8188512518409425,
466
+ "eval_validation.parquet_accuracy_conf50__10": 0.9235197368421053,
467
+ "eval_validation.parquet_accuracy_conf50__11": 0.7770419426048565,
468
+ "eval_validation.parquet_accuracy_conf50__12": 0.8278145695364238,
469
+ "eval_validation.parquet_accuracy_conf50__13": 0.7868217054263565,
470
+ "eval_validation.parquet_accuracy_conf50__14": 0.8702702702702703,
471
+ "eval_validation.parquet_accuracy_conf50__15": 0.8786786786786787,
472
+ "eval_validation.parquet_accuracy_conf50__16": 0.8814159292035398,
473
+ "eval_validation.parquet_accuracy_conf50__17": 0.8229166666666666,
474
+ "eval_validation.parquet_accuracy_conf50__18": 0.7370892018779343,
475
+ "eval_validation.parquet_accuracy_conf50__19": 0.7697594501718213,
476
+ "eval_validation.parquet_accuracy_conf50__2": 0.7733333333333333,
477
+ "eval_validation.parquet_accuracy_conf50__20": 0.6578947368421053,
478
+ "eval_validation.parquet_accuracy_conf50__21": 0.7653061224489796,
479
+ "eval_validation.parquet_accuracy_conf50__22": 0.8055555555555556,
480
+ "eval_validation.parquet_accuracy_conf50__23": 0.8033707865168539,
481
+ "eval_validation.parquet_accuracy_conf50__3": 0.775,
482
+ "eval_validation.parquet_accuracy_conf50__4": 0.896969696969697,
483
+ "eval_validation.parquet_accuracy_conf50__5": 0.8125,
484
+ "eval_validation.parquet_accuracy_conf50__6": 0.6176470588235294,
485
+ "eval_validation.parquet_accuracy_conf50__7": 0.7510917030567685,
486
+ "eval_validation.parquet_accuracy_conf50__8": 0.8928571428571429,
487
+ "eval_validation.parquet_accuracy_conf50__9": 0.8224852071005917,
488
+ "eval_validation.parquet_accuracy_conf75": 0.903749370910921,
489
+ "eval_validation.parquet_accuracy_conf75__0": 0.9035087719298246,
490
+ "eval_validation.parquet_accuracy_conf75__1": 0.8844765342960289,
491
+ "eval_validation.parquet_accuracy_conf75__10": 0.9520295202952029,
492
+ "eval_validation.parquet_accuracy_conf75__11": 0.8846153846153846,
493
+ "eval_validation.parquet_accuracy_conf75__12": 0.8793774319066148,
494
+ "eval_validation.parquet_accuracy_conf75__13": 0.8640776699029126,
495
+ "eval_validation.parquet_accuracy_conf75__14": 0.9058823529411765,
496
+ "eval_validation.parquet_accuracy_conf75__15": 0.9330909090909091,
497
+ "eval_validation.parquet_accuracy_conf75__16": 0.9173884077281812,
498
+ "eval_validation.parquet_accuracy_conf75__17": 0.9240506329113924,
499
+ "eval_validation.parquet_accuracy_conf75__18": 0.8050314465408805,
500
+ "eval_validation.parquet_accuracy_conf75__19": 0.8609865470852018,
501
+ "eval_validation.parquet_accuracy_conf75__2": 0.7966101694915254,
502
+ "eval_validation.parquet_accuracy_conf75__20": 0.7777777777777778,
503
+ "eval_validation.parquet_accuracy_conf75__21": 0.88,
504
+ "eval_validation.parquet_accuracy_conf75__22": 0.8812260536398467,
505
+ "eval_validation.parquet_accuracy_conf75__23": 0.8507462686567164,
506
+ "eval_validation.parquet_accuracy_conf75__3": 0.8390804597701149,
507
+ "eval_validation.parquet_accuracy_conf75__4": 0.9461966604823747,
508
+ "eval_validation.parquet_accuracy_conf75__5": 0.8888888888888888,
509
+ "eval_validation.parquet_accuracy_conf75__6": 0.7309941520467836,
510
+ "eval_validation.parquet_accuracy_conf75__7": 0.8081395348837209,
511
+ "eval_validation.parquet_accuracy_conf75__8": 0.94,
512
+ "eval_validation.parquet_accuracy_conf75__9": 0.9090909090909091,
513
+ "eval_validation.parquet_accuracy_label_average": 0.7894190658762755,
514
+ "eval_validation.parquet_accuracy_label_average_conf50": 0.805091889381846,
515
+ "eval_validation.parquet_accuracy_label_average_conf75": 0.873469436832182,
516
+ "eval_validation.parquet_accuracy_label_min": 0.596958174904943,
517
+ "eval_validation.parquet_accuracy_label_min_conf50": 0.6176470588235294,
518
+ "eval_validation.parquet_accuracy_label_min_conf75": 0.7309941520467836,
519
+ "eval_validation.parquet_loss": 0.5989147424697876,
520
+ "eval_validation.parquet_proportion_conf50": 0.9665,
521
+ "eval_validation.parquet_proportion_conf75": 0.7948,
522
+ "eval_validation.parquet_runtime": 8.5755,
523
+ "eval_validation.parquet_samples_per_second": 1166.109,
524
+ "eval_validation.parquet_steps_per_second": 36.499,
525
+ "num_input_tokens_seen": 1949274656,
526
+ "step": 780
527
+ },
528
+ {
529
+ "epoch": 4.9728,
530
+ "num_input_tokens_seen": 1949274656,
531
+ "step": 780,
532
+ "total_flos": 1.297523316772307e+18,
533
+ "train_loss": 2.327354235526843,
534
+ "train_runtime": 577.2472,
535
+ "train_samples_per_second": 692.944,
536
+ "train_steps_per_second": 1.351
537
+ }
538
+ ],
539
+ "logging_steps": 100,
540
+ "max_steps": 780,
541
+ "num_input_tokens_seen": 1949274656,
542
+ "num_train_epochs": 5,
543
+ "save_steps": 500,
544
+ "stateful_callbacks": {
545
+ "TrainerControl": {
546
+ "args": {
547
+ "should_epoch_stop": false,
548
+ "should_evaluate": false,
549
+ "should_log": false,
550
+ "should_save": true,
551
+ "should_training_stop": true
552
+ },
553
+ "attributes": {}
554
+ }
555
+ },
556
+ "total_flos": 1.297523316772307e+18,
557
+ "train_batch_size": 32,
558
+ "trial_name": null,
559
+ "trial_params": null
560
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48e0b28b20cfc2427b664a74c37f3f8f8589db64657a770b1a4f827b83141f23
3
+ size 6840