dongdaxiang commited on
Commit
080a3a8
·
verified ·
1 Parent(s): 33bc18f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ example/scene_ocr.png filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SPDX-License-Identifier: MIT AND LicenseRef-Meta-Llama-3.1-Community
2
+
3
+ Composite License: MIT (for Original Contributions) + Llama 3.1 Community License (for Llama Materials)
4
+
5
+ === Section A — MIT License (applies to Original Contributions by the Project) ===
6
+
7
+ MIT License
8
+
9
+ Copyright (c) 2025 Qianfan
10
+
11
+ Permission is hereby granted, free of charge, to any person obtaining a copy
12
+ of this software and associated documentation files (the "Software"), to deal
13
+ in the Software without restriction, including without limitation the rights
14
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
+ copies of the Software, and to permit persons to whom the Software is
16
+ furnished to do so, subject to the following conditions:
17
+
18
+ The above copyright notice and this permission notice shall be included in all
19
+ copies or substantial portions of the Software.
20
+
21
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
+ SOFTWARE.
28
+
29
+
30
+ === Section B — Llama 3.1 Community License (applies to any Meta Llama 3.1 materials included here or derivatives thereof) ===
31
+
32
+ LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
33
+ Llama 3.1 Version Release Date: July 23, 2024
34
+
35
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
36
+ Llama Materials set forth herein.
37
+
38
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.1
39
+ distributed by Meta at https://llama.meta.com/doc/overview.
40
+
41
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
42
+ this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
43
+ regulations to provide legal consent and that has legal authority to bind your employer or such other
44
+ person or entity if you are entering in this Agreement on their behalf.
45
+
46
+ “Llama 3.1” means the foundational large language models and software and algorithms, including
47
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
48
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
49
+ https://llama.meta.com/llama-downloads.
50
+
51
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any
52
+ portion thereof) made available under this Agreement.
53
+
54
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
55
+ principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
56
+ outside of the EEA or Switzerland).
57
+
58
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
59
+ you agree to be bound by this Agreement.
60
+
61
+ 1. License Rights and Redistribution.
62
+
63
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
64
+ limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
65
+ Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
66
+ Llama Materials.
67
+
68
+ b. Redistribution and Use.
69
+
70
+ i. If you distribute or make available the Llama Materials (or any derivative works
71
+ thereof), or a product or service (including another AI model) that contains any of them, you shall (A)
72
+ provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with
73
+ Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use
74
+ the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
75
+ otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at
76
+ the beginning of any such AI model name.
77
+
78
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
79
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
80
+
81
+ iii. You must retain in all copies of the Llama Materials that you distribute the following
82
+ attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is
83
+ licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights
84
+ Reserved.”
85
+
86
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
87
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
88
+ Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by
89
+ reference into this Agreement.
90
+
91
+ 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users
92
+ of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
93
+ million monthly active users in the preceding calendar month, you must request a license from Meta,
94
+ which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
95
+ rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
96
+
97
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
98
+ OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
99
+ ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
100
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
101
+ MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
102
+ DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
103
+ ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
104
+ RESULTS.
105
+
106
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
107
+ LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
108
+ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
109
+ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
110
+ OF THE POSSIBILITY OF ANY OF THE FOREGOING.
111
+
112
+ 5. Intellectual Property.
113
+
114
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama
115
+ Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
116
+ or any of its affiliates, except as required for reasonable and customary use in describing and
117
+ redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
118
+ use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
119
+ comply with Meta’s brand guidelines (currently accessible at
120
+ https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
121
+ of the Mark will inure to the benefit of Meta.
122
+
123
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
124
+ respect to any derivative works and modifications of the Llama Materials that are made by you, as
125
+ between you and Meta, you are and will be the owner of such derivative works and modifications.
126
+
127
+ c. If you institute litigation or other proceedings against Meta or any entity (including a
128
+ cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or
129
+ results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
130
+ rights owned or licensable by you, then any licenses granted to you under this Agreement shall
131
+ terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
132
+ harmless Meta from and against any claim by any third party arising out of or related to your use or
133
+ distribution of the Llama Materials.
134
+
135
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
136
+ Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
137
+ accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
138
+ breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
139
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
140
+ Agreement.
141
+
142
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
143
+ the State of California without regard to choice of law principles, and the UN Convention on Contracts
144
+ for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
145
+ exclusive jurisdiction of any dispute arising out of this Agreement.
146
+
147
+
148
+ === Scope Clarification (Non‑operative summary) ===
149
+ - Section A (MIT) covers only the Project’s original contributions authored by Qianfan.
150
+ - Section B (Llama 3.1 Community License) governs any included Llama Materials and any derivatives thereof (e.g., fine‑tuned weights).
151
+ - In the event of any conflict, the applicable license for the relevant component controls (MIT for original contributions; Llama 3.1 for Llama Materials).
NOTICE ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
2
+
3
+ Built with Llama
config.json ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": null,
3
+ "architectures": [
4
+ "QianfanVLChatModel"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "configuration_qianfanvl_chat.QianfanVLChatConfig",
8
+ "AutoModel": "modeling_qianfanvl_chat.QianfanVLChatModel",
9
+ "AutoModelForCausalLM": "modeling_qianfanvl_chat.QianfanVLChatModel"
10
+ },
11
+ "downsample_ratio": 0.5,
12
+ "dynamic_image_size": true,
13
+ "force_image_size": 448,
14
+ "hidden_size": 4096,
15
+ "llm_config": {
16
+ "_name_or_path": "",
17
+ "add_cross_attention": false,
18
+ "architectures": [
19
+ "LlamaForCausalLM"
20
+ ],
21
+ "attention_bias": false,
22
+ "attention_dropout": 0.0,
23
+ "attn_implementation": "eager",
24
+ "bad_words_ids": null,
25
+ "begin_suppress_tokens": null,
26
+ "bos_token_id": 181887,
27
+ "chunk_size_feed_forward": 0,
28
+ "cross_attention_hidden_size": null,
29
+ "decoder_start_token_id": null,
30
+ "diversity_penalty": 0.0,
31
+ "do_sample": false,
32
+ "early_stopping": false,
33
+ "encoder_no_repeat_ngram_size": 0,
34
+ "eos_token_id": 181888,
35
+ "exponential_decay_length_penalty": null,
36
+ "finetuning_task": null,
37
+ "forced_bos_token_id": null,
38
+ "forced_eos_token_id": null,
39
+ "hidden_act": "silu",
40
+ "hidden_size": 4096,
41
+ "id2label": {
42
+ "0": "LABEL_0",
43
+ "1": "LABEL_1"
44
+ },
45
+ "initializer_range": 0.02,
46
+ "intermediate_size": 14336,
47
+ "is_decoder": false,
48
+ "is_encoder_decoder": false,
49
+ "label2id": {
50
+ "LABEL_0": 0,
51
+ "LABEL_1": 1
52
+ },
53
+ "length_penalty": 1.0,
54
+ "max_length": 20,
55
+ "max_position_embeddings": 32768,
56
+ "min_length": 0,
57
+ "model_type": "llama",
58
+ "no_repeat_ngram_size": 0,
59
+ "num_attention_heads": 32,
60
+ "num_beam_groups": 1,
61
+ "num_beams": 1,
62
+ "num_hidden_layers": 32,
63
+ "num_key_value_heads": 8,
64
+ "num_return_sequences": 1,
65
+ "output_attentions": false,
66
+ "output_hidden_states": false,
67
+ "output_scores": false,
68
+ "pad_token_id": null,
69
+ "prefix": null,
70
+ "pretraining_tp": 1,
71
+ "problem_type": null,
72
+ "pruned_heads": {},
73
+ "remove_invalid_values": false,
74
+ "repetition_penalty": 1.0,
75
+ "return_dict": true,
76
+ "return_dict_in_generate": false,
77
+ "rms_norm_eps": 1e-05,
78
+ "rope_scaling": null,
79
+ "rope_theta": 5000000.0,
80
+ "sep_token_id": null,
81
+ "suppress_tokens": null,
82
+ "task_specific_params": null,
83
+ "temperature": 1.0,
84
+ "tf_legacy_loss": false,
85
+ "tie_encoder_decoder": false,
86
+ "tie_word_embeddings": false,
87
+ "tokenizer_class": null,
88
+ "top_k": 50,
89
+ "top_p": 1.0,
90
+ "torch_dtype": "bfloat16",
91
+ "torchscript": false,
92
+ "transformers_version": "4.37.2",
93
+ "typical_p": 1.0,
94
+ "use_bfloat16": false,
95
+ "use_cache": true,
96
+ "vocab_size": 182025
97
+ },
98
+ "max_dynamic_patch": 12,
99
+ "min_dynamic_patch": 1,
100
+ "model_type": "qianfanvl_chat",
101
+ "pad2square": false,
102
+ "ps_version": "v2",
103
+ "select_layer": -1,
104
+ "template": "qianfanvl",
105
+ "tie_word_embeddings": false,
106
+ "torch_dtype": "bfloat16",
107
+ "transformers_version": null,
108
+ "use_backbone_lora": 0,
109
+ "use_llm_lora": 0,
110
+ "use_thumbnail": true,
111
+ "vision_config": {
112
+ "_name_or_path": "",
113
+ "add_cross_attention": false,
114
+ "architectures": [
115
+ "InternVisionModel"
116
+ ],
117
+ "attention_dropout": 0.0,
118
+ "auto_map": {
119
+ "AutoConfig": "configuration_intern_vit.InternVisionConfig",
120
+ "AutoModel": "modeling_intern_vit.InternVisionModel"
121
+ },
122
+ "bad_words_ids": null,
123
+ "begin_suppress_tokens": null,
124
+ "bos_token_id": null,
125
+ "chunk_size_feed_forward": 0,
126
+ "cross_attention_hidden_size": null,
127
+ "decoder_start_token_id": null,
128
+ "diversity_penalty": 0.0,
129
+ "do_sample": false,
130
+ "drop_path_rate": 0.0,
131
+ "dropout": 0.0,
132
+ "early_stopping": false,
133
+ "encoder_no_repeat_ngram_size": 0,
134
+ "eos_token_id": null,
135
+ "exponential_decay_length_penalty": null,
136
+ "finetuning_task": null,
137
+ "forced_bos_token_id": null,
138
+ "forced_eos_token_id": null,
139
+ "hidden_act": "gelu",
140
+ "hidden_size": 1024,
141
+ "id2label": {
142
+ "0": "LABEL_0",
143
+ "1": "LABEL_1"
144
+ },
145
+ "image_size": 448,
146
+ "initializer_factor": 1.0,
147
+ "initializer_range": 0.02,
148
+ "intermediate_size": 4096,
149
+ "is_decoder": false,
150
+ "is_encoder_decoder": false,
151
+ "label2id": {
152
+ "LABEL_0": 0,
153
+ "LABEL_1": 1
154
+ },
155
+ "layer_norm_eps": 1e-06,
156
+ "length_penalty": 1.0,
157
+ "max_length": 20,
158
+ "min_length": 0,
159
+ "model_type": "intern_vit_6b",
160
+ "no_repeat_ngram_size": 0,
161
+ "norm_type": "layer_norm",
162
+ "num_attention_heads": 16,
163
+ "num_beam_groups": 1,
164
+ "num_beams": 1,
165
+ "num_channels": 3,
166
+ "num_hidden_layers": 24,
167
+ "num_return_sequences": 1,
168
+ "output_attentions": false,
169
+ "output_hidden_states": false,
170
+ "output_scores": false,
171
+ "pad_token_id": null,
172
+ "patch_size": 14,
173
+ "prefix": null,
174
+ "problem_type": null,
175
+ "pruned_heads": {},
176
+ "qk_normalization": false,
177
+ "qkv_bias": true,
178
+ "remove_invalid_values": false,
179
+ "repetition_penalty": 1.0,
180
+ "return_dict": true,
181
+ "return_dict_in_generate": false,
182
+ "sep_token_id": null,
183
+ "suppress_tokens": null,
184
+ "task_specific_params": null,
185
+ "temperature": 1.0,
186
+ "tf_legacy_loss": false,
187
+ "tie_encoder_decoder": false,
188
+ "tie_word_embeddings": true,
189
+ "tokenizer_class": null,
190
+ "top_k": 50,
191
+ "top_p": 1.0,
192
+ "torch_dtype": "bfloat16",
193
+ "torchscript": false,
194
+ "transformers_version": "4.37.2",
195
+ "typical_p": 1.0,
196
+ "use_bfloat16": false,
197
+ "use_flash_attn": false
198
+ }
199
+ }
configuration_intern_vit.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # --------------------------------------------------------
2
+ # InternVL
3
+ # Copyright (c) 2024 OpenGVLab
4
+ # Licensed under The MIT License [see LICENSE for details]
5
+ # --------------------------------------------------------
6
+
7
+ import os
8
+ from typing import Union
9
+
10
+ from transformers.configuration_utils import PretrainedConfig
11
+ from transformers.utils import logging
12
+
13
+ logger = logging.get_logger(__name__)
14
+
15
+
16
+ class InternVisionConfig(PretrainedConfig):
17
+ r"""
18
+ This is the configuration class to store the configuration of a [`InternVisionModel`]. It is used to
19
+ instantiate a vision encoder according to the specified arguments, defining the model architecture.
20
+
21
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
22
+ documentation from [`PretrainedConfig`] for more information.
23
+
24
+ Args:
25
+ num_channels (`int`, *optional*, defaults to 3):
26
+ Number of color channels in the input images (e.g., 3 for RGB).
27
+ patch_size (`int`, *optional*, defaults to 14):
28
+ The size (resolution) of each patch.
29
+ image_size (`int`, *optional*, defaults to 224):
30
+ The size (resolution) of each image.
31
+ qkv_bias (`bool`, *optional*, defaults to `False`):
32
+ Whether to add a bias to the queries and values in the self-attention layers.
33
+ hidden_size (`int`, *optional*, defaults to 3200):
34
+ Dimensionality of the encoder layers and the pooler layer.
35
+ num_attention_heads (`int`, *optional*, defaults to 25):
36
+ Number of attention heads for each attention layer in the Transformer encoder.
37
+ intermediate_size (`int`, *optional*, defaults to 12800):
38
+ Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
39
+ qk_normalization (`bool`, *optional*, defaults to `True`):
40
+ Whether to normalize the queries and keys in the self-attention layers.
41
+ num_hidden_layers (`int`, *optional*, defaults to 48):
42
+ Number of hidden layers in the Transformer encoder.
43
+ use_flash_attn (`bool`, *optional*, defaults to `True`):
44
+ Whether to use flash attention mechanism.
45
+ hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
46
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
47
+ `"relu"`, `"selu"` and `"gelu_new"` ``"gelu"` are supported.
48
+ layer_norm_eps (`float`, *optional*, defaults to 1e-6):
49
+ The epsilon used by the layer normalization layers.
50
+ dropout (`float`, *optional*, defaults to 0.0):
51
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
52
+ drop_path_rate (`float`, *optional*, defaults to 0.0):
53
+ Dropout rate for stochastic depth.
54
+ attention_dropout (`float`, *optional*, defaults to 0.0):
55
+ The dropout ratio for the attention probabilities.
56
+ initializer_range (`float`, *optional*, defaults to 0.02):
57
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
58
+ initializer_factor (`float`, *optional*, defaults to 0.1):
59
+ A factor for layer scale.
60
+ """
61
+
62
+ model_type = 'intern_vit_6b'
63
+
64
+ def __init__(
65
+ self,
66
+ num_channels=3,
67
+ patch_size=14,
68
+ image_size=224,
69
+ qkv_bias=False,
70
+ hidden_size=3200,
71
+ num_attention_heads=25,
72
+ intermediate_size=12800,
73
+ qk_normalization=True,
74
+ num_hidden_layers=48,
75
+ use_flash_attn=True,
76
+ hidden_act='gelu',
77
+ norm_type='rms_norm',
78
+ layer_norm_eps=1e-6,
79
+ dropout=0.0,
80
+ drop_path_rate=0.0,
81
+ attention_dropout=0.0,
82
+ initializer_range=0.02,
83
+ initializer_factor=0.1,
84
+ **kwargs,
85
+ ):
86
+ super().__init__(**kwargs)
87
+
88
+ self.hidden_size = hidden_size
89
+ self.intermediate_size = intermediate_size
90
+ self.dropout = dropout
91
+ self.drop_path_rate = drop_path_rate
92
+ self.num_hidden_layers = num_hidden_layers
93
+ self.num_attention_heads = num_attention_heads
94
+ self.num_channels = num_channels
95
+ self.patch_size = patch_size
96
+ self.image_size = image_size
97
+ self.initializer_range = initializer_range
98
+ self.initializer_factor = initializer_factor
99
+ self.attention_dropout = attention_dropout
100
+ self.layer_norm_eps = layer_norm_eps
101
+ self.hidden_act = hidden_act
102
+ self.norm_type = norm_type
103
+ self.qkv_bias = qkv_bias
104
+ self.qk_normalization = qk_normalization
105
+ self.use_flash_attn = use_flash_attn
106
+
107
+ @classmethod
108
+ def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> 'PretrainedConfig':
109
+ config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
110
+
111
+ if 'vision_config' in config_dict:
112
+ config_dict = config_dict['vision_config']
113
+
114
+ if 'model_type' in config_dict and hasattr(cls, 'model_type') and config_dict['model_type'] != cls.model_type:
115
+ logger.warning(
116
+ f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
117
+ f'{cls.model_type}. This is not supported for all configurations of models and can yield errors.'
118
+ )
119
+
120
+ return cls.from_dict(config_dict, **kwargs)
configuration_qianfanvl_chat.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2025 Qianfan
2
+ # Licensed under the MIT License. See LICENSE file in the project root for full license information.
3
+ import copy
4
+
5
+ from transformers import AutoConfig, LlamaConfig
6
+ from transformers.configuration_utils import PretrainedConfig
7
+ from transformers.utils import logging
8
+
9
+ from .configuration_intern_vit import InternVisionConfig
10
+
11
+ logger = logging.get_logger(__name__)
12
+
13
+
14
+ class QianfanVLChatConfig(PretrainedConfig):
15
+ model_type = 'qianfanvl_chat'
16
+ is_composition = True
17
+
18
+ def __init__(
19
+ self,
20
+ vision_config=None,
21
+ llm_config=None,
22
+ use_backbone_lora=0,
23
+ use_llm_lora=0,
24
+ pad2square=False,
25
+ select_layer=-1,
26
+ force_image_size=None,
27
+ downsample_ratio=0.5,
28
+ template=None,
29
+ dynamic_image_size=False,
30
+ use_thumbnail=False,
31
+ ps_version='v1',
32
+ min_dynamic_patch=1,
33
+ max_dynamic_patch=6,
34
+ **kwargs):
35
+ super().__init__(**kwargs)
36
+ if vision_config is None:
37
+ vision_config = {'architectures': ['InternVisionModel']}
38
+ if llm_config is None:
39
+ llm_config = {'architectures': ['LlamaForCausalLM']}
40
+ self.vision_config = InternVisionConfig(**vision_config)
41
+ self.llm_config = LlamaConfig(**llm_config)
42
+ self.text_config = self.llm_config.get_text_config()
43
+ self.use_backbone_lora = use_backbone_lora
44
+ self.use_llm_lora = use_llm_lora
45
+ self.pad2square = pad2square
46
+ self.select_layer = select_layer
47
+ self.force_image_size = force_image_size
48
+ self.downsample_ratio = downsample_ratio
49
+ self.template = template
50
+ self.dynamic_image_size = dynamic_image_size
51
+ self.use_thumbnail = use_thumbnail
52
+ self.ps_version = ps_version # pixel shuffle version
53
+ self.min_dynamic_patch = min_dynamic_patch
54
+ self.max_dynamic_patch = max_dynamic_patch
55
+
56
+ self.hidden_size = self.llm_config.hidden_size
57
+ self.tie_word_embeddings = False
58
+ self.llm_config.tie_word_embeddings = self.tie_word_embeddings
59
+
60
+ def to_dict(self):
61
+ """
62
+ Serializes this instance to a Python dictionary. Override the default [`~PretrainedConfig.to_dict`].
63
+
64
+ Returns:
65
+ `Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
66
+ """
67
+ output = copy.deepcopy(self.__dict__)
68
+ output['vision_config'] = self.vision_config.to_dict()
69
+ output['llm_config'] = self.llm_config.to_dict()
70
+ output['text_config'] = self.text_config.to_dict()
71
+ output['model_type'] = self.__class__.model_type
72
+ output['use_backbone_lora'] = self.use_backbone_lora
73
+ output['use_llm_lora'] = self.use_llm_lora
74
+ output['select_layer'] = self.select_layer
75
+ output['force_image_size'] = self.force_image_size
76
+ output['downsample_ratio'] = self.downsample_ratio
77
+ output['template'] = self.template
78
+ output['dynamic_image_size'] = self.dynamic_image_size
79
+ output['use_thumbnail'] = self.use_thumbnail
80
+ output['ps_version'] = self.ps_version
81
+ output['min_dynamic_patch'] = self.min_dynamic_patch
82
+ output['max_dynamic_patch'] = self.max_dynamic_patch
83
+
84
+ return output
conversation.py ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Conversation prompt templates.
3
+
4
+ We kindly request that you import fastchat instead of copying this file if you wish to use it.
5
+ If you have changes in mind, please contribute back so the community can benefit collectively and continue to maintain these valuable templates.
6
+
7
+ Modified from https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
8
+ """
9
+
10
+ import dataclasses
11
+ from enum import IntEnum, auto
12
+ from typing import Any, Dict, List, Tuple, Union
13
+
14
+
15
+ class SeparatorStyle(IntEnum):
16
+ """Separator styles."""
17
+
18
+ ADD_COLON_SINGLE = auto()
19
+ ADD_COLON_TWO = auto()
20
+ ADD_COLON_SPACE_SINGLE = auto()
21
+ NO_COLON_SINGLE = auto()
22
+ NO_COLON_TWO = auto()
23
+ ADD_NEW_LINE_SINGLE = auto()
24
+ LLAMA2 = auto()
25
+ CHATGLM = auto()
26
+ CHATML = auto()
27
+ CHATINTERN = auto()
28
+ DOLLY = auto()
29
+ RWKV = auto()
30
+ PHOENIX = auto()
31
+ ROBIN = auto()
32
+ FALCON_CHAT = auto()
33
+ CHATGLM3 = auto()
34
+ INTERNVL_ZH = auto()
35
+ MPT = auto()
36
+
37
+
38
+ @dataclasses.dataclass
39
+ class Conversation:
40
+ """A class that manages prompt templates and keeps all conversation history."""
41
+
42
+ # The name of this template
43
+ name: str
44
+ # The template of the system prompt
45
+ system_template: str = '{system_message}'
46
+ # The system message
47
+ system_message: str = ''
48
+ # The names of two roles
49
+ roles: Tuple[str] = ('USER', 'ASSISTANT')
50
+ # All messages. Each item is (role, message).
51
+ messages: List[List[str]] = ()
52
+ # The number of few shot examples
53
+ offset: int = 0
54
+ # The separator style and configurations
55
+ sep_style: SeparatorStyle = SeparatorStyle.ADD_COLON_SINGLE
56
+ sep: str = '\n'
57
+ sep2: str = None
58
+ # Stop criteria (the default one is EOS token)
59
+ stop_str: Union[str, List[str]] = None
60
+ # Stops generation if meeting any token in this list
61
+ stop_token_ids: List[int] = None
62
+
63
+ def get_prompt(self) -> str:
64
+ """Get the prompt for generation."""
65
+ system_prompt = self.system_template.format(system_message=self.system_message)
66
+ if self.sep_style == SeparatorStyle.ADD_COLON_SINGLE:
67
+ ret = system_prompt + self.sep
68
+ for role, message in self.messages:
69
+ if message:
70
+ ret += role + ': ' + message + self.sep
71
+ else:
72
+ ret += role + ':'
73
+ return ret
74
+ elif self.sep_style == SeparatorStyle.ADD_COLON_TWO:
75
+ seps = [self.sep, self.sep2]
76
+ ret = system_prompt + seps[0]
77
+ for i, (role, message) in enumerate(self.messages):
78
+ if message:
79
+ ret += role + ': ' + message + seps[i % 2]
80
+ else:
81
+ ret += role + ':'
82
+ return ret
83
+ elif self.sep_style == SeparatorStyle.ADD_COLON_SPACE_SINGLE:
84
+ ret = system_prompt + self.sep
85
+ for role, message in self.messages:
86
+ if message:
87
+ ret += role + ': ' + message + self.sep
88
+ else:
89
+ ret += role + ': ' # must be end with a space
90
+ return ret
91
+ elif self.sep_style == SeparatorStyle.ADD_NEW_LINE_SINGLE:
92
+ ret = '' if system_prompt == '' else system_prompt + self.sep
93
+ for role, message in self.messages:
94
+ if message:
95
+ ret += role + '\n' + message + self.sep
96
+ else:
97
+ ret += role + '\n'
98
+ return ret
99
+ elif self.sep_style == SeparatorStyle.NO_COLON_SINGLE:
100
+ ret = system_prompt
101
+ for role, message in self.messages:
102
+ if message:
103
+ ret += role + message + self.sep
104
+ else:
105
+ ret += role
106
+ return ret
107
+ elif self.sep_style == SeparatorStyle.NO_COLON_TWO:
108
+ seps = [self.sep, self.sep2]
109
+ ret = system_prompt
110
+ for i, (role, message) in enumerate(self.messages):
111
+ if message:
112
+ ret += role + message + seps[i % 2]
113
+ else:
114
+ ret += role
115
+ return ret
116
+ elif self.sep_style == SeparatorStyle.RWKV:
117
+ ret = system_prompt
118
+ for i, (role, message) in enumerate(self.messages):
119
+ if message:
120
+ ret += (
121
+ role
122
+ + ': '
123
+ + message.replace('\r\n', '\n').replace('\n\n', '\n')
124
+ )
125
+ ret += '\n\n'
126
+ else:
127
+ ret += role + ':'
128
+ return ret
129
+ elif self.sep_style == SeparatorStyle.LLAMA2:
130
+ seps = [self.sep, self.sep2]
131
+ if self.system_message:
132
+ ret = system_prompt
133
+ else:
134
+ ret = '[INST] '
135
+ for i, (role, message) in enumerate(self.messages):
136
+ tag = self.roles[i % 2]
137
+ if message:
138
+ if i == 0:
139
+ ret += message + ' '
140
+ else:
141
+ ret += tag + ' ' + message + seps[i % 2]
142
+ else:
143
+ ret += tag
144
+ return ret
145
+ elif self.sep_style == SeparatorStyle.CHATGLM:
146
+ # source: https://huggingface.co/THUDM/chatglm-6b/blob/1d240ba371910e9282298d4592532d7f0f3e9f3e/modeling_chatglm.py#L1302-L1308
147
+ # source2: https://huggingface.co/THUDM/chatglm2-6b/blob/e186c891cf64310ac66ef10a87e6635fa6c2a579/modeling_chatglm.py#L926
148
+ round_add_n = 1 if self.name == 'chatglm2' else 0
149
+ if system_prompt:
150
+ ret = system_prompt + self.sep
151
+ else:
152
+ ret = ''
153
+
154
+ for i, (role, message) in enumerate(self.messages):
155
+ if i % 2 == 0:
156
+ ret += f'[Round {i//2 + round_add_n}]{self.sep}'
157
+
158
+ if message:
159
+ ret += f'{role}:{message}{self.sep}'
160
+ else:
161
+ ret += f'{role}:'
162
+ return ret
163
+ elif self.sep_style == SeparatorStyle.CHATML:
164
+ ret = '' if system_prompt == '' else system_prompt + self.sep + '\n'
165
+ for role, message in self.messages:
166
+ if message:
167
+ ret += role + '\n' + message + self.sep + '\n'
168
+ else:
169
+ ret += role + '\n'
170
+ return ret
171
+ elif self.sep_style == SeparatorStyle.CHATGLM3:
172
+ ret = ''
173
+ if self.system_message:
174
+ ret += system_prompt
175
+ for role, message in self.messages:
176
+ if message:
177
+ ret += role + '\n' + ' ' + message
178
+ else:
179
+ ret += role
180
+ return ret
181
+ elif self.sep_style == SeparatorStyle.CHATINTERN:
182
+ # source: https://huggingface.co/internlm/internlm-chat-7b-8k/blob/bd546fa984b4b0b86958f56bf37f94aa75ab8831/modeling_internlm.py#L771
183
+ seps = [self.sep, self.sep2]
184
+ ret = system_prompt
185
+ for i, (role, message) in enumerate(self.messages):
186
+ # if i % 2 == 0:
187
+ # ret += "<s>"
188
+ if message:
189
+ ret += role + ':' + message + seps[i % 2] + '\n'
190
+ else:
191
+ ret += role + ':'
192
+ return ret
193
+ elif self.sep_style == SeparatorStyle.DOLLY:
194
+ seps = [self.sep, self.sep2]
195
+ ret = system_prompt
196
+ for i, (role, message) in enumerate(self.messages):
197
+ if message:
198
+ ret += role + ':\n' + message + seps[i % 2]
199
+ if i % 2 == 1:
200
+ ret += '\n\n'
201
+ else:
202
+ ret += role + ':\n'
203
+ return ret
204
+ elif self.sep_style == SeparatorStyle.PHOENIX:
205
+ ret = system_prompt
206
+ for role, message in self.messages:
207
+ if message:
208
+ ret += role + ': ' + '<s>' + message + '</s>'
209
+ else:
210
+ ret += role + ': ' + '<s>'
211
+ return ret
212
+ elif self.sep_style == SeparatorStyle.ROBIN:
213
+ ret = system_prompt + self.sep
214
+ for role, message in self.messages:
215
+ if message:
216
+ ret += role + ':\n' + message + self.sep
217
+ else:
218
+ ret += role + ':\n'
219
+ return ret
220
+ elif self.sep_style == SeparatorStyle.FALCON_CHAT:
221
+ ret = ''
222
+ if self.system_message:
223
+ ret += system_prompt + self.sep
224
+ for role, message in self.messages:
225
+ if message:
226
+ ret += role + ': ' + message + self.sep
227
+ else:
228
+ ret += role + ':'
229
+
230
+ return ret
231
+ elif self.sep_style == SeparatorStyle.INTERNVL_ZH:
232
+ seps = [self.sep2, self.sep]
233
+ ret = self.system_message + seps[0]
234
+ for i, (role, message) in enumerate(self.messages):
235
+ if message:
236
+ ret += role + ': ' + message + seps[i % 2]
237
+ else:
238
+ ret += role + ':'
239
+ return ret
240
+ elif self.sep_style == SeparatorStyle.MPT:
241
+ ret = system_prompt + self.sep
242
+ for role, message in self.messages:
243
+ if message:
244
+ if type(message) is tuple:
245
+ message, _, _ = message
246
+ ret += role + message + self.sep
247
+ else:
248
+ ret += role
249
+ return ret
250
+ else:
251
+ raise ValueError(f'Invalid style: {self.sep_style}')
252
+
253
+ def set_system_message(self, system_message: str):
254
+ """Set the system message."""
255
+ self.system_message = system_message
256
+
257
+ def append_message(self, role: str, message: str):
258
+ """Append a new message."""
259
+ self.messages.append([role, message])
260
+
261
+ def update_last_message(self, message: str):
262
+ """Update the last output.
263
+
264
+ The last message is typically set to be None when constructing the prompt,
265
+ so we need to update it in-place after getting the response from a model.
266
+ """
267
+ self.messages[-1][1] = message
268
+
269
+ def to_gradio_chatbot(self):
270
+ """Convert the conversation to gradio chatbot format."""
271
+ ret = []
272
+ for i, (role, msg) in enumerate(self.messages[self.offset :]):
273
+ if i % 2 == 0:
274
+ ret.append([msg, None])
275
+ else:
276
+ ret[-1][-1] = msg
277
+ return ret
278
+
279
+ def to_openai_api_messages(self):
280
+ """Convert the conversation to OpenAI chat completion format."""
281
+ ret = [{'role': 'system', 'content': self.system_message}]
282
+
283
+ for i, (_, msg) in enumerate(self.messages[self.offset :]):
284
+ if i % 2 == 0:
285
+ ret.append({'role': 'user', 'content': msg})
286
+ else:
287
+ if msg is not None:
288
+ ret.append({'role': 'assistant', 'content': msg})
289
+ return ret
290
+
291
+ def copy(self):
292
+ return Conversation(
293
+ name=self.name,
294
+ system_template=self.system_template,
295
+ system_message=self.system_message,
296
+ roles=self.roles,
297
+ messages=[[x, y] for x, y in self.messages],
298
+ offset=self.offset,
299
+ sep_style=self.sep_style,
300
+ sep=self.sep,
301
+ sep2=self.sep2,
302
+ stop_str=self.stop_str,
303
+ stop_token_ids=self.stop_token_ids,
304
+ )
305
+
306
+ def dict(self):
307
+ return {
308
+ 'template_name': self.name,
309
+ 'system_message': self.system_message,
310
+ 'roles': self.roles,
311
+ 'messages': self.messages,
312
+ 'offset': self.offset,
313
+ }
314
+
315
+
316
+ # A global registry for all conversation templates
317
+ conv_templates: Dict[str, Conversation] = {}
318
+
319
+
320
+ def register_conv_template(template: Conversation, override: bool = False):
321
+ """Register a new conversation template."""
322
+ if not override:
323
+ assert (
324
+ template.name not in conv_templates
325
+ ), f'{template.name} has been registered.'
326
+
327
+ conv_templates[template.name] = template
328
+
329
+
330
+ def get_conv_template(name: str) -> Conversation:
331
+ """Get a conversation template."""
332
+ #print("conv_templates", conv_templates)
333
+ return conv_templates[name].copy()
334
+
335
+ register_conv_template(
336
+ Conversation(
337
+ name='qianfanvl',
338
+ system_template='<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}',
339
+ system_message='你是Qianfan-VL,由百度智能云千帆团队研发的多模态大模型。',
340
+ roles=(
341
+ '<|start_header_id|>user<|end_header_id|>\n\n',
342
+ '<|start_header_id|>assistant<|end_header_id|>\n\n',
343
+ ),
344
+ sep_style=SeparatorStyle.MPT,
345
+ sep='<|eot_id|>',
346
+ )
347
+ )
348
+
349
+ register_conv_template(
350
+ Conversation(
351
+ name='qwen',
352
+ system_template='<|im_start|>system<|im_end|>\n\n{system_message}',
353
+ system_message='你是Qianfan-VL,由百度智能云千帆团队研发的多模态大模型。',
354
+ roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
355
+ sep_style=SeparatorStyle.MPT,
356
+ sep='<|im_end|>',
357
+ )
358
+ )
example/scene_ocr.png ADDED

Git LFS Details

  • SHA256: c4d66dfbe18b3367295a6f19dee42981065cb4d39f4e5bd790570e9c4809fb4f
  • Pointer size: 132 Bytes
  • Size of remote file: 1.17 MB
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "transformers_version": "4.37.2",
4
+ "repetition_penalty": 1.05,
5
+ "temperature": 0.000001
6
+ }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c316574b419d679d54fa2bf2138c4c6ada9b2633214716143b804272b389787c
3
+ size 4980960824
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f9188a265ff77cdbe2b26ab0d3dc05642a32d2707be4259012e546b9d9774fa
3
+ size 4915920528
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8b95f8096a848313af81211a24549d69a71c8040820b66959b96baa545d96c8
3
+ size 4947407000
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6347b44abed47fe0994f0f26c74ee04598101238fe21b4904388106dfdfcbfb7
3
+ size 2772438696
model.safetensors.index.json ADDED
@@ -0,0 +1,676 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 17616644096
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00003-of-00004.safetensors",
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00004.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
16
+ "language_model.model.layers.0.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
17
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
18
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
19
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
20
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
21
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
22
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
23
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
24
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
25
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
26
+ "language_model.model.layers.1.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
27
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
28
+ "language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00004.safetensors",
29
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
30
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
31
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
32
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
33
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
34
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
35
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
36
+ "language_model.model.layers.10.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
37
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
38
+ "language_model.model.layers.11.input_layernorm.weight": "model-00001-of-00004.safetensors",
39
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
40
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
41
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
42
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
43
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
44
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
45
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
46
+ "language_model.model.layers.11.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
47
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
48
+ "language_model.model.layers.12.input_layernorm.weight": "model-00001-of-00004.safetensors",
49
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
50
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
51
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
52
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
53
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
54
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
55
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
56
+ "language_model.model.layers.12.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
57
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
58
+ "language_model.model.layers.13.input_layernorm.weight": "model-00001-of-00004.safetensors",
59
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
60
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
61
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
62
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
63
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
64
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
65
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
66
+ "language_model.model.layers.13.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
67
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
68
+ "language_model.model.layers.14.input_layernorm.weight": "model-00001-of-00004.safetensors",
69
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
70
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
71
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
72
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
73
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
74
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
75
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
76
+ "language_model.model.layers.14.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
77
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
78
+ "language_model.model.layers.15.input_layernorm.weight": "model-00001-of-00004.safetensors",
79
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
80
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
81
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
82
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
83
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
84
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
85
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
86
+ "language_model.model.layers.15.self_attn.rotary_emb.inv_freq": "model-00001-of-00004.safetensors",
87
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
88
+ "language_model.model.layers.16.input_layernorm.weight": "model-00001-of-00004.safetensors",
89
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
90
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
91
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
92
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
93
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
94
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
95
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
96
+ "language_model.model.layers.16.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
97
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
98
+ "language_model.model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
99
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
100
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
101
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
102
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
103
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
104
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
105
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
106
+ "language_model.model.layers.17.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
107
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
108
+ "language_model.model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
109
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
110
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
111
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
112
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
113
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
114
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
115
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
116
+ "language_model.model.layers.18.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
117
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
118
+ "language_model.model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
119
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
120
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
121
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
122
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
123
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
124
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
125
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
126
+ "language_model.model.layers.19.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
127
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
128
+ "language_model.model.layers.2.input_layernorm.weight": "model-00002-of-00004.safetensors",
129
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
130
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
131
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
132
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
133
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
134
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
135
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
136
+ "language_model.model.layers.2.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
137
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
138
+ "language_model.model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
139
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
140
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
141
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
142
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
143
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
144
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
145
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
146
+ "language_model.model.layers.20.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
147
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
148
+ "language_model.model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
149
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
150
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
151
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
152
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
153
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
154
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
155
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
156
+ "language_model.model.layers.21.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
157
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
158
+ "language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors",
159
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
160
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
161
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
162
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
163
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
164
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
165
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
166
+ "language_model.model.layers.22.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
167
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
168
+ "language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00004.safetensors",
169
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
170
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
171
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
172
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
173
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
174
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
175
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
176
+ "language_model.model.layers.23.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
177
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
178
+ "language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00004.safetensors",
179
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
180
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
181
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
182
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
183
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
184
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
185
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
186
+ "language_model.model.layers.24.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
187
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
188
+ "language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00004.safetensors",
189
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
190
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
191
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
192
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
193
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
194
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
195
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
196
+ "language_model.model.layers.25.self_attn.rotary_emb.inv_freq": "model-00002-of-00004.safetensors",
197
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
198
+ "language_model.model.layers.26.input_layernorm.weight": "model-00002-of-00004.safetensors",
199
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
200
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
201
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
202
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
203
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
204
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
205
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
206
+ "language_model.model.layers.26.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
207
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
208
+ "language_model.model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
209
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
210
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
211
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
212
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
213
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
214
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
215
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
216
+ "language_model.model.layers.27.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
217
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
218
+ "language_model.model.layers.28.input_layernorm.weight": "model-00004-of-00004.safetensors",
219
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
220
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
221
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
222
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
223
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
224
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
225
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
226
+ "language_model.model.layers.28.self_attn.rotary_emb.inv_freq": "model-00004-of-00004.safetensors",
227
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
228
+ "language_model.model.layers.29.input_layernorm.weight": "model-00004-of-00004.safetensors",
229
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
230
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
231
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
232
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
233
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
234
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
235
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
236
+ "language_model.model.layers.29.self_attn.rotary_emb.inv_freq": "model-00004-of-00004.safetensors",
237
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
238
+ "language_model.model.layers.3.input_layernorm.weight": "model-00003-of-00004.safetensors",
239
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
240
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
241
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
242
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
243
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
244
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
245
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
246
+ "language_model.model.layers.3.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
247
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
248
+ "language_model.model.layers.30.input_layernorm.weight": "model-00004-of-00004.safetensors",
249
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
250
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
251
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
252
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
253
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
254
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
255
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
256
+ "language_model.model.layers.30.self_attn.rotary_emb.inv_freq": "model-00004-of-00004.safetensors",
257
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
258
+ "language_model.model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
259
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
260
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
261
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
262
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
263
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
264
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
265
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
266
+ "language_model.model.layers.31.self_attn.rotary_emb.inv_freq": "model-00004-of-00004.safetensors",
267
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
268
+ "language_model.model.layers.4.input_layernorm.weight": "model-00003-of-00004.safetensors",
269
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
270
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
271
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
272
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
273
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
274
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
275
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
276
+ "language_model.model.layers.4.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
277
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
278
+ "language_model.model.layers.5.input_layernorm.weight": "model-00003-of-00004.safetensors",
279
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
280
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
281
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
282
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
283
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
284
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
285
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
286
+ "language_model.model.layers.5.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
287
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
288
+ "language_model.model.layers.6.input_layernorm.weight": "model-00003-of-00004.safetensors",
289
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
290
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
291
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
292
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
293
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
294
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
295
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
296
+ "language_model.model.layers.6.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
297
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
298
+ "language_model.model.layers.7.input_layernorm.weight": "model-00003-of-00004.safetensors",
299
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
300
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
301
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
302
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
303
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
304
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
305
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
306
+ "language_model.model.layers.7.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
307
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
308
+ "language_model.model.layers.8.input_layernorm.weight": "model-00003-of-00004.safetensors",
309
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
310
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
311
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
312
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
313
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
314
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
315
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
316
+ "language_model.model.layers.8.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
317
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
318
+ "language_model.model.layers.9.input_layernorm.weight": "model-00003-of-00004.safetensors",
319
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
320
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
321
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
322
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
323
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
324
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
325
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
326
+ "language_model.model.layers.9.self_attn.rotary_emb.inv_freq": "model-00003-of-00004.safetensors",
327
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
328
+ "language_model.model.norm.weight": "model-00004-of-00004.safetensors",
329
+ "mlp1.0.bias": "model-00004-of-00004.safetensors",
330
+ "mlp1.0.weight": "model-00004-of-00004.safetensors",
331
+ "mlp1.1.bias": "model-00004-of-00004.safetensors",
332
+ "mlp1.1.weight": "model-00004-of-00004.safetensors",
333
+ "mlp1.3.bias": "model-00004-of-00004.safetensors",
334
+ "mlp1.3.weight": "model-00004-of-00004.safetensors",
335
+ "vision_model.embeddings.class_embedding": "model-00004-of-00004.safetensors",
336
+ "vision_model.embeddings.patch_embedding.bias": "model-00004-of-00004.safetensors",
337
+ "vision_model.embeddings.patch_embedding.weight": "model-00004-of-00004.safetensors",
338
+ "vision_model.embeddings.position_embedding": "model-00004-of-00004.safetensors",
339
+ "vision_model.encoder.layers.0.attn.proj.bias": "model-00004-of-00004.safetensors",
340
+ "vision_model.encoder.layers.0.attn.proj.weight": "model-00004-of-00004.safetensors",
341
+ "vision_model.encoder.layers.0.attn.qkv.bias": "model-00004-of-00004.safetensors",
342
+ "vision_model.encoder.layers.0.attn.qkv.weight": "model-00004-of-00004.safetensors",
343
+ "vision_model.encoder.layers.0.ls1": "model-00004-of-00004.safetensors",
344
+ "vision_model.encoder.layers.0.ls2": "model-00004-of-00004.safetensors",
345
+ "vision_model.encoder.layers.0.mlp.fc1.bias": "model-00004-of-00004.safetensors",
346
+ "vision_model.encoder.layers.0.mlp.fc1.weight": "model-00004-of-00004.safetensors",
347
+ "vision_model.encoder.layers.0.mlp.fc2.bias": "model-00004-of-00004.safetensors",
348
+ "vision_model.encoder.layers.0.mlp.fc2.weight": "model-00004-of-00004.safetensors",
349
+ "vision_model.encoder.layers.0.norm1.bias": "model-00004-of-00004.safetensors",
350
+ "vision_model.encoder.layers.0.norm1.weight": "model-00004-of-00004.safetensors",
351
+ "vision_model.encoder.layers.0.norm2.bias": "model-00004-of-00004.safetensors",
352
+ "vision_model.encoder.layers.0.norm2.weight": "model-00004-of-00004.safetensors",
353
+ "vision_model.encoder.layers.1.attn.proj.bias": "model-00004-of-00004.safetensors",
354
+ "vision_model.encoder.layers.1.attn.proj.weight": "model-00004-of-00004.safetensors",
355
+ "vision_model.encoder.layers.1.attn.qkv.bias": "model-00004-of-00004.safetensors",
356
+ "vision_model.encoder.layers.1.attn.qkv.weight": "model-00004-of-00004.safetensors",
357
+ "vision_model.encoder.layers.1.ls1": "model-00004-of-00004.safetensors",
358
+ "vision_model.encoder.layers.1.ls2": "model-00004-of-00004.safetensors",
359
+ "vision_model.encoder.layers.1.mlp.fc1.bias": "model-00004-of-00004.safetensors",
360
+ "vision_model.encoder.layers.1.mlp.fc1.weight": "model-00004-of-00004.safetensors",
361
+ "vision_model.encoder.layers.1.mlp.fc2.bias": "model-00004-of-00004.safetensors",
362
+ "vision_model.encoder.layers.1.mlp.fc2.weight": "model-00004-of-00004.safetensors",
363
+ "vision_model.encoder.layers.1.norm1.bias": "model-00004-of-00004.safetensors",
364
+ "vision_model.encoder.layers.1.norm1.weight": "model-00004-of-00004.safetensors",
365
+ "vision_model.encoder.layers.1.norm2.bias": "model-00004-of-00004.safetensors",
366
+ "vision_model.encoder.layers.1.norm2.weight": "model-00004-of-00004.safetensors",
367
+ "vision_model.encoder.layers.10.attn.proj.bias": "model-00004-of-00004.safetensors",
368
+ "vision_model.encoder.layers.10.attn.proj.weight": "model-00004-of-00004.safetensors",
369
+ "vision_model.encoder.layers.10.attn.qkv.bias": "model-00004-of-00004.safetensors",
370
+ "vision_model.encoder.layers.10.attn.qkv.weight": "model-00004-of-00004.safetensors",
371
+ "vision_model.encoder.layers.10.ls1": "model-00004-of-00004.safetensors",
372
+ "vision_model.encoder.layers.10.ls2": "model-00004-of-00004.safetensors",
373
+ "vision_model.encoder.layers.10.mlp.fc1.bias": "model-00004-of-00004.safetensors",
374
+ "vision_model.encoder.layers.10.mlp.fc1.weight": "model-00004-of-00004.safetensors",
375
+ "vision_model.encoder.layers.10.mlp.fc2.bias": "model-00004-of-00004.safetensors",
376
+ "vision_model.encoder.layers.10.mlp.fc2.weight": "model-00004-of-00004.safetensors",
377
+ "vision_model.encoder.layers.10.norm1.bias": "model-00004-of-00004.safetensors",
378
+ "vision_model.encoder.layers.10.norm1.weight": "model-00004-of-00004.safetensors",
379
+ "vision_model.encoder.layers.10.norm2.bias": "model-00004-of-00004.safetensors",
380
+ "vision_model.encoder.layers.10.norm2.weight": "model-00004-of-00004.safetensors",
381
+ "vision_model.encoder.layers.11.attn.proj.bias": "model-00004-of-00004.safetensors",
382
+ "vision_model.encoder.layers.11.attn.proj.weight": "model-00004-of-00004.safetensors",
383
+ "vision_model.encoder.layers.11.attn.qkv.bias": "model-00004-of-00004.safetensors",
384
+ "vision_model.encoder.layers.11.attn.qkv.weight": "model-00004-of-00004.safetensors",
385
+ "vision_model.encoder.layers.11.ls1": "model-00004-of-00004.safetensors",
386
+ "vision_model.encoder.layers.11.ls2": "model-00004-of-00004.safetensors",
387
+ "vision_model.encoder.layers.11.mlp.fc1.bias": "model-00004-of-00004.safetensors",
388
+ "vision_model.encoder.layers.11.mlp.fc1.weight": "model-00004-of-00004.safetensors",
389
+ "vision_model.encoder.layers.11.mlp.fc2.bias": "model-00004-of-00004.safetensors",
390
+ "vision_model.encoder.layers.11.mlp.fc2.weight": "model-00004-of-00004.safetensors",
391
+ "vision_model.encoder.layers.11.norm1.bias": "model-00004-of-00004.safetensors",
392
+ "vision_model.encoder.layers.11.norm1.weight": "model-00004-of-00004.safetensors",
393
+ "vision_model.encoder.layers.11.norm2.bias": "model-00004-of-00004.safetensors",
394
+ "vision_model.encoder.layers.11.norm2.weight": "model-00004-of-00004.safetensors",
395
+ "vision_model.encoder.layers.12.attn.proj.bias": "model-00004-of-00004.safetensors",
396
+ "vision_model.encoder.layers.12.attn.proj.weight": "model-00004-of-00004.safetensors",
397
+ "vision_model.encoder.layers.12.attn.qkv.bias": "model-00004-of-00004.safetensors",
398
+ "vision_model.encoder.layers.12.attn.qkv.weight": "model-00004-of-00004.safetensors",
399
+ "vision_model.encoder.layers.12.ls1": "model-00004-of-00004.safetensors",
400
+ "vision_model.encoder.layers.12.ls2": "model-00004-of-00004.safetensors",
401
+ "vision_model.encoder.layers.12.mlp.fc1.bias": "model-00004-of-00004.safetensors",
402
+ "vision_model.encoder.layers.12.mlp.fc1.weight": "model-00004-of-00004.safetensors",
403
+ "vision_model.encoder.layers.12.mlp.fc2.bias": "model-00004-of-00004.safetensors",
404
+ "vision_model.encoder.layers.12.mlp.fc2.weight": "model-00004-of-00004.safetensors",
405
+ "vision_model.encoder.layers.12.norm1.bias": "model-00004-of-00004.safetensors",
406
+ "vision_model.encoder.layers.12.norm1.weight": "model-00004-of-00004.safetensors",
407
+ "vision_model.encoder.layers.12.norm2.bias": "model-00004-of-00004.safetensors",
408
+ "vision_model.encoder.layers.12.norm2.weight": "model-00004-of-00004.safetensors",
409
+ "vision_model.encoder.layers.13.attn.proj.bias": "model-00004-of-00004.safetensors",
410
+ "vision_model.encoder.layers.13.attn.proj.weight": "model-00004-of-00004.safetensors",
411
+ "vision_model.encoder.layers.13.attn.qkv.bias": "model-00004-of-00004.safetensors",
412
+ "vision_model.encoder.layers.13.attn.qkv.weight": "model-00004-of-00004.safetensors",
413
+ "vision_model.encoder.layers.13.ls1": "model-00004-of-00004.safetensors",
414
+ "vision_model.encoder.layers.13.ls2": "model-00004-of-00004.safetensors",
415
+ "vision_model.encoder.layers.13.mlp.fc1.bias": "model-00004-of-00004.safetensors",
416
+ "vision_model.encoder.layers.13.mlp.fc1.weight": "model-00004-of-00004.safetensors",
417
+ "vision_model.encoder.layers.13.mlp.fc2.bias": "model-00004-of-00004.safetensors",
418
+ "vision_model.encoder.layers.13.mlp.fc2.weight": "model-00004-of-00004.safetensors",
419
+ "vision_model.encoder.layers.13.norm1.bias": "model-00004-of-00004.safetensors",
420
+ "vision_model.encoder.layers.13.norm1.weight": "model-00004-of-00004.safetensors",
421
+ "vision_model.encoder.layers.13.norm2.bias": "model-00004-of-00004.safetensors",
422
+ "vision_model.encoder.layers.13.norm2.weight": "model-00004-of-00004.safetensors",
423
+ "vision_model.encoder.layers.14.attn.proj.bias": "model-00004-of-00004.safetensors",
424
+ "vision_model.encoder.layers.14.attn.proj.weight": "model-00004-of-00004.safetensors",
425
+ "vision_model.encoder.layers.14.attn.qkv.bias": "model-00004-of-00004.safetensors",
426
+ "vision_model.encoder.layers.14.attn.qkv.weight": "model-00004-of-00004.safetensors",
427
+ "vision_model.encoder.layers.14.ls1": "model-00004-of-00004.safetensors",
428
+ "vision_model.encoder.layers.14.ls2": "model-00004-of-00004.safetensors",
429
+ "vision_model.encoder.layers.14.mlp.fc1.bias": "model-00004-of-00004.safetensors",
430
+ "vision_model.encoder.layers.14.mlp.fc1.weight": "model-00004-of-00004.safetensors",
431
+ "vision_model.encoder.layers.14.mlp.fc2.bias": "model-00004-of-00004.safetensors",
432
+ "vision_model.encoder.layers.14.mlp.fc2.weight": "model-00004-of-00004.safetensors",
433
+ "vision_model.encoder.layers.14.norm1.bias": "model-00004-of-00004.safetensors",
434
+ "vision_model.encoder.layers.14.norm1.weight": "model-00004-of-00004.safetensors",
435
+ "vision_model.encoder.layers.14.norm2.bias": "model-00004-of-00004.safetensors",
436
+ "vision_model.encoder.layers.14.norm2.weight": "model-00004-of-00004.safetensors",
437
+ "vision_model.encoder.layers.15.attn.proj.bias": "model-00004-of-00004.safetensors",
438
+ "vision_model.encoder.layers.15.attn.proj.weight": "model-00004-of-00004.safetensors",
439
+ "vision_model.encoder.layers.15.attn.qkv.bias": "model-00004-of-00004.safetensors",
440
+ "vision_model.encoder.layers.15.attn.qkv.weight": "model-00004-of-00004.safetensors",
441
+ "vision_model.encoder.layers.15.ls1": "model-00004-of-00004.safetensors",
442
+ "vision_model.encoder.layers.15.ls2": "model-00004-of-00004.safetensors",
443
+ "vision_model.encoder.layers.15.mlp.fc1.bias": "model-00004-of-00004.safetensors",
444
+ "vision_model.encoder.layers.15.mlp.fc1.weight": "model-00004-of-00004.safetensors",
445
+ "vision_model.encoder.layers.15.mlp.fc2.bias": "model-00004-of-00004.safetensors",
446
+ "vision_model.encoder.layers.15.mlp.fc2.weight": "model-00004-of-00004.safetensors",
447
+ "vision_model.encoder.layers.15.norm1.bias": "model-00004-of-00004.safetensors",
448
+ "vision_model.encoder.layers.15.norm1.weight": "model-00004-of-00004.safetensors",
449
+ "vision_model.encoder.layers.15.norm2.bias": "model-00004-of-00004.safetensors",
450
+ "vision_model.encoder.layers.15.norm2.weight": "model-00004-of-00004.safetensors",
451
+ "vision_model.encoder.layers.16.attn.proj.bias": "model-00004-of-00004.safetensors",
452
+ "vision_model.encoder.layers.16.attn.proj.weight": "model-00004-of-00004.safetensors",
453
+ "vision_model.encoder.layers.16.attn.qkv.bias": "model-00004-of-00004.safetensors",
454
+ "vision_model.encoder.layers.16.attn.qkv.weight": "model-00004-of-00004.safetensors",
455
+ "vision_model.encoder.layers.16.ls1": "model-00004-of-00004.safetensors",
456
+ "vision_model.encoder.layers.16.ls2": "model-00004-of-00004.safetensors",
457
+ "vision_model.encoder.layers.16.mlp.fc1.bias": "model-00004-of-00004.safetensors",
458
+ "vision_model.encoder.layers.16.mlp.fc1.weight": "model-00004-of-00004.safetensors",
459
+ "vision_model.encoder.layers.16.mlp.fc2.bias": "model-00004-of-00004.safetensors",
460
+ "vision_model.encoder.layers.16.mlp.fc2.weight": "model-00004-of-00004.safetensors",
461
+ "vision_model.encoder.layers.16.norm1.bias": "model-00004-of-00004.safetensors",
462
+ "vision_model.encoder.layers.16.norm1.weight": "model-00004-of-00004.safetensors",
463
+ "vision_model.encoder.layers.16.norm2.bias": "model-00004-of-00004.safetensors",
464
+ "vision_model.encoder.layers.16.norm2.weight": "model-00004-of-00004.safetensors",
465
+ "vision_model.encoder.layers.17.attn.proj.bias": "model-00004-of-00004.safetensors",
466
+ "vision_model.encoder.layers.17.attn.proj.weight": "model-00004-of-00004.safetensors",
467
+ "vision_model.encoder.layers.17.attn.qkv.bias": "model-00004-of-00004.safetensors",
468
+ "vision_model.encoder.layers.17.attn.qkv.weight": "model-00004-of-00004.safetensors",
469
+ "vision_model.encoder.layers.17.ls1": "model-00004-of-00004.safetensors",
470
+ "vision_model.encoder.layers.17.ls2": "model-00004-of-00004.safetensors",
471
+ "vision_model.encoder.layers.17.mlp.fc1.bias": "model-00004-of-00004.safetensors",
472
+ "vision_model.encoder.layers.17.mlp.fc1.weight": "model-00004-of-00004.safetensors",
473
+ "vision_model.encoder.layers.17.mlp.fc2.bias": "model-00004-of-00004.safetensors",
474
+ "vision_model.encoder.layers.17.mlp.fc2.weight": "model-00004-of-00004.safetensors",
475
+ "vision_model.encoder.layers.17.norm1.bias": "model-00004-of-00004.safetensors",
476
+ "vision_model.encoder.layers.17.norm1.weight": "model-00004-of-00004.safetensors",
477
+ "vision_model.encoder.layers.17.norm2.bias": "model-00004-of-00004.safetensors",
478
+ "vision_model.encoder.layers.17.norm2.weight": "model-00004-of-00004.safetensors",
479
+ "vision_model.encoder.layers.18.attn.proj.bias": "model-00004-of-00004.safetensors",
480
+ "vision_model.encoder.layers.18.attn.proj.weight": "model-00004-of-00004.safetensors",
481
+ "vision_model.encoder.layers.18.attn.qkv.bias": "model-00004-of-00004.safetensors",
482
+ "vision_model.encoder.layers.18.attn.qkv.weight": "model-00004-of-00004.safetensors",
483
+ "vision_model.encoder.layers.18.ls1": "model-00004-of-00004.safetensors",
484
+ "vision_model.encoder.layers.18.ls2": "model-00004-of-00004.safetensors",
485
+ "vision_model.encoder.layers.18.mlp.fc1.bias": "model-00004-of-00004.safetensors",
486
+ "vision_model.encoder.layers.18.mlp.fc1.weight": "model-00004-of-00004.safetensors",
487
+ "vision_model.encoder.layers.18.mlp.fc2.bias": "model-00004-of-00004.safetensors",
488
+ "vision_model.encoder.layers.18.mlp.fc2.weight": "model-00004-of-00004.safetensors",
489
+ "vision_model.encoder.layers.18.norm1.bias": "model-00004-of-00004.safetensors",
490
+ "vision_model.encoder.layers.18.norm1.weight": "model-00004-of-00004.safetensors",
491
+ "vision_model.encoder.layers.18.norm2.bias": "model-00004-of-00004.safetensors",
492
+ "vision_model.encoder.layers.18.norm2.weight": "model-00004-of-00004.safetensors",
493
+ "vision_model.encoder.layers.19.attn.proj.bias": "model-00004-of-00004.safetensors",
494
+ "vision_model.encoder.layers.19.attn.proj.weight": "model-00004-of-00004.safetensors",
495
+ "vision_model.encoder.layers.19.attn.qkv.bias": "model-00004-of-00004.safetensors",
496
+ "vision_model.encoder.layers.19.attn.qkv.weight": "model-00004-of-00004.safetensors",
497
+ "vision_model.encoder.layers.19.ls1": "model-00004-of-00004.safetensors",
498
+ "vision_model.encoder.layers.19.ls2": "model-00004-of-00004.safetensors",
499
+ "vision_model.encoder.layers.19.mlp.fc1.bias": "model-00004-of-00004.safetensors",
500
+ "vision_model.encoder.layers.19.mlp.fc1.weight": "model-00004-of-00004.safetensors",
501
+ "vision_model.encoder.layers.19.mlp.fc2.bias": "model-00004-of-00004.safetensors",
502
+ "vision_model.encoder.layers.19.mlp.fc2.weight": "model-00004-of-00004.safetensors",
503
+ "vision_model.encoder.layers.19.norm1.bias": "model-00004-of-00004.safetensors",
504
+ "vision_model.encoder.layers.19.norm1.weight": "model-00004-of-00004.safetensors",
505
+ "vision_model.encoder.layers.19.norm2.bias": "model-00004-of-00004.safetensors",
506
+ "vision_model.encoder.layers.19.norm2.weight": "model-00004-of-00004.safetensors",
507
+ "vision_model.encoder.layers.2.attn.proj.bias": "model-00004-of-00004.safetensors",
508
+ "vision_model.encoder.layers.2.attn.proj.weight": "model-00004-of-00004.safetensors",
509
+ "vision_model.encoder.layers.2.attn.qkv.bias": "model-00004-of-00004.safetensors",
510
+ "vision_model.encoder.layers.2.attn.qkv.weight": "model-00004-of-00004.safetensors",
511
+ "vision_model.encoder.layers.2.ls1": "model-00004-of-00004.safetensors",
512
+ "vision_model.encoder.layers.2.ls2": "model-00004-of-00004.safetensors",
513
+ "vision_model.encoder.layers.2.mlp.fc1.bias": "model-00004-of-00004.safetensors",
514
+ "vision_model.encoder.layers.2.mlp.fc1.weight": "model-00004-of-00004.safetensors",
515
+ "vision_model.encoder.layers.2.mlp.fc2.bias": "model-00004-of-00004.safetensors",
516
+ "vision_model.encoder.layers.2.mlp.fc2.weight": "model-00004-of-00004.safetensors",
517
+ "vision_model.encoder.layers.2.norm1.bias": "model-00004-of-00004.safetensors",
518
+ "vision_model.encoder.layers.2.norm1.weight": "model-00004-of-00004.safetensors",
519
+ "vision_model.encoder.layers.2.norm2.bias": "model-00004-of-00004.safetensors",
520
+ "vision_model.encoder.layers.2.norm2.weight": "model-00004-of-00004.safetensors",
521
+ "vision_model.encoder.layers.20.attn.proj.bias": "model-00004-of-00004.safetensors",
522
+ "vision_model.encoder.layers.20.attn.proj.weight": "model-00004-of-00004.safetensors",
523
+ "vision_model.encoder.layers.20.attn.qkv.bias": "model-00004-of-00004.safetensors",
524
+ "vision_model.encoder.layers.20.attn.qkv.weight": "model-00004-of-00004.safetensors",
525
+ "vision_model.encoder.layers.20.ls1": "model-00004-of-00004.safetensors",
526
+ "vision_model.encoder.layers.20.ls2": "model-00004-of-00004.safetensors",
527
+ "vision_model.encoder.layers.20.mlp.fc1.bias": "model-00004-of-00004.safetensors",
528
+ "vision_model.encoder.layers.20.mlp.fc1.weight": "model-00004-of-00004.safetensors",
529
+ "vision_model.encoder.layers.20.mlp.fc2.bias": "model-00004-of-00004.safetensors",
530
+ "vision_model.encoder.layers.20.mlp.fc2.weight": "model-00004-of-00004.safetensors",
531
+ "vision_model.encoder.layers.20.norm1.bias": "model-00004-of-00004.safetensors",
532
+ "vision_model.encoder.layers.20.norm1.weight": "model-00004-of-00004.safetensors",
533
+ "vision_model.encoder.layers.20.norm2.bias": "model-00004-of-00004.safetensors",
534
+ "vision_model.encoder.layers.20.norm2.weight": "model-00004-of-00004.safetensors",
535
+ "vision_model.encoder.layers.21.attn.proj.bias": "model-00004-of-00004.safetensors",
536
+ "vision_model.encoder.layers.21.attn.proj.weight": "model-00004-of-00004.safetensors",
537
+ "vision_model.encoder.layers.21.attn.qkv.bias": "model-00004-of-00004.safetensors",
538
+ "vision_model.encoder.layers.21.attn.qkv.weight": "model-00004-of-00004.safetensors",
539
+ "vision_model.encoder.layers.21.ls1": "model-00004-of-00004.safetensors",
540
+ "vision_model.encoder.layers.21.ls2": "model-00004-of-00004.safetensors",
541
+ "vision_model.encoder.layers.21.mlp.fc1.bias": "model-00004-of-00004.safetensors",
542
+ "vision_model.encoder.layers.21.mlp.fc1.weight": "model-00004-of-00004.safetensors",
543
+ "vision_model.encoder.layers.21.mlp.fc2.bias": "model-00004-of-00004.safetensors",
544
+ "vision_model.encoder.layers.21.mlp.fc2.weight": "model-00004-of-00004.safetensors",
545
+ "vision_model.encoder.layers.21.norm1.bias": "model-00004-of-00004.safetensors",
546
+ "vision_model.encoder.layers.21.norm1.weight": "model-00004-of-00004.safetensors",
547
+ "vision_model.encoder.layers.21.norm2.bias": "model-00004-of-00004.safetensors",
548
+ "vision_model.encoder.layers.21.norm2.weight": "model-00004-of-00004.safetensors",
549
+ "vision_model.encoder.layers.22.attn.proj.bias": "model-00004-of-00004.safetensors",
550
+ "vision_model.encoder.layers.22.attn.proj.weight": "model-00004-of-00004.safetensors",
551
+ "vision_model.encoder.layers.22.attn.qkv.bias": "model-00004-of-00004.safetensors",
552
+ "vision_model.encoder.layers.22.attn.qkv.weight": "model-00004-of-00004.safetensors",
553
+ "vision_model.encoder.layers.22.ls1": "model-00004-of-00004.safetensors",
554
+ "vision_model.encoder.layers.22.ls2": "model-00004-of-00004.safetensors",
555
+ "vision_model.encoder.layers.22.mlp.fc1.bias": "model-00004-of-00004.safetensors",
556
+ "vision_model.encoder.layers.22.mlp.fc1.weight": "model-00004-of-00004.safetensors",
557
+ "vision_model.encoder.layers.22.mlp.fc2.bias": "model-00004-of-00004.safetensors",
558
+ "vision_model.encoder.layers.22.mlp.fc2.weight": "model-00004-of-00004.safetensors",
559
+ "vision_model.encoder.layers.22.norm1.bias": "model-00004-of-00004.safetensors",
560
+ "vision_model.encoder.layers.22.norm1.weight": "model-00004-of-00004.safetensors",
561
+ "vision_model.encoder.layers.22.norm2.bias": "model-00004-of-00004.safetensors",
562
+ "vision_model.encoder.layers.22.norm2.weight": "model-00004-of-00004.safetensors",
563
+ "vision_model.encoder.layers.23.attn.proj.bias": "model-00004-of-00004.safetensors",
564
+ "vision_model.encoder.layers.23.attn.proj.weight": "model-00004-of-00004.safetensors",
565
+ "vision_model.encoder.layers.23.attn.qkv.bias": "model-00004-of-00004.safetensors",
566
+ "vision_model.encoder.layers.23.attn.qkv.weight": "model-00004-of-00004.safetensors",
567
+ "vision_model.encoder.layers.23.ls1": "model-00004-of-00004.safetensors",
568
+ "vision_model.encoder.layers.23.ls2": "model-00004-of-00004.safetensors",
569
+ "vision_model.encoder.layers.23.mlp.fc1.bias": "model-00004-of-00004.safetensors",
570
+ "vision_model.encoder.layers.23.mlp.fc1.weight": "model-00004-of-00004.safetensors",
571
+ "vision_model.encoder.layers.23.mlp.fc2.bias": "model-00004-of-00004.safetensors",
572
+ "vision_model.encoder.layers.23.mlp.fc2.weight": "model-00004-of-00004.safetensors",
573
+ "vision_model.encoder.layers.23.norm1.bias": "model-00004-of-00004.safetensors",
574
+ "vision_model.encoder.layers.23.norm1.weight": "model-00004-of-00004.safetensors",
575
+ "vision_model.encoder.layers.23.norm2.bias": "model-00004-of-00004.safetensors",
576
+ "vision_model.encoder.layers.23.norm2.weight": "model-00004-of-00004.safetensors",
577
+ "vision_model.encoder.layers.3.attn.proj.bias": "model-00004-of-00004.safetensors",
578
+ "vision_model.encoder.layers.3.attn.proj.weight": "model-00004-of-00004.safetensors",
579
+ "vision_model.encoder.layers.3.attn.qkv.bias": "model-00004-of-00004.safetensors",
580
+ "vision_model.encoder.layers.3.attn.qkv.weight": "model-00004-of-00004.safetensors",
581
+ "vision_model.encoder.layers.3.ls1": "model-00004-of-00004.safetensors",
582
+ "vision_model.encoder.layers.3.ls2": "model-00004-of-00004.safetensors",
583
+ "vision_model.encoder.layers.3.mlp.fc1.bias": "model-00004-of-00004.safetensors",
584
+ "vision_model.encoder.layers.3.mlp.fc1.weight": "model-00004-of-00004.safetensors",
585
+ "vision_model.encoder.layers.3.mlp.fc2.bias": "model-00004-of-00004.safetensors",
586
+ "vision_model.encoder.layers.3.mlp.fc2.weight": "model-00004-of-00004.safetensors",
587
+ "vision_model.encoder.layers.3.norm1.bias": "model-00004-of-00004.safetensors",
588
+ "vision_model.encoder.layers.3.norm1.weight": "model-00004-of-00004.safetensors",
589
+ "vision_model.encoder.layers.3.norm2.bias": "model-00004-of-00004.safetensors",
590
+ "vision_model.encoder.layers.3.norm2.weight": "model-00004-of-00004.safetensors",
591
+ "vision_model.encoder.layers.4.attn.proj.bias": "model-00004-of-00004.safetensors",
592
+ "vision_model.encoder.layers.4.attn.proj.weight": "model-00004-of-00004.safetensors",
593
+ "vision_model.encoder.layers.4.attn.qkv.bias": "model-00004-of-00004.safetensors",
594
+ "vision_model.encoder.layers.4.attn.qkv.weight": "model-00004-of-00004.safetensors",
595
+ "vision_model.encoder.layers.4.ls1": "model-00004-of-00004.safetensors",
596
+ "vision_model.encoder.layers.4.ls2": "model-00004-of-00004.safetensors",
597
+ "vision_model.encoder.layers.4.mlp.fc1.bias": "model-00004-of-00004.safetensors",
598
+ "vision_model.encoder.layers.4.mlp.fc1.weight": "model-00004-of-00004.safetensors",
599
+ "vision_model.encoder.layers.4.mlp.fc2.bias": "model-00004-of-00004.safetensors",
600
+ "vision_model.encoder.layers.4.mlp.fc2.weight": "model-00004-of-00004.safetensors",
601
+ "vision_model.encoder.layers.4.norm1.bias": "model-00004-of-00004.safetensors",
602
+ "vision_model.encoder.layers.4.norm1.weight": "model-00004-of-00004.safetensors",
603
+ "vision_model.encoder.layers.4.norm2.bias": "model-00004-of-00004.safetensors",
604
+ "vision_model.encoder.layers.4.norm2.weight": "model-00004-of-00004.safetensors",
605
+ "vision_model.encoder.layers.5.attn.proj.bias": "model-00004-of-00004.safetensors",
606
+ "vision_model.encoder.layers.5.attn.proj.weight": "model-00004-of-00004.safetensors",
607
+ "vision_model.encoder.layers.5.attn.qkv.bias": "model-00004-of-00004.safetensors",
608
+ "vision_model.encoder.layers.5.attn.qkv.weight": "model-00004-of-00004.safetensors",
609
+ "vision_model.encoder.layers.5.ls1": "model-00004-of-00004.safetensors",
610
+ "vision_model.encoder.layers.5.ls2": "model-00004-of-00004.safetensors",
611
+ "vision_model.encoder.layers.5.mlp.fc1.bias": "model-00004-of-00004.safetensors",
612
+ "vision_model.encoder.layers.5.mlp.fc1.weight": "model-00004-of-00004.safetensors",
613
+ "vision_model.encoder.layers.5.mlp.fc2.bias": "model-00004-of-00004.safetensors",
614
+ "vision_model.encoder.layers.5.mlp.fc2.weight": "model-00004-of-00004.safetensors",
615
+ "vision_model.encoder.layers.5.norm1.bias": "model-00004-of-00004.safetensors",
616
+ "vision_model.encoder.layers.5.norm1.weight": "model-00004-of-00004.safetensors",
617
+ "vision_model.encoder.layers.5.norm2.bias": "model-00004-of-00004.safetensors",
618
+ "vision_model.encoder.layers.5.norm2.weight": "model-00004-of-00004.safetensors",
619
+ "vision_model.encoder.layers.6.attn.proj.bias": "model-00004-of-00004.safetensors",
620
+ "vision_model.encoder.layers.6.attn.proj.weight": "model-00004-of-00004.safetensors",
621
+ "vision_model.encoder.layers.6.attn.qkv.bias": "model-00004-of-00004.safetensors",
622
+ "vision_model.encoder.layers.6.attn.qkv.weight": "model-00004-of-00004.safetensors",
623
+ "vision_model.encoder.layers.6.ls1": "model-00004-of-00004.safetensors",
624
+ "vision_model.encoder.layers.6.ls2": "model-00004-of-00004.safetensors",
625
+ "vision_model.encoder.layers.6.mlp.fc1.bias": "model-00004-of-00004.safetensors",
626
+ "vision_model.encoder.layers.6.mlp.fc1.weight": "model-00004-of-00004.safetensors",
627
+ "vision_model.encoder.layers.6.mlp.fc2.bias": "model-00004-of-00004.safetensors",
628
+ "vision_model.encoder.layers.6.mlp.fc2.weight": "model-00004-of-00004.safetensors",
629
+ "vision_model.encoder.layers.6.norm1.bias": "model-00004-of-00004.safetensors",
630
+ "vision_model.encoder.layers.6.norm1.weight": "model-00004-of-00004.safetensors",
631
+ "vision_model.encoder.layers.6.norm2.bias": "model-00004-of-00004.safetensors",
632
+ "vision_model.encoder.layers.6.norm2.weight": "model-00004-of-00004.safetensors",
633
+ "vision_model.encoder.layers.7.attn.proj.bias": "model-00004-of-00004.safetensors",
634
+ "vision_model.encoder.layers.7.attn.proj.weight": "model-00004-of-00004.safetensors",
635
+ "vision_model.encoder.layers.7.attn.qkv.bias": "model-00004-of-00004.safetensors",
636
+ "vision_model.encoder.layers.7.attn.qkv.weight": "model-00004-of-00004.safetensors",
637
+ "vision_model.encoder.layers.7.ls1": "model-00004-of-00004.safetensors",
638
+ "vision_model.encoder.layers.7.ls2": "model-00004-of-00004.safetensors",
639
+ "vision_model.encoder.layers.7.mlp.fc1.bias": "model-00004-of-00004.safetensors",
640
+ "vision_model.encoder.layers.7.mlp.fc1.weight": "model-00004-of-00004.safetensors",
641
+ "vision_model.encoder.layers.7.mlp.fc2.bias": "model-00004-of-00004.safetensors",
642
+ "vision_model.encoder.layers.7.mlp.fc2.weight": "model-00004-of-00004.safetensors",
643
+ "vision_model.encoder.layers.7.norm1.bias": "model-00004-of-00004.safetensors",
644
+ "vision_model.encoder.layers.7.norm1.weight": "model-00004-of-00004.safetensors",
645
+ "vision_model.encoder.layers.7.norm2.bias": "model-00004-of-00004.safetensors",
646
+ "vision_model.encoder.layers.7.norm2.weight": "model-00004-of-00004.safetensors",
647
+ "vision_model.encoder.layers.8.attn.proj.bias": "model-00004-of-00004.safetensors",
648
+ "vision_model.encoder.layers.8.attn.proj.weight": "model-00004-of-00004.safetensors",
649
+ "vision_model.encoder.layers.8.attn.qkv.bias": "model-00004-of-00004.safetensors",
650
+ "vision_model.encoder.layers.8.attn.qkv.weight": "model-00004-of-00004.safetensors",
651
+ "vision_model.encoder.layers.8.ls1": "model-00004-of-00004.safetensors",
652
+ "vision_model.encoder.layers.8.ls2": "model-00004-of-00004.safetensors",
653
+ "vision_model.encoder.layers.8.mlp.fc1.bias": "model-00004-of-00004.safetensors",
654
+ "vision_model.encoder.layers.8.mlp.fc1.weight": "model-00004-of-00004.safetensors",
655
+ "vision_model.encoder.layers.8.mlp.fc2.bias": "model-00004-of-00004.safetensors",
656
+ "vision_model.encoder.layers.8.mlp.fc2.weight": "model-00004-of-00004.safetensors",
657
+ "vision_model.encoder.layers.8.norm1.bias": "model-00004-of-00004.safetensors",
658
+ "vision_model.encoder.layers.8.norm1.weight": "model-00004-of-00004.safetensors",
659
+ "vision_model.encoder.layers.8.norm2.bias": "model-00004-of-00004.safetensors",
660
+ "vision_model.encoder.layers.8.norm2.weight": "model-00004-of-00004.safetensors",
661
+ "vision_model.encoder.layers.9.attn.proj.bias": "model-00004-of-00004.safetensors",
662
+ "vision_model.encoder.layers.9.attn.proj.weight": "model-00004-of-00004.safetensors",
663
+ "vision_model.encoder.layers.9.attn.qkv.bias": "model-00004-of-00004.safetensors",
664
+ "vision_model.encoder.layers.9.attn.qkv.weight": "model-00004-of-00004.safetensors",
665
+ "vision_model.encoder.layers.9.ls1": "model-00004-of-00004.safetensors",
666
+ "vision_model.encoder.layers.9.ls2": "model-00004-of-00004.safetensors",
667
+ "vision_model.encoder.layers.9.mlp.fc1.bias": "model-00004-of-00004.safetensors",
668
+ "vision_model.encoder.layers.9.mlp.fc1.weight": "model-00004-of-00004.safetensors",
669
+ "vision_model.encoder.layers.9.mlp.fc2.bias": "model-00004-of-00004.safetensors",
670
+ "vision_model.encoder.layers.9.mlp.fc2.weight": "model-00004-of-00004.safetensors",
671
+ "vision_model.encoder.layers.9.norm1.bias": "model-00004-of-00004.safetensors",
672
+ "vision_model.encoder.layers.9.norm1.weight": "model-00004-of-00004.safetensors",
673
+ "vision_model.encoder.layers.9.norm2.bias": "model-00004-of-00004.safetensors",
674
+ "vision_model.encoder.layers.9.norm2.weight": "model-00004-of-00004.safetensors"
675
+ }
676
+ }
modeling_intern_vit.py ADDED
@@ -0,0 +1,428 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # --------------------------------------------------------
2
+ # InternVL
3
+ # Copyright (c) 2024 OpenGVLab
4
+ # Licensed under The MIT License [see LICENSE for details]
5
+ # --------------------------------------------------------
6
+
7
+ from typing import Optional, Tuple, Union
8
+
9
+ import torch
10
+ import torch.nn.functional as F
11
+ import torch.utils.checkpoint
12
+ from einops import rearrange
13
+ from timm.models.layers import DropPath
14
+ from torch import nn
15
+ from transformers.activations import ACT2FN
16
+ from transformers.modeling_outputs import (BaseModelOutput,
17
+ BaseModelOutputWithPooling)
18
+ from transformers.modeling_utils import PreTrainedModel
19
+ from transformers.utils import logging
20
+
21
+ from .configuration_intern_vit import InternVisionConfig
22
+
23
+ try:
24
+ from flash_attn.bert_padding import pad_input, unpad_input
25
+ from flash_attn.flash_attn_interface import \
26
+ flash_attn_varlen_qkvpacked_func
27
+ has_flash_attn = True
28
+ except:
29
+ print('FlashAttention2 is not installed.')
30
+ has_flash_attn = False
31
+
32
+ logger = logging.get_logger(__name__)
33
+
34
+
35
+ class FlashAttention(nn.Module):
36
+ """Implement the scaled dot product attention with softmax.
37
+ Arguments
38
+ ---------
39
+ softmax_scale: The temperature to use for the softmax attention.
40
+ (default: 1/sqrt(d_keys) where d_keys is computed at
41
+ runtime)
42
+ attention_dropout: The dropout rate to apply to the attention
43
+ (default: 0.0)
44
+ """
45
+
46
+ def __init__(self, softmax_scale=None, attention_dropout=0.0, device=None, dtype=None):
47
+ super().__init__()
48
+ self.softmax_scale = softmax_scale
49
+ self.dropout_p = attention_dropout
50
+
51
+ def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens=None,
52
+ max_s=None, need_weights=False):
53
+ """Implements the multihead softmax attention.
54
+ Arguments
55
+ ---------
56
+ qkv: The tensor containing the query, key, and value. (B, S, 3, H, D) if key_padding_mask is None
57
+ if unpadded: (nnz, 3, h, d)
58
+ key_padding_mask: a bool tensor of shape (B, S)
59
+ """
60
+ assert not need_weights
61
+ assert qkv.dtype in [torch.float16, torch.bfloat16]
62
+ assert qkv.is_cuda
63
+
64
+ if cu_seqlens is None:
65
+ batch_size = qkv.shape[0]
66
+ seqlen = qkv.shape[1]
67
+ if key_padding_mask is None:
68
+ qkv = rearrange(qkv, 'b s ... -> (b s) ...')
69
+ max_s = seqlen
70
+ cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
71
+ device=qkv.device)
72
+ output = flash_attn_varlen_qkvpacked_func(
73
+ qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
74
+ softmax_scale=self.softmax_scale, causal=causal
75
+ )
76
+ output = rearrange(output, '(b s) ... -> b s ...', b=batch_size)
77
+ else:
78
+ nheads = qkv.shape[-2]
79
+ x = rearrange(qkv, 'b s three h d -> b s (three h d)')
80
+ x_unpad, indices, cu_seqlens, max_s = unpad_input(x, key_padding_mask)
81
+ x_unpad = rearrange(x_unpad, 'nnz (three h d) -> nnz three h d', three=3, h=nheads)
82
+ output_unpad = flash_attn_varlen_qkvpacked_func(
83
+ x_unpad, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
84
+ softmax_scale=self.softmax_scale, causal=causal
85
+ )
86
+ output = rearrange(pad_input(rearrange(output_unpad, 'nnz h d -> nnz (h d)'),
87
+ indices, batch_size, seqlen),
88
+ 'b s (h d) -> b s h d', h=nheads)
89
+ else:
90
+ assert max_s is not None
91
+ output = flash_attn_varlen_qkvpacked_func(
92
+ qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
93
+ softmax_scale=self.softmax_scale, causal=causal
94
+ )
95
+
96
+ return output, None
97
+
98
+
99
+
100
+ class InternRMSNorm(nn.Module):
101
+ def __init__(self, hidden_size, eps=1e-6):
102
+ super().__init__()
103
+ self.weight = nn.Parameter(torch.ones(hidden_size))
104
+ self.variance_epsilon = eps
105
+
106
+ def forward(self, hidden_states):
107
+ input_dtype = hidden_states.dtype
108
+ hidden_states = hidden_states.to(torch.float32)
109
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
110
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
111
+ return self.weight * hidden_states.to(input_dtype)
112
+
113
+ # try:
114
+ # from apex.normalization import FusedRMSNorm
115
+
116
+ # InternRMSNorm = FusedRMSNorm # noqa
117
+
118
+ # except ImportError:
119
+ # # using the normal InternRMSNorm
120
+ # pass
121
+ # except Exception:
122
+ # pass
123
+
124
+
125
+ NORM2FN = {
126
+ 'rms_norm': InternRMSNorm,
127
+ 'layer_norm': nn.LayerNorm,
128
+ }
129
+
130
+
131
+ class InternVisionEmbeddings(nn.Module):
132
+ def __init__(self, config: InternVisionConfig):
133
+ super().__init__()
134
+ self.config = config
135
+ self.embed_dim = config.hidden_size
136
+ self.image_size = config.image_size
137
+ self.patch_size = config.patch_size
138
+
139
+ self.class_embedding = nn.Parameter(
140
+ torch.randn(1, 1, self.embed_dim),
141
+ )
142
+
143
+ self.patch_embedding = nn.Conv2d(
144
+ in_channels=3, out_channels=self.embed_dim, kernel_size=self.patch_size, stride=self.patch_size
145
+ )
146
+
147
+ self.num_patches = (self.image_size // self.patch_size) ** 2
148
+ self.num_positions = self.num_patches + 1
149
+
150
+ self.position_embedding = nn.Parameter(torch.randn(1, self.num_positions, self.embed_dim))
151
+
152
+ def _get_pos_embed(self, pos_embed, H, W):
153
+ target_dtype = pos_embed.dtype
154
+ pos_embed = pos_embed.float().reshape(
155
+ 1, self.image_size // self.patch_size, self.image_size // self.patch_size, -1).permute(0, 3, 1, 2)
156
+ pos_embed = F.interpolate(pos_embed, size=(H, W), mode='bicubic', align_corners=False). \
157
+ reshape(1, -1, H * W).permute(0, 2, 1).to(target_dtype)
158
+ return pos_embed
159
+
160
+ def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
161
+ target_dtype = self.patch_embedding.weight.dtype
162
+ patch_embeds = self.patch_embedding(pixel_values) # shape = [*, channel, width, height]
163
+ batch_size, _, height, width = patch_embeds.shape
164
+ patch_embeds = patch_embeds.flatten(2).transpose(1, 2)
165
+ class_embeds = self.class_embedding.expand(batch_size, 1, -1).to(target_dtype)
166
+ embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
167
+ position_embedding = torch.cat([
168
+ self.position_embedding[:, :1, :],
169
+ self._get_pos_embed(self.position_embedding[:, 1:, :], height, width)
170
+ ], dim=1)
171
+ embeddings = embeddings + position_embedding.to(target_dtype)
172
+ return embeddings
173
+
174
+
175
+ class InternAttention(nn.Module):
176
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
177
+
178
+ def __init__(self, config: InternVisionConfig):
179
+ super().__init__()
180
+ self.config = config
181
+ self.embed_dim = config.hidden_size
182
+ self.num_heads = config.num_attention_heads
183
+ self.use_flash_attn = config.use_flash_attn and has_flash_attn
184
+ if config.use_flash_attn and not has_flash_attn:
185
+ print('Warning: Flash Attention is not available, use_flash_attn is set to False.')
186
+ self.head_dim = self.embed_dim // self.num_heads
187
+ if self.head_dim * self.num_heads != self.embed_dim:
188
+ raise ValueError(
189
+ f'embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim} and `num_heads`:'
190
+ f' {self.num_heads}).'
191
+ )
192
+
193
+ self.scale = self.head_dim ** -0.5
194
+ self.qkv = nn.Linear(self.embed_dim, 3 * self.embed_dim, bias=config.qkv_bias)
195
+ self.attn_drop = nn.Dropout(config.attention_dropout)
196
+ self.proj_drop = nn.Dropout(config.dropout)
197
+
198
+ self.qk_normalization = config.qk_normalization
199
+
200
+ if self.qk_normalization:
201
+ self.q_norm = InternRMSNorm(self.embed_dim, eps=config.layer_norm_eps)
202
+ self.k_norm = InternRMSNorm(self.embed_dim, eps=config.layer_norm_eps)
203
+
204
+ if self.use_flash_attn:
205
+ self.inner_attn = FlashAttention(attention_dropout=config.attention_dropout)
206
+ self.proj = nn.Linear(self.embed_dim, self.embed_dim)
207
+
208
+ def _naive_attn(self, x):
209
+ B, N, C = x.shape
210
+ qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
211
+ q, k, v = qkv.unbind(0) # make torchscript happy (cannot use tensor as tuple)
212
+
213
+ if self.qk_normalization:
214
+ B_, H_, N_, D_ = q.shape
215
+ q = self.q_norm(q.transpose(1, 2).flatten(-2, -1)).view(B_, N_, H_, D_).transpose(1, 2)
216
+ k = self.k_norm(k.transpose(1, 2).flatten(-2, -1)).view(B_, N_, H_, D_).transpose(1, 2)
217
+
218
+ attn = ((q * self.scale) @ k.transpose(-2, -1))
219
+ attn = attn.softmax(dim=-1)
220
+ attn = self.attn_drop(attn)
221
+
222
+ x = (attn @ v).transpose(1, 2).reshape(B, N, C)
223
+ x = self.proj(x)
224
+ x = self.proj_drop(x)
225
+ return x
226
+
227
+ def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
228
+ qkv = self.qkv(x)
229
+ qkv = rearrange(qkv, 'b s (three h d) -> b s three h d', three=3, h=self.num_heads)
230
+
231
+ if self.qk_normalization:
232
+ q, k, v = qkv.unbind(2)
233
+ q = self.q_norm(q.flatten(-2, -1)).view(q.shape)
234
+ k = self.k_norm(k.flatten(-2, -1)).view(k.shape)
235
+ qkv = torch.stack([q, k, v], dim=2)
236
+
237
+ context, _ = self.inner_attn(
238
+ qkv, key_padding_mask=key_padding_mask, need_weights=need_weights, causal=False
239
+ )
240
+ outs = self.proj(rearrange(context, 'b s h d -> b s (h d)'))
241
+ outs = self.proj_drop(outs)
242
+ return outs
243
+
244
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
245
+ x = self._naive_attn(hidden_states) if not self.use_flash_attn else self._flash_attn(hidden_states)
246
+ return x
247
+
248
+
249
+ class InternMLP(nn.Module):
250
+ def __init__(self, config: InternVisionConfig):
251
+ super().__init__()
252
+ self.config = config
253
+ self.act = ACT2FN[config.hidden_act]
254
+ self.fc1 = nn.Linear(config.hidden_size, config.intermediate_size)
255
+ self.fc2 = nn.Linear(config.intermediate_size, config.hidden_size)
256
+
257
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
258
+ hidden_states = self.fc1(hidden_states)
259
+ hidden_states = self.act(hidden_states)
260
+ hidden_states = self.fc2(hidden_states)
261
+ return hidden_states
262
+
263
+
264
+ class InternVisionEncoderLayer(nn.Module):
265
+ def __init__(self, config: InternVisionConfig, drop_path_rate: float):
266
+ super().__init__()
267
+ self.embed_dim = config.hidden_size
268
+ self.intermediate_size = config.intermediate_size
269
+ self.norm_type = config.norm_type
270
+
271
+ self.attn = InternAttention(config)
272
+ self.mlp = InternMLP(config)
273
+ self.norm1 = NORM2FN[self.norm_type](self.embed_dim, eps=config.layer_norm_eps)
274
+ self.norm2 = NORM2FN[self.norm_type](self.embed_dim, eps=config.layer_norm_eps)
275
+
276
+ self.ls1 = nn.Parameter(config.initializer_factor * torch.ones(self.embed_dim))
277
+ self.ls2 = nn.Parameter(config.initializer_factor * torch.ones(self.embed_dim))
278
+ self.drop_path1 = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
279
+ self.drop_path2 = DropPath(drop_path_rate) if drop_path_rate > 0. else nn.Identity()
280
+
281
+ def forward(
282
+ self,
283
+ hidden_states: torch.Tensor,
284
+ ) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor], Optional[Tuple[torch.FloatTensor]]]:
285
+ """
286
+ Args:
287
+ hidden_states (`Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]`): input to the layer of shape `(batch, seq_len, embed_dim)`
288
+ """
289
+ hidden_states = hidden_states + self.drop_path1(self.attn(self.norm1(hidden_states).to(hidden_states.dtype)) * self.ls1)
290
+
291
+ hidden_states = hidden_states + self.drop_path2(self.mlp(self.norm2(hidden_states).to(hidden_states.dtype)) * self.ls2)
292
+
293
+ return hidden_states
294
+
295
+
296
+ class InternVisionEncoder(nn.Module):
297
+ """
298
+ Transformer encoder consisting of `config.num_hidden_layers` self attention layers. Each layer is a
299
+ [`InternEncoderLayer`].
300
+
301
+ Args:
302
+ config (`InternConfig`):
303
+ The corresponding vision configuration for the `InternEncoder`.
304
+ """
305
+
306
+ def __init__(self, config: InternVisionConfig):
307
+ super().__init__()
308
+ self.config = config
309
+ # stochastic depth decay rule
310
+ dpr = [x.item() for x in torch.linspace(0, config.drop_path_rate, config.num_hidden_layers)]
311
+ self.layers = nn.ModuleList([
312
+ InternVisionEncoderLayer(config, dpr[idx]) for idx in range(config.num_hidden_layers)])
313
+ self.gradient_checkpointing = True
314
+
315
+ def forward(
316
+ self,
317
+ inputs_embeds,
318
+ output_hidden_states: Optional[bool] = None,
319
+ return_dict: Optional[bool] = None,
320
+ ) -> Union[Tuple, BaseModelOutput]:
321
+ r"""
322
+ Args:
323
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`):
324
+ Embedded representation of the inputs. Should be float, not int tokens.
325
+ output_hidden_states (`bool`, *optional*):
326
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
327
+ for more detail.
328
+ return_dict (`bool`, *optional*):
329
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
330
+ """
331
+ output_hidden_states = (
332
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
333
+ )
334
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
335
+
336
+ encoder_states = () if output_hidden_states else None
337
+ hidden_states = inputs_embeds
338
+
339
+ for idx, encoder_layer in enumerate(self.layers):
340
+ if output_hidden_states:
341
+ encoder_states = encoder_states + (hidden_states,)
342
+ if self.gradient_checkpointing and self.training:
343
+ layer_outputs = torch.utils.checkpoint.checkpoint(
344
+ encoder_layer,
345
+ hidden_states)
346
+ else:
347
+ layer_outputs = encoder_layer(
348
+ hidden_states,
349
+ )
350
+ hidden_states = layer_outputs
351
+
352
+ if output_hidden_states:
353
+ encoder_states = encoder_states + (hidden_states,)
354
+
355
+ if not return_dict:
356
+ return tuple(v for v in [hidden_states, encoder_states] if v is not None)
357
+ return BaseModelOutput(
358
+ last_hidden_state=hidden_states, hidden_states=encoder_states
359
+ )
360
+
361
+
362
+ class InternVisionModel(PreTrainedModel):
363
+ main_input_name = 'pixel_values'
364
+ _supports_flash_attn_2 = True
365
+ config_class = InternVisionConfig
366
+ _no_split_modules = ['InternVisionEncoderLayer']
367
+
368
+ def __init__(self, config: InternVisionConfig):
369
+ super().__init__(config)
370
+ self.config = config
371
+
372
+ self.embeddings = InternVisionEmbeddings(config)
373
+ self.encoder = InternVisionEncoder(config)
374
+
375
+ def resize_pos_embeddings(self, old_size, new_size, patch_size):
376
+ pos_emb = self.embeddings.position_embedding
377
+ _, num_positions, embed_dim = pos_emb.shape
378
+ cls_emb = pos_emb[:, :1, :]
379
+ pos_emb = pos_emb[:, 1:, :].reshape(1, old_size // patch_size, old_size // patch_size, -1).permute(0, 3, 1, 2)
380
+ pos_emb = F.interpolate(pos_emb.float(), size=new_size // patch_size, mode='bicubic', align_corners=False)
381
+ pos_emb = pos_emb.to(cls_emb.dtype).reshape(1, embed_dim, -1).permute(0, 2, 1)
382
+ pos_emb = torch.cat([cls_emb, pos_emb], dim=1)
383
+ self.embeddings.position_embedding = nn.Parameter(pos_emb)
384
+ self.embeddings.image_size = new_size
385
+ logger.info('Resized position embeddings from {} to {}'.format(old_size, new_size))
386
+
387
+ def get_input_embeddings(self):
388
+ return self.embeddings
389
+
390
+ def forward(
391
+ self,
392
+ pixel_values: Optional[torch.FloatTensor] = None,
393
+ output_hidden_states: Optional[bool] = None,
394
+ return_dict: Optional[bool] = None,
395
+ pixel_embeds: Optional[torch.FloatTensor] = None,
396
+ ) -> Union[Tuple, BaseModelOutputWithPooling]:
397
+ output_hidden_states = (
398
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
399
+ )
400
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
401
+
402
+ if pixel_values is None and pixel_embeds is None:
403
+ raise ValueError('You have to specify pixel_values or pixel_embeds')
404
+
405
+ if pixel_embeds is not None:
406
+ hidden_states = pixel_embeds
407
+ else:
408
+ if len(pixel_values.shape) == 4:
409
+ hidden_states = self.embeddings(pixel_values)
410
+ else:
411
+ raise ValueError(f'wrong pixel_values size: {pixel_values.shape}')
412
+ encoder_outputs = self.encoder(
413
+ inputs_embeds=hidden_states,
414
+ output_hidden_states=output_hidden_states,
415
+ return_dict=return_dict,
416
+ )
417
+ last_hidden_state = encoder_outputs.last_hidden_state
418
+ pooled_output = last_hidden_state[:, 0, :]
419
+
420
+ if not return_dict:
421
+ return (last_hidden_state, pooled_output) + encoder_outputs[1:]
422
+
423
+ return BaseModelOutputWithPooling(
424
+ last_hidden_state=last_hidden_state,
425
+ pooler_output=pooled_output,
426
+ hidden_states=encoder_outputs.hidden_states,
427
+ attentions=encoder_outputs.attentions,
428
+ )
modeling_qianfanvl_chat.py ADDED
@@ -0,0 +1,394 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) 2025 Qianfan
2
+ # Licensed under the MIT License. See LICENSE file in the project root for full license information.
3
+ import warnings
4
+ from typing import List, Optional, Tuple, Union
5
+ from transformers import AutoTokenizer
6
+ import torch.distributed as dist
7
+ import torch.utils.checkpoint
8
+ import transformers
9
+ from .conversation import get_conv_template
10
+ from torch import nn
11
+ from torch.nn import CrossEntropyLoss
12
+ from transformers import (AutoModel, GenerationConfig, LlamaForCausalLM,
13
+ LlamaTokenizer)
14
+ from transformers.modeling_outputs import CausalLMOutputWithPast
15
+ from transformers.modeling_utils import PreTrainedModel
16
+ from transformers.utils import ModelOutput, logging
17
+
18
+ from .configuration_qianfanvl_chat import QianfanVLChatConfig
19
+ from .modeling_intern_vit import InternVisionModel, has_flash_attn
20
+ import difflib
21
+
22
+ logger = logging.get_logger(__name__)
23
+
24
+
25
+ def version_cmp(v1, v2, op='eq'):
26
+ import operator
27
+
28
+ from packaging import version
29
+ op_func = getattr(operator, op)
30
+ return op_func(version.parse(v1), version.parse(v2))
31
+
32
+
33
+ class QianfanVLChatModel(PreTrainedModel):
34
+ config_class = QianfanVLChatConfig
35
+ main_input_name = 'pixel_values'
36
+ base_model_prefix = 'language_model'
37
+ _no_split_modules = ['InternVisionModel', 'LlamaDecoderLayer']
38
+ _supports_flash_attn_2 = True
39
+ supports_gradient_checkpointing = True
40
+
41
+ def __init__(self, config: QianfanVLChatConfig, vision_model=None, language_model=None, use_flash_attn=True):
42
+ super().__init__(config)
43
+
44
+ assert version_cmp(transformers.__version__, '4.37.0', 'ge')
45
+ image_size = config.force_image_size or config.vision_config.image_size
46
+ patch_size = config.vision_config.patch_size
47
+ self.patch_size = patch_size
48
+ self.select_layer = config.select_layer
49
+ self.template = config.template
50
+ self.num_image_token = int((image_size // patch_size) ** 2 * (config.downsample_ratio ** 2))
51
+ self.downsample_ratio = config.downsample_ratio
52
+ self.ps_version = config.ps_version
53
+ self.llm_arch_name = config.llm_config.architectures[0]
54
+ # Enable Flash Attention if supported, otherwise fall back to eager attention.
55
+ use_flash_attn = False # use_flash_attn if has_flash_attn else False
56
+ config.vision_config.use_flash_attn = True if use_flash_attn else False
57
+ config.llm_config.attn_implementation = 'flash_attention_2' if use_flash_attn else 'eager'
58
+
59
+ if vision_model is not None:
60
+ self.vision_model = vision_model
61
+ else:
62
+ self.vision_model = InternVisionModel(config.vision_config)
63
+ if language_model is not None:
64
+ self.language_model = language_model
65
+ else:
66
+ self.language_model = LlamaForCausalLM(config.llm_config)
67
+
68
+ vit_hidden_size = config.vision_config.hidden_size
69
+ llm_hidden_size = config.llm_config.hidden_size
70
+
71
+ self.mlp1 = nn.Sequential(
72
+ nn.LayerNorm(vit_hidden_size * int(1 / self.downsample_ratio) ** 2),
73
+ nn.Linear(vit_hidden_size * int(1 / self.downsample_ratio) ** 2, llm_hidden_size),
74
+ nn.GELU(),
75
+ nn.Linear(llm_hidden_size, llm_hidden_size)
76
+ )
77
+
78
+ self.img_context_token_id = None
79
+ self.conv_template = get_conv_template(self.template)
80
+ if hasattr(config, 'system_message'):
81
+ self.system_message = config.system_message
82
+ else:
83
+ self.system_message = self.conv_template.system_message
84
+ self.num_samples = 0
85
+
86
+ def forward(
87
+ self,
88
+ pixel_values: torch.FloatTensor,
89
+ input_ids: torch.LongTensor = None,
90
+ attention_mask: Optional[torch.Tensor] = None,
91
+ position_ids: Optional[torch.LongTensor] = None,
92
+ image_flags: Optional[torch.LongTensor] = None,
93
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
94
+ labels: Optional[torch.LongTensor] = None,
95
+ use_cache: Optional[bool] = None,
96
+ output_attentions: Optional[bool] = None,
97
+ output_hidden_states: Optional[bool] = None,
98
+ return_dict: Optional[bool] = None,
99
+ statistics: Optional[torch.LongTensor] = None,
100
+ loss_weight: Optional[List] = None,
101
+ loss_reduction_all_gather: Optional[bool] = False,
102
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
103
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
104
+
105
+ image_flags = image_flags.squeeze(-1)
106
+ input_embeds = self.language_model.get_input_embeddings()(input_ids).clone()
107
+ vit_embeds = self.extract_feature(pixel_values)
108
+ vit_embeds = vit_embeds[image_flags == 1]
109
+ vit_batch_size = pixel_values.shape[0]
110
+ B, N, C = input_embeds.shape
111
+ input_embeds = input_embeds.reshape(B * N, C)
112
+
113
+ if torch.distributed.is_initialized() and torch.distributed.get_rank() == 0:
114
+ print(f'dynamic ViT batch size: {vit_batch_size}, images per sample: {vit_batch_size / B}, dynamic token length: {N}')
115
+ if statistics is not None:
116
+ num_samples, num_padding_tokens, num_padding_images = statistics.tolist()
117
+ self.num_samples += num_samples
118
+ print(f'total_samples={self.num_samples}, {num_samples=}, {num_padding_tokens=}, {num_padding_images=}')
119
+
120
+ input_ids = input_ids.reshape(B * N)
121
+ selected = (input_ids == self.img_context_token_id)
122
+ try:
123
+ input_embeds[selected] = input_embeds[selected] * 0.0 + vit_embeds.reshape(-1, C)
124
+ ignore_flag = False
125
+ except Exception as e:
126
+ vit_embeds = vit_embeds.reshape(-1, C)
127
+ print(f'warning: {e}, input_embeds[selected].shape={input_embeds[selected].shape}, '
128
+ f'vit_embeds.shape={vit_embeds.shape}')
129
+ n_token = selected.sum()
130
+ input_embeds[selected] = input_embeds[selected] * 0.0 + vit_embeds[:n_token]
131
+ ignore_flag = True
132
+
133
+ input_embeds = input_embeds.reshape(B, N, C)
134
+
135
+ outputs = self.language_model(
136
+ inputs_embeds=input_embeds,
137
+ attention_mask=attention_mask,
138
+ position_ids=position_ids,
139
+ past_key_values=past_key_values,
140
+ use_cache=use_cache,
141
+ output_attentions=output_attentions,
142
+ output_hidden_states=output_hidden_states,
143
+ return_dict=return_dict,
144
+ )
145
+ logits = outputs.logits
146
+
147
+ loss = None
148
+ if labels is not None and loss_weight is not None:
149
+ loss_weight = torch.tensor(loss_weight, dtype=torch.float32, device=labels.device)
150
+ # Shift so that tokens < n predict n
151
+ shift_logits = logits[..., :-1, :].contiguous()
152
+ shift_labels = labels[..., 1:].contiguous()
153
+ shift_weights = loss_weight[..., 1:].contiguous()
154
+ # Flatten the tokens
155
+ loss_fct = CrossEntropyLoss(reduction='none')
156
+ shift_logits = shift_logits.view(-1, self.language_model.config.vocab_size)
157
+ shift_labels = shift_labels.view(-1)
158
+ shift_weights = shift_weights.view(-1)
159
+ # Enable model parallelism
160
+ shift_labels = shift_labels.to(shift_logits.device)
161
+ shift_weights = shift_weights.to(shift_logits.device)
162
+ loss = loss_fct(shift_logits, shift_labels)
163
+
164
+ shift_weights_sum = shift_weights.sum()
165
+ if loss_reduction_all_gather:
166
+ dist.all_reduce(shift_weights_sum, op=dist.ReduceOp.AVG)
167
+
168
+ loss = loss * shift_weights
169
+ loss = loss.sum() / shift_weights_sum
170
+ if ignore_flag:
171
+ loss = loss * 0.0
172
+ elif labels is not None:
173
+ # Shift so that tokens < n predict n
174
+ shift_logits = logits[..., :-1, :].contiguous()
175
+ shift_labels = labels[..., 1:].contiguous()
176
+ # Flatten the tokens
177
+ loss_fct = CrossEntropyLoss()
178
+ shift_logits = shift_logits.view(-1, self.language_model.config.vocab_size)
179
+ shift_labels = shift_labels.view(-1)
180
+ # Enable model parallelism
181
+ shift_labels = shift_labels.to(shift_logits.device)
182
+ loss = loss_fct(shift_logits, shift_labels)
183
+ if ignore_flag:
184
+ loss = loss * 0.0
185
+
186
+ if not return_dict:
187
+ output = (logits,) + outputs[1:]
188
+ return (loss,) + output if loss is not None else output
189
+
190
+ return CausalLMOutputWithPast(
191
+ loss=loss,
192
+ logits=logits,
193
+ past_key_values=outputs.past_key_values,
194
+ hidden_states=outputs.hidden_states,
195
+ attentions=outputs.attentions,
196
+ )
197
+
198
+ def pixel_shuffle(self, x, scale_factor=0.5):
199
+ n, w, h, c = x.size()
200
+ # N, W, H, C --> N, W, H * scale, C // scale
201
+ x = x.view(n, w, int(h * scale_factor), int(c / scale_factor))
202
+ # N, W, H * scale, C // scale --> N, H * scale, W, C // scale
203
+ x = x.permute(0, 2, 1, 3).contiguous()
204
+ # N, H * scale, W, C // scale --> N, H * scale, W * scale, C // (scale ** 2)
205
+ x = x.view(n, int(h * scale_factor), int(w * scale_factor),
206
+ int(c / (scale_factor * scale_factor)))
207
+ if self.ps_version == 'v1':
208
+ warnings.warn("In ps_version 'v1', the height and width have not been swapped back, "
209
+ 'which results in a transposed image.')
210
+ else:
211
+ x = x.permute(0, 2, 1, 3).contiguous()
212
+ return x
213
+
214
+ def extract_feature(self, pixel_values):
215
+ if self.select_layer == -1:
216
+ vit_embeds = self.vision_model(
217
+ pixel_values=pixel_values,
218
+ output_hidden_states=False,
219
+ return_dict=True).last_hidden_state
220
+ else:
221
+ vit_embeds = self.vision_model(
222
+ pixel_values=pixel_values,
223
+ output_hidden_states=True,
224
+ return_dict=True).hidden_states[self.select_layer]
225
+ vit_embeds = vit_embeds[:, 1:, :]
226
+ h = w = int(vit_embeds.shape[1] ** 0.5)
227
+ vit_embeds = vit_embeds.reshape(vit_embeds.shape[0], h, w, -1)
228
+ vit_embeds = self.pixel_shuffle(vit_embeds, scale_factor=self.downsample_ratio)
229
+ vit_embeds = vit_embeds.reshape(vit_embeds.shape[0], -1, vit_embeds.shape[-1])
230
+ vit_embeds = self.mlp1(vit_embeds)
231
+ return vit_embeds
232
+
233
+ def batch_chat(self, tokenizer, pixel_values, questions, generation_config, num_patches_list=None,
234
+ history=None, return_history=False, IMG_START_TOKEN='<img>', IMG_END_TOKEN='</img>',
235
+ IMG_CONTEXT_TOKEN='<IMG_CONTEXT>', verbose=False, image_counts=None):
236
+ if history is not None or return_history:
237
+ print('Now multi-turn chat is not supported in batch_chat.')
238
+ raise NotImplementedError
239
+
240
+ if image_counts is not None:
241
+ num_patches_list = image_counts
242
+ print('Warning: `image_counts` is deprecated. Please use `num_patches_list` instead.')
243
+
244
+ img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
245
+ self.img_context_token_id = img_context_token_id
246
+
247
+ if verbose and pixel_values is not None:
248
+ image_bs = pixel_values.shape[0]
249
+ print(f'dynamic ViT batch size: {image_bs}')
250
+
251
+ queries = []
252
+ for idx, num_patches in enumerate(num_patches_list):
253
+ question = questions[idx]
254
+ if pixel_values is not None and '<image>' not in question:
255
+ question = '<image>' + question
256
+ template = get_conv_template(self.template)
257
+ template.system_message = self.system_message
258
+ template.append_message(template.roles[0], question)
259
+ template.append_message(template.roles[1], None)
260
+ query = template.get_prompt()
261
+
262
+ image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
263
+ query = query.replace('<image>', image_tokens, 1)
264
+ queries.append(query)
265
+
266
+ tokenizer.padding_side = 'left'
267
+ model_inputs = tokenizer(queries, return_tensors='pt', padding=True)
268
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
269
+ input_ids = model_inputs['input_ids'].to(device)
270
+ attention_mask = model_inputs['attention_mask'].to(device)
271
+ eos_token_id = tokenizer.convert_tokens_to_ids(template.sep.strip())
272
+ generation_config['eos_token_id'] = eos_token_id
273
+ #generation_config['eos_token_id'] = 181896
274
+ generation_output = self.generate(
275
+ pixel_values=pixel_values,
276
+ input_ids=input_ids,
277
+ attention_mask=attention_mask,
278
+ **generation_config
279
+ )
280
+ responses = tokenizer.batch_decode(generation_output, skip_special_tokens=False)
281
+ responses = [response.split(template.sep.strip())[0].strip() for response in responses]
282
+ return responses
283
+
284
+ def chat(self, tokenizer, pixel_values, question, generation_config, history=None, return_history=False,
285
+ num_patches_list=None, IMG_START_TOKEN='<img>', IMG_END_TOKEN='</img>', IMG_CONTEXT_TOKEN='<IMG_CONTEXT>',
286
+ verbose=False, debug=False):
287
+
288
+ if history is None and pixel_values is not None and '<image>' not in question:
289
+ question = '<image>' + question
290
+
291
+ if num_patches_list is None:
292
+ num_patches_list = [pixel_values.shape[0]] if pixel_values is not None else []
293
+ assert pixel_values is None or len(pixel_values) == sum(num_patches_list)
294
+
295
+ img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
296
+ self.img_context_token_id = img_context_token_id
297
+
298
+ template = get_conv_template(self.template)
299
+ eos_token_id = tokenizer.convert_tokens_to_ids(template.sep.strip())
300
+
301
+ history = [] if history is None else history
302
+ for (old_question, old_answer) in history:
303
+ template.append_message(template.roles[0], old_question)
304
+ template.append_message(template.roles[1], old_answer)
305
+ template.append_message(template.roles[0], question)
306
+ template.append_message(template.roles[1], None)
307
+ query = template.get_prompt()
308
+
309
+ if verbose and pixel_values is not None:
310
+ image_bs = pixel_values.shape[0]
311
+ print(f'dynamic ViT batch size: {image_bs}')
312
+
313
+ for num_patches in num_patches_list:
314
+ image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
315
+ query = query.replace('<image>', image_tokens, 1)
316
+
317
+ model_inputs = tokenizer(query, return_tensors='pt')
318
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
319
+ input_ids = model_inputs['input_ids'].to(device)
320
+ attention_mask = model_inputs['attention_mask'].to(device)
321
+ generation_config['eos_token_id'] = eos_token_id
322
+
323
+ generation_output = self.generate(
324
+ pixel_values=pixel_values,
325
+ input_ids=input_ids,
326
+ attention_mask=attention_mask,
327
+ **generation_config
328
+ )
329
+
330
+ if debug:
331
+ return input_ids, generation_output
332
+ response = tokenizer.batch_decode(generation_output, skip_special_tokens=False)[0]
333
+ response = response.split(template.sep.strip())[0].strip()
334
+ history.append((question, response))
335
+ if return_history:
336
+ return response, history
337
+ else:
338
+ query_to_print = query.replace(IMG_CONTEXT_TOKEN, '')
339
+ query_to_print = query_to_print.replace(f'{IMG_START_TOKEN}{IMG_END_TOKEN}', '<image>')
340
+ if verbose:
341
+ print(query_to_print, response)
342
+ return response
343
+
344
+ @torch.no_grad()
345
+ def generate(
346
+ self,
347
+ pixel_values: Optional[torch.FloatTensor] = None,
348
+ input_ids: Optional[torch.FloatTensor] = None,
349
+ attention_mask: Optional[torch.LongTensor] = None,
350
+ visual_features: Optional[torch.FloatTensor] = None,
351
+ generation_config: Optional[GenerationConfig] = None,
352
+ output_hidden_states: Optional[bool] = None,
353
+ **generate_kwargs,
354
+ ):
355
+
356
+ assert self.img_context_token_id is not None
357
+ if pixel_values is not None:
358
+ if visual_features is not None:
359
+ vit_embeds = visual_features
360
+ else:
361
+ vit_embeds = self.extract_feature(pixel_values)
362
+ #print("ViTEmbedding shape:", vit_embeds.shape)
363
+ input_embeds = self.language_model.get_input_embeddings()(input_ids)
364
+ B, N, C = input_embeds.shape
365
+ input_embeds = input_embeds.reshape(B * N, C)
366
+ input_ids = input_ids.reshape(B * N)
367
+ selected = (input_ids == self.img_context_token_id)
368
+ assert selected.sum() != 0
369
+ input_embeds[selected] = vit_embeds.reshape(-1, C).to(input_embeds.device)
370
+
371
+ input_embeds = input_embeds.reshape(B, N, C)
372
+ else:
373
+ input_embeds = self.language_model.get_input_embeddings()(input_ids)
374
+
375
+ outputs = self.language_model.generate(
376
+ inputs_embeds=input_embeds,
377
+ attention_mask=attention_mask,
378
+ generation_config=generation_config,
379
+ output_hidden_states=output_hidden_states,
380
+ use_cache=True,
381
+ **generate_kwargs,
382
+ )
383
+
384
+ return outputs
385
+
386
+ @property
387
+ def lm_head(self):
388
+ return self.language_model.get_output_embeddings()
389
+
390
+ def get_input_embeddings(self):
391
+ return self.language_model.get_input_embeddings()
392
+
393
+ def get_output_embeddings(self):
394
+ return self.language_model.get_output_embeddings()
special_tokens_map.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|end_of_text|>"
17
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc9a1340b0cfcda70bee91abbf2adff4dc0dd4c9b2f5f9b9e86926fa0fe92fc0
3
+ size 12264545
tokenizer_config.json ADDED
@@ -0,0 +1,1120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_eos_token": false,
3
+ "added_tokens_decoder": {
4
+ "181887": {
5
+ "content": "<|begin_of_text|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "181888": {
13
+ "content": "<|end_of_text|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "181889": {
21
+ "content": "<|reserved_special_token_0|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "181890": {
29
+ "content": "<|reserved_special_token_1|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "181891": {
37
+ "content": "<|reserved_special_token_2|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "181892": {
45
+ "content": "<|reserved_special_token_3|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "181893": {
53
+ "content": "<|start_header_id|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "181894": {
61
+ "content": "<|end_header_id|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "181895": {
69
+ "content": "<|reserved_special_token_4|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "181896": {
77
+ "content": "<|eot_id|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "181897": {
85
+ "content": "<|reserved_special_token_5|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "181898": {
93
+ "content": "<|reserved_special_token_6|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "181899": {
101
+ "content": "<|reserved_special_token_7|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "181900": {
109
+ "content": "<|reserved_special_token_8|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "181901": {
117
+ "content": "<|reserved_special_token_9|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "181902": {
125
+ "content": "<|reserved_special_token_10|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "181903": {
133
+ "content": "<|reserved_special_token_11|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "181904": {
141
+ "content": "<|reserved_special_token_12|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "181905": {
149
+ "content": "<|reserved_special_token_13|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "181906": {
157
+ "content": "<|reserved_special_token_14|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "181907": {
165
+ "content": "<|reserved_special_token_15|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "181908": {
173
+ "content": "<|reserved_special_token_16|>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "181909": {
181
+ "content": "<|reserved_special_token_17|>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "181910": {
189
+ "content": "<|reserved_special_token_18|>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "181911": {
197
+ "content": "<|reserved_special_token_19|>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "181912": {
205
+ "content": "<|reserved_special_token_20|>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "181913": {
213
+ "content": "<|reserved_special_token_21|>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "181914": {
221
+ "content": "<|reserved_special_token_22|>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "181915": {
229
+ "content": "<|reserved_special_token_23|>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "181916": {
237
+ "content": "<|reserved_special_token_24|>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "181917": {
245
+ "content": "<|reserved_special_token_25|>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "181918": {
253
+ "content": "<|reserved_special_token_26|>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "181919": {
261
+ "content": "<|reserved_special_token_27|>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "181920": {
269
+ "content": "<|reserved_special_token_28|>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "181921": {
277
+ "content": "<|reserved_special_token_29|>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "181922": {
285
+ "content": "<|reserved_special_token_30|>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "181923": {
293
+ "content": "<|reserved_special_token_31|>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "181924": {
301
+ "content": "<|reserved_special_token_32|>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "181925": {
309
+ "content": "<|reserved_special_token_33|>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "181926": {
317
+ "content": "<|reserved_special_token_34|>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "181927": {
325
+ "content": "<|reserved_special_token_35|>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "181928": {
333
+ "content": "<|reserved_special_token_36|>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "181929": {
341
+ "content": "<|reserved_special_token_37|>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "181930": {
349
+ "content": "<|reserved_special_token_38|>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "181931": {
357
+ "content": "<|reserved_special_token_39|>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "181932": {
365
+ "content": "<|reserved_special_token_40|>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "181933": {
373
+ "content": "<|reserved_special_token_41|>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "181934": {
381
+ "content": "<|reserved_special_token_42|>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "181935": {
389
+ "content": "<|reserved_special_token_43|>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "181936": {
397
+ "content": "<|reserved_special_token_44|>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "181937": {
405
+ "content": "<|reserved_special_token_45|>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "181938": {
413
+ "content": "<|reserved_special_token_46|>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "181939": {
421
+ "content": "<|reserved_special_token_47|>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "181940": {
429
+ "content": "<|reserved_special_token_48|>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "181941": {
437
+ "content": "<|reserved_special_token_49|>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "181942": {
445
+ "content": "<|reserved_special_token_50|>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "181943": {
453
+ "content": "<|reserved_special_token_51|>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "181944": {
461
+ "content": "<|reserved_special_token_52|>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "181945": {
469
+ "content": "<|reserved_special_token_53|>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "181946": {
477
+ "content": "<|reserved_special_token_54|>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "181947": {
485
+ "content": "<|reserved_special_token_55|>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "181948": {
493
+ "content": "<|reserved_special_token_56|>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "181949": {
501
+ "content": "<|reserved_special_token_57|>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "181950": {
509
+ "content": "<|reserved_special_token_58|>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "181951": {
517
+ "content": "<|reserved_special_token_59|>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "181952": {
525
+ "content": "<|reserved_special_token_60|>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "181953": {
533
+ "content": "<|reserved_special_token_61|>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "181954": {
541
+ "content": "<|reserved_special_token_62|>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "181955": {
549
+ "content": "<|reserved_special_token_63|>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "181956": {
557
+ "content": "<|reserved_special_token_64|>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "181957": {
565
+ "content": "<|reserved_special_token_65|>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "181958": {
573
+ "content": "<|reserved_special_token_66|>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "181959": {
581
+ "content": "<|reserved_special_token_67|>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "181960": {
589
+ "content": "<|reserved_special_token_68|>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "181961": {
597
+ "content": "<|reserved_special_token_69|>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "181962": {
605
+ "content": "<|reserved_special_token_70|>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "181963": {
613
+ "content": "<|reserved_special_token_71|>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "181964": {
621
+ "content": "<|reserved_special_token_72|>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "181965": {
629
+ "content": "<|reserved_special_token_73|>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "181966": {
637
+ "content": "<|reserved_special_token_74|>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "181967": {
645
+ "content": "<|reserved_special_token_75|>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "181968": {
653
+ "content": "<|reserved_special_token_76|>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "181969": {
661
+ "content": "<|reserved_special_token_77|>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "181970": {
669
+ "content": "<|reserved_special_token_78|>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "181971": {
677
+ "content": "<|reserved_special_token_79|>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "181972": {
685
+ "content": "<|reserved_special_token_80|>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "181973": {
693
+ "content": "<|reserved_special_token_81|>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "181974": {
701
+ "content": "<|reserved_special_token_82|>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "181975": {
709
+ "content": "<|reserved_special_token_83|>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "181976": {
717
+ "content": "<|reserved_special_token_84|>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "181977": {
725
+ "content": "<|reserved_special_token_85|>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "181978": {
733
+ "content": "<|reserved_special_token_86|>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "181979": {
741
+ "content": "<|reserved_special_token_87|>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "181980": {
749
+ "content": "<|reserved_special_token_88|>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "181981": {
757
+ "content": "<|reserved_special_token_89|>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "181982": {
765
+ "content": "<|reserved_special_token_90|>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "181983": {
773
+ "content": "<|reserved_special_token_91|>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "181984": {
781
+ "content": "<|reserved_special_token_92|>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "181985": {
789
+ "content": "<|reserved_special_token_93|>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "181986": {
797
+ "content": "<|reserved_special_token_94|>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "181987": {
805
+ "content": "<|reserved_special_token_95|>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "181988": {
813
+ "content": "<|reserved_special_token_96|>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "181989": {
821
+ "content": "<|reserved_special_token_97|>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "181990": {
829
+ "content": "<|reserved_special_token_98|>",
830
+ "lstrip": false,
831
+ "normalized": false,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": true
835
+ },
836
+ "181991": {
837
+ "content": "<|reserved_special_token_99|>",
838
+ "lstrip": false,
839
+ "normalized": false,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": true
843
+ },
844
+ "181992": {
845
+ "content": "<|reserved_special_token_100|>",
846
+ "lstrip": false,
847
+ "normalized": false,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": true
851
+ },
852
+ "181993": {
853
+ "content": "<|reserved_special_token_101|>",
854
+ "lstrip": false,
855
+ "normalized": false,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": true
859
+ },
860
+ "181994": {
861
+ "content": "<|reserved_special_token_102|>",
862
+ "lstrip": false,
863
+ "normalized": false,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": true
867
+ },
868
+ "181995": {
869
+ "content": "<|reserved_special_token_103|>",
870
+ "lstrip": false,
871
+ "normalized": false,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": true
875
+ },
876
+ "181996": {
877
+ "content": "<|reserved_special_token_104|>",
878
+ "lstrip": false,
879
+ "normalized": false,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": true
883
+ },
884
+ "181997": {
885
+ "content": "<|reserved_special_token_105|>",
886
+ "lstrip": false,
887
+ "normalized": false,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": true
891
+ },
892
+ "181998": {
893
+ "content": "<|reserved_special_token_106|>",
894
+ "lstrip": false,
895
+ "normalized": false,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": true
899
+ },
900
+ "181999": {
901
+ "content": "<|reserved_special_token_107|>",
902
+ "lstrip": false,
903
+ "normalized": false,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": true
907
+ },
908
+ "182000": {
909
+ "content": "<|reserved_special_token_108|>",
910
+ "lstrip": false,
911
+ "normalized": false,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": true
915
+ },
916
+ "182001": {
917
+ "content": "<|reserved_special_token_109|>",
918
+ "lstrip": false,
919
+ "normalized": false,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": true
923
+ },
924
+ "182002": {
925
+ "content": "<|reserved_special_token_110|>",
926
+ "lstrip": false,
927
+ "normalized": false,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": true
931
+ },
932
+ "182003": {
933
+ "content": "<|reserved_special_token_111|>",
934
+ "lstrip": false,
935
+ "normalized": false,
936
+ "rstrip": false,
937
+ "single_word": false,
938
+ "special": true
939
+ },
940
+ "182004": {
941
+ "content": "<|reserved_special_token_112|>",
942
+ "lstrip": false,
943
+ "normalized": false,
944
+ "rstrip": false,
945
+ "single_word": false,
946
+ "special": true
947
+ },
948
+ "182005": {
949
+ "content": "<|reserved_special_token_113|>",
950
+ "lstrip": false,
951
+ "normalized": false,
952
+ "rstrip": false,
953
+ "single_word": false,
954
+ "special": true
955
+ },
956
+ "182006": {
957
+ "content": "<|reserved_special_token_114|>",
958
+ "lstrip": false,
959
+ "normalized": false,
960
+ "rstrip": false,
961
+ "single_word": false,
962
+ "special": true
963
+ },
964
+ "182007": {
965
+ "content": "<|reserved_special_token_115|>",
966
+ "lstrip": false,
967
+ "normalized": false,
968
+ "rstrip": false,
969
+ "single_word": false,
970
+ "special": true
971
+ },
972
+ "182008": {
973
+ "content": "<|reserved_special_token_116|>",
974
+ "lstrip": false,
975
+ "normalized": false,
976
+ "rstrip": false,
977
+ "single_word": false,
978
+ "special": true
979
+ },
980
+ "182009": {
981
+ "content": "<|reserved_special_token_117|>",
982
+ "lstrip": false,
983
+ "normalized": false,
984
+ "rstrip": false,
985
+ "single_word": false,
986
+ "special": true
987
+ },
988
+ "182010": {
989
+ "content": "<|reserved_special_token_118|>",
990
+ "lstrip": false,
991
+ "normalized": false,
992
+ "rstrip": false,
993
+ "single_word": false,
994
+ "special": true
995
+ },
996
+ "182011": {
997
+ "content": "<|reserved_special_token_119|>",
998
+ "lstrip": false,
999
+ "normalized": false,
1000
+ "rstrip": false,
1001
+ "single_word": false,
1002
+ "special": true
1003
+ },
1004
+ "182012": {
1005
+ "content": "<|reserved_special_token_120|>",
1006
+ "lstrip": false,
1007
+ "normalized": false,
1008
+ "rstrip": false,
1009
+ "single_word": false,
1010
+ "special": true
1011
+ },
1012
+ "182013": {
1013
+ "content": "<|reserved_special_token_121|>",
1014
+ "lstrip": false,
1015
+ "normalized": false,
1016
+ "rstrip": false,
1017
+ "single_word": false,
1018
+ "special": true
1019
+ },
1020
+ "182014": {
1021
+ "content": "<|reserved_special_token_122|>",
1022
+ "lstrip": false,
1023
+ "normalized": false,
1024
+ "rstrip": false,
1025
+ "single_word": false,
1026
+ "special": true
1027
+ },
1028
+ "182015": {
1029
+ "content": "<|reserved_special_token_123|>",
1030
+ "lstrip": false,
1031
+ "normalized": false,
1032
+ "rstrip": false,
1033
+ "single_word": false,
1034
+ "special": true
1035
+ },
1036
+ "182016": {
1037
+ "content": "<img>",
1038
+ "lstrip": false,
1039
+ "normalized": false,
1040
+ "rstrip": false,
1041
+ "single_word": false,
1042
+ "special": true
1043
+ },
1044
+ "182017": {
1045
+ "content": "</img>",
1046
+ "lstrip": false,
1047
+ "normalized": false,
1048
+ "rstrip": false,
1049
+ "single_word": false,
1050
+ "special": true
1051
+ },
1052
+ "182018": {
1053
+ "content": "<IMG_CONTEXT>",
1054
+ "lstrip": false,
1055
+ "normalized": false,
1056
+ "rstrip": false,
1057
+ "single_word": false,
1058
+ "special": true
1059
+ },
1060
+ "182019": {
1061
+ "content": "<quad>",
1062
+ "lstrip": false,
1063
+ "normalized": false,
1064
+ "rstrip": false,
1065
+ "single_word": false,
1066
+ "special": true
1067
+ },
1068
+ "182020": {
1069
+ "content": "</quad>",
1070
+ "lstrip": false,
1071
+ "normalized": false,
1072
+ "rstrip": false,
1073
+ "single_word": false,
1074
+ "special": true
1075
+ },
1076
+ "182021": {
1077
+ "content": "<ref>",
1078
+ "lstrip": false,
1079
+ "normalized": false,
1080
+ "rstrip": false,
1081
+ "single_word": false,
1082
+ "special": true
1083
+ },
1084
+ "182022": {
1085
+ "content": "</ref>",
1086
+ "lstrip": false,
1087
+ "normalized": false,
1088
+ "rstrip": false,
1089
+ "single_word": false,
1090
+ "special": true
1091
+ },
1092
+ "182023": {
1093
+ "content": "<box>",
1094
+ "lstrip": false,
1095
+ "normalized": false,
1096
+ "rstrip": false,
1097
+ "single_word": false,
1098
+ "special": true
1099
+ },
1100
+ "182024": {
1101
+ "content": "</box>",
1102
+ "lstrip": false,
1103
+ "normalized": false,
1104
+ "rstrip": false,
1105
+ "single_word": false,
1106
+ "special": true
1107
+ }
1108
+ },
1109
+ "bos_token": "<|begin_of_text|>",
1110
+ "chat_template": "{% set loop_messages = messages %}{% if loop_messages[0]['role'] != 'system' %}{% set loop_messages = [{'role': 'system', 'content': '你是Qianfan-VL,由百度智能云千帆团队研发的多模态大模型。'}] + loop_messages %}{% endif %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
1111
+ "clean_up_tokenization_spaces": true,
1112
+ "eos_token": "<|eot_id|>",
1113
+ "model_input_names": [
1114
+ "input_ids",
1115
+ "attention_mask"
1116
+ ],
1117
+ "model_max_length": 32768,
1118
+ "pad_token": "<|end_of_text|>",
1119
+ "tokenizer_class": "PreTrainedTokenizerFast"
1120
+ }