spow12 commited on
Commit
936cfec
Β·
verified Β·
1 Parent(s): e3d6e36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +155 -185
README.md CHANGED
@@ -1,199 +1,169 @@
1
  ---
 
 
 
 
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a πŸ€— transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - ko
5
+ license: cc-by-nc-4.0
6
  library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ base_model:
11
+ - mistral-community/pixtral-12b
12
+ pipeline_tag: image-text-to-text
13
  ---
14
 
15
+ # Pixtral-12b-korean-preview
16
 
17
+ Finetunned with korean, english data for improving korean performance.
18
 
19
+ # Model Card for Model ID
20
 
21
+ Merged model using [mergekit](https://github.com/arcee-ai/mergekit/tree/main/mergekit)
22
+
23
+ This model hasn't been fully tested, so your feedback will be invaluable in improving it.
24
+
25
+ ## Merge Format
26
+
27
+ ```yaml
28
+ models:
29
+ - model: spow12/Pixtral-12b-korean-base
30
+ layer_range: [0, 40]
31
+ - model: mistral-community/pixtral-12b
32
+ layer_range: [0, 40]
33
+ merge_method: slerp
34
+ base_model: mistral-community/pixtral-12b
35
+ parameters:
36
+ t:
37
+ - filter: self_attn
38
+ value: [0, 0.5, 0.3, 0.7, 1]
39
+ - filter: mlp
40
+ value: [1, 0.5, 0.7, 0.3, 0]
41
+ - value: 0.5 # fallback for rest of tensors
42
+ dtype: bfloat16
43
+ ```
44
 
45
  ## Model Details
46
 
47
  ### Model Description
48
 
49
+ - **Developed by:** spow12(yw_nam)
50
+ - **Shared by :** spow12(yw_nam)
51
+ - **Model type:** LLaVA
52
+ - **Language(s) (NLP):** Korean, English
53
+ - **Finetuned from model :** [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b)
54
+
55
+ ## Usage
56
+
57
+ ### Single image inference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
+ ![image](https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSXVmCeFm5GRrciuGCM502uv9xXVSrS9zDJZ1umCfoMero2MLxT)
60
 
61
+ ```python
62
+ from transformers import AutoProcessor, AutoModelForVision2Seq
63
+ from PIL import Image
64
+
65
+ model_id = 'spow12/Pixtral-12b-korean-preview'
66
+ model = AutoModelForVision2Seq.from_pretrained(
67
+ model_id,
68
+ device_map='auto',
69
+ torch_dtype = torch.bfloat16,
70
+ ).eval()
71
+ model.tie_weights()
72
+ processor = AutoProcessor.from_pretrained(model_id)
73
+
74
+ system = "You are helpful assistant create by Yw nam"
75
+
76
+
77
+ chat = [
78
+ {
79
+ 'content': system,
80
+ 'role': 'system'
81
+ },
82
+ {
83
+ "role": "user", "content": [
84
+ {"type": "image"},
85
+ {"type": "text", "content": "이 이미지에 λ‚˜μ™€μžˆλŠ” 풍경을 μ„€λͺ…ν•΄μ€˜"},
86
+ ]
87
+ }
88
+ ]
89
+ url = "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSXVmCeFm5GRrciuGCM502uv9xXVSrS9zDJZ1umCfoMero2MLxT"
90
+ image = Image.open(requests.get(url, stream=True).raw)
91
+
92
+ images = [[image]]
93
+ prompt = processor.apply_chat_template(chat, tokenize=False)
94
+
95
+ inputs = processor(text=prompt, images=images, return_tensors="pt").to(model.device)
96
+ generate_ids = model.generate(**inputs, max_new_tokens=500,do_sample=True,min_p=0.1, temperature=0.9)
97
+ output = processor.batch_decode(generate_ids, skip_special_tokens=True,clean_up_tokenization_spaces=False)
98
+ print(output[0])
99
+
100
+ #Output
101
+ """이 μ΄λ―Έμ§€λŠ” λ°”μœ„ ν•΄μ•ˆμ— μœ„μΉ˜ν•œ μž‘μ€ 섬에 μœ„μΉ˜ν•œ κ³ μš”ν•œ ν•΄μ•ˆ 경치λ₯Ό λ³΄μ—¬μ€λ‹ˆλ‹€. 이 섬은 ν‘Έλ₯Έ 물둜 λ‘˜λŸ¬μ‹Έμ—¬ 있으며, κ·Έ μœ„μ—λŠ” 뢉은 지뢕이 μžˆλŠ” ν•˜μ–€ λ“±λŒ€κ°€ μ„œ μžˆμŠ΅λ‹ˆλ‹€. λ“±λŒ€λŠ” μ„¬μ˜ 쀑앙에 μœ„μΉ˜ν•΄ 있으며, λ°”μœ„ 절벽과 μ—°κ²°λœ λŒλ‹€λ¦¬κ°€ 이어져 μžˆμ–΄ μ ‘κ·Όν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ“±λŒ€ μ£Όλ³€μ˜ λ°”μœ„ μ ˆλ²½μ€ νŒŒλ„κ°€ λΆ€λ”ͺ히며 μž₯면에 역동적인 μš”μ†Œλ₯Ό λ”ν•©λ‹ˆλ‹€. λ“±λŒ€ λ„ˆλ¨Έλ‘œλŠ” ν•˜λŠ˜μ΄ λ§‘κ³  ν‘Έλ₯΄λ©°, 전체적인 μž₯면은 평화둭고 κ³ μš”ν•œ λΆ„μœ„κΈ°λ₯Ό μžμ•„λƒ…λ‹ˆλ‹€."""
102
+ ```
103
+
104
+ ### Multi image inference
105
+
106
+
107
+ <p align="center">
108
+ <img src="https://cloud.shopback.com/c_fit,h_750,w_750/store-service-tw/assets/20185/0476e480-b6c3-11ea-b541-2ba549204a69.png" width="300" style="display:inline-block;"/>
109
+ <img src="https://pbs.twimg.com/profile_images/1268196215587397634/sgD5ZWuO_400x400.png" width="300" style="display:inline-block;"/>
110
+ </p>
111
+
112
+ ```python
113
+ url_apple = "https://cloud.shopback.com/c_fit,h_750,w_750/store-service-tw/assets/20185/0476e480-b6c3-11ea-b541-2ba549204a69.png"
114
+ image_1 = Image.open(requests.get(url_apple, stream=True).raw)
115
+ url_microsoft = "https://pbs.twimg.com/profile_images/1268196215587397634/sgD5ZWuO_400x400.png"
116
+ image_2 = Image.open(requests.get(url_microsoft, stream=True).raw)
117
+ chat = [
118
+ {
119
+ 'content': system,
120
+ 'role': 'system'
121
+ },
122
+ {
123
+ "role": "user", "content": [
124
+ {"type": "image"},
125
+ {"type": "image"},
126
+ {"type": "text", "content": "두 기업에 λŒ€ν•΄μ„œ μ•„λŠ”κ±Έ μ„€λͺ…ν•΄μ€˜."},
127
+ ]
128
+ }
129
+ ]
130
+
131
+ images = [[image_1, image_2] ]
132
+ prompt = processor.apply_chat_template(chat, tokenize=False)
133
+ inputs = processor(text=prompt, images=images, return_tensors="pt").to(model.device)
134
+ generate_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.7, min_p=0.1)
135
+ output = processor.batch_decode(generate_ids, skip_special_tokens=True,clean_up_tokenization_spaces=False)
136
+ print(output[0])
137
+
138
+
139
+ #Output
140
+ """두 기업은 각각 Appleκ³Ό Microsoftμž…λ‹ˆλ‹€.
141
+
142
+ 1. μ• ν”Œ:
143
+ μ• ν”Œμ€ 1976년에 μŠ€ν‹°λΈŒ 작슀, μŠ€ν‹°λΈŒ μ›Œμ¦ˆλ‹ˆμ•…, λ‘œλ„λ“œ μ›¨μΈμ—κ²Œ μ„€λ¦½λœ 미ꡭ의 닀ꡭ적 기술 κΈ°μ—…μž…λ‹ˆλ‹€. μ• ν”Œμ˜ μ£Όμš” μ œν’ˆμœΌλ‘œλŠ” iPhone, iPad, Mac, Apple Watchκ°€ μžˆμŠ΅λ‹ˆλ‹€. 이 νšŒμ‚¬λŠ” ν˜μ‹ μ μΈ λ””μžμΈ, μ‚¬μš©μž μΉœν™”μ μΈ μΈν„°νŽ˜μ΄μŠ€, κ³ ν’ˆμ§ˆμ˜ ν•˜λ“œμ›¨μ–΄λ‘œ 유λͺ…ν•©λ‹ˆλ‹€. μ• ν”Œμ€ λ˜ν•œ Apple Music, iCloud, App Store와 같은 λ‹€μ–‘ν•œ μ†Œν”„νŠΈμ›¨μ–΄ μ„œλΉ„μŠ€μ™€ ν”Œλž«νΌμ„ μ œκ³΅ν•©λ‹ˆλ‹€. μ• ν”Œμ€ ν˜μ‹ μ μΈ μ œν’ˆκ³Ό κ°•λ ₯ν•œ λΈŒλžœλ“œλ‘œ 잘 μ•Œλ €μ Έ 있으며, 2010λ…„λŒ€ 이후 μ„Έκ³„μ—μ„œ κ°€μž₯ κ°€μΉ˜ μžˆλŠ” κΈ°μ—… 쀑 ν•˜λ‚˜λ‘œ μžλ¦¬λ§€κΉ€ν–ˆμŠ΅λ‹ˆλ‹€.
144
+
145
+ 2. λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈ:
146
+ λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈλŠ” 1975년에 빌 κ²Œμ΄μΈ μ™€ 폴 μ•Œλ Œμ— μ˜ν•΄ μ„€λ¦½λœ 미ꡭ의 닀ꡭ적 기술 κΈ°μ—…μž…λ‹ˆλ‹€. 이 νšŒμ‚¬λŠ” 운영 체제, μ†Œν”„νŠΈμ›¨μ–΄, 개인용 컴퓨터, μ „μžμ œν’ˆ κ°œλ°œμ— 쀑점을 λ‘‘λ‹ˆλ‹€. λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈμ˜ μ£Όμš” μ œν’ˆμœΌλ‘œλŠ” Windows 운영 체제, Microsoft Office μ œν’ˆκ΅°, Xbox κ²Œμž„ μ½˜μ†”μ΄ μžˆμŠ΅λ‹ˆλ‹€. 이 νšŒμ‚¬λŠ” μ†Œν”„νŠΈμ›¨μ–΄ 개발, ν΄λΌμš°λ“œ μ»΄ν“¨νŒ…, 인곡지λŠ₯ 연ꡬ와 같은 λΆ„μ•Όμ—μ„œλ„ μ€‘μš”ν•œ 역할을 ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈλŠ” ν˜μ‹ μ μΈ 기술과 κ°•λ ₯ν•œ λΉ„μ¦ˆλ‹ˆμŠ€ μ†”λ£¨μ…˜μœΌλ‘œ 잘 μ•Œλ €μ Έ 있으며, μ„Έκ³„μ—μ„œ κ°€μž₯ κ°€μΉ˜ μžˆλŠ” κΈ°μ—… 쀑 ν•˜λ‚˜λ‘œ μžλ¦¬λ§€κΉ€ν–ˆμŠ΅λ‹ˆλ‹€"""
147
+ ```
148
+
149
+ ## Limitation
150
+
151
+ Overall, the performance seems reasonable.
152
+
153
+ However, it declines when processing images with languages other than English.
154
+
155
+ This is likely because the model was trained primarily on English text and landscapes.
156
+
157
+ Adding Korean data in the future is expected to enhance performance.
158
+
159
+ ## Citation
160
+
161
+ ```bibtex
162
+ @misc {spow12/Pixtral-12b-korean-preview,
163
+ author = { YoungWoo Nam },
164
+ title = { spow12/Pixtral-12b-korean-preview },
165
+ year = 2024,
166
+ url = { https://huggingface.co/spow12/Pixtral-12b-korean-preview },
167
+ publisher = { Hugging Face }
168
+ }
169
+ ```