qqc1989 commited on
Commit
d383d62
·
verified ·
1 Parent(s): 3cb8955

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -3
README.md CHANGED
@@ -1,3 +1,195 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - Qwen/Qwen2.5-VL-3B-Instruct
8
+ pipeline_tag: image-text-to-text
9
+ library_name: transformers
10
+ tags:
11
+ - Qwen2.5-VL
12
+ - Qwen2.5-VL-3B-Instruct
13
+ - Int8
14
+ - VLM
15
+ ---
16
+
17
+ # Qwen2.5-VL-3B-Instruct
18
+
19
+ This version of Qwen2.5-VL-3B-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.
20
+
21
+ This model has been optimized with the following LoRA:
22
+
23
+ Compatible with Pulsar2 version: 3.4
24
+
25
+ ## Convert tools links:
26
+
27
+ For those who are interested in model conversion, you can try to export axmodel through the original repo :
28
+ https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
29
+
30
+ [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
31
+
32
+ [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/Qwen2.5-VL-3B-Instruct.axera)
33
+
34
+
35
+ ## Support Platform
36
+
37
+ - AX650
38
+ - AX650N DEMO Board
39
+ - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
40
+ - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
41
+
42
+ **Image Process**
43
+ |Chips| input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
44
+ |--|--|--|--|--|--|--|--|
45
+ |AX650| 448*448 | 1 | 780 ms | 420 ms | 6.2 tokens/sec| 4.3 GiB | 4.6 GiB |
46
+
47
+ **Video Process**
48
+ |Chips| input size | image num | image encoder |ttft(512 tokens) | w8a16 | DDR | Flash |
49
+ |--|--|--|--|--|--|--|--|
50
+ |AX650| 308*308 | 8 | 1400 ms | 5400 ms | 6.1 tokens/sec| 4.4 GiB | 4.7 GiB |
51
+
52
+
53
+ ## How to use
54
+
55
+ Download all files from this repository to the device
56
+
57
+ **If you using AX650 Board**
58
+ ```
59
+ root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# tree -L 2
60
+ .
61
+ ├── image
62
+ │   └── ssd_car.jpg
63
+ ├── main
64
+ ├── python
65
+ │   ├── cv_resize.py
66
+ │   ├── infer_image.py
67
+ │   ├── infer_text.py
68
+ │   ├── infer_video.py
69
+ │   ├── preprocess.py
70
+ │   └── utils.py
71
+ ├── qwen2_5-vl-3b-image-ax650
72
+ │   ├── Qwen2.5-VL-3B-Instruct_vision_nchw448.axmodel
73
+ │   ├── model.embed_tokens.weight.bfloat16.bin
74
+ │   ├── qwen2_5_vl_p320_l0_together.axmodel
75
+ ......
76
+ │   ├── qwen2_5_vl_p320_l9_together.axmodel
77
+ │   └── qwen2_5_vl_post.axmodel
78
+ ├── qwen2_5-vl-3b-video-ax650
79
+ │   ├── Qwen2.5-VL-3B-Instruct_vision_nhwc.axmodel
80
+ │   ├── model.embed_tokens.weight.bfloat16.bin
81
+ │   ├── qwen2_5_vl_p512_l0_together.axmodel
82
+ ......
83
+ │   ├── qwen2_5_vl_p512_l9_together.axmodel
84
+ │   └── qwen2_5_vl_post.axmodel
85
+ ├── qwen2_5-vl-tokenizer
86
+ │   ├── chat_template.json
87
+ │   ├── config.json
88
+ │   ├── generation_config.json
89
+ │   ├── merges.txt
90
+ │   ├── model.safetensors.index.json
91
+ │   ├── preprocessor_config.json
92
+ │   ├── tokenizer.json
93
+ │   ├── tokenizer_config.json
94
+ │   └── vocab.json
95
+ ├── qwen2_tokenizer_image_448.py
96
+ ├── qwen2_tokenizer_video_308.py
97
+ ├── run_qwen2_5_vl_image.sh
98
+ ├── run_qwen2_5_vl_video.sh
99
+ └── video
100
+ ├── frame_0075.jpg
101
+ ......
102
+ └── frame_0089.jpg
103
+
104
+ ```
105
+
106
+ #### Install transformer
107
+
108
+ ```
109
+ pip install transformers==4.41.1
110
+ ```
111
+
112
+ #### Start the Tokenizer service
113
+
114
+ **If you using image process**
115
+
116
+ - input text
117
+
118
+ ```
119
+ 描述下图片
120
+ ```
121
+
122
+ - input image
123
+
124
+ ![](./image/ssd_car.jpg)
125
+
126
+ ```
127
+ root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
128
+ [I][ Init][ 129]: LLM init start
129
+ bos_id: -1, eos_id: 151645
130
+ 2% | █ | 1 / 40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
131
+ [I][ Init][ 26]: LLaMaEmbedSelector use mmap
132
+ 100% | ████████████████████████████████ | 40 / 40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
133
+ [I][ Init][ 277]: max_token_len : 1023
134
+ [I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
135
+ [I][ Init][ 290]: prefill_token_num : 320
136
+ [I][ Init][ 292]: vpm_height : 1024,vpm_width : 392
137
+ [I][ Init][ 301]: LLM init ok
138
+ Type "q" to exit, Ctrl+c to stop current running
139
+
140
+ prompt >> who are you?
141
+ image >>
142
+ [I][ Run][ 638]: ttft: 2854.47 ms
143
+ I am a large language model created by Alibaba Cloud. I am called Qwen.
144
+
145
+ [N][ Run][ 779]: hit eos,avg 6.05 token/s
146
+
147
+ prompt >> 描述下图片
148
+ image >> image/ssd_car.jpg
149
+ [I][ Encode][ 416]: image encode time : 795.614014 ms, size : 524288
150
+ [I][ Run][ 638]: ttft: 2856.88 ms
151
+ 这张图片展示了一条繁忙的城市街道。前景中,一名女子站在人行道上,她穿着黑色外套,面带微笑。她旁边是一辆红色的��层巴士,巴士上有一个广告,
152
+ 上面写着“THINGS GET MORE EXITING WHEN YOU SAY ‘YES’”。巴士的车牌号是“L15”。巴士旁边停着一辆黑色的小型货车。背景中可以看到一些商店和行人,
153
+ 街道两旁的建筑物是现代的玻璃幕墙建筑。整体氛围显得繁忙而充满活力。
154
+
155
+ [N][ Run][ 779]: hit eos,avg 5.96 token/s
156
+ ```
157
+
158
+ **If you using video process**
159
+
160
+ ```
161
+ root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
162
+ [I][ Init][ 129]: LLM init start
163
+ bos_id: -1, eos_id: 151645
164
+ 2% | █ | 1 / 40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
165
+ [I][ Init][ 26]: LLaMaEmbedSelector use mmap
166
+ 100% | ████████████████████████████████ | 40 / 40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
167
+ [I][ Init][ 277]: max_token_len : 1023
168
+ [I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
169
+ [I][ Init][ 290]: prefill_token_num : 512
170
+ [I][ Init][ 292]: vpm_height : 484,vpm_width : 392
171
+ [I][ Init][ 301]: LLM init ok
172
+ Type "q" to exit, Ctrl+c to stop current running
173
+
174
+ prompt >> 描述这个视频
175
+ image >> video
176
+ video/frame_0075.jpg
177
+ video/frame_0077.jpg
178
+ video/frame_0079.jpg
179
+ video/frame_0081.jpg
180
+ video/frame_0083.jpg
181
+ video/frame_0085.jpg
182
+ video/frame_0087.jpg
183
+ video/frame_0089.jpg
184
+ [I][ Encode][ 416]: image encode time : 1488.392944 ms, size : 991232
185
+ [I][ Run][ 638]: ttft: 5487.22 ms
186
+ 视频显示的是一个城市街道的场景。时间戳显示为2月26日,地点是xxx。视频中,一名穿着深色外套和牛仔裤的男子正在推着一个行李箱。
187
+ 突然,他似乎被什么东西绊倒,随后他摔倒在地。背景中可以看到一个广告牌,上面有一个绿色的图案,旁边停着一辆电动车。街道两旁有建筑物和树木,天气看起来有些阴沉。
188
+
189
+ [N][ Run][ 779]: hit eos,avg 5.94 token/s
190
+ ```
191
+
192
+ #### Inference with M.2 Accelerator card
193
+ What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.
194
+
195
+ TODO