AXERA-TECH
/

Qwen2.5-VL-3B-Instruct

Image-Text-to-Text

Qwen2.5-VL-3B-Instruct

Model card Files Files and versions Community

qqc1989 commited on Apr 6

Commit

e7bfdf2

·

verified ·

1 Parent(s): 11b6418

Update README.md

Files changed (1) hide show

README.md +22 -4

README.md CHANGED Viewed

@@ -42,7 +42,7 @@ https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
 **Image Process**
 |Chips| input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
 |--|--|--|--|--|--|--|--|
-|AX650| 448*448 | 1 | 780 ms | 420 ms | 6.2 tokens/sec| 4.3 GiB |  4.6 GiB  |
 **Video Process**
 |Chips| input size | image num | image encoder |ttft(512 tokens) | w8a16 | DDR | Flash |
@@ -104,15 +104,25 @@ root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# tree -L 2
 ```
 #### Install transformer
 ```
 pip install transformers==4.41.1
 ```
-#### Start the Tokenizer service
-**If you using image process**
 - input text
@@ -156,10 +166,18 @@ image >> image/ssd_car.jpg
 [N][                             Run][ 779]: hit eos,avg 5.96 token/s
 ```
-**If you using video process**
 Please pre-process the image of the video file into a 308x308 size picture
 ```
 root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
 [I][                            Init][ 129]: LLM init start

 **Image Process**
 |Chips| input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
 |--|--|--|--|--|--|--|--|
+|AX650| 448*448 | 1 | 780 ms | 2857 ms | 6.2 tokens/sec| 4.3 GiB |  4.6 GiB  |
 **Video Process**
 |Chips| input size | image num | image encoder |ttft(512 tokens) | w8a16 | DDR | Flash |
 ```
+### Prepare tokenizer server
 #### Install transformer
 ```
 pip install transformers==4.41.1
 ```
+### Demo Run
+#### Image understand demo
+##### start tokenizer server for image understand demo
+```
+python3 qwen2_tokenizer_image_448.py --port 12345
+```
+##### run image understand demo
 - input text
 [N][                             Run][ 779]: hit eos,avg 5.96 token/s
 ```
+#### Video understand demo
 Please pre-process the image of the video file into a 308x308 size picture
+##### start tokenizer server for image understand demo
+```
+python qwen2_tokenizer_video_308.py --port 12345
+```
+##### run image understand demo
 ```
 root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
 [I][                            Init][ 129]: LLM init start