qqc1989 commited on
Commit
9b90330
·
verified ·
1 Parent(s): 958ab25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -0
README.md CHANGED
@@ -37,3 +37,146 @@ For those who are interested in model conversion, you can try to export axmodel
37
  |Chips|w8a16|w4a16|
38
  |--|--|--|
39
  |AX650| 2.7 tokens/sec|5 tokens/sec|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  |Chips|w8a16|w4a16|
38
  |--|--|--|
39
  |AX650| 2.7 tokens/sec|5 tokens/sec|
40
+
41
+
42
+ ## How to use
43
+
44
+ Download all files from this repository to the device
45
+
46
+ ```
47
+ root@ax650:/mnt/qtang/llm-test/deepseek-r1-7b# tree -L 1
48
+ .
49
+ ├── deepseek-r1-7b-gptq-int4-ax650
50
+ ├── deepseek-r1_tokenizer
51
+ ├── deepseek-r1_tokenizer.py
52
+ ├── main_axcl_aarch64
53
+ ├── main_axcl_x86
54
+ ├── main_prefill
55
+ ├── post_config.json
56
+ ├── run_deepseek-r1_7b_gptq_int4_ax650.sh
57
+ ├── run_deepseek-r1_7b_gptq_int4_axcl_aarch64.sh
58
+ └── run_deepseek-r1_7b_gptq_int4_axcl_x86.sh
59
+ ```
60
+
61
+ #### Start the Tokenizer service
62
+
63
+ ```
64
+ root@ax650:/mnt/qtang/llm-test/deepseek-r1-7b# python deepseek-r1_tokenizer.py --port 12345
65
+ 151646 <|begin▁of▁sentence|> 151643 <|end▁of▁sentence|>
66
+ <|begin▁of▁sentence|>You are DeepSeek-R1, You are a helpful assistant.<|User|>hello world<|Assistant|>
67
+ [151646, 151646, 2610, 525, 18183, 39350, 10911, 16, 11, 1446, 525, 264, 10950, 17847, 13, 151644, 14990, 1879, 151645]
68
+ http://localhost:12345
69
+ ```
70
+
71
+ #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board
72
+
73
+ Open another terminal and run `run_deepseek-r1_7b_gptq_int4_ax650.sh`
74
+
75
+ ```
76
+ root@ax650:/mnt/qtang/llm-test/deepseek-r1-7b# ./run_deepseek-r1_7b_gptq_int4_ax650.sh
77
+ [I][ Init][ 125]: LLM init start
78
+ bos_id: 151646, eos_id: 151643
79
+ 3% | ██ | 1 / 31 [0.00s<0.09s, 333.33 count/s] tokenizer init ok
80
+ 100% | ████████████████████████████████ | 31 / 31 [45.25s<45.25s, 0.69 count/s] init post axmodel ok,remain_cmm(7664 MB)[I][
81
+ [I][ Init][ 246]: kv_cache_size : 512, kv_cache_num: 1024
82
+ [I][ Init][ 254]: prefill_token_num : 128
83
+ [I][ load_config][ 281]: load config:
84
+ {
85
+ "enable_repetition_penalty": false,
86
+ "enable_temperature": true,
87
+ "enable_top_k_sampling": true,
88
+ "enable_top_p_sampling": false,
89
+ "penalty_window": 20,
90
+ "repetition_penalty": 1.2,
91
+ "temperature": 0.9,
92
+ "top_k": 10,
93
+ "top_p": 0.8
94
+ }
95
+
96
+ [I][ Init][ 268]: LLM init ok
97
+ Type "q" to exit, Ctrl+c to stop current running
98
+ <think>
99
+ I'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek.
100
+ I specialize in helping you tackle complex mathematical, coding, and logical challenges. I'll do my best to assist you.
101
+ </think>
102
+ I'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek.
103
+ I specialize in helping you tackle complex mathematical, coding, and logical challenges. I'll do my best to assist you.
104
+ [N][ Run][ 605]: hit eos,avg 4.52 token/s
105
+ ```
106
+
107
+ #### Inference with M.2 Accelerator card
108
+
109
+ [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
110
+
111
+ Open another terminal and run `run_deepseek-r1_7b_gptq_int4_axcl_aarch64.sh`
112
+
113
+ ```
114
+ (base) axera@raspberrypi:~/samples/deepseek-r1-7b-gptq-int4 $ ./run_deepseek-r1_7b_gptq_int4_axcl_aarch64.sh
115
+ build time: Feb 13 2025 15:15:07
116
+ [I][ Init][ 111]: LLM init start
117
+ bos_id: 151646, eos_id: 151643
118
+ 100% | ████████████████████████████████ | 31 / 31 [67.43s<67.43s, 0.46 count/s] init post axmodel okremain_cmm(2739 MB)
119
+ [I][ Init][ 226]: max_token_len : 1024
120
+ [I][ Init][ 231]: kv_cache_size : 512, kv_cache_num: 1024
121
+ [I][ load_config][ 282]: load config:
122
+ {
123
+ "enable_repetition_penalty": false,
124
+ "enable_temperature": true,
125
+ "enable_top_k_sampling": true,
126
+ "enable_top_p_sampling": false,
127
+ "penalty_window": 20,
128
+ "repetition_penalty": 1.2,
129
+ "temperature": 0.9,
130
+ "top_k": 10,
131
+ "top_p": 0.8
132
+ }
133
+
134
+ [I][ Init][ 288]: LLM init ok
135
+ Type "q" to exit, Ctrl+c to stop current running
136
+ >> 直角三角形两直角边是3和4,斜边是多少?简单思考
137
+ <think>
138
+ 首先,我需要找到一个直角三角形的斜边长度。已知两条直角边的长度分别是3和4。
139
+ 根据勾股定理,斜边的平方等于两条直角边的平方之和。因此,我可以计算出斜边长度的平方为3的平方加上4的平方,即9加上16,等于25。
140
+ 然后,通过对25的平方根运算,我得到斜边的长度是5。
141
+ 最终,斜边的长度是5。
142
+ </think>
143
+
144
+ 要找到直角三角形的斜边长度,已知两条直角边的长度分别为3和4。我们可以使用勾股定理来计算斜边长度。
145
+ 勾股定理的表达式是:
146
+ \[
147
+ c = \sqrt{a^2 + b^2}
148
+ \]
149
+ 其中:
150
+ - \( c \) 是斜边的长度,
151
+ - \( a \) 和 \( b \) 是两条直角边的长度。
152
+ 将已知数值代入公式:
153
+ \[
154
+ c = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5
155
+ \]
156
+ 因此,斜边的长度是:
157
+ \[
158
+ \boxed{5}
159
+ \]
160
+ [N][ Run][ 605]: hit eos,avg 4.64 token/s
161
+ >> q
162
+
163
+ (base) axera@raspberrypi:~ $ axcl-smi
164
+ +------------------------------------------------------------------------------------------------+
165
+ | AXCL-SMI V2.26.0_20250206225448 Driver V2.26.0_20250206225448 |
166
+ +-----------------------------------------+--------------+---------------------------------------+
167
+ | Card Name Firmware | Bus-Id | Memory-Usage |
168
+ | Fan Temp Pwr:Usage/Cap | CPU NPU | CMM-Usage |
169
+ |=========================================+==============+=======================================|
170
+ +-----------------------------------------+--------------+---------------------------------------+
171
+ | 0 AX650N V2.26.0 | 0000:05:00.0 | 175 MiB / 945 MiB |
172
+ | -- 61C -- / -- | 0% 0% | 4301 MiB / 7040 MiB |
173
+ +-----------------------------------------+--------------+---------------------------------------+
174
+
175
+ +------------------------------------------------------------------------------------------------+
176
+ | Processes: |
177
+ | Card PID Process Name NPU Memory Usage |
178
+ |================================================================================================|
179
+ | 0 63118 /home/axera/samples/deepseek-r1-7b-gptq-int4/main_axcl_aarch64 4316448 KiB |
180
+ +------------------------------------------------------------------------------------------------+
181
+
182
+ ```