File size: 7,792 Bytes
12c311a
 
 
065c2f7
958ab25
065c2f7
 
 
958ab25
065c2f7
 
12c311a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b90330
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
library_name: transformers
license: bsd-3-clause
base_model:
- jakiAJK/DeepSeek-R1-Distill-Qwen-7B_GPTQ-int4
tags:
- DeepSeek
- DeepSeek-R1-Distill-Qwen-7B
- DeepSeek-R1-Distill-Qwen-7B-GPTQ-int4
- GPTQ
- Int4
---

# DeepSeek-R1-Distill-Qwen-7B-GPTQ-Int4

This version of DeepSeek-R1-Distill-Qwen-7B has been converted to run on the Axera NPU using **w4a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: 3.4(Not released yet)

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/jakiAJK/DeepSeek-R1-Distill-Qwen-7B_GPTQ-int4

[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html) 

[AXera NPU LLM Runtime](https://github.com/AXERA-TECH/ax-llm) 

## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
 
|Chips|w8a16|w4a16|
|--|--|--|
|AX650| 2.7 tokens/sec|5 tokens/sec|


## How to use

Download all files from this repository to the device

```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-7b# tree -L 1
.
├── deepseek-r1-7b-gptq-int4-ax650
├── deepseek-r1_tokenizer
├── deepseek-r1_tokenizer.py
├── main_axcl_aarch64
├── main_axcl_x86
├── main_prefill
├── post_config.json
├── run_deepseek-r1_7b_gptq_int4_ax650.sh
├── run_deepseek-r1_7b_gptq_int4_axcl_aarch64.sh
└── run_deepseek-r1_7b_gptq_int4_axcl_x86.sh
```

#### Start the Tokenizer service

```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-7b# python deepseek-r1_tokenizer.py --port 12345
151646 <|begin▁of▁sentence|> 151643 <|end▁of▁sentence|>
<|begin▁of▁sentence|>You are DeepSeek-R1, You are a helpful assistant.<|User|>hello world<|Assistant|>
[151646, 151646, 2610, 525, 18183, 39350, 10911, 16, 11, 1446, 525, 264, 10950, 17847, 13, 151644, 14990, 1879, 151645]
http://localhost:12345
```

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Open another terminal and run `run_deepseek-r1_7b_gptq_int4_ax650.sh`

```
root@ax650:/mnt/qtang/llm-test/deepseek-r1-7b# ./run_deepseek-r1_7b_gptq_int4_ax650.sh
[I][                            Init][ 125]: LLM init start
bos_id: 151646, eos_id: 151643
  3% | ██                                |   1 /  31 [0.00s<0.09s, 333.33 count/s] tokenizer init ok
100% | ████████████████████████████████ |  31 /  31 [45.25s<45.25s, 0.69 count/s] init post axmodel ok,remain_cmm(7664 MB)[I][
[I][                            Init][ 246]: kv_cache_size : 512, kv_cache_num: 1024
[I][                            Init][ 254]: prefill_token_num : 128
[I][                     load_config][ 281]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 268]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
<think>
I'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek.
I specialize in helping you tackle complex mathematical, coding, and logical challenges. I'll do my best to assist you.
</think>
I'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek.
 I specialize in helping you tackle complex mathematical, coding, and logical challenges. I'll do my best to assist you.
[N][                             Run][ 605]: hit eos,avg 4.52 token/s
```

#### Inference with M.2 Accelerator card

[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.

Open another terminal and run `run_deepseek-r1_7b_gptq_int4_axcl_aarch64.sh`

```
(base) axera@raspberrypi:~/samples/deepseek-r1-7b-gptq-int4 $ ./run_deepseek-r1_7b_gptq_int4_axcl_aarch64.sh
build time: Feb 13 2025 15:15:07
[I][                            Init][ 111]: LLM init start
bos_id: 151646, eos_id: 151643
100% | ████████████████████████████████ |  31 /  31 [67.43s<67.43s, 0.46 count/s] init post axmodel okremain_cmm(2739 MB)
[I][                            Init][ 226]: max_token_len : 1024
[I][                            Init][ 231]: kv_cache_size : 512, kv_cache_num: 1024
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 288]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
>> 直角三角形两直角边是3和4,斜边是多少?简单思考
<think>
首先,我需要找到一个直角三角形的斜边长度。已知两条直角边的长度分别是3和4。
根据勾股定理,斜边的平方等于两条直角边的平方之和。因此,我可以计算出斜边长度的平方为3的平方加上4的平方,即9加上16,等于25。
然后,通过对25的平方根运算,我得到斜边的长度是5。
最终,斜边的长度是5。
</think>

要找到直角三角形的斜边长度,已知两条直角边的长度分别为3和4。我们可以使用勾股定理来计算斜边长度。
勾股定理的表达式是:
\[
c = \sqrt{a^2 + b^2}
\]
其中:
- \( c \) 是斜边的长度,
- \( a \) 和 \( b \) 是两条直角边的长度。
将已知数值代入公式:
\[
c = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5
\]
因此,斜边的长度是:
\[
\boxed{5}
\]
[N][                             Run][ 605]: hit eos,avg 4.64 token/s
>> q

(base) axera@raspberrypi:~ $ axcl-smi
+------------------------------------------------------------------------------------------------+
| AXCL-SMI  V2.26.0_20250206225448                                Driver  V2.26.0_20250206225448 |
+-----------------------------------------+--------------+---------------------------------------+
| Card  Name                     Firmware | Bus-Id       |                          Memory-Usage |
| Fan   Temp                Pwr:Usage/Cap | CPU      NPU |                             CMM-Usage |
|=========================================+==============+=======================================|
+-----------------------------------------+--------------+---------------------------------------+
|    0  AX650N                    V2.26.0 | 0000:05:00.0 |                175 MiB /      945 MiB |
|   --   61C                      -- / -- | 0%        0% |               4301 MiB /     7040 MiB |
+-----------------------------------------+--------------+---------------------------------------+

+------------------------------------------------------------------------------------------------+
| Processes:                                                                                     |
| Card      PID  Process Name                                                   NPU Memory Usage |
|================================================================================================|
|    0    63118  /home/axera/samples/deepseek-r1-7b-gptq-int4/main_axcl_aarch64      4316448 KiB |
+------------------------------------------------------------------------------------------------+

```