360Zhinao3 (360智脑)
Feel free to visit 360Zhinao's official website https://ai.360.com for more experience.
Introduction
🎉🎉🎉 Recently, Qihoo 360 has open sourced and upgraded its self-developed 7B parameter model 360Zhinao3-7B. It has now been launched on the Github open source community 360zhinao3 and can be used for commercial purposes free of charge. The capabilities of the model have been comprehensively improved. Compared with small parameter models with less than 10B, 360Zhinao3-7B has achieved excellent results of first place in multiple benchmarks.
- 360Zhinao3-7B
- 360Zhinao3-7B-Instruct
- 360Zhinao3-7B-O1.5
Notable features of our 360Zhinao3 models are:
360Zhinao3-7B is continuously pre-trained with 700B high-quality tokens on the basis of 360Zhinao2-7B. The two models have exactly the same structure. The improvement in model performance mainly stems from the improvement in the quality of training data.
News and Updates
- [2025.04.14] 🔥🔥🔥We have released the 360Zhinao3 series of models, and at the same time opened up 360Zhinao3-7B, 360Zhinao3-7B-Instruct, and the long thought chain model 360Zhinao3-7B-O1.5.
- [2024.11.18] We release 360Zhinao2-7B, providing access to both the Base model and Chat models with text lengths of 4K, 32K, and 360K.
- [2024.05.23] We released two models, 360Zhinao-search and 360Zhinao-1.8B-Reranking, which ranked first respectively in the Retrieval and Reranking tasks of C-MTEB Leaderboard .
- [2024.05.20] We extended llama3 and released llama3-8B-360Zhinao-360k-Instruct🤗
- [2024.04.12] We released 360Zhinao-7B v1.0, including the base model and three chat models with context lengths 4K, 32K and 360K. Technical report is on arXiv.
Table of contents
Download URL
Model Evaluation
Base Model
WWe used the open source tool opencompass to conduct multi-dimensional evaluation of the model. The benchmark average score of the model ranks first among models with less than 10B parameters. It is competitive in the same size.
Type | Datasets | language | glm4-9b | Qwen2.5-7B | internlm2.5-7b | Yi1.5-9B | gemma2-9b | Llama3.1-8B | 360Zhinao2-7B | 360Zhinao3-7B |
Exam | ceval | zh | 75.83 | 81.41 | 77.71 | 73.51 | 56.36 | 51.67 | 83.04 | 84.7 |
mmlu | en | 75.5 | 75.5 | 71.55 | 71.43 | 72.22 | 66.75 | 67.84 | 75.42 | |
cmmlu | zh | 74.24 | 81.79 | 78.77 | 74.2 | 58.89 | 52.49 | 73.8 | 82.17 | |
ARC-c | en | 94.92 | 80 | 85.08 | 87.46 | 77.63 | 80.68 | 87.12 | 88.14 | |
ARC-e | en | 98.41 | 84.83 | 95.24 | 94.53 | 78.84 | 89.77 | 92.77 | 94 | |
Language | WiC | en | 51.57 | 52.82 | 50.78 | 50.63 | 50.47 | 50 | 49.84 | 50.31 |
WSC | en | 68.27 | 68.27 | 69.23 | 66.35 | 68.27 | 67.31 | 65.38 | 71.15 | |
Knowledge | BoolQ | en | 81.8 | 83.88 | 89.51 | 84.46 | 85.6 | 82.2 | 88.29 | 88.38 |
commonsense_qa | en | 71.17 | 73.22 | 68.55 | 71.58 | 68.47 | 71.25 | 69.78 | 71.33 | |
Understanding | C3 | zh | 91.51 | 92 | 93.04 | 85.86 | 81.64 | 83.51 | 93.26 | 92.77 |
race-middle | en | 91.99 | 91.02 | 92.06 | 91.16 | 88.09 | 81.69 | 90.46 | 90.04 | |
race-high | en | 90.71 | 87.91 | 90.08 | 88.34 | 82.08 | 78.73 | 86.74 | 85.96 | |
lcsts | zh | 18.29 | 15.82 | 15.96 | 16.49 | 10.62 | 17.29 | 18.61 | 18.85 | |
eprstmt-dev | zh | 91.88 | 86.88 | 91.25 | 91.88 | 48.12 | 83.12 | 90 | 92.50 | |
lambada | en | 71.67 | 71.14 | 69.98 | 70.64 | 75.43 | 74.23 | 72.56 | 68.17 | |
Reasoning | hellaswag | en | 70.25 | 72.76 | 70.38 | 71.55 | 66.83 | 74.65 | 71.49 | 73.61 |
siqa | en | 81.73 | 72.52 | 78.97 | 76.2 | 58.96 | 64.18 | 77.12 | 79.02 | |
bbh | en | 73.68 | 54.63 | 59.43 | 67.86 | 68.45 | 59.9 | 46.54 | 73.74 | |
Code | humaneval | en | 69.51 | 75 | 60.37 | 26.22 | 5.49 | 27.44 | 60.98 | 64.63 |
mbpp | en | 60 | 60 | 43.6 | 56.8 | 51.2 | 42.6 | 54 | 67.80 | |
Math | math | en | 26.86 | 38 | 27.14 | 27.06 | 28.52 | 15.32 | 38.34 | 37.60 |
gsm8k | en | 78.54 | 79.76 | 52.54 | 71.11 | 73.09 | 56.25 | 75.51 | 78.77 | |
Overall | avg_zh | 70.35 | 71.58 | 71.35 | 68.39 | 51.13 | 57.62 | 71.74 | 74.20 | |
avg_all | 73.11 | 71.78 | 69.60 | 68.88 | 61.60 | 62.32 | 70.61 | 74.83 |
Instruct Model
We have evaluated and compared the 360Zhinao3-7B-Instruct model on three popular evaluations: IFEval, MT-bench, and CF-Bench. MT-bench and CFBench both rank first among open-source models of the same level and have strong competitiveness. In IFEval (prompt strict), it is second only to glm4-9b and has the highest score in the 7B size.
Model | MT-bench | IFEval(strict prompt) | CFBench(CSR,ISR,PSR) | ||
---|---|---|---|---|---|
Qwen2.5-7B-Instruct | 8.07 | 0.556 | 0.81 | 0.46 | 0.57 |
Yi-9B-16k-Chat | 7.44 | 0.455 | 0.75 | 0.4 | 0.52 |
GLM4-9B-Chat | 8.08 | 0.634 | 0.82 | 0.48 | 0.61 |
InternLM2.5-7B-Chat | 7.39 | 0.540 | 0.78 | 0.4 | 0.54 |
360Zhinao2-7B-Chat-4k | 7.86 | 0.577 | 0.8 | 0.44 | 0.57 |
360Zhinao3-7B-Instruct | 8.17 | 0.626 | 0.83 | 0.52 | 0.64 |
Long COT Model
We used the previously open-sourced Light-R1 method of Zhinao to continue fine-tuning the Long COT of 360Zhinao3-7B-Instruct, as well as RFT and GRPO. There is still a certain gap compared with the latest OpenThinker2-7B, but it surpasses all previous models based on the general Qwen2.5-7B-Instruct.
Model | Date | Base Model | AIME24 | AIME25 | GPQA Diamond |
---|---|---|---|---|---|
OpenThinker2-7B | 25.4.3 | Qwen2.5-7B-Instruct | 50 | 33.3 | 49.3 |
OpenThinker-7B | 25.1.28 | Qwen2.5-7B-Instruct | 31.3 | 23.3 | 42.4 |
360Zhinao3-7B-O1.5 | 25.4.14 | 360Zhinao3-7B-Instruct | 54.2 | 36.3 | 40.0 |
OpenR1-Qwen-7B | 25.2.11 | Qwen2.5-Math-7B-Instruct | 48.7 | 34.7 | 21.2 |
DeepSeek-R1-Distill-Qwen-7B | 25.1.20 | Qwen2.5-Math-7B-Instruct | 57.3 | 33.3 | 47.3 |
Light-R1-7B-DS | 25.3.12 | DeepSeek-R1-Distill-Qwen-7B | 59.1 | 44.3 | 49.4 |
Areal-boba-RL-7B | 25.3.31 | DeepSeek-R1-Distill-Qwen-7B | 61.9 | 48.3 | 47.6 |
Quickstart
A simple example to illustrate how to quickly use 360Zhinao3-7B, 360Zhinao3-7B-Instruct, and 360Zhinao3-7B-O1.5 with 🤗Transformers
🤗 Transformers
Demonstration of Base Model Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True).cuda()
generation_config = GenerationConfig.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True)
generation_config.max_new_tokens = 1024
inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
Demonstration of Instruct Model Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True).cuda()
generation_config = GenerationConfig.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True)
generation_config.max_new_tokens = 2048
messages = []
#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")
#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")
Demonstration of Long COT Model Inference
import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True).cuda()
generation_config = GenerationConfig.from_pretrained(
MODEL_NAME_OR_PATH,
trust_remote_code=True)
generation_config.max_new_tokens = 2048
def extract_thinking_and_answer(input_string):
thinking, answer = "", ""
# 提取答案
pattern_answer = r'.*</think>(.*)$'
match_answer = re.search(pattern_answer, input_string, re.S)
if match_answer:
answer = match_answer.group(1)
else:
return thinking, input_string
# 提取思考过程
pattern_thinking = r'<think>(.*?)</think>'
match_thinking = re.search(pattern_thinking, input_string, re.S)
if match_thinking:
thinking = match_thinking.group(1)
return thinking, answer
messages = []
messages.append({"role": "user", "content": "现有一笼子,里面有鸡和兔子若干只,数一数,共有头14个,腿38条,求鸡和兔子各有多少只?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))
Model Inference
Deployment
vLLM Installation
We recommend using vllm==0.6.0
.
If you are using CUDA 12.1 and PyTorch 2.1, you can install vLLM directly with:
pip install vllm==0.6.0
Otherwise, please refer to the official vLLM Installation Instructions.
After installation, perform the following steps:
Copy
vllm/zhinao.py
intovllm/model_executor/models
in your vllm installation directory (in python/conda env).Then add a line in
vllm/model_executor/models/__init__.py
"ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),
vLLM Service Start
Start the service:
python -m vllm.entrypoints.openai.api_server \
--model qihoo360/360Zhinao3-7B-O1.5 \
--served-model-name 360Zhinao3-7B-O1.5 \
--port 8360 \
--host 0.0.0.0 \
--dtype bfloat16 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.8 \
--trust-remote-code
Use curl to request the service:
curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "360Zhinao3-7B-O1.5",
"max_tokens": 200,
"top_k": -1,
"top_p": 0.8,
"temperature": 1.0,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "你好"}
],
"stop": [
"<eod>",
"<|im_end|>",
"<|im_start|>"
]
}'
Use python to request the service:
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
model="360Zhinao3-7B-O1.5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "你好"},
],
stop=[
"<eod>",
"<|im_end|>",
"<|im_start|>"
],
presence_penalty=0.0,
frequency_penalty=0.0
)
print("Chat response:", chat_response)
If you need to enable repetition penalty, we recommend setting
presence_penalty
andfrequency_penalty
instead ofrepetition_penalty
.
Model Finetune
Training data
Training Data: data/training_data_sample.json
. This example data has 10,000 rows sampled from multiturn_chat_0.8M with converted format.
Data Format:
[
{
"id": 1,
"conversations": [
{
"from": "system",
"value": "You are a helpful assistant."
},
{
"from": "user",
"value": "您好啊"
},
{
"from": "assistant",
"value": "你好!我今天能为您做些什么?有什么问题或需要帮助吗? 我在这里为您提供服务。"
}
]
}
]
Finetuning scripts
set -x
HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json
# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=32768
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500
IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)
DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao3-7B-Instruct"
OUTPUT_DIR="./outputs/"
deepspeed --hostfile ${HOSTFILE} \
--master_port ${MASTER_PORT} \
--num_nodes ${NUM_NODES} \
--num_gpus ${NUM_GPUS} \
finetune.py \
--report_to "tensorboard" \
--data_path ${DATA_PATH} \
--model_name_or_path ${MODEL_PATH} \
--output_dir ${OUTPUT_DIR} \
--model_max_length ${MAX_LEN} \
--num_train_epochs ${EPOCHS} \
--per_device_train_batch_size ${BATCH_SIZE} \
--gradient_accumulation_steps 1 \
--save_strategy steps \
--save_steps 200 \
--learning_rate ${LR} \
--lr_scheduler_type cosine \
--adam_beta1 0.9 \
--adam_beta2 0.95 \
--adam_epsilon 1e-8 \
--max_grad_norm 1.0 \
--weight_decay 0.1 \
--warmup_ratio 0.01 \
--gradient_checkpointing True \
--bf16 True \
--tf32 True \
--deepspeed ${DS_CONFIG} \
--is_concat ${IS_CONCAT} \
--logging_steps 1 \
--log_on_each_node False
bash finetune/ds_finetune.sh
- Configuring
HOSTFILE
switches between single-machine and multi-machine training. - configuring
ds_config
switches between zero1, zero2 and zero3. fp16, bf16
could configure mixed precision training. bf16 is recommended to be consistent with the pretrained model.is_concat
configures whether the training data is concatenated or not.
License
The source code of this repository follows the open-source license Apache 2.0.
360Zhinao3 open-source models support free commercial use. It is not necessary for you to submit a request for commercial usage.
- Downloads last month
- 13