360Zhinao3 (360智脑)

🤗 HuggingFace   |    💬 WeChat (微信)  

Feel free to visit 360Zhinao's official website https://ai.360.com for more experience.


Introduction

🎉🎉🎉 Recently, Qihoo 360 has open sourced and upgraded its self-developed 7B parameter model 360Zhinao3-7B. It has now been launched on the Github open source community 360zhinao3 and can be used for commercial purposes free of charge. The capabilities of the model have been comprehensively improved. Compared with small parameter models with less than 10B, 360Zhinao3-7B has achieved excellent results of first place in multiple benchmarks.

  • 360Zhinao3-7B
  • 360Zhinao3-7B-Instruct
  • 360Zhinao3-7B-O1.5

Notable features of our 360Zhinao3 models are:

360Zhinao3-7B is continuously pre-trained with 700B high-quality tokens on the basis of 360Zhinao2-7B. The two models have exactly the same structure. The improvement in model performance mainly stems from the improvement in the quality of training data.


News and Updates

  • [2025.04.14] 🔥🔥🔥We have released the 360Zhinao3 series of models, and at the same time opened up 360Zhinao3-7B, 360Zhinao3-7B-Instruct, and the long thought chain model 360Zhinao3-7B-O1.5.
  • [2024.11.18] We release 360Zhinao2-7B, providing access to both the Base model and Chat models with text lengths of 4K, 32K, and 360K.
  • [2024.05.23] We released two models, 360Zhinao-search and 360Zhinao-1.8B-Reranking, which ranked first respectively in the Retrieval and Reranking tasks of C-MTEB Leaderboard .
  • [2024.05.20] We extended llama3 and released llama3-8B-360Zhinao-360k-Instruct🤗
  • [2024.04.12] We released 360Zhinao-7B v1.0, including the base model and three chat models with context lengths 4K, 32K and 360K. Technical report is on arXiv.

Table of contents


Download URL

Size Model BF16
7B 360Zhinao3-7B 🤗
7B 360Zhinao3-7B-Instruct 🤗
7B 360Zhinao3-7B-O1.5 🤗

Model Evaluation

Base Model

WWe used the open source tool opencompass to conduct multi-dimensional evaluation of the model. The benchmark average score of the model ranks first among models with less than 10B parameters. It is competitive in the same size.

TypeDatasetslanguageglm4-9bQwen2.5-7Binternlm2.5-7bYi1.5-9Bgemma2-9bLlama3.1-8B360Zhinao2-7B360Zhinao3-7B
Examcevalzh75.8381.4177.7173.5156.3651.6783.0484.7
mmluen75.575.571.5571.4372.2266.7567.8475.42
cmmluzh74.2481.7978.7774.258.8952.4973.882.17
ARC-cen94.928085.0887.4677.6380.6887.1288.14
ARC-een98.4184.8395.2494.5378.8489.7792.7794
LanguageWiCen51.5752.8250.7850.6350.475049.8450.31
WSCen68.2768.2769.2366.3568.2767.3165.3871.15
Knowledge BoolQen81.883.8889.5184.4685.682.288.2988.38
commonsense_qaen71.1773.2268.5571.5868.4771.2569.7871.33
Understanding C3zh91.519293.0485.8681.6483.5193.2692.77
race-middleen91.9991.0292.0691.1688.0981.6990.4690.04
race-highen90.7187.9190.0888.3482.0878.7386.7485.96
lcstszh18.2915.8215.9616.4910.6217.2918.6118.85
eprstmt-devzh91.8886.8891.2591.8848.1283.129092.50
lambadaen71.6771.1469.9870.6475.4374.2372.5668.17
Reasoning hellaswagen70.2572.7670.3871.5566.8374.6571.4973.61
siqaen81.7372.5278.9776.258.9664.1877.1279.02
bbhen73.6854.6359.4367.8668.4559.946.5473.74
Code humanevalen69.517560.3726.225.4927.4460.9864.63
mbppen606043.656.851.242.65467.80
Math mathen26.863827.1427.0628.5215.3238.3437.60
gsm8ken78.5479.7652.5471.1173.0956.2575.5178.77
Overall avg_zh70.3571.5871.3568.3951.1357.6271.7474.20
avg_all73.1171.7869.6068.8861.6062.3270.6174.83

Instruct Model

We have evaluated and compared the 360Zhinao3-7B-Instruct model on three popular evaluations: IFEval, MT-bench, and CF-Bench. MT-bench and CFBench both rank first among open-source models of the same level and have strong competitiveness. In IFEval (prompt strict), it is second only to glm4-9b and has the highest score in the 7B size.

Model MT-bench IFEval(strict prompt) CFBench(CSR,ISR,PSR)
Qwen2.5-7B-Instruct 8.07 0.556 0.81 0.46 0.57
Yi-9B-16k-Chat 7.44 0.455 0.75 0.4 0.52
GLM4-9B-Chat 8.08 0.634 0.82 0.48 0.61
InternLM2.5-7B-Chat 7.39 0.540 0.78 0.4 0.54
360Zhinao2-7B-Chat-4k 7.86 0.577 0.8 0.44 0.57
360Zhinao3-7B-Instruct 8.17 0.626 0.83 0.52 0.64

Long COT Model

We used the previously open-sourced Light-R1 method of Zhinao to continue fine-tuning the Long COT of 360Zhinao3-7B-Instruct, as well as RFT and GRPO. There is still a certain gap compared with the latest OpenThinker2-7B, but it surpasses all previous models based on the general Qwen2.5-7B-Instruct.

Model Date Base Model AIME24 AIME25 GPQA Diamond
OpenThinker2-7B 25.4.3 Qwen2.5-7B-Instruct 50 33.3 49.3
OpenThinker-7B 25.1.28 Qwen2.5-7B-Instruct 31.3 23.3 42.4
360Zhinao3-7B-O1.5 25.4.14 360Zhinao3-7B-Instruct 54.2 36.3 40.0
OpenR1-Qwen-7B 25.2.11 Qwen2.5-Math-7B-Instruct 48.7 34.7 21.2
DeepSeek-R1-Distill-Qwen-7B 25.1.20 Qwen2.5-Math-7B-Instruct 57.3 33.3 47.3
Light-R1-7B-DS 25.3.12 DeepSeek-R1-Distill-Qwen-7B 59.1 44.3 49.4
Areal-boba-RL-7B 25.3.31 DeepSeek-R1-Distill-Qwen-7B 61.9 48.3 47.6

Quickstart

A simple example to illustrate how to quickly use 360Zhinao3-7B, 360Zhinao3-7B-Instruct, and 360Zhinao3-7B-O1.5 with 🤗Transformers

🤗 Transformers

Demonstration of Base Model Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Demonstration of Instruct Model Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

Demonstration of Long COT Model Inference

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048


def extract_thinking_and_answer(input_string):
    thinking, answer = "", ""
    # 提取答案
    pattern_answer = r'.*</think>(.*)$'
    match_answer = re.search(pattern_answer, input_string, re.S)
    if match_answer:
        answer = match_answer.group(1)
    else:
        return thinking, input_string

    # 提取思考过程
    pattern_thinking = r'<think>(.*?)</think>'
    match_thinking = re.search(pattern_thinking, input_string, re.S)
    if match_thinking:
        thinking = match_thinking.group(1)

    return thinking, answer


messages = []
messages.append({"role": "user", "content": "现有一笼子,里面有鸡和兔子若干只,数一数,共有头14个,腿38条,求鸡和兔子各有多少只?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))

Model Inference

Deployment

vLLM Installation

We recommend using vllm==0.6.0.

If you are using CUDA 12.1 and PyTorch 2.1, you can install vLLM directly with:

pip install  vllm==0.6.0

Otherwise, please refer to the official vLLM Installation Instructions.

After installation, perform the following steps:

  1. Copy vllm/zhinao.py into vllm/model_executor/models in your vllm installation directory (in python/conda env).

  2. Then add a line in vllm/model_executor/models/__init__.py

    "ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),
    

vLLM Service Start

Start the service:

python -m vllm.entrypoints.openai.api_server \
    --model qihoo360/360Zhinao3-7B-O1.5 \
    --served-model-name 360Zhinao3-7B-O1.5 \
    --port 8360 \
    --host 0.0.0.0 \
    --dtype bfloat16 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code

Use curl to request the service:

curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "360Zhinao3-7B-O1.5",
    "max_tokens": 200,
    "top_k": -1,
    "top_p": 0.8,
    "temperature": 1.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ],
    "stop": [
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ]
}'

Use python to request the service:

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="360Zhinao3-7B-O1.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"},
    ],
    stop=[
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ],
    presence_penalty=0.0,
    frequency_penalty=0.0
)
print("Chat response:", chat_response)

If you need to enable repetition penalty, we recommend setting presence_penalty and frequency_penalty instead of repetition_penalty.


Model Finetune

Training data

Training Data: data/training_data_sample.json. This example data has 10,000 rows sampled from multiturn_chat_0.8M with converted format.

Data Format:

[
  {
    "id": 1,
    "conversations": [
        {
            "from": "system",
            "value": "You are a helpful assistant."
        },
        {
            "from": "user",
            "value": "您好啊"
        },
        {
            "from": "assistant",
            "value": "你好!我今天能为您做些什么?有什么问题或需要帮助吗? 我在这里为您提供服务。"
        }
    ]
  }
]

Finetuning scripts

set -x

HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json

# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=32768
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500

IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)

DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao3-7B-Instruct"
OUTPUT_DIR="./outputs/"

deepspeed --hostfile ${HOSTFILE} \
        --master_port ${MASTER_PORT} \
        --num_nodes ${NUM_NODES} \
        --num_gpus ${NUM_GPUS} \
        finetune.py \
        --report_to "tensorboard" \
        --data_path ${DATA_PATH} \
        --model_name_or_path ${MODEL_PATH} \
        --output_dir ${OUTPUT_DIR} \
        --model_max_length ${MAX_LEN} \
        --num_train_epochs ${EPOCHS} \
        --per_device_train_batch_size ${BATCH_SIZE} \
        --gradient_accumulation_steps 1 \
        --save_strategy steps \
        --save_steps 200 \
        --learning_rate ${LR} \
        --lr_scheduler_type cosine \
        --adam_beta1 0.9 \
        --adam_beta2 0.95 \
        --adam_epsilon 1e-8 \
        --max_grad_norm 1.0 \
        --weight_decay 0.1 \
        --warmup_ratio 0.01 \
        --gradient_checkpointing True \
        --bf16 True \
        --tf32 True \
        --deepspeed ${DS_CONFIG} \
        --is_concat ${IS_CONCAT} \
        --logging_steps 1 \
        --log_on_each_node False
bash finetune/ds_finetune.sh
  • Configuring HOSTFILE switches between single-machine and multi-machine training.
  • configuring ds_config switches between zero1, zero2 and zero3.
  • fp16, bf16 could configure mixed precision training. bf16 is recommended to be consistent with the pretrained model.
  • is_concat configures whether the training data is concatenated or not.

License

The source code of this repository follows the open-source license Apache 2.0.

360​Zhinao3 open-source models support free commercial use. It is not necessary for you to submit a request for commercial usage.

Downloads last month
13
Safetensors
Model size
7.77B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including qihoo360/360Zhinao3-7B