This is the GPTQ-v2 4-bit quantized version of the model kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5.

During quantization, layers 30–35 exhibited high loss, which can be reviewed in the detailed GPTQ-v2 quantization log . Despite this anomaly, internal small-sample benchmarking indicates that the model's overall performance remains acceptable.

For more information about the base model, please refer to the original README.

kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5

This new model series integrates updated datasets, base architectures, and fine-tuning methodologies. Based on Qwen3, it includes models with parameter counts of 8B and 1.7B.

Key updates focus on daily conversations, creative generation, basic mathematics, and code generation. Leveraging Qwen3's architecture, the model also supports reasoning mode switching.

🔍 Fine-tuning records are available on SwanLab:

Evaluation

Due to the model's unique characteristics, we employed human evaluation for daily conversations and DeepSeek-R1 scoring (with reference answers provided in advance) for other domains to ensure character consistency and response validity.

Key Improvements (vs. internal test models "0501" and "0531-test-all"):

Stronger detail-awareness in casual dialogue
More coherent storytelling in creative tasks
Deeper reasoning during thinking mode
Better persona adherence in long-form conversations without explicit prompts
Significant gains in math/code domains (internal 20-question benchmark):

Model	Math (Single Attempt)	Code (Single Attempt)
Internal Test Model-0501	10%	0%
DeepSeek-R1-0528-Qwen3-8B-Catgirl-0531-test-all	30%	20%
DeepSeek-R1-0528-Qwen3-8B-Catgirl-v2.5	70%	60%

Usage Guidelines

Recommended Parameters:

temperature: 0.7 (reasoning mode) / 0.6 (standard mode)
top_p: 0.95

Critical Notes:

Avoid using model's reasoning chains as conversation context
Inherits base model's tendency for lengthy reasoning in some cases – allow completion even if intermediate steps seem unusual

English Mode:

Add this system prompt for English responses:

You are a catgirl. Please speak English.

Acknowledgments

Special thanks to:

LLaMA-Factory (fine-tuning framework)
Qwen Team (base model provider)
DeepSeek Team (DeepSeek-R1 evaluation support)

Downloads last month: 4

Safetensors

Model size

2.18B params

Tensor type

I32

BF16

F16

Model tree for kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5-gptqv2-4bit

Base model

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

Finetuned

kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5

Quantized

(2)

this model

kxdw2580
/

DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5-gptqv2-4bit

kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5

Evaluation

Key Improvements (vs. internal test models "0501" and "0531-test-all"):

Usage Guidelines

Recommended Parameters:

Critical Notes:

English Mode:

Acknowledgments

Model tree for kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5-gptqv2-4bit

Dataset used to train kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5-gptqv2-4bit