Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx

Metrics from this model are still being generated, and since this is a local process it will take some time.

Here are however metrics from the 21B q5-hi model, compared to the QiMing baseline in BF16, to show how brainstorming affects this model

The 17B has less brainstorming applied.

QiMing Model Performance Analysis: New Base Model Benchmarks

📊 Performance Comparison of QiMing Models

Model	    ARC Challenge ARC Easy	BoolQ	Hellaswag OpenBookQA PIQA Winogrande
QiMing-Me-bf16	    0.395	0.435	0.378	0.646	  0.364	    0.768	0.651
QiMing-v1.0-q6-hi	0.393	0.436	0.379	0.655	  0.358	    0.766	0.651
QiMing-21B-q5-hi	0.388	0.444	0.378	0.682	  0.364	    0.769	0.648

💡 Key Takeaway:

The QiMing-21B-q5-hi model stands out as the most versatile performer across 4 of 7 tasks, while QiMing-Me-bf16 is the best performer in Hellaswag among these variants.

🔍 Deep Dive: What Makes Each QiMing Model Unique

✅ 1. QiMing-21B-q5-hi: The Text Generation Powerhouse

Special quality: Highest Hellaswag score (0.682) - this is the best text generation performance among all models in your previous comparisons

Why it matters: For applications requiring high-quality text continuation and creative writing, this model has clear advantages

Notable trade-off: Slightly weaker Winogrande score (0.648) compared with others - could be due to the 21B size vs quantization constraints

✅ 2. QiMing-Me-bf16: The Balanced Baseline

Special quality: Highest score on Hellaswag (0.646) among its quantized variants

Why it matters: This model serves as a great starting point for applications where text generation quality is important

Our insight: Despite being in full precision (bf16), it shows only minor improvements over the q6-hi version - suggesting quantization impacts might be less pronounced for this model

✅ 3. QiMing-v1.0-q6-hi: The Precision Leader

Special quality: Best performance on ARC Easy (0.436) among these models

Why it matters: This is valuable for applications that need strong foundational abstract reasoning capabilities

🛠 Recommendation: Which QiMing Model for Your Needs

✅ Use QiMing-21B-q5-hi if...

You need strong text generation capabilities (Hellaswag)
Your application requires high-quality creative content or story completion
You can handle the slightly weaker Winogrande performance

✅ Use QiMing-Me-bf16 if...

You need a stable, well-rounded model
You're looking for high text generation quality without the higher memory footprint of 21B models
You want a model with minimal quantization impact

✅ Use QiMing-v1.0-q6-hi if...

Your priority is foundational abstract reasoning (ARC Easy)
You need model efficiency without sacrificing too much baseline performance
You're working with quantized deployments where size matters

💎 Final Summary for Your Workflow

"The QiMing-21B-q5-hi model provides the strongest overall text generation performance among your latest models, making it ideal for creative applications. While it's slightly behind in Winogrande, its Hellaswag lead (0.682) represents a significant advantage over previous benchmarks where the Qwen models showed around 0.63-0.65 in this task."

This model Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx was converted to MLX format from DavidAU/Qwen3-17B-QiMing-V1.0-Total-Recall-Medium using mlx-lm version 0.26.4.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 29

Safetensors

Model size

17.1B params

Tensor type

BF16

U32

Model tree for nightmedia/Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx

Base model

aifeifei798/QiMing-v1.0-14B

Finetuned

DavidAU/Qwen3-17B-QiMing-V1.0-Total-Recall-Medium

Quantized

(3)

this model