Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx

Metrics from this model are still being generated, and since this is a local process it will take some time.

Here are however metrics from the 21B q5-hi model, compared to the QiMing baseline in BF16, to show how brainstorming affects this model

The 17B has less brainstorming applied.

QiMing Model Performance Analysis: New Base Model Benchmarks

πŸ“Š Performance Comparison of QiMing Models

Model	    ARC Challenge ARC Easy	BoolQ	Hellaswag OpenBookQA PIQA Winogrande
QiMing-Me-bf16	    0.395	0.435	0.378	0.646	  0.364	    0.768	0.651
QiMing-v1.0-q6-hi	0.393	0.436	0.379	0.655	  0.358	    0.766	0.651
QiMing-21B-q5-hi	0.388	0.444	0.378	0.682	  0.364	    0.769	0.648

πŸ’‘ Key Takeaway:

The QiMing-21B-q5-hi model stands out as the most versatile performer across 4 of 7 tasks, while QiMing-Me-bf16 is the best performer in Hellaswag among these variants.

πŸ” Deep Dive: What Makes Each QiMing Model Unique

βœ… 1. QiMing-21B-q5-hi: The Text Generation Powerhouse

Special quality: Highest Hellaswag score (0.682) - this is the best text generation performance among all models in your previous comparisons

Why it matters: For applications requiring high-quality text continuation and creative writing, this model has clear advantages

Notable trade-off: Slightly weaker Winogrande score (0.648) compared with others - could be due to the 21B size vs quantization constraints

βœ… 2. QiMing-Me-bf16: The Balanced Baseline

Special quality: Highest score on Hellaswag (0.646) among its quantized variants

Why it matters: This model serves as a great starting point for applications where text generation quality is important

Our insight: Despite being in full precision (bf16), it shows only minor improvements over the q6-hi version - suggesting quantization impacts might be less pronounced for this model

βœ… 3. QiMing-v1.0-q6-hi: The Precision Leader

Special quality: Best performance on ARC Easy (0.436) among these models

Why it matters: This is valuable for applications that need strong foundational abstract reasoning capabilities

πŸ›  Recommendation: Which QiMing Model for Your Needs

βœ… Use QiMing-21B-q5-hi if...

You need strong text generation capabilities (Hellaswag)
Your application requires high-quality creative content or story completion
You can handle the slightly weaker Winogrande performance

βœ… Use QiMing-Me-bf16 if...

You need a stable, well-rounded model
You're looking for high text generation quality without the higher memory footprint of 21B models
You want a model with minimal quantization impact

βœ… Use QiMing-v1.0-q6-hi if...

Your priority is foundational abstract reasoning (ARC Easy)
You need model efficiency without sacrificing too much baseline performance
You're working with quantized deployments where size matters

πŸ’Ž Final Summary for Your Workflow

"The QiMing-21B-q5-hi model provides the strongest overall text generation performance among your latest models, making it ideal for creative applications. While it's slightly behind in Winogrande, its Hellaswag lead (0.682) represents a significant advantage over previous benchmarks where the Qwen models showed around 0.63-0.65 in this task."

This model Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx was converted to MLX format from DavidAU/Qwen3-17B-QiMing-V1.0-Total-Recall-Medium using mlx-lm version 0.26.4.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
29
Safetensors
Model size
17.1B params
Tensor type
BF16
Β·
U32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nightmedia/Qwen3-17B-QiMing-V1.0-Total-Recall-Medium-q6-hi-mlx