UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark

#8
by Fernanda24 - opened

UD-Q4_K_XL :

  test_cases: 225
  model: openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
  edit_format: diff
  commit_hash: f38200c
  pass_rate_1: 29.8
  pass_rate_2: 60.9                               << -- this is the final score 60.9%
  pass_num_1: 67
  pass_num_2: 137
  percent_cases_well_formed: 94.7
  error_outputs: 12
  num_malformed_responses: 12
  num_with_malformed_responses: 12
  user_asks: 106
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 2896195
  completion_tokens: 456367
  test_timeouts: 1
  total_tests: 225
  command: aider --model openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
  date: 2025-07-23
  versions: 0.85.3.dev
  seconds_per_case: 120.3 ```
Fernanda24 changed discussion status to closed
Fernanda24 changed discussion status to open
Fernanda24 changed discussion title from UD-Q4_K_XL matches fp8 with 60.9% vs 61.8% on Aider Polyglot benchmark to UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark
Unsloth AI org

Thanks for sharing your results! Pretty damn cool!

isn't that for the 235b model? can't fund any Aider results for the orig 480B coder model? (great results ofc, but would be nice to know where the 480b sits accurately)

Unsloth AI org

isn't that for the 235b model? can't fund any Aider results for the orig 480B coder model? (great results ofc, but would be nice to know where the 480b sits accurately)

No, its for the 480b model. Usually the results are posted in the Aider discord

Sign up or log in to comment