UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark

by Fernanda24 - opened Jul 25

Jul 25

•

UD-Q4_K_XL :

  test_cases: 225
  model: openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
  edit_format: diff
  commit_hash: f38200c
  pass_rate_1: 29.8
  pass_rate_2: 60.9                               << -- this is the final score 60.9%
  pass_num_1: 67
  pass_num_2: 137
  percent_cases_well_formed: 94.7
  error_outputs: 12
  num_malformed_responses: 12
  num_with_malformed_responses: 12
  user_asks: 106
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 2896195
  completion_tokens: 456367
  test_timeouts: 1
  total_tests: 225
  command: aider --model openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
  date: 2025-07-23
  versions: 0.85.3.dev
  seconds_per_case: 120.3 ```

Fernanda24 changed discussion status to closed Jul 25

Fernanda24 changed discussion status to open Jul 25

Fernanda24 changed discussion title from UD-Q4_K_XL matches fp8 with 60.9% vs 61.8% on Aider Polyglot benchmark to UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark Jul 25

shimmyshimmer

Unsloth AI org Jul 25

Thanks for sharing your results! Pretty damn cool!

freegheist

4 days ago

isn't that for the 235b model? can't fund any Aider results for the orig 480B coder model? (great results ofc, but would be nice to know where the 480b sits accurately)

shimmyshimmer

Unsloth AI org 1 day ago

isn't that for the 235b model? can't fund any Aider results for the orig 480B coder model? (great results ofc, but would be nice to know where the 480b sits accurately)

No, its for the 480b model. Usually the results are posted in the Aider discord

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment