UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark
#8
by
Fernanda24
- opened
UD-Q4_K_XL :
test_cases: 225
model: openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
edit_format: diff
commit_hash: f38200c
pass_rate_1: 29.8
pass_rate_2: 60.9 << -- this is the final score 60.9%
pass_num_1: 67
pass_num_2: 137
percent_cases_well_formed: 94.7
error_outputs: 12
num_malformed_responses: 12
num_with_malformed_responses: 12
user_asks: 106
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
prompt_tokens: 2896195
completion_tokens: 456367
test_timeouts: 1
total_tests: 225
command: aider --model openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
date: 2025-07-23
versions: 0.85.3.dev
seconds_per_case: 120.3 ```
Fernanda24
changed discussion status to
closed
Fernanda24
changed discussion status to
open
Fernanda24
changed discussion title from
UD-Q4_K_XL matches fp8 with 60.9% vs 61.8% on Aider Polyglot benchmark
to UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark
Thanks for sharing your results! Pretty damn cool!
isn't that for the 235b model? can't fund any Aider results for the orig 480B coder model? (great results ofc, but would be nice to know where the 480b sits accurately)
isn't that for the 235b model? can't fund any Aider results for the orig 480B coder model? (great results ofc, but would be nice to know where the 480b sits accurately)
No, its for the 480b model. Usually the results are posted in the Aider discord