Snowflake
/

Qwen-2.5-coder-Arctic-ExCoT-32B

Model card Files Files and versions

jeffra commited on Mar 27

Commit

d71c180

·

verified ·

1 Parent(s): 64b9559

Update README.md

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -20,3 +20,20 @@ For more details about ExCoT and how to use it:
 * 📝 [ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback (arxiv)](https://arxiv.org/pdf/2503.19988)
 * 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/main/projects/excot_dpo)

 * 📝 [ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback (arxiv)](https://arxiv.org/pdf/2503.19988)
 * 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/main/projects/excot_dpo)
+## Evaluation results
+| Model                                 | Ex% Dev | Ex% Test |
+|--------------------------------------|---------|----------|
+| Arctic-ExCoT-70B (LLaMA 3.1 70B)      | **68.51** | 68.53    |
+| Arctic-ExCoT-32B (Qwen-2.5-Coder 32B) | 68.25   | 68.19    |
+| XiYanSQL-QwenCoder*                   | 67.01   | **69.03** |
+| OpenAI GPT-4o                         | 54.04   | –        |
+| OpenAI GPT-4                          | 46.35   | 54.89    |
+| Anthropic Claude 3.5-Sonnet          | 50.13   | –        |
+| Claude-2                              | 42.70   | 49.02    |
+| OpenAI o1-mini                        | 52.41   | –        |
+| OpenAI o3-mini                        | 53.72   | –        |
+| Mistral-large-2407 (123B)             | 53.52   | 55.84    |
+| DeepSeek-V2 (236B)                    | 56.13   | 56.68    |
+Top Single-Model, Single-Inference Results on the BIRD Leaderboard (as of March 25, 2025). *XiYanSQL-QwenCoder: there are some challenges to reproduce the numbers [[1]](https://github.com/XGenerationLab/XiYanSQL-QwenCoder/issues/4)[[2]](https://modelscope.cn/models/XGenerationLab/XiYanSQL-QwenCoder-32B-2412/feedback/issueDetail/22708).