Update README.md
Browse files
README.md
CHANGED
@@ -20,3 +20,20 @@ For more details about ExCoT and how to use it:
|
|
20 |
* π [ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback (arxiv)](https://arxiv.org/pdf/2503.19988)
|
21 |
* π [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/main/projects/excot_dpo)
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
* π [ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback (arxiv)](https://arxiv.org/pdf/2503.19988)
|
21 |
* π [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/main/projects/excot_dpo)
|
22 |
|
23 |
+
## Evaluation results
|
24 |
+
|
25 |
+
| Model | Ex% Dev | Ex% Test |
|
26 |
+
|--------------------------------------|---------|----------|
|
27 |
+
| Arctic-ExCoT-70B (LLaMA 3.1 70B) | **68.51** | 68.53 |
|
28 |
+
| Arctic-ExCoT-32B (Qwen-2.5-Coder 32B) | 68.25 | 68.19 |
|
29 |
+
| XiYanSQL-QwenCoder* | 67.01 | **69.03** |
|
30 |
+
| OpenAI GPT-4o | 54.04 | β |
|
31 |
+
| OpenAI GPT-4 | 46.35 | 54.89 |
|
32 |
+
| Anthropic Claude 3.5-Sonnet | 50.13 | β |
|
33 |
+
| Claude-2 | 42.70 | 49.02 |
|
34 |
+
| OpenAI o1-mini | 52.41 | β |
|
35 |
+
| OpenAI o3-mini | 53.72 | β |
|
36 |
+
| Mistral-large-2407 (123B) | 53.52 | 55.84 |
|
37 |
+
| DeepSeek-V2 (236B) | 56.13 | 56.68 |
|
38 |
+
|
39 |
+
Top Single-Model, Single-Inference Results on the BIRD Leaderboard (as of March 25, 2025). *XiYanSQL-QwenCoder: there are some challenges to reproduce the numbers [[1]](https://github.com/XGenerationLab/XiYanSQL-QwenCoder/issues/4)[[2]](https://modelscope.cn/models/XGenerationLab/XiYanSQL-QwenCoder-32B-2412/feedback/issueDetail/22708).
|