hrw commited on
Commit
897542f
·
verified ·
1 Parent(s): 2ec52c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -1,17 +1,18 @@
1
  ---
2
  license: mit
3
  ---
4
- **SWE-Dev-9B is trained from [glm-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat/)**
5
 
6
- 🚀 SWE-Dev, a groundbreaking open-source Software Engineering Agent (SWE Agent)!
7
 
8
- 📚 We have built a high-quality dataset and significantly improved the model’s performance on SWE tasks through rejection sampling. We also explored the feasibility of various offline algorithms on SWE through extensive experiments.
9
 
10
- 🔧 Using only open-source frameworks and models, SWE-Dev-7B and 32B achieved solve rates of 23.4% and 36.6% on SWE-bench-Verified, respectively, even approaching the performance of closed-source models like GPT-4o.
11
 
12
- 🛠 No need for complex prompt engineering or expensive multi-round evaluations—performance breakthroughs can be achieved with simplified inference scaling! We discovered that increasing interaction rounds significantly boosts model performance. For instance, DeepSeek-V3’s solve rate improved from 37.4% at 30 rounds to 41.2% at 75 rounds. Context extension also proved highly effective for short-text-trained models!
 
 
 
13
 
14
- 💡 We further explored the scaling laws between data size, interaction rounds, and model performance, demonstrating that smaller, high-quality datasets are sufficient to support top-tier performance.
15
 
16
  Notion Link: https://ubecwang.notion.site/1bc32cf963e080b2a01df2895f66021f?v=1bc32cf963e0810ca07e000c86c4c1e1
17
 
 
1
  ---
2
  license: mit
3
  ---
 
4
 
5
+ 🚀 SWE-Dev, an open-source Agent for Software Engineering tasks!
6
 
7
+ 💡 We develop a comprehensive pipeline for creating developer-oriented datasets from GitHub repositories, including issue tracking, code localization, test case generation, and evaluation.
8
 
9
+ 🔧 Based on open-source frameworks (OpenHands) and models, SWE-Dev-7B and 32B achieved solve rates of 23.4% and 36.6% on SWE-bench-Verified, respectively, even approaching the performance of GPT-4o.
10
 
11
+ 📚 We find that training data scaling and inference scaling can both effectively boost the performance of models on SWE-bench. Moreover, higher data quality further improves this trend when combined with reinforcement fine-tuning (RFT). For inference scaling specifically, the solve rate on SWE-Dev increased from 34.0% at 30 rounds to 36.6% at 75 rounds.
12
+
13
+
14
+ SWE-Dev-9B is trained from [glm-4-9B-chat](https://huggingface.co/THUDM/glm-4-9b-chat/)
15
 
 
16
 
17
  Notion Link: https://ubecwang.notion.site/1bc32cf963e080b2a01df2895f66021f?v=1bc32cf963e0810ca07e000c86c4c1e1
18