Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Abstract
DART, a decoupled reinforcement learning framework for GUI agents, improves efficiency and learning effectiveness through asynchronous modules and adaptive data curation, achieving high task success rates on the OSWorld benchmark.
Vision-language model (VLM) based GUI agents show promise for automating complex desktop and mobile tasks, but face significant challenges in applying reinforcement learning (RL): (1) slow multi-turn interactions with GUI environments for policy rollout, and (2) insufficient high-quality agent-environment interactions for policy learning. To address these challenges, we propose DART, a Decoupled Agentic RL Training framework for GUI agents, which coordinates heterogeneous modules in a highly decoupled manner. DART separates the training system into four asynchronous modules: environment cluster, rollout service, data manager, and trainer. This design enables non-blocking communication, asynchronous training, rollout-wise trajectory sampling, and per-worker model synchronization, significantly improving the system efficiency: 1.6*GPU utilization for rollout, 1.9* training throughput, and 5.5* environment utilization. To facilitate effective learning from abundant samples, we introduce an adaptive data curation scheme: (1) pre-collecting successful trajectories for challenging tasks to supplement sparse success in online sampling; (2) dynamically adjusting rollout numbers and trajectory lengths based on task difficulty; (3) training selectively on high-entropy steps to prioritize critical decisions; (4) stabilizing learning via truncated importance sampling for policy mismatch between policy rollout and updating. On the OSWorld benchmark, DART-GUI-7B achieves a 42.13% task success rate, a 14.61% absolute gain over the base model, and 7.34% higher than open-source SOTA. We will fully open-source our training framework, data, and model checkpoints via computer-use-agents.github.io/dart-gui, which we believe is a timely contribution to the open-source community of agentic RL training.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning (2025)
- MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents (2025)
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents (2025)
- CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks (2025)
- GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks (2025)
- SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control (2025)
- D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper