Spaces:

osunlp
/

Online_Mind2Web_Leaderboard

Running

WeijianQi1999 commited on Mar 24

Commit

ad604a4

1 Parent(s): 5105bab

simplify content

Files changed (1) hide show

content.py CHANGED Viewed

@@ -9,18 +9,11 @@ LINKS = """
 """
 INTRODUCTION_TEXT = """
-Online Mind2Web is a benchmark designed to evaluate real-world performance of web agents on online websites.
-## Tasks
-Online Mind2Web includes 300 tasks from 136 popular websites across various domains. It covers a diverse set of user tasks, to evaluate agents' performance in real-world environments.
-Tasks are categorized into three difficulty levels based on the steps human annotators need:
-- Easy: 1 - 5
-- Medium: 6 - 10
-- Hard: 11 +
 ## Leaderboard
 """
 SUBMISSION_TEXT = """

 """
 INTRODUCTION_TEXT = """
+Online Mind2Web is a benchmark designed to evaluate the real-world performance of web agents on live websites, featuring 300 tasks across 136 popular sites in diverse domains. Based on the number of steps required by human annotators, tasks are divided into three difficulty levels: Easy (1–5 steps), Medium (6–10 steps), and Hard (11+ steps).
 ## Leaderboard
+We maintain two leaderboards: one for automated evaluation, conducted internally using participant-submitted trajectories, and another for human evaluation—agents will be included in the human-eval leaderboard after submitted results successfully pass our validation process.
 """
 SUBMISSION_TEXT = """