WeijianQi1999 commited on
Commit
ad604a4
·
1 Parent(s): 5105bab

simplify content

Browse files
Files changed (1) hide show
  1. content.py +3 -10
content.py CHANGED
@@ -9,18 +9,11 @@ LINKS = """
9
  """
10
 
11
  INTRODUCTION_TEXT = """
12
- Online Mind2Web is a benchmark designed to evaluate real-world performance of web agents on online websites.
13
-
14
-
15
- ## Tasks
16
- Online Mind2Web includes 300 tasks from 136 popular websites across various domains. It covers a diverse set of user tasks, to evaluate agents' performance in real-world environments.
17
-
18
- Tasks are categorized into three difficulty levels based on the steps human annotators need:
19
- - Easy: 1 - 5
20
- - Medium: 6 - 10
21
- - Hard: 11 +
22
 
23
  ## Leaderboard
 
 
24
  """
25
 
26
  SUBMISSION_TEXT = """
 
9
  """
10
 
11
  INTRODUCTION_TEXT = """
12
+ Online Mind2Web is a benchmark designed to evaluate the real-world performance of web agents on live websites, featuring 300 tasks across 136 popular sites in diverse domains. Based on the number of steps required by human annotators, tasks are divided into three difficulty levels: Easy (1–5 steps), Medium (6–10 steps), and Hard (11+ steps).
 
 
 
 
 
 
 
 
 
13
 
14
  ## Leaderboard
15
+
16
+ We maintain two leaderboards: one for automated evaluation, conducted internally using participant-submitted trajectories, and another for human evaluation—agents will be included in the human-eval leaderboard after submitted results successfully pass our validation process.
17
  """
18
 
19
  SUBMISSION_TEXT = """