Commit
·
ad604a4
1
Parent(s):
5105bab
simplify content
Browse files- content.py +3 -10
content.py
CHANGED
@@ -9,18 +9,11 @@ LINKS = """
|
|
9 |
"""
|
10 |
|
11 |
INTRODUCTION_TEXT = """
|
12 |
-
Online Mind2Web is a benchmark designed to evaluate real-world performance of web agents on
|
13 |
-
|
14 |
-
|
15 |
-
## Tasks
|
16 |
-
Online Mind2Web includes 300 tasks from 136 popular websites across various domains. It covers a diverse set of user tasks, to evaluate agents' performance in real-world environments.
|
17 |
-
|
18 |
-
Tasks are categorized into three difficulty levels based on the steps human annotators need:
|
19 |
-
- Easy: 1 - 5
|
20 |
-
- Medium: 6 - 10
|
21 |
-
- Hard: 11 +
|
22 |
|
23 |
## Leaderboard
|
|
|
|
|
24 |
"""
|
25 |
|
26 |
SUBMISSION_TEXT = """
|
|
|
9 |
"""
|
10 |
|
11 |
INTRODUCTION_TEXT = """
|
12 |
+
Online Mind2Web is a benchmark designed to evaluate the real-world performance of web agents on live websites, featuring 300 tasks across 136 popular sites in diverse domains. Based on the number of steps required by human annotators, tasks are divided into three difficulty levels: Easy (1–5 steps), Medium (6–10 steps), and Hard (11+ steps).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
## Leaderboard
|
15 |
+
|
16 |
+
We maintain two leaderboards: one for automated evaluation, conducted internally using participant-submitted trajectories, and another for human evaluation—agents will be included in the human-eval leaderboard after submitted results successfully pass our validation process.
|
17 |
"""
|
18 |
|
19 |
SUBMISSION_TEXT = """
|