Spaces:
Running
Running
Update utils.py
Browse files
utils.py
CHANGED
@@ -16,27 +16,27 @@ TASKS_V2 = ["V2-Overall", "V-CLS", "V-QA", "V-RET", "V-MRET", "VisDoc"]
|
|
16 |
COLUMN_NAMES = BASE_COLS + TASKS_V1 + TASKS_V2
|
17 |
|
18 |
DATA_TITLE_TYPE = ['number', 'markdown', 'str', 'markdown'] + \
|
19 |
-
['number'] *
|
20 |
|
21 |
LEADERBOARD_INTRODUCTION = """
|
22 |
-
# MMEB
|
23 |
|
24 |
## Introduction
|
25 |
-
We introduce a novel benchmark, MMEB (Massive Multimodal Embedding Benchmark)
|
26 |
which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training
|
27 |
and evaluating embedding models across various combinations of text and image modalities.
|
28 |
All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text,
|
29 |
-
or a combination of both. MMEB is divided into 20 in-distribution datasets, which can be used for
|
30 |
training, and 16 out-of-distribution datasets, reserved for evaluation.
|
31 |
|
32 |
-
Building upon on **MMEB**, **MMEB-V2** expands the evaluation scope to include five new tasks: four video-based tasks
|
33 |
โ Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering โ and one task focused on visual documents, Visual Document Retrieval.
|
34 |
This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.
|
35 |
|
36 |
-
| [
|
37 |
| [**๐MMEB-V2/VLM2Vec-V2 Paper (TBA)**](https://arxiv.org/abs/2410.05160)
|
38 |
| [**๐MMEB-V1/VLM2Vec-V1 Paper**](https://arxiv.org/abs/2410.05160)
|
39 |
-
| [
|
40 |
"""
|
41 |
|
42 |
TABLE_INTRODUCTION = """"""
|
|
|
16 |
COLUMN_NAMES = BASE_COLS + TASKS_V1 + TASKS_V2
|
17 |
|
18 |
DATA_TITLE_TYPE = ['number', 'markdown', 'str', 'markdown'] + \
|
19 |
+
['number'] * len(TASKS_V1 + TASKS_V2)
|
20 |
|
21 |
LEADERBOARD_INTRODUCTION = """
|
22 |
+
# ๐ **MMEB LEADERBOARD (V1 & V2)**
|
23 |
|
24 |
## Introduction
|
25 |
+
We introduce a novel benchmark, **MMEB-V1 (Massive Multimodal Embedding Benchmark)**,
|
26 |
which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training
|
27 |
and evaluating embedding models across various combinations of text and image modalities.
|
28 |
All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text,
|
29 |
+
or a combination of both. MMEB-V1 is divided into 20 in-distribution datasets, which can be used for
|
30 |
training, and 16 out-of-distribution datasets, reserved for evaluation.
|
31 |
|
32 |
+
Building upon on **MMEB-V1**, **MMEB-V2** expands the evaluation scope to include five new tasks: four video-based tasks
|
33 |
โ Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering โ and one task focused on visual documents, Visual Document Retrieval.
|
34 |
This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.
|
35 |
|
36 |
+
| [**๐Overview**](https://tiger-ai-lab.github.io/VLM2Vec/) | [**Github**](https://github.com/TIGER-AI-Lab/VLM2Vec)
|
37 |
| [**๐MMEB-V2/VLM2Vec-V2 Paper (TBA)**](https://arxiv.org/abs/2410.05160)
|
38 |
| [**๐MMEB-V1/VLM2Vec-V1 Paper**](https://arxiv.org/abs/2410.05160)
|
39 |
+
| [**๐คHugging Face**](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2) |
|
40 |
"""
|
41 |
|
42 |
TABLE_INTRODUCTION = """"""
|