MINGYISU commited on
Commit
df126cf
ยท
verified ยท
1 Parent(s): 22c598e

Update utils.py

Browse files
Files changed (1) hide show
  1. utils.py +7 -7
utils.py CHANGED
@@ -16,27 +16,27 @@ TASKS_V2 = ["V2-Overall", "V-CLS", "V-QA", "V-RET", "V-MRET", "VisDoc"]
16
  COLUMN_NAMES = BASE_COLS + TASKS_V1 + TASKS_V2
17
 
18
  DATA_TITLE_TYPE = ['number', 'markdown', 'str', 'markdown'] + \
19
- ['number'] * (len(TASKS_V1) + len(TASKS_V2))
20
 
21
  LEADERBOARD_INTRODUCTION = """
22
- # MMEB Leaderboard
23
 
24
  ## Introduction
25
- We introduce a novel benchmark, MMEB (Massive Multimodal Embedding Benchmark),
26
  which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training
27
  and evaluating embedding models across various combinations of text and image modalities.
28
  All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text,
29
- or a combination of both. MMEB is divided into 20 in-distribution datasets, which can be used for
30
  training, and 16 out-of-distribution datasets, reserved for evaluation.
31
 
32
- Building upon on **MMEB**, **MMEB-V2** expands the evaluation scope to include five new tasks: four video-based tasks
33
  โ€” Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering โ€” and one task focused on visual documents, Visual Document Retrieval.
34
  This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.
35
 
36
- | [**Overview**](https://tiger-ai-lab.github.io/VLM2Vec/) | [**Github**](https://github.com/TIGER-AI-Lab/VLM2Vec)
37
  | [**๐Ÿ“–MMEB-V2/VLM2Vec-V2 Paper (TBA)**](https://arxiv.org/abs/2410.05160)
38
  | [**๐Ÿ“–MMEB-V1/VLM2Vec-V1 Paper**](https://arxiv.org/abs/2410.05160)
39
- | [**Hugging Face**](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2)
40
  """
41
 
42
  TABLE_INTRODUCTION = """"""
 
16
  COLUMN_NAMES = BASE_COLS + TASKS_V1 + TASKS_V2
17
 
18
  DATA_TITLE_TYPE = ['number', 'markdown', 'str', 'markdown'] + \
19
+ ['number'] * len(TASKS_V1 + TASKS_V2)
20
 
21
  LEADERBOARD_INTRODUCTION = """
22
+ # ๐Ÿ“Š **MMEB LEADERBOARD (V1 & V2)**
23
 
24
  ## Introduction
25
+ We introduce a novel benchmark, **MMEB-V1 (Massive Multimodal Embedding Benchmark)**,
26
  which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training
27
  and evaluating embedding models across various combinations of text and image modalities.
28
  All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text,
29
+ or a combination of both. MMEB-V1 is divided into 20 in-distribution datasets, which can be used for
30
  training, and 16 out-of-distribution datasets, reserved for evaluation.
31
 
32
+ Building upon on **MMEB-V1**, **MMEB-V2** expands the evaluation scope to include five new tasks: four video-based tasks
33
  โ€” Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering โ€” and one task focused on visual documents, Visual Document Retrieval.
34
  This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.
35
 
36
+ | [**๐Ÿ“ˆOverview**](https://tiger-ai-lab.github.io/VLM2Vec/) | [**Github**](https://github.com/TIGER-AI-Lab/VLM2Vec)
37
  | [**๐Ÿ“–MMEB-V2/VLM2Vec-V2 Paper (TBA)**](https://arxiv.org/abs/2410.05160)
38
  | [**๐Ÿ“–MMEB-V1/VLM2Vec-V1 Paper**](https://arxiv.org/abs/2410.05160)
39
+ | [**๐Ÿค—Hugging Face**](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2) |
40
  """
41
 
42
  TABLE_INTRODUCTION = """"""