Spaces:

TIGER-Lab
/

MMEB-Leaderboard

Running

App Files Files Community

MINGYISU commited on May 16

Commit

df126cf

verified ·

1 Parent(s): 22c598e

Update utils.py

Browse files

Files changed (1) hide show

utils.py +7 -7

utils.py CHANGED Viewed

@@ -16,27 +16,27 @@ TASKS_V2 = ["V2-Overall", "V-CLS", "V-QA", "V-RET", "V-MRET", "VisDoc"]
 COLUMN_NAMES = BASE_COLS + TASKS_V1 + TASKS_V2
 DATA_TITLE_TYPE = ['number', 'markdown', 'str', 'markdown'] + \
-                    ['number'] * (len(TASKS_V1) + len(TASKS_V2))
 LEADERBOARD_INTRODUCTION = """
-# MMEB Leaderboard
 ## Introduction
-We introduce a novel benchmark, MMEB (Massive Multimodal Embedding Benchmark),
 which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training
 and evaluating embedding models across various combinations of text and image modalities.
 All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text,
-or a combination of both. MMEB is divided into 20 in-distribution datasets, which can be used for
 training, and 16 out-of-distribution datasets, reserved for evaluation.
-Building upon on **MMEB**, **MMEB-V2** expands the evaluation scope to include five new tasks: four video-based tasks
 — Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering — and one task focused on visual documents, Visual Document Retrieval.
 This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.
-| [**Overview**](https://tiger-ai-lab.github.io/VLM2Vec/) | [**Github**](https://github.com/TIGER-AI-Lab/VLM2Vec)
 | [**📖MMEB-V2/VLM2Vec-V2 Paper (TBA)**](https://arxiv.org/abs/2410.05160)
 | [**📖MMEB-V1/VLM2Vec-V1 Paper**](https://arxiv.org/abs/2410.05160)
-| [**Hugging Face**](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2)
 """
 TABLE_INTRODUCTION = """"""

 COLUMN_NAMES = BASE_COLS + TASKS_V1 + TASKS_V2
 DATA_TITLE_TYPE = ['number', 'markdown', 'str', 'markdown'] + \
+                    ['number'] * len(TASKS_V1 + TASKS_V2)
 LEADERBOARD_INTRODUCTION = """
+# 📊 **MMEB LEADERBOARD (V1 & V2)**
 ## Introduction
+We introduce a novel benchmark, **MMEB-V1 (Massive Multimodal Embedding Benchmark)**,
 which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training
 and evaluating embedding models across various combinations of text and image modalities.
 All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text,
+or a combination of both. MMEB-V1 is divided into 20 in-distribution datasets, which can be used for
 training, and 16 out-of-distribution datasets, reserved for evaluation.
+Building upon on **MMEB-V1**, **MMEB-V2** expands the evaluation scope to include five new tasks: four video-based tasks
 — Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering — and one task focused on visual documents, Visual Document Retrieval.
 This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.
+| [**📈Overview**](https://tiger-ai-lab.github.io/VLM2Vec/) | [**Github**](https://github.com/TIGER-AI-Lab/VLM2Vec)
 | [**📖MMEB-V2/VLM2Vec-V2 Paper (TBA)**](https://arxiv.org/abs/2410.05160)
 | [**📖MMEB-V1/VLM2Vec-V1 Paper**](https://arxiv.org/abs/2410.05160)
+| [**🤗Hugging Face**](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2) |
 """
 TABLE_INTRODUCTION = """"""