DeepGlint-AI
/

mlcd-vit-bigG-patch14-336

Model card Files Files and versions Community

xiangan commited on Feb 8

Commit

dcd6928

·

verified ·

1 Parent(s): 28c33c4

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+---
+license: mit
+---
+We adopted the official [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) and the official training dataset [LLaVA-NeXT-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data) for evaluating the foundational visual models.
+| Vision Tower                 | RoPE2D | ChartQA   | DocVQA    | InfoVQA   | OCRBench   | MMMU      |
+| :--------------------------- | :----: | :-------- | :-------- | :-------- | :--------- | :-------- |
+| CLIP (ViT-L-14-336px)        |   ×    | 66.52     | 75.21     | 38.88     | 525.00     | 44.20     |
+| SigLIP (ViT-SO400M-384px)    |   ×    | 69.28     | 76.71     | 41.38     | 554.00     | 46.78     |
+| DFN5B (ViT-H-14-378px)       |   ×    | 64.36     | 70.87     | 38.59     | 473.00     | **48.00** |
+| **MLCD (ViT-L-14-336px)**    |   ×    | 67.84     | 76.46     | 43.48     | 531.00     | 44.30     |
+| **MLCD (ViT-bigG-14-336px)** |   √    | **71.07** | **79.63** | **44.38** | **572.00** | 46.78     |