Vintern-1B-v2-ViTable-docvqa

Report Link๐Ÿ‘๏ธ

Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)

Benchmarks

Model ANLS Semantic Similarity MLLM-as-judge (Gemini)
Gemini 1.5 Flash 0.35 0.56 0.40
Vintern-1B-v2 0.04 0.45 0.50
Vintern-1B-v2-ViTable-docvqa 0.50 0.71 0.59

Usage

Check out this ๐Ÿค— HF Demo, or you can open it in Colab:
Open In Colab

Citation:

@misc{doan2024vintern1befficientmultimodallarge,
      title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese}, 
      author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
      year={2024},
      eprint={2408.12480},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.12480}, 
}
Downloads last month
21
Safetensors
Model size
938M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Finetuned
(2)
this model

Dataset used to train YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Space using YuukiAsuna/Vintern-1B-v2-ViTable-docvqa 1