license: apache-2.0
language:
- en
tags:
- Multimodal
- StableLM
datasets:
- LDJnr/LessWrong-Amplify-Instruct
- LDJnr/Pure-Dove
- LDJnr/Verified-Camel
Obsidian: Multimodal LLM for Everyone
Model Name: Obsidian-3B-V0.5
Obsidian is a brand new series of Multimodal Language Models. The current version is effectively a multi-modal version of Capybara-3B, as Llava is to Vicuna.
Obsidian-3B-V0.5 is a LLaVA version based on Capybara-3B-V1.9 which was built on top of StableLM-3B-4e1t. Capybara-3B-V1.9 achieves state-of-the-art performance when compared to model with similar size, even beats some 7B models.
Current finetuning and inference code is available on our GitHub repo: Here
Acknowledgement
Obsidian-3B-V0.5 was developed and finetuned by Nous Research. in collaboration with Virtual Interactive. Special thank you to LDJ for the wonderful Capybara dataset, and qnguyen3 for the model training procedure.
Model Training
Obsidian-3B-V0.5 followed the same training procedure as LLaVA 1.5
Prompt Format
The model followed ChatML format. However, with ###
as the seperator
<|im_start|>user
What is this sign about?\n<image>
###
<|im_start|>assistant
The sign is about bullying, and it is placed on a black background with a red background.
###
Benchmarks
Coming Soon!