Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +65 -51

README.md CHANGED Viewed

@@ -1,52 +1,66 @@
----
-library_name: transformers
-license: apache-2.0
-base_model:
-- Qwen/Qwen2.5-7B-Instruct
----
-# Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip
-## Introduction
-This repo contains the **Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Based on the original [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) checkpoint, we further insert speech tokens into its vocabulary for end-to-end omni-modal alignment as follows. The total number of speech tokens in [EMOVA speech tokenizer](https://huggingface.co/Emova-ollm/emova_speech_tokenizer_hf) is 4096. Therefore, it should be utilized as initialization in the **Stage 2: Omni-modal text-centric alignment** of EMOVA training.
-```bash
-# Source code can be found https://github.com/emova-ollm/EMOVA#insert-speech-tokens-into-llm-vocabulary
-python scripts/insert_speech_token.py \
-  --origin_model_path Qwen/Qwen2.5-7B-Instruct \
-  --saved_model_path ./Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip \
-  --num_speech_tokens 4096
-```
-## Usage
-To train EMOVA with Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip, we need to create a new model config, and set the **language_model** parameters as follows. An example is provided [here](https://github.com/emova-ollm/EMOVA/blob/main/configs/_base_/models/qwen2_5_qwen2vit.py). Check more details on training EMOVA in our [github repo](https://github.com/emova-ollm/EMOVA#training-emova).
-```python
-language_model=dict(
-  type='EmovaQwen2ForCausalLM',                                              -- Wrapper class type for EMOVA
-  pretrained_model_name_or_path='Emova-ollm/Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip',  -- HuggingFace repo of pre-trained LLM
-  attn_implementation="flash_attention_2",                                   -- Attention type
-  from_pretrained=True,                                                      -- Load pre-trained weights
-),
-```
-## Citation
-```bibtex
-@article{chen2024emova,
-  title={Emova: Empowering language models to see, hear and speak with vivid emotions},
-  author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
-  journal={arXiv preprint arXiv:2409.18042},
-  year={2024}
-}
-@article{qwen2.5,
-    title   = {Qwen2.5 Technical Report},
-    author  = {An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
-    journal = {arXiv preprint arXiv:2412.15115},
-    year    = {2024}
-}
 ```

+---
+library_name: transformers
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+# Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip
+## Introduction
+This repo contains the **Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Based on the original [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) checkpoint, we further insert speech tokens into its vocabulary for end-to-end omni-modal alignment as follows. The total number of speech tokens in [EMOVA speech tokenizer](https://huggingface.co/Emova-ollm/emova_speech_tokenizer_hf) is 4096. Therefore, it should be utilized as initialization in the **Stage 2: Omni-modal text-centric alignment** of EMOVA training.
+```bash
+# Source code can be found https://github.com/emova-ollm/EMOVA#insert-speech-tokens-into-llm-vocabulary
+python scripts/insert_speech_token.py \
+  --origin_model_path Qwen/Qwen2.5-7B-Instruct \
+  --saved_model_path ./Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip \
+  --num_speech_tokens 4096
+```
+## Usage
+To train EMOVA with Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip, we need to create a new model config, and set the **language_model** parameters as follows. An example is provided [here](https://github.com/emova-ollm/EMOVA/blob/main/configs/_base_/models/qwen2_5_qwen2vit.py). Check more details on training EMOVA in our [github repo](https://github.com/emova-ollm/EMOVA#training-emova).
+```python
+language_model=dict(
+  type='EmovaQwen2ForCausalLM',                                              -- Wrapper class type for EMOVA
+  pretrained_model_name_or_path='Emova-ollm/Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip',  -- HuggingFace repo of pre-trained LLM
+  attn_implementation="flash_attention_2",                                   -- Attention type
+  from_pretrained=True,                                                      -- Load pre-trained weights
+),
+```
+## Citation
+```bibtex
+@article{chen2024emova,
+  title={Emova: Empowering language models to see, hear and speak with vivid emotions},
+  author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
+  journal={arXiv preprint arXiv:2409.18042},
+  year={2024}
+}
+@article{qwen2.5,
+    title   = {Qwen2.5 Technical Report},
+    author  = {An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
+    journal = {arXiv preprint arXiv:2412.15115},
+    year    = {2024}
+}
 ```