lbourdois commited on
Commit
568cef6
·
verified ·
1 Parent(s): 5b09759

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +65 -51
README.md CHANGED
@@ -1,52 +1,66 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model:
5
- - Qwen/Qwen2.5-7B-Instruct
6
- ---
7
-
8
- # Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip
9
-
10
- ## Introduction
11
-
12
- This repo contains the **Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Based on the original [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) checkpoint, we further insert speech tokens into its vocabulary for end-to-end omni-modal alignment as follows. The total number of speech tokens in [EMOVA speech tokenizer](https://huggingface.co/Emova-ollm/emova_speech_tokenizer_hf) is 4096. Therefore, it should be utilized as initialization in the **Stage 2: Omni-modal text-centric alignment** of EMOVA training.
13
-
14
- ```bash
15
- # Source code can be found https://github.com/emova-ollm/EMOVA#insert-speech-tokens-into-llm-vocabulary
16
- python scripts/insert_speech_token.py \
17
- --origin_model_path Qwen/Qwen2.5-7B-Instruct \
18
- --saved_model_path ./Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip \
19
- --num_speech_tokens 4096
20
- ```
21
-
22
- ## Usage
23
-
24
- To train EMOVA with Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip, we need to create a new model config, and set the **language_model** parameters as follows. An example is provided [here](https://github.com/emova-ollm/EMOVA/blob/main/configs/_base_/models/qwen2_5_qwen2vit.py). Check more details on training EMOVA in our [github repo](https://github.com/emova-ollm/EMOVA#training-emova).
25
-
26
-
27
- ```python
28
- language_model=dict(
29
- type='EmovaQwen2ForCausalLM', -- Wrapper class type for EMOVA
30
- pretrained_model_name_or_path='Emova-ollm/Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip', -- HuggingFace repo of pre-trained LLM
31
- attn_implementation="flash_attention_2", -- Attention type
32
- from_pretrained=True, -- Load pre-trained weights
33
- ),
34
- ```
35
-
36
- ## Citation
37
-
38
- ```bibtex
39
- @article{chen2024emova,
40
- title={Emova: Empowering language models to see, hear and speak with vivid emotions},
41
- author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
42
- journal={arXiv preprint arXiv:2409.18042},
43
- year={2024}
44
- }
45
-
46
- @article{qwen2.5,
47
- title = {Qwen2.5 Technical Report},
48
- author = {An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
49
- journal = {arXiv preprint arXiv:2412.15115},
50
- year = {2024}
51
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ```
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model:
5
+ - Qwen/Qwen2.5-7B-Instruct
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ ---
21
+
22
+ # Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip
23
+
24
+ ## Introduction
25
+
26
+ This repo contains the **Qwen2.5-7B-Instruct-Add-Speech-Token-4096-Nostrip** model utilized to train the [EMOVA](https://huggingface.co/collections/Emova-ollm/emova-models-67779d377bb8261e6057a320) series of models. Based on the original [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) checkpoint, we further insert speech tokens into its vocabulary for end-to-end omni-modal alignment as follows. The total number of speech tokens in [EMOVA speech tokenizer](https://huggingface.co/Emova-ollm/emova_speech_tokenizer_hf) is 4096. Therefore, it should be utilized as initialization in the **Stage 2: Omni-modal text-centric alignment** of EMOVA training.
27
+
28
+ ```bash
29
+ # Source code can be found https://github.com/emova-ollm/EMOVA#insert-speech-tokens-into-llm-vocabulary
30
+ python scripts/insert_speech_token.py \
31
+ --origin_model_path Qwen/Qwen2.5-7B-Instruct \
32
+ --saved_model_path ./Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip \
33
+ --num_speech_tokens 4096
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ To train EMOVA with Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip, we need to create a new model config, and set the **language_model** parameters as follows. An example is provided [here](https://github.com/emova-ollm/EMOVA/blob/main/configs/_base_/models/qwen2_5_qwen2vit.py). Check more details on training EMOVA in our [github repo](https://github.com/emova-ollm/EMOVA#training-emova).
39
+
40
+
41
+ ```python
42
+ language_model=dict(
43
+ type='EmovaQwen2ForCausalLM', -- Wrapper class type for EMOVA
44
+ pretrained_model_name_or_path='Emova-ollm/Qwen2.5-7B-Instruct_add_speech_token_4096_nostrip', -- HuggingFace repo of pre-trained LLM
45
+ attn_implementation="flash_attention_2", -- Attention type
46
+ from_pretrained=True, -- Load pre-trained weights
47
+ ),
48
+ ```
49
+
50
+ ## Citation
51
+
52
+ ```bibtex
53
+ @article{chen2024emova,
54
+ title={Emova: Empowering language models to see, hear and speak with vivid emotions},
55
+ author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
56
+ journal={arXiv preprint arXiv:2409.18042},
57
+ year={2024}
58
+ }
59
+
60
+ @article{qwen2.5,
61
+ title = {Qwen2.5 Technical Report},
62
+ author = {An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
63
+ journal = {arXiv preprint arXiv:2412.15115},
64
+ year = {2024}
65
+ }
66
  ```