PyTorch
English
llama
sound language model
jan-hq commited on
Commit
ecbe834
·
verified ·
1 Parent(s): cbc3651

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +133 -0
README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - homebrewltd/instruction-speech-whispervq-v2
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ tags:
8
+ - sound language model
9
+ ---
10
+
11
+ ## Model Details
12
+
13
+ We have developed and released the family [llama3s](https://huggingface.co/collections/homebrew-research/llama3-s-669df2139f0576abc6eb7405). This family is natively understanding audio and text input.
14
+
15
+ We continue to supervised finetune our last checkpoint using WhisperVQ as a tokenizer for audio files [homebrewltd/...](...) with 2B tokens from [Instruction Speech WhisperVQ v2](https://huggingface.co/datasets/homebrewltd/instruction-speech-whispervq-v2) dataset.
16
+
17
+ **Model developers** Homebrew Research.
18
+
19
+ **Input** Text and sound.
20
+
21
+ **Output** Text.
22
+
23
+ **Model Architecture** Llama-3.
24
+
25
+ **Language(s):** English.
26
+
27
+ ## Intended Use
28
+
29
+ **Intended Use Cases** This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.
30
+
31
+ **Out-of-scope** The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.
32
+
33
+ ## How to Get Started with the Model
34
+
35
+ First, we need to convert the audio file to sound tokens
36
+
37
+ ```python
38
+
39
+ ```
40
+
41
+ Then, we can inference the model the same as any other LLM.
42
+
43
+ ```python
44
+
45
+ ```
46
+
47
+ ## Training process
48
+ **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
49
+
50
+ ![training_loss](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/Mo_FGQvhkcHl3y1REf76f.png)
51
+
52
+ ### Hardware
53
+
54
+ **GPU Configuration**: Cluster of 8x NVIDIA H100-SXM-80GB.
55
+ **GPU Usage**:
56
+ - **Continual Training**: 6 hours.
57
+
58
+ ### Training Arguments
59
+
60
+ We utilize [torchtune](https://github.com/pytorch/torchtune) library for the latest FSDP2 training code implementation.
61
+
62
+ | Parameter | Continual Training |
63
+ |----------------------------|-------------------------|
64
+ | **Epoch** | 1 |
65
+ | **Global batch size** | 128 |
66
+ | **Learning Rate** | 0.5e-4 |
67
+ | **Learning Scheduler** | Cosine with warmup |
68
+ | **Optimizer** | Adam torch fused |
69
+ | **Warmup Ratio** | 0.01 |
70
+ | **Weight Decay** | 0.005 |
71
+ | **Max Sequence Length** | 1024 |
72
+
73
+
74
+ ## Examples
75
+
76
+ 1. Good example:
77
+
78
+ <details>
79
+ <summary>Click to toggle Example 1</summary>
80
+
81
+ ```
82
+
83
+ ```
84
+ </details>
85
+
86
+ <details>
87
+ <summary>Click to toggle Example 2</summary>
88
+
89
+ ```
90
+
91
+ ```
92
+ </details>
93
+
94
+
95
+ 2. Misunderstanding example:
96
+
97
+ <details>
98
+ <summary>Click to toggle Example 3</summary>
99
+
100
+ ```
101
+
102
+ ```
103
+ </details>
104
+
105
+ 3. Off-tracked example:
106
+
107
+ <details>
108
+ <summary>Click to toggle Example 4</summary>
109
+
110
+ ```
111
+
112
+ ```
113
+ </details>
114
+
115
+
116
+ ## Citation Information
117
+
118
+ **BibTeX:**
119
+
120
+ ```
121
+ @article{Llama3-S: Sound Instruction Language Model 2024,
122
+ title={Llama3-S},
123
+ author={Homebrew Research},
124
+ year=2024,
125
+ month=August},
126
+ url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}
127
+ ```
128
+
129
+ ## Acknowledgement
130
+
131
+ - **[WhisperSpeech](https://github.com/collabora/WhisperSpeech)**
132
+
133
+ - **[Meta-Llama-3.1-8B-Instruct ](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)**