Surn commited on
Commit
edd143f
·
verified ·
1 Parent(s): 8dff93f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +317 -317
README.md CHANGED
@@ -1,318 +1,318 @@
1
- ---
2
- title: UnlimitedMusicGen
3
- emoji: 🎼
4
- colorFrom: gray
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 5.34.2
8
- python_version: 3.12.8
9
- app_file: app.py
10
- pinned: true
11
- license: creativeml-openrail-m
12
- tags:
13
- - mcp-server-track
14
- - musicgen
15
- - unlimited
16
- - user history
17
- - metadata
18
- hf_oauth: true
19
- disable_embedding: true
20
- short_description: 'unlimited Audio generation with a few added features '
21
- thumbnail: >-
22
- https://cdn-uploads.huggingface.co/production/uploads/6346595c9e5f0fe83fc60444/Z8E8OaKV84zuVAvvGpMDJ.png
23
- ---
24
-
25
- [arxiv]: https://arxiv.org/abs/2306.05284
26
- [musicgen_samples]: https://ai.honu.io/papers/musicgen/
27
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
28
-
29
- # UnlimitedMusicGen
30
- Charles Fettinger's modification of the Audiocraft project to enable unlimited Audio generation. I have added a few features to the original project to enable this. I have also added a few features to the gradio interface to make it easier to use.
31
-
32
- Please review my other AI relalated spaces at https://huggingface.co/Surn
33
-
34
- Check your video's generative metadata with https://mediaarea.net/en/MediaInfo
35
-
36
- Also note that I wrote an extension to Gradio for the waveform in the video after v4.48.0 removed it.
37
-
38
- The key update here is in the extend utility. We segment melody input and then condition the next segment with current tensors and tensors from the current time in the conditioning melody file.
39
- This allows us to follow the same arraingement of the original melody.
40
-
41
- **Thank you Huggingface for the community grant to run this project**!!
42
-
43
- ## Key Features
44
-
45
- - **Unlimited Audio Generation**: Generate music of any length by seamlessly stitching together segments
46
- - **User History**: Save and manage your generated music and access it later
47
- - **File Storage**: Generated files are automatically stored in a Hugging Face repository with shareable URLs
48
- - **Rich Metadata**: Each generated file includes detailed metadata about the generation parameters
49
- - **API Access**: Generate music programmatically using the REST API
50
- - **Background Customization**: Use custom images and settings for your music videos
51
- - **Melody Conditioning**: Use existing music to guide the generation process
52
-
53
- # Audiocraft
54
- ![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
55
- ![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
56
- ![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)
57
-
58
- Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.
59
-
60
- ## MusicGen
61
-
62
- Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
63
- Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
64
- all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
65
- them in parallel, thus having only 50 auto-regressive steps per second of audio.
66
- Check out our [sample page][musicgen_samples] or test the available demo!
67
-
68
- <a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing">
69
- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
70
- </a>
71
- <a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
72
- <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
73
- </a>
74
- <br>
75
-
76
- We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
77
-
78
- ## Installation
79
- Audiocraft requires Python 3.9, PyTorch 2.1.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
80
- # Best to make sure you have torch installed first, in particular before installing xformers.
81
- # Don't run this if you already have PyTorch installed.
82
- pip install 'torch>=2.1'
83
- # Then proceed to one of the following
84
- pip install -U audiocraft # stable release
85
- pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
86
- pip install -e . # or if you cloned the repo locally
87
- ## Usage
88
- We offer a number of way to interact with MusicGen:
89
- 1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/Surn/UnlimitedMusicGen) (huge thanks to all the HF team for their support).
90
- 2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing).
91
- 3. You can use the gradio demo locally by running `python app.py`.
92
- 4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU).
93
- 5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly
94
- updated with contributions from @camenduru and the community.
95
- 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
96
-
97
- ### Advanced Usage
98
-
99
- #### Programmatic Generation via API
100
-
101
- The `predict_simple` API endpoint allows generating music without using the UI:
102
- import requests
103
-
104
- # Example API call
105
- response = requests.post(
106
- "https://huggingface.co/spaces/Surn/UnlimitedMusicGen/api/predict_simple",
107
- json={
108
- "model": "stereo-medium", # Choose your model
109
- "text": "Epic orchestral soundtrack with dramatic strings and percussion",
110
- "duration": 60, # Duration in seconds
111
- "topk": 250,
112
- "topp": 0, # 0 means use topk instead
113
- "temperature": 0.8,
114
- "cfg_coef": 4.0,
115
- "seed": 42, # Use -1 for random seed
116
- "overlap": 2, # Seconds of overlap between segments
117
- "video_orientation": "Landscape" # or "Portrait"
118
- }
119
- )
120
-
121
- # URLs to the generated content
122
- video_url, audio_url, seed = response.json()
123
- #### Custom Background Images
124
-
125
- You can use your own background images for the music video:
126
-
127
- 1. Upload an image through the UI
128
- 2. Or specify an image URL in the API call:response = requests.post(
129
- "https://huggingface.co/spaces/Surn/UnlimitedMusicGen/api/predict_simple",
130
- json={
131
- # ... other parameters
132
- "background": "https://example.com/your-image.jpg",
133
- "video_orientation": "Landscape"
134
- }
135
- )
136
- ### More info about Top-k, Top-p, Temperature and Classifier Free Guidance from ChatGPT
137
-
138
- Top-k: Top-k is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
139
-
140
- Top-p (or nucleus sampling): Top-p, also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
141
-
142
- Temperature: Temperature is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
143
-
144
- Classifier-Free Guidance: Classifier-Free Guidance refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.
145
-
146
- These parameters, such as top-k, top-p, temperature, and classifier-free guidance, provide different ways to influence the output of a music generation model and strike a balance between creativity, diversity, coherence, and control. The specific values for these parameters can be tuned based on the desired outcome and user preferences.
147
-
148
- ## API and Storage Integration
149
-
150
- UnlimitedMusicGen now offers enhanced API capabilities and file storage integration with Hugging Face repositories:
151
-
152
- ### REST API Access
153
-
154
- The application exposes a simple REST API endpoint through Gradio that allows you to generate music programmatically:
155
- import requests
156
-
157
- # Basic API call example
158
- response = requests.post(
159
- "https://your-app-url/api/predict_simple",
160
- json={
161
- "model": "medium",
162
- "text": "4/4 120bpm electronic music with driving bass",
163
- "duration": 30,
164
- "temperature": 0.7,
165
- "cfg_coef": 3.75,
166
- "title": "My API Generated Track"
167
- }
168
- )
169
-
170
- # The response contains URLs to the generated audio/video
171
- video_url, audio_url, seed = response.json()
172
- print(f"Generated music video: {video_url}")
173
- print(f"Generated audio file: {audio_url}")
174
- print(f"Seed used: {seed}")
175
- ### File Storage
176
-
177
- Generated files are automatically uploaded to a Hugging Face dataset repository, providing:
178
-
179
- - Persistent storage of your generated audio and video files
180
- - Shareable URLs for easy distribution
181
- - Organization by user, timestamp, and metadata
182
- - Automatic handling of file paths and naming
183
-
184
- The storage system supports various file types including audio (.wav, .mp3), video (.mp4), and images (.png, .jpg).
185
-
186
- ### Background Image Support
187
-
188
- You can now provide custom background images for your music videos:
189
- - Upload from your device
190
- - Use URL links to images (automatically downloaded and processed)
191
- - Choose between landscape and portrait orientations
192
- - Add title and generation settings overlay with customizable fonts and colors
193
-
194
- ## Python API
195
-
196
- We provide a simple API and 10 pre-trained models. The pre trained models are:
197
- - `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
198
- - `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium)
199
- - `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody)
200
- - `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large)
201
- - `melody large` (3.3B), text to music, and text+melody to music # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-melody-large)
202
- - `small stereo` (300M), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
203
- - `medium stereo` (1.5B), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-medium)
204
- - `melody stereo` (1.5B) text to music and text+melody to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-melody)
205
- - `large stereo` (3.3B), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-large)
206
- - `melody large stereo` (3.3B), text to music, and text+melody to music # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-melody-large)
207
-
208
- We observe the best trade-off between quality and compute with the `medium` or `melody` model.
209
- In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller
210
- GPUs will be able to generate short sequences, or longer sequences with the `small` model.
211
-
212
- **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
213
- You can install it with:apt-get install ffmpeg
214
- See after a quick example for using the API.
215
- import torchaudio
216
- from audiocraft.models import MusicGen
217
- from audiocraft.data.audio import audio_write
218
-
219
- model = MusicGen.get_pretrained('melody')
220
- model.set_generation_params(duration=8) # generate 8 seconds.
221
- wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
222
- descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
223
- wav = model.generate(descriptions) # generates 3 samples.
224
-
225
- melody, sr = torchaudio.load('./assets/bach.mp3')
226
- # generates using the melody from the given audio and the provided descriptions.
227
- wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
228
-
229
- for idx, one_wav in enumerate(wav):
230
- # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
231
- audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)## 🤗 Transformers Usage
232
-
233
- MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies
234
- and additional packages. Steps to get started:
235
-
236
- 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
237
- pip install git+https://github.com/huggingface/transformers.git
238
- 2. Run the following Python code to generate text-conditional audio samples:
239
- from transformers import AutoProcessor, MusicgenForConditionalGeneration
240
-
241
-
242
- processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
243
- model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
244
-
245
- inputs = processor(
246
- text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
247
- padding=True,
248
- return_tensors="pt",
249
- )
250
-
251
- audio_values = model.generate(**inputs, max_new_tokens=256)
252
- 3. Listen to the audio samples either in an ipynb notebook:
253
- from IPython.display import Audio
254
-
255
- sampling_rate = model.config.audio_encoder.sampling_rate
256
- Audio(audio_values[0].numpy(), rate=sampling_rate)
257
- Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
258
- import scipy
259
-
260
- sampling_rate = model.config.audio_encoder.sampling_rate
261
- scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
262
- For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the
263
- [MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on
264
- [Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb).
265
-
266
- ## User History
267
-
268
- User History is a plugin that you can add to your Spaces to cache generated images for your users.
269
-
270
- Key features:
271
- - 🤗 Sign in with Hugging Face
272
- - Save generated image, video, audio and document files with their metadata: prompts, timestamp, hyper-parameters, etc.
273
- - Export your history as zip.
274
- - Delete your history to respect privacy.
275
- - Compatible with Persistent Storage for long-term storage.
276
- - Admin panel to check configuration and disk usage .
277
-
278
- Useful links:
279
- - Demo: https://huggingface.co/spaces/Wauplin/gradio-user-history
280
- - README: https://huggingface.co/spaces/Wauplin/gradio-user-history/blob/main/README.md
281
- - Source file: https://huggingface.co/spaces/Wauplin/gradio-user-history/blob/main/user_history.py
282
- - Discussions: https://huggingface.co/spaces/Wauplin/gradio-user-history/discussions
283
-
284
- ![Image preview](./assets/screenshot.png)
285
-
286
- ## Model Card
287
-
288
- See [the model card page](./MODEL_CARD.md).
289
-
290
- ## FAQ
291
-
292
- #### Will the training code be released?
293
-
294
- Yes. We will soon release the training code for MusicGen and EnCodec.
295
-
296
-
297
- #### I need help on Windows
298
-
299
- @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
300
-
301
- #### I need help for running the demo on Colab
302
-
303
- Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo).
304
-
305
- ## Citation@article{copet2023simple,
306
- title={Simple and Controllable Music Generation},
307
- author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
308
- year={2023},
309
- journal={arXiv preprint arXiv:2306.05284},
310
- }
311
- ## License
312
- * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
313
- * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
314
- [arxiv]: https://arxiv.org/abs/2306.05284
315
-
316
- [arxiv]: https://arxiv.org/abs/2306.05284
317
- [musicgen_samples]: https://ai.honu.io/papers/musicgen/
318
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ ---
2
+ title: UnlimitedMusicGen
3
+ emoji: 🎼
4
+ colorFrom: gray
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 5.39.0
8
+ python_version: 3.12.8
9
+ app_file: app.py
10
+ pinned: true
11
+ license: creativeml-openrail-m
12
+ tags:
13
+ - mcp-server-track
14
+ - musicgen
15
+ - unlimited
16
+ - user history
17
+ - metadata
18
+ hf_oauth: true
19
+ disable_embedding: true
20
+ short_description: 'unlimited Audio generation with a few added features '
21
+ thumbnail: >-
22
+ https://cdn-uploads.huggingface.co/production/uploads/6346595c9e5f0fe83fc60444/Z8E8OaKV84zuVAvvGpMDJ.png
23
+ ---
24
+
25
+ [arxiv]: https://arxiv.org/abs/2306.05284
26
+ [musicgen_samples]: https://ai.honu.io/papers/musicgen/
27
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
28
+
29
+ # UnlimitedMusicGen
30
+ Charles Fettinger's modification of the Audiocraft project to enable unlimited Audio generation. I have added a few features to the original project to enable this. I have also added a few features to the gradio interface to make it easier to use.
31
+
32
+ Please review my other AI relalated spaces at https://huggingface.co/Surn
33
+
34
+ Check your video's generative metadata with https://mediaarea.net/en/MediaInfo
35
+
36
+ Also note that I wrote an extension to Gradio for the waveform in the video after v4.48.0 removed it.
37
+
38
+ The key update here is in the extend utility. We segment melody input and then condition the next segment with current tensors and tensors from the current time in the conditioning melody file.
39
+ This allows us to follow the same arraingement of the original melody.
40
+
41
+ **Thank you Huggingface for the community grant to run this project**!!
42
+
43
+ ## Key Features
44
+
45
+ - **Unlimited Audio Generation**: Generate music of any length by seamlessly stitching together segments
46
+ - **User History**: Save and manage your generated music and access it later
47
+ - **File Storage**: Generated files are automatically stored in a Hugging Face repository with shareable URLs
48
+ - **Rich Metadata**: Each generated file includes detailed metadata about the generation parameters
49
+ - **API Access**: Generate music programmatically using the REST API
50
+ - **Background Customization**: Use custom images and settings for your music videos
51
+ - **Melody Conditioning**: Use existing music to guide the generation process
52
+
53
+ # Audiocraft
54
+ ![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
55
+ ![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
56
+ ![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)
57
+
58
+ Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.
59
+
60
+ ## MusicGen
61
+
62
+ Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
63
+ Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
64
+ all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
65
+ them in parallel, thus having only 50 auto-regressive steps per second of audio.
66
+ Check out our [sample page][musicgen_samples] or test the available demo!
67
+
68
+ <a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing">
69
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
70
+ </a>
71
+ <a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
72
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
73
+ </a>
74
+ <br>
75
+
76
+ We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
77
+
78
+ ## Installation
79
+ Audiocraft requires Python 3.9, PyTorch 2.1.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
80
+ # Best to make sure you have torch installed first, in particular before installing xformers.
81
+ # Don't run this if you already have PyTorch installed.
82
+ pip install 'torch>=2.1'
83
+ # Then proceed to one of the following
84
+ pip install -U audiocraft # stable release
85
+ pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
86
+ pip install -e . # or if you cloned the repo locally
87
+ ## Usage
88
+ We offer a number of way to interact with MusicGen:
89
+ 1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/Surn/UnlimitedMusicGen) (huge thanks to all the HF team for their support).
90
+ 2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing).
91
+ 3. You can use the gradio demo locally by running `python app.py`.
92
+ 4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU).
93
+ 5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly
94
+ updated with contributions from @camenduru and the community.
95
+ 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
96
+
97
+ ### Advanced Usage
98
+
99
+ #### Programmatic Generation via API
100
+
101
+ The `predict_simple` API endpoint allows generating music without using the UI:
102
+ import requests
103
+
104
+ # Example API call
105
+ response = requests.post(
106
+ "https://huggingface.co/spaces/Surn/UnlimitedMusicGen/api/predict_simple",
107
+ json={
108
+ "model": "stereo-medium", # Choose your model
109
+ "text": "Epic orchestral soundtrack with dramatic strings and percussion",
110
+ "duration": 60, # Duration in seconds
111
+ "topk": 250,
112
+ "topp": 0, # 0 means use topk instead
113
+ "temperature": 0.8,
114
+ "cfg_coef": 4.0,
115
+ "seed": 42, # Use -1 for random seed
116
+ "overlap": 2, # Seconds of overlap between segments
117
+ "video_orientation": "Landscape" # or "Portrait"
118
+ }
119
+ )
120
+
121
+ # URLs to the generated content
122
+ video_url, audio_url, seed = response.json()
123
+ #### Custom Background Images
124
+
125
+ You can use your own background images for the music video:
126
+
127
+ 1. Upload an image through the UI
128
+ 2. Or specify an image URL in the API call:response = requests.post(
129
+ "https://huggingface.co/spaces/Surn/UnlimitedMusicGen/api/predict_simple",
130
+ json={
131
+ # ... other parameters
132
+ "background": "https://example.com/your-image.jpg",
133
+ "video_orientation": "Landscape"
134
+ }
135
+ )
136
+ ### More info about Top-k, Top-p, Temperature and Classifier Free Guidance from ChatGPT
137
+
138
+ Top-k: Top-k is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
139
+
140
+ Top-p (or nucleus sampling): Top-p, also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
141
+
142
+ Temperature: Temperature is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
143
+
144
+ Classifier-Free Guidance: Classifier-Free Guidance refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.
145
+
146
+ These parameters, such as top-k, top-p, temperature, and classifier-free guidance, provide different ways to influence the output of a music generation model and strike a balance between creativity, diversity, coherence, and control. The specific values for these parameters can be tuned based on the desired outcome and user preferences.
147
+
148
+ ## API and Storage Integration
149
+
150
+ UnlimitedMusicGen now offers enhanced API capabilities and file storage integration with Hugging Face repositories:
151
+
152
+ ### REST API Access
153
+
154
+ The application exposes a simple REST API endpoint through Gradio that allows you to generate music programmatically:
155
+ import requests
156
+
157
+ # Basic API call example
158
+ response = requests.post(
159
+ "https://your-app-url/api/predict_simple",
160
+ json={
161
+ "model": "medium",
162
+ "text": "4/4 120bpm electronic music with driving bass",
163
+ "duration": 30,
164
+ "temperature": 0.7,
165
+ "cfg_coef": 3.75,
166
+ "title": "My API Generated Track"
167
+ }
168
+ )
169
+
170
+ # The response contains URLs to the generated audio/video
171
+ video_url, audio_url, seed = response.json()
172
+ print(f"Generated music video: {video_url}")
173
+ print(f"Generated audio file: {audio_url}")
174
+ print(f"Seed used: {seed}")
175
+ ### File Storage
176
+
177
+ Generated files are automatically uploaded to a Hugging Face dataset repository, providing:
178
+
179
+ - Persistent storage of your generated audio and video files
180
+ - Shareable URLs for easy distribution
181
+ - Organization by user, timestamp, and metadata
182
+ - Automatic handling of file paths and naming
183
+
184
+ The storage system supports various file types including audio (.wav, .mp3), video (.mp4), and images (.png, .jpg).
185
+
186
+ ### Background Image Support
187
+
188
+ You can now provide custom background images for your music videos:
189
+ - Upload from your device
190
+ - Use URL links to images (automatically downloaded and processed)
191
+ - Choose between landscape and portrait orientations
192
+ - Add title and generation settings overlay with customizable fonts and colors
193
+
194
+ ## Python API
195
+
196
+ We provide a simple API and 10 pre-trained models. The pre trained models are:
197
+ - `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
198
+ - `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium)
199
+ - `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody)
200
+ - `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large)
201
+ - `melody large` (3.3B), text to music, and text+melody to music # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-melody-large)
202
+ - `small stereo` (300M), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
203
+ - `medium stereo` (1.5B), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-medium)
204
+ - `melody stereo` (1.5B) text to music and text+melody to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-melody)
205
+ - `large stereo` (3.3B), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-large)
206
+ - `melody large stereo` (3.3B), text to music, and text+melody to music # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-melody-large)
207
+
208
+ We observe the best trade-off between quality and compute with the `medium` or `melody` model.
209
+ In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller
210
+ GPUs will be able to generate short sequences, or longer sequences with the `small` model.
211
+
212
+ **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
213
+ You can install it with:apt-get install ffmpeg
214
+ See after a quick example for using the API.
215
+ import torchaudio
216
+ from audiocraft.models import MusicGen
217
+ from audiocraft.data.audio import audio_write
218
+
219
+ model = MusicGen.get_pretrained('melody')
220
+ model.set_generation_params(duration=8) # generate 8 seconds.
221
+ wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
222
+ descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
223
+ wav = model.generate(descriptions) # generates 3 samples.
224
+
225
+ melody, sr = torchaudio.load('./assets/bach.mp3')
226
+ # generates using the melody from the given audio and the provided descriptions.
227
+ wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
228
+
229
+ for idx, one_wav in enumerate(wav):
230
+ # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
231
+ audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)## 🤗 Transformers Usage
232
+
233
+ MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies
234
+ and additional packages. Steps to get started:
235
+
236
+ 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
237
+ pip install git+https://github.com/huggingface/transformers.git
238
+ 2. Run the following Python code to generate text-conditional audio samples:
239
+ from transformers import AutoProcessor, MusicgenForConditionalGeneration
240
+
241
+
242
+ processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
243
+ model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
244
+
245
+ inputs = processor(
246
+ text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
247
+ padding=True,
248
+ return_tensors="pt",
249
+ )
250
+
251
+ audio_values = model.generate(**inputs, max_new_tokens=256)
252
+ 3. Listen to the audio samples either in an ipynb notebook:
253
+ from IPython.display import Audio
254
+
255
+ sampling_rate = model.config.audio_encoder.sampling_rate
256
+ Audio(audio_values[0].numpy(), rate=sampling_rate)
257
+ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
258
+ import scipy
259
+
260
+ sampling_rate = model.config.audio_encoder.sampling_rate
261
+ scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
262
+ For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the
263
+ [MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on
264
+ [Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb).
265
+
266
+ ## User History
267
+
268
+ User History is a plugin that you can add to your Spaces to cache generated images for your users.
269
+
270
+ Key features:
271
+ - 🤗 Sign in with Hugging Face
272
+ - Save generated image, video, audio and document files with their metadata: prompts, timestamp, hyper-parameters, etc.
273
+ - Export your history as zip.
274
+ - Delete your history to respect privacy.
275
+ - Compatible with Persistent Storage for long-term storage.
276
+ - Admin panel to check configuration and disk usage .
277
+
278
+ Useful links:
279
+ - Demo: https://huggingface.co/spaces/Wauplin/gradio-user-history
280
+ - README: https://huggingface.co/spaces/Wauplin/gradio-user-history/blob/main/README.md
281
+ - Source file: https://huggingface.co/spaces/Wauplin/gradio-user-history/blob/main/user_history.py
282
+ - Discussions: https://huggingface.co/spaces/Wauplin/gradio-user-history/discussions
283
+
284
+ ![Image preview](./assets/screenshot.png)
285
+
286
+ ## Model Card
287
+
288
+ See [the model card page](./MODEL_CARD.md).
289
+
290
+ ## FAQ
291
+
292
+ #### Will the training code be released?
293
+
294
+ Yes. We will soon release the training code for MusicGen and EnCodec.
295
+
296
+
297
+ #### I need help on Windows
298
+
299
+ @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
300
+
301
+ #### I need help for running the demo on Colab
302
+
303
+ Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo).
304
+
305
+ ## Citation@article{copet2023simple,
306
+ title={Simple and Controllable Music Generation},
307
+ author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
308
+ year={2023},
309
+ journal={arXiv preprint arXiv:2306.05284},
310
+ }
311
+ ## License
312
+ * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
313
+ * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
314
+ [arxiv]: https://arxiv.org/abs/2306.05284
315
+
316
+ [arxiv]: https://arxiv.org/abs/2306.05284
317
+ [musicgen_samples]: https://ai.honu.io/papers/musicgen/
318
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference