Audio-to-Audio
English

Jasper's RAVE Models

A collection of RAVE models trained by me, using RAVE (Realtime Audio Variational autoEncoder) from acids-ircam, article: https://arxiv.org/abs/2111.05011

Used in some of my own projects, including the nn.terrain latent generator, soundwalking the latent space workshop, the sketch-to-sound demo.

All models work with the interactive prior nn.terrain~. All models have encode and decode methods and tested with the Max external nn~.

Important Note

When exporting, the Gaussian noise in the encoder's reparametrize layer is removed, so that the latent trajectory encoded from the same audio sample stays the same. This benefits the nn.terrain~ prior because the latent trajectory is used as training data there.

Model Descriptions

Please see description for each model for training datasets, training configurations, and exporting configurations.

guitar_picking_dm_b2048_r44100_z8_causal.ts
Training data
Acoustic and electric guitar fingerpicking in the D minor scale. Collected from released/unreleased dance and electronic tracks I made under the name Alaska Winter. Mostly self-recorded, with some additional samples from Splice sound library. Unmixed, recorded dry, approximately 2h in total after trimming silence.
Configuration
RAVE v2, 44.1kHz, block size 2048, causal, variational regularization. Exported with 8 latent dimensions, streaming is enabled.

gtsinger_b2048_r44100_z16_noncausal.ts
Training data
A subset of the singing voice in the GTSinger dataset. Recordings of spoken lyrics were removed, all other recordings (with/without singing techniques, control groups) were included. Approximately 26h after trimming silence.
Configuration
RAVE v2, 44.1kHz, block size 2048, noncausal, Wasserstein regularization. Exported with 16 latent dimensions, streaming is enabled.

aam_drum_b2048_r44100_z16_noncausal.ts
Training data
All drums recordings in the Artificial Audio Multitracks dataset. Approximately 104h in total after trimming silence.
Configuration
RAVE v2, 44.1kHz, block size 2048, noncausal, Wasserstein regularization, capacity 128. Exported with 16 latent dimensions, streaming is enabled.

aam_bass_b2048_r44100_z16_noncausal.ts
Training data
All bass recordings in the Artificial Audio Multitracks dataset. Approximately 114h in total after trimming silence.
Configuration
RAVE v2, 44.1kHz, block size 2048, noncausal, Wasserstein regularization, capacity 128. Exported with 16 latent dimensions, streaming is enabled.

aam_string_b2048_r44100_z16_noncausal.ts
Training data
All string recordings in the Artificial Audio Multitracks dataset, including viola, cello, erhu, violin, ukulele, guitar, balalaika, sitar, and jinghu. Approximately 144h in total after trimming silence.
Configuration
RAVE v2, 44.1kHz, block size 2048, noncausal, Wasserstein regularization. Exported with 16 latent dimensions, streaming is enabled.

librispeech100_b2048_r44100_z8_causal.ts
Training data
LibriSpeech train-clean-100.tar.gz: 100 hours of 16kHz read English speech.
Configuration
RAVE v2, 44.1kHz, block size 2048, causal, variational regularization. Exported with 8 latent dimensions, streaming is enabled.

librispeech100_b2048_r44100_z8_noncausal.ts
Training data
LibriSpeech train-clean-100.tar.gz: 100 hours of 16kHz read English speech.
Configuration
RAVE v2, 44.1kHz, block size 2048, noncausal, variational regularization. Exported with 8 latent dimensions, streaming is enabled.

aam_brass_sax_b2048_r44100_z8_noncausal.ts
Training data
All Brass and Sax recordings in the Artificial Audio Multitracks dataset. Approximately 60h in total after trimming silence.
Configuration
RAVE v2, 44.1kHz, block size 2048, noncausal, variational regularization. Exported with 8 latent dimensions, streaming is enabled.

Info

Variational and Wasserstein regularization: (taken from the tutorial on IRCAM forum) wasserstein regularization may provide better reconstruction results, at the price of a more messy latent space (no smoothness in latent exploration).

Citation

@misc{shuoyang_jasper_zheng_2025,
    author       = { Shuoyang Jasper Zheng },
    title        = { jaspers-rave-models (Revision 45e2ea8) },
    year         = 2025,
    url          = { https://huggingface.co/shuoyang-zheng/jaspers-rave-models },
    doi          = { 10.57967/hf/5589 },
    publisher    = { Hugging Face }
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using shuoyang-zheng/jaspers-rave-models 2