--- language: - en library: xvasynth tags: - audio - text-to-speech - speech-to-speech - voice conversion - tts pipeline_tag: text-to-speech --- GitHub project, inference Windows/Electron app: https://github.com/DanRuta/xVA-Synth Fine-tuning app: https://github.com/DanRuta/xva-trainer The base model for training other [🤗 xVASynth's](https://huggingface.co/spaces/Pendrokar/xVASynth-TTS) FastPitch 1.1 type models (v2). Used to fine tune models with xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta. v3 models are called [xVAPitch](https://huggingface.co/Pendrokar/xvapitch) and are not based on FastPitch. There are hundreds of fine-tuned models on the web. But most of them use non-permissive datasets. ## xVASynth Editor v2 walkthrough video ▶: [![Video](https://img.youtube.com/vi/W-9SFoNuTtM/hqdefault.jpg)](https://www.youtube.com/watch?v=W-9SFoNuTtM) ## xVATrainer v1 walkthrough video ▶: [![Video](https://img.youtube.com/vi/PXv_SeTWk2M/hqdefault.jpg)](https://www.youtube.com/watch?v=PXv_SeTWk2M) ## References - [1] [FastPitch: Parallel Text-to-speech with Pitch Prediction](https://arxiv.org/abs/2006.06873) - [2] [One TTS Alignment To Rule Them All](https://arxiv.org/abs/2108.10447) Used datasets: Unknown/Non-permissiable data