---
language:
  - en
library: xvasynth
tags:
  - audio
  - text-to-speech
  - speech-to-speech
  - voice conversion 
  - tts
pipeline_tag: text-to-speech
---

GitHub project, inference Windows/Electron app: https://github.com/DanRuta/xVA-Synth

Fine-tuning app: https://github.com/DanRuta/xva-trainer

The base model for training other [🤗 xVASynth's](https://huggingface.co/spaces/Pendrokar/xVASynth-TTS) FastPitch 1.1 type models (v2). Used to fine tune models with xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta.

v3 models are called [xVAPitch](https://huggingface.co/Pendrokar/xvapitch) and are not based on FastPitch.

There are hundreds of fine-tuned models on the web. But most of them use non-permissive datasets.

## xVASynth Editor v2 walkthrough video ▶:
[![Video](https://img.youtube.com/vi/W-9SFoNuTtM/hqdefault.jpg)](https://www.youtube.com/watch?v=W-9SFoNuTtM)

## xVATrainer v1 walkthrough video ▶:
[![Video](https://img.youtube.com/vi/PXv_SeTWk2M/hqdefault.jpg)](https://www.youtube.com/watch?v=PXv_SeTWk2M)

## References
- [1] [FastPitch: Parallel Text-to-speech with Pitch Prediction](https://arxiv.org/abs/2006.06873)
- [2] [One TTS Alignment To Rule Them All](https://arxiv.org/abs/2108.10447)

Used datasets: Unknown/Non-permissiable data