xvapitch / README.md
Pendrokar's picture
videos
24ca692 verified
metadata
language:
  - en
  - de
  - es
  - it
  - nl
  - pt
  - pl
  - ro
  - sv
  - da
  - fi
  - hu
  - el
  - fr
  - ru
  - uk
  - tr
  - ar
  - hi
  - jp
  - ko
  - zh
  - vi
  - la
  - ha
  - sw
  - yo
  - wo
library: xvasynth
tags:
  - emotion
  - audio
  - text-to-speech
  - speech-to-speech
  - voice conversion
  - tts
pipeline_tag: text-to-speech

GitHub project, inference Windows/Electron app: https://github.com/DanRuta/xVA-Synth

Fine-tuning app: https://github.com/DanRuta/xva-trainer

The base model for training other 🤗 xVASynth's "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan "@dr00392" Ruta.

The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script. - Dan Ruta

When used in xVASynth editor, it is an American Adult Male voice. Default pacing is too fast and has to be adjusted.

xVAPitch_5820651 model sample:

There are hundreds of fine-tuned models on the web. But most of them use non-permissive datasets.

xVASynth Editor v3 walkthrough video ▶:

Video

xVATrainer v1 walkthrough video ▶:

Video

Papers:

Referenced papers within code:

Used datasets: Unknown/Non-permissiable data