Pendrokar
/

xvapitch

speech-to-speech

voice conversion

Model card Files Files and versions Community

Pendrokar commited on Apr 30

Commit

be6c16d

·

verified ·

1 Parent(s): 869e063

link to HF space

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ pipeline_tag: text-to-speech
 GitHub project: https://github.com/DanRuta/xVA-Synth
-The base model for training other xVASynth's "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta.
 `The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script.` - Dan Ruta
@@ -52,6 +52,8 @@ xVAPitch_5820651 model sample: <audio controls>
   Your browser does not support the audio element.
 </audio>
 Papers:
 - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103
 - YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418

 GitHub project: https://github.com/DanRuta/xVA-Synth
+The base model for training other [🤗 xVASynth's](https://huggingface.co/spaces/Pendrokar/xVASynth-TTS) "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta.
 `The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script.` - Dan Ruta
   Your browser does not support the audio element.
 </audio>
+There are hundreds of fine-tuned models on the web. But most of them use non-permissive datasets.
 Papers:
 - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103
 - YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418