Pendrokar commited on
Commit
be6c16d
·
verified ·
1 Parent(s): 869e063

link to HF space

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -41,7 +41,7 @@ pipeline_tag: text-to-speech
41
 
42
  GitHub project: https://github.com/DanRuta/xVA-Synth
43
 
44
- The base model for training other xVASynth's "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta.
45
 
46
  `The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script.` - Dan Ruta
47
 
@@ -52,6 +52,8 @@ xVAPitch_5820651 model sample: <audio controls>
52
  Your browser does not support the audio element.
53
  </audio>
54
 
 
 
55
  Papers:
56
  - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103
57
  - YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418
 
41
 
42
  GitHub project: https://github.com/DanRuta/xVA-Synth
43
 
44
+ The base model for training other [🤗 xVASynth's](https://huggingface.co/spaces/Pendrokar/xVASynth-TTS) "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan ["@dr00392"](https://huggingface.co/dr00392) Ruta.
45
 
46
  `The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script.` - Dan Ruta
47
 
 
52
  Your browser does not support the audio element.
53
  </audio>
54
 
55
+ There are hundreds of fine-tuned models on the web. But most of them use non-permissive datasets.
56
+
57
  Papers:
58
  - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103
59
  - YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418