ggmbr commited on
Commit
00bcf21
·
1 Parent(s): bbef05c
Files changed (1) hide show
  1. README.md +25 -5
README.md CHANGED
@@ -14,13 +14,33 @@ datasets:
14
  ---
15
 
16
  # Non-timbral Embeddings extractor
17
- This model has been derived from the self-supervised pretrained model WavLM-large [lien]. It produces embeddings that represent the non-timbral traits (prosody, accent, ...) of a speaker,
18
- which can be used the same way as for a classical ASV (automatic speaker verification) embeddings, except that only the non-timbral traits are compared.
 
19
 
20
- See section below for an eplanation on how to use these embeddings.
21
 
22
- # Citation
23
- paper
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  # Usage
26
  code
 
14
  ---
15
 
16
  # Non-timbral Embeddings extractor
17
+ This model produces embeddings that represent the non-timbral traits (prosody, accent, ...) of a speaker's voice. These embeddings can be used the same way as for a classical
18
+ speaker verification (ASV): to compare two voice signals, extract an embeddings for each of them and compute the cosine similarity between the two embeddings.
19
+ The main difference with classical ASV embeddings is that here only the non-timbral traits are compared.
20
 
21
+ The model has been derived from the self-supervised pretrained model [WavLM-large](https://huggingface.co/microsoft/wavlm-large).
22
 
23
+ See section below for an eplanation on how to compute the non-timbral embeddings.
24
+
25
+ # Publication
26
+ Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled
27
+ [Disentangling prosody and timbre embeddings via voice conversion](https://www.isca-archive.org/interspeech_2024/gengembre24_interspeech.pdf).
28
+
29
+ ## Citation
30
+ Gengembre, N., Le Blouch, O., Gendrot, C. (2024) Disentangling prosody and timbre embeddings via voice conversion. Proc. Interspeech 2024, 2765-2769, doi: 10.21437/Interspeech.2024-207
31
+
32
+ ## BibteX citation
33
+ '''
34
+ @inproceedings{gengembre24_interspeech,
35
+ title = {Disentangling prosody and timbre embeddings via voice conversion},
36
+ author = {Nicolas Gengembre and Olivier {Le Blouch} and Cédric Gendrot},
37
+ year = {2024},
38
+ booktitle = {Interspeech 2024},
39
+ pages = {2765--2769},
40
+ doi = {10.21437/Interspeech.2024-207},
41
+ issn = {2958-1796},
42
+ }
43
+ '''
44
 
45
  # Usage
46
  code