|
Models |
|
====== |
|
|
|
End-to-End ASR models are typically of encoder-decoder style, where the encoder does acoustic |
|
modeling i.e., converting speech wavform into features, and the decoder converts those features into |
|
text. Encoder contains the bulk of trainable parameters and is usually the focus of SSL in ASR. |
|
Thus, any architecture that can be used as encoder in ASR models can be pre-trained using SSL. For an |
|
overview of model architectures that are currently supported in NeMo's ASR's collection, refer |
|
to `ASR Models <../models.html>`__. Note that SSL also uses encoder-decoder style of models. During |
|
down-stream fine-tuning, the encoder is retained where as the decoder (used during SSL) is replaced |
|
with down-stream task specific module. Refer to `checkpoints <./results.html>`__ to see how this is |
|
accomplished in NeMo. |
|
|