NeMo / docs /source /asr /ssl /models.rst
camenduru's picture
thanks to NVIDIA ❤
7934b29
raw
history blame contribute delete
834 Bytes
Models
======
End-to-End ASR models are typically of encoder-decoder style, where the encoder does acoustic
modeling i.e., converting speech wavform into features, and the decoder converts those features into
text. Encoder contains the bulk of trainable parameters and is usually the focus of SSL in ASR.
Thus, any architecture that can be used as encoder in ASR models can be pre-trained using SSL. For an
overview of model architectures that are currently supported in NeMo's ASR's collection, refer
to `ASR Models <../models.html>`__. Note that SSL also uses encoder-decoder style of models. During
down-stream fine-tuning, the encoder is retained where as the decoder (used during SSL) is replaced
with down-stream task specific module. Refer to `checkpoints <./results.html>`__ to see how this is
accomplished in NeMo.