File size: 834 Bytes
7934b29
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
Models
======

End-to-End ASR models are typically of encoder-decoder style, where the encoder does acoustic 
modeling i.e., converting speech wavform into features, and the decoder converts those features into 
text. Encoder contains the bulk of trainable parameters and is usually the focus of SSL in ASR. 
Thus, any architecture that can be used as encoder in ASR models can be pre-trained using SSL. For an 
overview of model architectures that are currently supported in NeMo's ASR's collection, refer 
to `ASR Models <../models.html>`__. Note that SSL also uses encoder-decoder style of models. During 
down-stream fine-tuning, the encoder is retained where as the decoder (used during SSL) is replaced 
with down-stream task specific module. Refer to `checkpoints <./results.html>`__ to see how this is 
accomplished in NeMo.