|
Self-Supervised Learning |
|
================================= |
|
|
|
Self-Supervised Learning (SSL) refers to the problem of learning without explicit labels. As |
|
any learning process require feedback, without explit labels, SSL derives supervisory signals from |
|
the data itself. The general ideal of SSL is to predict any hidden part (or property) of the input |
|
from observed part of the input (e.g., filling in the blanks in a sentence or predicting whether |
|
an image is upright or inverted). |
|
|
|
SSL for speech/audio understanding broadly falls into either contrastive or reconstruction |
|
based approaches. In contrastive methods, models learn by distinguising between true and distractor |
|
tokens (or latents). Examples of contrastive approaches are Contrastive Predictive Coding (CPC), |
|
Masked Language Modeling (MLM) etc. In reconstruction methods, models learn by directly estimating |
|
the missing (intentionally leftout) portions of the input. Masked Reconstruction, Autoregressive |
|
Predictive Coding (APC) are few examples. |
|
|
|
In the recent past, SSL has been a major benefactor in improving Acoustic Modeling (AM), i.e., the |
|
encoder module of neural ASR models. Here too, majority of SSL effort is focused on improving AM. |
|
While it is common that AM is the focus of SSL in ASR, it can also be utilized in improving other parts of |
|
ASR models (e.g., predictor module in transducer based ASR models). |
|
|
|
The full documentation tree is as follows: |
|
|
|
.. toctree:: |
|
:maxdepth: 8 |
|
|
|
models |
|
datasets |
|
results |
|
configs |
|
api |
|
resources |
|
|
|
.. include:: resources.rst |
|
|