The Impact of Token Granularity on the Predictive Power of Language Model Surprisal

Introduction

This is the model repository for the paper The Impact of Token Granularity on the Predictive Power of Language Model Surprisal, featuring Mamba-2 language models trained on the English training section of the Wiki-40B dataset. Models of three different sizes (6_8_256, 12_16_512, 24_24_768) were trained on the same data tokenized using 11 different unigram language model tokenizers (vocabulary sizes of 256, 512, 1k, 2k, 4k, 8k, 16k, 32k, 48k, 64k, 128k), resulting in a total of 33 models. The weights at both initialization ("_0") and after training ("_10063") are released.

Companion Repository

Please refer to the companion GitHub repository for further instructions on how to load and use these models.

Questions

For questions or concerns, please contact Byung-Doh Oh (oh.b@nyu.edu).

byungdoh
/

ssm-token-granularity

The Impact of Token Granularity on the Predictive Power of Language Model Surprisal

Introduction

Companion Repository

Questions

Dataset used to train byungdoh/ssm-token-granularity