|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- allenai/dolma |
|
language: |
|
- en |
|
base_model: |
|
- mistralai/Mistral-7B-v0.3 |
|
--- |
|
# TESS 2 v0.3 Base |
|
|
|
This model is the diffusion adapted TESS 2. This model is a simplex-based diffusion model adapted from Mistral v0.3 7B, further trained on Dolma 1.7. |
|
For more details, please check out our paper [TESS-2: A Large-Scale, Generalist Diffusion Language Model](https://arxiv.org/abs/2502.13917). |
|
This is the model based on Mistral v0.3. |
|
|
|
**This is the diffusion-adapted base model, which has not yet undergone instruction tuning. We recommend further tuning this model on your dataset of interest, or checking out the [instruction tuned version](https://huggingface.co/hamishivi/tess2).** |
|
|
|
This model will only work with our custom codebase found [here](https://github.com/hamishivi/tess-2) -- please go there to see details on how to run training. |
|
|
|
## Citation |
|
|
|
If you find this work useful, please cite this work as follows. |
|
|
|
```bibtex |
|
@misc{taeivison2025tess2, |
|
title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}}, |
|
author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan}, |
|
year={2025}, |
|
eprint={2502.13917}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2502.13917}, |
|
} |
|
``` |