hamishivi
/

tess2-v0.3-base

Model card Files Files and versions Community

tess2-v0.3-base / README.md

hamishivi's picture

Update README.md

b437e44 verified 3 months ago

|

history blame contribute delete

1.27 kB

	---
	license: apache-2.0
	datasets:
	- allenai/dolma
	language:
	- en
	base_model:
	- mistralai/Mistral-7B-v0.3
	---
	# TESS 2 v0.3 Base

	This model is the diffusion adapted TESS 2. This model is a simplex-based diffusion model adapted from Mistral v0.3 7B, further trained on Dolma 1.7.
	For more details, please check out our paper [TESS-2: A Large-Scale, Generalist Diffusion Language Model](https://arxiv.org/abs/2502.13917).
	This is the model based on Mistral v0.3.

	This is the diffusion-adapted base model, which has not yet undergone instruction tuning. We recommend further tuning this model on your dataset of interest, or checking out the [instruction tuned version](https://huggingface.co/hamishivi/tess2).

	This model will only work with our custom codebase found [here](https://github.com/hamishivi/tess-2) -- please go there to see details on how to run training.

	## Citation

	If you find this work useful, please cite this work as follows.

	```bibtex
	@misc{taeivison2025tess2,
	title={{TESS 2: A Large-Scale Generalist Diffusion Language Model}},
	author={Jaesung Tae and Hamish Ivison and Sachin Kumar and Arman Cohan},
	year={2025},
	eprint={2502.13917},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.13917},
	}
	```