--- base_model: - Qwen/Qwen2.5-1.5B-Instruct datasets: - jjzha/fs1-2708 language: - en library_name: transformers license: mit pipeline_tag: text-generation tags: - en - factuality - thinking - reasoning --- ## Model Details **Qwen2.5-1.5B-Instruct-fs1-2708** is a 1.5B parameter language model designed for English text generation tasks. This model builds upon [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) and is further fine-tuned on the [jjzha/fs1-2708](https://huggingface.co/datasets/jjzha/fs1-2708) dataset. It focuses on enhancing factual reasoning abilities in generated text. ### Model Developers This model was fine-tuned by independent contributors using the Hugging Face Transformers library. ### Variations This is a fine-tuned version of the `Qwen2.5-1.5B-Instruct` model. No additional variants or intermediate checkpoints are currently provided. ### Input Text only. ### Output Text only. ### Model Architecture The model is an auto-regressive, transformer-based language model, fine-tuned with supervised learning to improve instruction-following and reasoning capabilities in English. ### Model Dates Fine-tuning was performed in February-April 2025. The base and instruct model was originally released by the Qwen team. ### License This model is released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). ### Research Paper [Scaling Reasoning can Improve Factuality in Large Language Models](https://huggingface.co/papers/2505.11140) ## Intended Use & Limitations ### Intended Use Cases This model is intended for English language text generation tasks that require improved factual accuracy and reasoning. It is suitable for research, experimentation, and development of assistant-like chat applications. The instruction-tuned base model follows the Qwen instruction format, and this fine-tuned version preserves that behavior. ### Limitations Despite improvements, the model may still produce factually incorrect or logically inconsistent outputs. It is not recommended for high-stakes decision-making applications without human oversight. Always verify generated content before relying on it in critical scenarios. ## Hardware and Software ### Training Factors Fine-tuning was performed using the Hugging Face Transformers library and Pytorch FSDP. We used a multinode and multigpu setup with AMD MI250x GPUs. ### Carbon Footprint We only have aggregated statistics of all models fine-tuned and inferences. A cumulative of 6,500 GPU hours of computation was performed on AMD MI250x GPU modules, which has a TDP of 500 Watts. The experiments were ran from February to April 2025. During this time, the average carbon efficiency in Finland was 0.085 kg/kW h. This means we released about 276 kg of CO2 equivalent. ## Training Data ### Overview Fine-tuning was performed on the [jjzha/fs1-2708](https://huggingface.co/datasets/jjzha/fs1-2708) dataset, which focuses on enhancing reasoning and factual accuracy. ## Evaluation Results See paper for results. ## Citation ``` @misc{zhang2025scalingreasoningimprovefactuality, title={Scaling Reasoning can Improve Factuality in Large Language Models}, author={Mike Zhang and Johannes Bjerva and Russa Biswas}, year={2025}, eprint={2505.11140}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.11140}, } ``` Code: https://github.com/jjzha/fs1