---
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
datasets:
- jjzha/fs1-2708
language:
- en
library_name: transformers
license: mit
pipeline_tag: text-generation
tags:
- en
- factuality
- thinking
- reasoning
---

## Model Details

**Qwen2.5-1.5B-Instruct-fs1-2708** is a 1.5B parameter language model designed for English text generation tasks. This model builds upon [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) and is further fine-tuned on the [jjzha/fs1-2708](https://huggingface.co/datasets/jjzha/fs1-2708) dataset. It focuses on enhancing factual reasoning abilities in generated text.

### Model Developers

This model was fine-tuned by independent contributors using the Hugging Face Transformers library.

### Variations

This is a fine-tuned version of the `Qwen2.5-1.5B-Instruct` model. No additional variants or intermediate checkpoints are currently provided.

### Input

Text only.

### Output

Text only.

### Model Architecture

The model is an auto-regressive, transformer-based language model, fine-tuned with supervised learning to improve instruction-following and reasoning capabilities in English.

### Model Dates

Fine-tuning was performed in February-April 2025. The base and instruct model was originally released by the Qwen team.

### License

This model is released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).

### Research Paper

[Scaling Reasoning can Improve Factuality in Large Language Models](https://huggingface.co/papers/2505.11140)

## Intended Use & Limitations

### Intended Use Cases

This model is intended for English language text generation tasks that require improved factual accuracy and reasoning. It is suitable for research, experimentation, and development of assistant-like chat applications.

The instruction-tuned base model follows the Qwen instruction format, and this fine-tuned version preserves that behavior.

### Limitations

Despite improvements, the model may still produce factually incorrect or logically inconsistent outputs. It is not recommended for high-stakes decision-making applications without human oversight. Always verify generated content before relying on it in critical scenarios.

## Hardware and Software

### Training Factors

Fine-tuning was performed using the Hugging Face Transformers library and Pytorch FSDP. We used a multinode and multigpu setup with AMD MI250x GPUs.

### Carbon Footprint

We only have aggregated statistics of all models fine-tuned and inferences. A cumulative of 6,500 GPU hours of computation was performed on AMD MI250x GPU
modules, which has a TDP of 500 Watts. The experiments were ran from February to April 2025. During this time, the average carbon efficiency in Finland was 0.085 kg/kW h.
This means we released about 276 kg of CO2 equivalent.

## Training Data

### Overview

Fine-tuning was performed on the [jjzha/fs1-2708](https://huggingface.co/datasets/jjzha/fs1-2708) dataset, which focuses on enhancing reasoning and factual accuracy.

## Evaluation Results

See paper for results.

## Citation

```
@misc{zhang2025scalingreasoningimprovefactuality,
      title={Scaling Reasoning can Improve Factuality in Large Language Models}, 
      author={Mike Zhang and Johannes Bjerva and Russa Biswas},
      year={2025},
      eprint={2505.11140},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.11140}, 
}
```

Code: https://github.com/jjzha/fs1