Model Card for Qwen2.5-0.5B-Instruct-GSM8K-SFT-full

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct specifically optimized for mathematical reasoning tasks on the GSM8K benchmark.

The model was trained using a Reasoning Distillation methodology. A powerful teacher model, Gemini-2.5-Flash, was used to generate high-quality, step-by-step (Chain-of-Thought) solutions for the GSM8K dataset. This model was then trained via Supervised Fine-Tuning (SFT) on this enriched data to directly learn the teacher's reasoning patterns.

This SFT w/ Full FT version represents the best-performing model from a series of experiments, demonstrating a notable improvement over the baseline model in mathematical problem-solving.

How to Get Started

You can use this model with the transformers library. For best results, please use the prompt format the model was trained on.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "kkh27/Qwen2.5-0.5B-Instruct-GSM8K-SFT-full"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

# Example math problem from GSM8K
question = "Lloyd has an egg farm. His chickens produce 252 eggs per day and he sells them for $2 per dozen. How much does Lloyd make on eggs per week?"

# Format the prompt using the specified template
prompt = f"Question: {question}\nAnswer:"

model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Clean up the response to show only the generated answer
answer = response.split("Answer:")[1].strip()
print(answer)
# Expected output similar to:
# Step 1: 252 eggs / 12 = 21 dozen eggs per day.
# Step 2: 21 * 7 = 147 dozen eggs per week.
# Step 3: 147 * $2/dozen = $294 per week.
# #### 294

Training Details

Training Data

The model was trained on the GSM8K dataset. The original answers were replaced with detailed, step-by-step solutions generated by the Gemini-2.5-Flash teacher model. This process of knowledge transfer is known as Reasoning Distillation.

Prompt Template

The model was trained using a simple question-answer format. All training examples were structured as follows:

Question: {question}\nAnswer:

Training Procedure

The model was trained using a standard Supervised Fine-Tuning (SFT) process with full parameter updates. The key hyperparameters are listed below.

num_train_epochs: 2
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 5e-5
lr_scheduler_type: "cosine"
weight_decay: 0.01
Effective Batch Size: 8

Evaluation

The model was evaluated on the GSM8K benchmark (5-shot) and achieved the highest performance among all tested training pipelines.

Model	Flexible-extract Exact Match (±stderr)	Strict-match Exact Match (±stderr)
Base (Qwen2.5-0.5B)	0.3381 ± 0.0130	0.3131 ± 0.0128
SFT w/ Full FT (This Model)	0.3525 ± 0.0132	0.3586 ± 0.0132

Downloads last month: 60

Safetensors

Model size

494M params

Tensor type

BF16