Model Card for Qwen2.5-0.5B-Instruct-GSM8K-SFT-full
Model Description
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct
specifically optimized for mathematical reasoning tasks on the GSM8K benchmark.
The model was trained using a Reasoning Distillation methodology. A powerful teacher model, Gemini-2.5-Flash
, was used to generate high-quality, step-by-step (Chain-of-Thought) solutions for the GSM8K dataset. This model was then trained via Supervised Fine-Tuning (SFT) on this enriched data to directly learn the teacher's reasoning patterns.
This SFT w/ Full FT
version represents the best-performing model from a series of experiments, demonstrating a notable improvement over the baseline model in mathematical problem-solving.
How to Get Started
You can use this model with the transformers
library. For best results, please use the prompt format the model was trained on.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "kkh27/Qwen2.5-0.5B-Instruct-GSM8K-SFT-full"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
# Example math problem from GSM8K
question = "Lloyd has an egg farm. His chickens produce 252 eggs per day and he sells them for $2 per dozen. How much does Lloyd make on eggs per week?"
# Format the prompt using the specified template
prompt = f"Question: {question}\nAnswer:"
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Clean up the response to show only the generated answer
answer = response.split("Answer:")[1].strip()
print(answer)
# Expected output similar to:
# Step 1: 252 eggs / 12 = 21 dozen eggs per day.
# Step 2: 21 * 7 = 147 dozen eggs per week.
# Step 3: 147 * $2/dozen = $294 per week.
# #### 294
Training Details
Training Data
The model was trained on the GSM8K dataset. The original answers were replaced with detailed, step-by-step solutions generated by the Gemini-2.5-Flash
teacher model. This process of knowledge transfer is known as Reasoning Distillation.
Prompt Template
The model was trained using a simple question-answer format. All training examples were structured as follows:
Question: {question}\nAnswer:
Training Procedure
The model was trained using a standard Supervised Fine-Tuning (SFT) process with full parameter updates. The key hyperparameters are listed below.
num_train_epochs
: 2per_device_train_batch_size
: 2gradient_accumulation_steps
: 4learning_rate
: 5e-5lr_scheduler_type
: "cosine"weight_decay
: 0.01- Effective Batch Size: 8
Evaluation
The model was evaluated on the GSM8K benchmark (5-shot) and achieved the highest performance among all tested training pipelines.
Model | Flexible-extract Exact Match (±stderr) | Strict-match Exact Match (±stderr) |
---|---|---|
Base (Qwen2.5-0.5B) | 0.3381 ± 0.0130 | 0.3131 ± 0.0128 |
SFT w/ Full FT (This Model) | 0.3525 ± 0.0132 | 0.3586 ± 0.0132 |
- Downloads last month
- 60