Educational Story Outcome Predictor
Version: 1.0 | Status: Stable | Release Date: September 16, 2025
GitHub repo with modelling code
Model Description
A fine-tuned DistilBERT model that predicts educational intervention outcomes from dual-sequence inputs: situation context + solution approach → success/failure prediction.
This model analyzes real educational scenarios and predicts the long-term effectiveness of interventions based on actual classroom outcomes, not theoretical or moral judgments about intervention approaches. The model learned from real teacher experiences to identify which intervention patterns tend to lead to successful vs. unsuccessful outcomes.
Model Details
- Base Model:
distilbert-base-uncased
- Model Type: Text Classification (Binary)
- Language: English
- License: Apache 2.0
- Parameters: ~67M parameters
- Dataset: MU-NLPC/Edustories-en (1,492 educational stories)
- Input Format: Two text sequences (situation + solution)
- Output: Binary classification with confidence scores
Performance
Metric | Score |
---|---|
Accuracy | 74.18% |
F1 Score | 82.18% |
Precision | 71.80% |
Recall | 96.05% |
Baseline Performance: 61.96% accuracy (most frequent class)
Improvement: +12.22 percentage points over baseline
Comparison with RoBERTa
DistilBERT outperforms RoBERTa (73.91% accuracy, 80.08% F1) while training 2.4x faster, making it the optimal choice for production deployment.
Intended Use
Primary Applications
- Educational Research: Analyze intervention effectiveness patterns
- Decision Support: Inform evidence-based educational choices
- Content Analysis: Automatically categorize educational narratives
- Bias Detection: Identify patterns in educational expectations
Out-of-Scope Uses
- High-stakes educational decisions without human oversight
- Medical or clinical decision making
- General text classification outside educational domain
- Real-time assessment of individual students
How to Use
Quick Start
from transformers import pipeline
# Load the model
classifier = pipeline(
"text-classification",
model="polkas/educational-story-outcome-predictor"
)
# Example prediction (combine situation and solution)
situation = "Student struggling with reading comprehension in grade 3"
solution = "Teacher implements guided reading sessions with peer support"
combined_text = f"{situation} {solution}"
result = classifier(combined_text)
print(f"Prediction: {result[0]['label']} (confidence: {result[0]['score']:.2f})")
Real Examples Demo
This repository includes example_usage.py
and real_examples.json
with 10 real examples from the MU-NLPC/Edustories-en dataset:
# Run the example script to see predictions on real data
python example_usage.py
The examples demonstrate both successful and failed interventions, achieving ~80% accuracy on these real cases.
Advanced Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "polkas/educational-story-outcome-predictor"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare dual-sequence input
situation = "Your educational situation description..."
solution = "Your intervention solution description..."
# Tokenize
inputs = tokenizer(
situation, solution,
return_tensors="pt",
truncation=True,
padding=True,
max_length=512
)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
confidence = torch.max(predictions, dim=-1)[0].item()
# Map to labels
labels = ['Failure', 'Success']
result = labels[predicted_class]
print(f"Prediction: {result} (confidence: {confidence:.3f})")
Training Details
Dataset
- Source: MU-NLPC/Edustories-en
- Total Examples: 1,471 (after cleaning)
- Training Set: 882 examples (60%)
- Validation Set: 221 examples (15%)
- Test Set: 368 examples (25%)
Training Configuration
- Epochs: 4
- Batch Size: 8
- Learning Rate: 2e-5
- Optimizer: AdamW with weight decay 0.01
- Hardware: Apple Silicon (MPS acceleration)
- Training Time: ~5 minutes
- Framework: Transformers 4.30+, PyTorch 2.0+
Data Preprocessing
- Input: Combined situation (description + anamnesis) + solution text
- Labels: Binary mapping from original multi-class annotations
- Label Distribution: 62% Success, 38% Failure
- Max Sequence Length: 512 tokens
Evaluation
The model was evaluated on a stratified 25% holdout test set with the following results:
- Test Set Size: 368 examples
- Evaluation Metrics: Accuracy, F1, Precision, Recall
- Baseline Comparison: Most frequent class predictor
- Cross-validation: Not applied (single train/test split)
Limitations and Bias
Known Limitations
- Language: English only, performance on other languages not evaluated
- Domain Specificity: Trained only on educational narratives
- Binary Classification: Only predicts Success/Failure (no nuanced outcomes)
- Sequence Length: Limited to 512 tokens (longer texts are truncated)
- Temporal Context: Static training data may not capture evolving practices
Potential Biases
- Representation Bias: Training data may not represent all educational contexts
- Annotation Bias: Human-labeled outcomes may reflect annotator perspectives
- Historical Bias: May perpetuate existing inequities in educational systems
- Cultural Bias: Model trained primarily on specific cultural/linguistic contexts
Recommendations
- Use as research tool, not for high-stakes decisions
- Validate predictions with domain experts
- Monitor for discriminatory patterns across different groups
- Consider cultural and contextual factors in deployment
- Implement human oversight for sensitive applications
Environmental Impact
- Training Emissions: Minimal (efficient Apple Silicon hardware)
- Model Size: 67M parameters (~250MB)
- Inference Efficiency: Optimized for deployment on consumer hardware
- Energy Usage: Low inference energy requirements
Technical Requirements
For Inference
- Python: 3.8+
- PyTorch: 2.0+
- Transformers: 4.30+
- Memory: 4GB+ RAM recommended
- Hardware: CPU sufficient, GPU optional
For Training
- Memory: 8GB+ RAM recommended
- Hardware: Apple Silicon (MPS) or CUDA GPU for efficient training
- Time: ~5 minutes for full training
Citation
If you use this model in your research, please cite:
@misc{educational-story-outcome-predictor-2025,
title={Educational Story Outcome Predictor: A DistilBERT Model for Educational Intervention Analysis},
author={Maciej Nasinski},
year={2025},
url={https://huggingface.co/polkas/educational-story-outcome-predictor},
note={Fine-tuned DistilBERT model for binary classification of educational intervention outcomes}
}
Dataset Citation
@misc{edustories-en-2024,
title={MU-NLPC/Edustories-en},
author={MU-NLPC},
year={2024},
url={https://huggingface.co/datasets/MU-NLPC/Edustories-en}
}
Model Card Authors
- Model Development: Maciej Nasinski
- Model Card: Maciej Nasinski
- Contact: [Insert your contact information]
Acknowledgments
- Base model: Hugging Face DistilBERT team
- Dataset: MU-NLPC research group
- Framework: Hugging Face Transformers library
This model card follows the guidelines from Mitchell et al. (2019) and Hugging Face Model Card Guidelines.
- Downloads last month
- 16
Model tree for polkas/educational-story-outcome-predictor
Base model
distilbert/distilbert-base-uncasedDataset used to train polkas/educational-story-outcome-predictor
Evaluation results
- Accuracy on Educational Storiesself-reported0.742
- F1 Score on Educational Storiesself-reported0.822
- Precision on Educational Storiesself-reported0.718
- Recall on Educational Storiesself-reported0.961