Educational Story Outcome Predictor

Version: 1.0 | Status: Stable | Release Date: September 16, 2025

Model Description

A fine-tuned DistilBERT model that predicts educational intervention outcomes from dual-sequence inputs: situation context + solution approach → success/failure prediction.

This model analyzes real educational scenarios and predicts the long-term effectiveness of interventions based on actual classroom outcomes, not theoretical or moral judgments about intervention approaches. The model learned from real teacher experiences to identify which intervention patterns tend to lead to successful vs. unsuccessful outcomes.

Model Details

Base Model: distilbert-base-uncased
Model Type: Text Classification (Binary)
Language: English
License: Apache 2.0
Parameters: ~67M parameters
Dataset: MU-NLPC/Edustories-en (1,492 educational stories)
Input Format: Two text sequences (situation + solution)
Output: Binary classification with confidence scores

Performance

Metric	Score
Accuracy	74.18%
F1 Score	82.18%
Precision	71.80%
Recall	96.05%

Baseline Performance: 61.96% accuracy (most frequent class)
Improvement: +12.22 percentage points over baseline

Comparison with RoBERTa

DistilBERT outperforms RoBERTa (73.91% accuracy, 80.08% F1) while training 2.4x faster, making it the optimal choice for production deployment.

Intended Use

Primary Applications

Educational Research: Analyze intervention effectiveness patterns
Decision Support: Inform evidence-based educational choices
Content Analysis: Automatically categorize educational narratives
Bias Detection: Identify patterns in educational expectations

Out-of-Scope Uses

High-stakes educational decisions without human oversight
Medical or clinical decision making
General text classification outside educational domain
Real-time assessment of individual students

How to Use

Quick Start

from transformers import pipeline

# Load the model
classifier = pipeline(
    "text-classification",
    model="polkas/educational-story-outcome-predictor"
)

# Example prediction (combine situation and solution)
situation = "Student struggling with reading comprehension in grade 3"
solution = "Teacher implements guided reading sessions with peer support"
combined_text = f"{situation} {solution}"

result = classifier(combined_text)
print(f"Prediction: {result[0]['label']} (confidence: {result[0]['score']:.2f})")

Real Examples Demo

This repository includes example_usage.py and real_examples.json with 10 real examples from the MU-NLPC/Edustories-en dataset:

# Run the example script to see predictions on real data
python example_usage.py

The examples demonstrate both successful and failed interventions, achieving ~80% accuracy on these real cases.

Advanced Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "polkas/educational-story-outcome-predictor"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare dual-sequence input
situation = "Your educational situation description..."
solution = "Your intervention solution description..."

# Tokenize
inputs = tokenizer(
    situation, solution,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=512
)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()
    confidence = torch.max(predictions, dim=-1)[0].item()

# Map to labels
labels = ['Failure', 'Success']
result = labels[predicted_class]
print(f"Prediction: {result} (confidence: {confidence:.3f})")

Training Details

Dataset

Source: MU-NLPC/Edustories-en
Total Examples: 1,471 (after cleaning)
Training Set: 882 examples (60%)
Validation Set: 221 examples (15%)
Test Set: 368 examples (25%)

Training Configuration

Epochs: 4
Batch Size: 8
Learning Rate: 2e-5
Optimizer: AdamW with weight decay 0.01
Hardware: Apple Silicon (MPS acceleration)
Training Time: ~5 minutes
Framework: Transformers 4.30+, PyTorch 2.0+

Data Preprocessing

Input: Combined situation (description + anamnesis) + solution text
Labels: Binary mapping from original multi-class annotations
Label Distribution: 62% Success, 38% Failure
Max Sequence Length: 512 tokens

Evaluation

The model was evaluated on a stratified 25% holdout test set with the following results:

Test Set Size: 368 examples
Evaluation Metrics: Accuracy, F1, Precision, Recall
Baseline Comparison: Most frequent class predictor
Cross-validation: Not applied (single train/test split)

Limitations and Bias

Known Limitations

Language: English only, performance on other languages not evaluated
Domain Specificity: Trained only on educational narratives
Binary Classification: Only predicts Success/Failure (no nuanced outcomes)
Sequence Length: Limited to 512 tokens (longer texts are truncated)
Temporal Context: Static training data may not capture evolving practices

Potential Biases

Representation Bias: Training data may not represent all educational contexts
Annotation Bias: Human-labeled outcomes may reflect annotator perspectives
Historical Bias: May perpetuate existing inequities in educational systems
Cultural Bias: Model trained primarily on specific cultural/linguistic contexts

Recommendations

Use as research tool, not for high-stakes decisions
Validate predictions with domain experts
Monitor for discriminatory patterns across different groups
Consider cultural and contextual factors in deployment
Implement human oversight for sensitive applications

Environmental Impact

Training Emissions: Minimal (efficient Apple Silicon hardware)
Model Size: 67M parameters (~250MB)
Inference Efficiency: Optimized for deployment on consumer hardware
Energy Usage: Low inference energy requirements

Technical Requirements

For Inference

Python: 3.8+
PyTorch: 2.0+
Transformers: 4.30+
Memory: 4GB+ RAM recommended
Hardware: CPU sufficient, GPU optional

For Training

Memory: 8GB+ RAM recommended
Hardware: Apple Silicon (MPS) or CUDA GPU for efficient training
Time: ~5 minutes for full training

Citation

If you use this model in your research, please cite:

@misc{educational-story-outcome-predictor-2025,
  title={Educational Story Outcome Predictor: A DistilBERT Model for Educational Intervention Analysis},
  author={Maciej Nasinski},
  year={2025},
  url={https://huggingface.co/polkas/educational-story-outcome-predictor},
  note={Fine-tuned DistilBERT model for binary classification of educational intervention outcomes}
}

Dataset Citation

@misc{edustories-en-2024,
  title={MU-NLPC/Edustories-en},
  author={MU-NLPC},
  year={2024},
  url={https://huggingface.co/datasets/MU-NLPC/Edustories-en}
}

Model Card Authors

Model Development: Maciej Nasinski
Model Card: Maciej Nasinski
Contact: [Insert your contact information]

Acknowledgments

Base model: Hugging Face DistilBERT team
Dataset: MU-NLPC research group
Framework: Hugging Face Transformers library

This model card follows the guidelines from Mitchell et al. (2019) and Hugging Face Model Card Guidelines.

Downloads last month: 16

Safetensors

Model size

67M params

Tensor type

F32

Model tree for polkas/educational-story-outcome-predictor

Base model

distilbert/distilbert-base-uncased

Finetuned

(9639)

this model

Dataset used to train polkas/educational-story-outcome-predictor

Evaluation results

Accuracy on Educational Stories
self-reported

0.742
F1 Score on Educational Stories
self-reported

0.822
Precision on Educational Stories
self-reported

0.718
Recall on Educational Stories
self-reported

0.961

View on Papers With Code