YAML Metadata
Warning:
The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
π° T5-small Fine-tuned on News Summarization (Custom 25K Dataset)
This is a custom fine-tuned t5-small
model trained for abstractive news summarization. The model was trained on a dataset of 25,000 news articles paired with human-written summaries.
π§ Model Details
- Base Model:
t5-small
(by Google) - Architecture Modified: Only 2 encoder and 2 decoder layers trained (others frozen)
- Fine-tuned for: Abstractive summarization of news articles
- Dataset Size: 25K article-summary pairs
- Precision: Mixed precision (
fp16
) - Training Epochs: 10
π§ͺ Training Configuration
Using Hugging Face's Seq2SeqTrainingArguments
:
training_args = Seq2SeqTrainingArguments(
output_dir="./results",
eval_strategy="epoch",
save_strategy="epoch",
logging_strategy="steps",
learning_rate=2e-4,
per_device_train_batch_size=8,
per_device_eval_batch_size=4,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=10,
predict_with_generate=False,
generation_max_length=128,
generation_num_beams=4,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False,
logging_first_step=True,
fp16=True,
gradient_accumulation_steps=3,
dataloader_num_workers=4,
report_to=None,
eval_accumulation_steps=3,
label_smoothing_factor=0.1,
)
π Training Progress
π ROUGE Scores Over Epochs
Evaluation Metrics
Metric | Value (Approx) |
---|---|
ROUGE-1 | ~32.2 |
ROUGE-2 | ~11.0 |
ROUGE-L | ~18.2 |
ROUGE-Lsum | ~28.7 |
How to Use
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
model = AutoModelForSeq2SeqLM.from_pretrained("your-username/your-model-name")
text = "Your news article goes here."
inputs = tokenizer("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(inputs.input_ids, max_length=128, num_beams=4)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
Notes
The model is lightweight and optimized for lower compute environments due to partial layer training.
Ideal for real-time or edge applications requiring summarization of news, reports, and factual narratives.
The generation_max_length can be tuned during inference for better performance on long articles.
π Intended Use
π° News summarization
π§Ύ Document compression
π§ Low-resource abstractive summarization tasks
- Downloads last month
- 2
Model tree for Harsh-Gupta/t5-small-news-summarizer
Base model
google-t5/t5-small