DistilBERT RSS Advertisement Detection
A DistilBERT-based model for classifying RSS article titles as advertisements or legitimate news content.
Model Description
This model is fine-tuned from distilbert-base-uncased
for binary text classification. It can distinguish between:
- Advertisement: Promotional content, deals, sales, sponsored content
- News: Legitimate news articles, editorial content, research findings
Intended Use
- Primary: Filtering RSS feeds to separate advertisements from news
- Secondary: Content moderation, spam detection, content categorization
- Research: Text classification, advertisement detection studies
Performance
- Accuracy: ~95%
- F1 Score: ~94%
- Precision: ~93%
- Recall: ~94%
Training Data
- Source: 75+ RSS feeds from major tech news outlets
- Articles: 1,600+ RSS articles
- Labeled: 1,000+ manually labeled examples
- Sources: TechCrunch, WIRED, The Verge, Ars Technica, OpenAI, Google AI, etc.
Usage
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification",
model="SoroushXYZ/distilbert-rss-ad-detection")
# Classify examples
examples = [
"Apple Announces New iPhone with Advanced AI Features",
"50% OFF - Limited Time Offer on Premium Headphones!",
"Scientists Discover New Method for Carbon Capture",
"Buy Now! Get Free Shipping on All Electronics Today Only!"
]
for text in examples:
result = classifier(text)
print(f"{text} -> {result[0]['label']} ({result[0]['score']:.3f})")
Model Architecture
- Base Model: distilbert-base-uncased
- Task: Binary text classification
- Input: Text (max 128 tokens)
- Output: Class probabilities (news, advertisement)
Training Details
- Epochs: 3
- Batch Size: 16
- Learning Rate: 5e-5
- Optimizer: AdamW
- Framework: PyTorch + Transformers
Limitations
- Trained primarily on tech news content
- May not generalize well to other domains
- Performance depends on title quality and clarity
- Limited to English language content
Citation
If you use this model, please cite:
@misc{distilbert-rss-ad-detection,
title={DistilBERT RSS Advertisement Detection},
author={Your Name},
year={2024},
url={https://huggingface.co/SoroushXYZ/distilbert-rss-ad-detection}
}
- Downloads last month
- 31