Scam SMS Detection Model (Llama 3.2 1B Fine-tuned)
A fine-tuned Llama 3.2 1B model specifically designed to detect and classify scam SMS messages in Hong Kong, with support for both Traditional Chinese and English text.
π Model Overview
This model is based on Meta's Llama 3.2 1B and has been fine-tuned using MLX framework on a carefully curated dataset of SMS messages collected in Hong Kong. The model can effectively distinguish between legitimate and fraudulent SMS messages in both Traditional Chinese and English.
Key Features
- Bilingual Support: Traditional Chinese and English
- Lightweight: 1B parameters for efficient deployment
- Cross-Platform: GGUF format optimized for llama.cpp deployment
- Local Processing: No internet connection required for inference
π Model Details
Specification | Details |
---|---|
Base Model | Meta Llama 3.2 1B |
Fine-tuning Framework | MLX |
Model Format | GGUF |
Languages | Traditional Chinese, English |
Training Data | Self-collected Hong Kong SMS samples |
Model Size | ~2.5GB |
Context Length | 8,192 tokens |
π Requirements
Software Dependencies
- llama.cpp (Model Engine)
- Python 3.8+ (for preprocessing scripts)
Hardware Requirements
- Minimum RAM: 8GB
- Recommended RAM: 16GB+
- Storage: 3GB free space
π± Installation & Deployment
Desktop/Server Deployment
Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp make
Download the model
# Download your model file (replace with actual download link) wget [MODEL_DOWNLOAD_URL] -O scam_sms_detector.gguf
Run inference
./main -m scam_sms_detector.gguf -p "Classify this SMS: ζεζ¨δΈηδΊοΌθ«ι»ζιζ₯ι εηι" -n 50
π§ Usage Examples
Basic Classification
# English SMS
./main -m scam_sms_detector.gguf -p "Classify: Congratulations! You've won $10,000. Click here to claim your prize!" -n 30
# Traditional Chinese SMS
./main -m scam_sms_detector.gguf -p "ει‘ζ€ηδΏ‘οΌζ¨ηιθ‘賬ζΆε·²θ’«εη΅οΌθ«η«ε³ι»ζιζ₯ι©θθΊ«δ»½" -n 30
Batch Processing
import subprocess
import json
def classify_sms(text):
cmd = [
"./main",
"-m", "scam_sms_detector.gguf",
"-p", f"Classify this SMS as SCAM or LEGITIMATE: {text}",
"-n", "10"
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout.strip()
# Example usage
messages = [
"Your package is ready for delivery. Track: https://bit.ly/track123",
"Meeting scheduled for 3 PM tomorrow in conference room A",
"ζεοΌζ¨ε·²θ’«ιΈδΈη²εΎε
θ²»iPhoneοΌθ«ι»ζι ε"
]
for msg in messages:
classification = classify_sms(msg)
print(f"Message: {msg}")
print(f"Classification: {classification}\n")
API Integration
# Simple Flask API wrapper
from flask import Flask, request, jsonify
import subprocess
app = Flask(__name__)
@app.route('/classify', methods=['POST'])
def classify_sms():
data = request.json
sms_text = data.get('text', '')
cmd = [
"./main",
"-m", "scam_sms_detector.gguf",
"-p", f"Classify: {sms_text}",
"-n", "20"
]
result = subprocess.run(cmd, capture_output=True, text=True)
return jsonify({
'text': sms_text,
'classification': result.stdout.strip(),
'confidence': 'high' # You may want to implement confidence scoring
})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
π Performance & Capabilities
Language Support
- Traditional Chinese: Optimized for Hong Kong usage patterns
- English: Standard international English
- Mixed Language: Can handle code-switching between Chinese and English
Expected Performance
- Memory Usage: ~3GB RAM during inference
β οΈ Limitations
- Regional Specificity: Optimized for Hong Kong SMS patterns; may need retraining for other regions
- Language Support: Limited to Traditional Chinese and English
- Context Dependency: May require additional context for borderline cases
- Update Frequency: Scam patterns evolve; periodic retraining recommended
- Legal Compliance: Users responsible for compliance with local privacy laws
π€ Contributing
You are welcomed to contributions to improve the model:
- Data Collection: Help expand the training dataset
- Bug Reports: Report issues or false classifications
- Feature Requests: Suggest improvements or new capabilities
Acknowledgments
- Meta AI for the Llama 3.2 base model
- Apple MLX team for the fine-tuning framework
- Georgi Gerganov for llama.cpp
- Downloads last month
- 12
Hardware compatibility
Log In
to view the estimation
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Rainnighttram/Scam_Detection
Base model
meta-llama/Llama-3.2-1B