Scam SMS Detection Model (Llama 3.2 1B Fine-tuned)

A fine-tuned Llama 3.2 1B model specifically designed to detect and classify scam SMS messages in Hong Kong, with support for both Traditional Chinese and English text.

πŸš€ Model Overview

This model is based on Meta's Llama 3.2 1B and has been fine-tuned using MLX framework on a carefully curated dataset of SMS messages collected in Hong Kong. The model can effectively distinguish between legitimate and fraudulent SMS messages in both Traditional Chinese and English.

Key Features

  • Bilingual Support: Traditional Chinese and English
  • Lightweight: 1B parameters for efficient deployment
  • Cross-Platform: GGUF format optimized for llama.cpp deployment
  • Local Processing: No internet connection required for inference

πŸ“Š Model Details

Specification Details
Base Model Meta Llama 3.2 1B
Fine-tuning Framework MLX
Model Format GGUF
Languages Traditional Chinese, English
Training Data Self-collected Hong Kong SMS samples
Model Size ~2.5GB
Context Length 8,192 tokens

πŸ›  Requirements

Software Dependencies

  • llama.cpp (Model Engine)
  • Python 3.8+ (for preprocessing scripts)

Hardware Requirements

  • Minimum RAM: 8GB
  • Recommended RAM: 16GB+
  • Storage: 3GB free space

πŸ“± Installation & Deployment

Desktop/Server Deployment

  1. Install llama.cpp

    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    make
    
  2. Download the model

    # Download your model file (replace with actual download link)
    wget [MODEL_DOWNLOAD_URL] -O scam_sms_detector.gguf
    
  3. Run inference

    ./main -m scam_sms_detector.gguf -p "Classify this SMS: ζ­ε–œζ‚¨δΈ­ηŽδΊ†οΌθ«‹ι»žζ“ŠιˆζŽ₯ι ˜ε–ηŽι‡‘" -n 50
    

πŸ”§ Usage Examples

Basic Classification

# English SMS
./main -m scam_sms_detector.gguf -p "Classify: Congratulations! You've won $10,000. Click here to claim your prize!" -n 30

# Traditional Chinese SMS
./main -m scam_sms_detector.gguf -p "εˆ†ι‘žζ­€ηŸ­δΏ‘οΌšζ‚¨ηš„ιŠ€θ‘Œθ³¬ζˆΆε·²θ’«ε‡η΅οΌŒθ«‹η«‹ε³ι»žζ“ŠιˆζŽ₯ι©—θ­‰θΊ«δ»½" -n 30

Batch Processing

import subprocess
import json

def classify_sms(text):
    cmd = [
        "./main", 
        "-m", "scam_sms_detector.gguf",
        "-p", f"Classify this SMS as SCAM or LEGITIMATE: {text}",
        "-n", "10"
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout.strip()

# Example usage
messages = [
    "Your package is ready for delivery. Track: https://bit.ly/track123",
    "Meeting scheduled for 3 PM tomorrow in conference room A",
    "ζ­ε–œοΌζ‚¨ε·²θ’«ιΈδΈ­η²εΎ—ε…θ²»iPhoneοΌŒθ«‹ι»žζ“Šι ˜ε–"
]

for msg in messages:
    classification = classify_sms(msg)
    print(f"Message: {msg}")
    print(f"Classification: {classification}\n")

API Integration

# Simple Flask API wrapper
from flask import Flask, request, jsonify
import subprocess

app = Flask(__name__)

@app.route('/classify', methods=['POST'])
def classify_sms():
    data = request.json
    sms_text = data.get('text', '')
    
    cmd = [
        "./main", 
        "-m", "scam_sms_detector.gguf",
        "-p", f"Classify: {sms_text}",
        "-n", "20"
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    
    return jsonify({
        'text': sms_text,
        'classification': result.stdout.strip(),
        'confidence': 'high'  # You may want to implement confidence scoring
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

πŸ“ˆ Performance & Capabilities

Language Support

  • Traditional Chinese: Optimized for Hong Kong usage patterns
  • English: Standard international English
  • Mixed Language: Can handle code-switching between Chinese and English

Expected Performance

  • Memory Usage: ~3GB RAM during inference

⚠️ Limitations

  1. Regional Specificity: Optimized for Hong Kong SMS patterns; may need retraining for other regions
  2. Language Support: Limited to Traditional Chinese and English
  3. Context Dependency: May require additional context for borderline cases
  4. Update Frequency: Scam patterns evolve; periodic retraining recommended
  5. Legal Compliance: Users responsible for compliance with local privacy laws

🀝 Contributing

You are welcomed to contributions to improve the model:

  1. Data Collection: Help expand the training dataset
  2. Bug Reports: Report issues or false classifications
  3. Feature Requests: Suggest improvements or new capabilities

Acknowledgments

  • Meta AI for the Llama 3.2 base model
  • Apple MLX team for the fine-tuning framework
  • Georgi Gerganov for llama.cpp
Downloads last month
12
GGUF
Model size
1.24B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Rainnighttram/Scam_Detection

Quantized
(210)
this model

Dataset used to train Rainnighttram/Scam_Detection