Scam SMS Detection Model (Llama 3.2 1B Fine-tuned)

A fine-tuned Llama 3.2 1B model specifically designed to detect and classify scam SMS messages in Hong Kong, with support for both Traditional Chinese and English text.

🚀 Model Overview

This model is based on Meta's Llama 3.2 1B and has been fine-tuned using MLX framework on a carefully curated dataset of SMS messages collected in Hong Kong. The model can effectively distinguish between legitimate and fraudulent SMS messages in both Traditional Chinese and English.

Key Features

Bilingual Support: Traditional Chinese and English
Lightweight: 1B parameters for efficient deployment
Cross-Platform: GGUF format optimized for llama.cpp deployment
Local Processing: No internet connection required for inference

📊 Model Details

Specification	Details
Base Model	Meta Llama 3.2 1B
Fine-tuning Framework	MLX
Model Format	GGUF
Languages	Traditional Chinese, English
Training Data	Self-collected Hong Kong SMS samples
Model Size	~2.5GB
Context Length	8,192 tokens

🛠 Requirements

Software Dependencies

llama.cpp (Model Engine)
Python 3.8+ (for preprocessing scripts)

Hardware Requirements

Minimum RAM: 8GB
Recommended RAM: 16GB+
Storage: 3GB free space

📱 Installation & Deployment

Desktop/Server Deployment

Install llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make

Download the model

# Download your model file (replace with actual download link)
wget [MODEL_DOWNLOAD_URL] -O scam_sms_detector.gguf

Run inference

./main -m scam_sms_detector.gguf -p "Classify this SMS: 恭喜您中獎了！請點擊鏈接領取獎金" -n 50

🔧 Usage Examples

Basic Classification

# English SMS
./main -m scam_sms_detector.gguf -p "Classify: Congratulations! You've won $10,000. Click here to claim your prize!" -n 30

# Traditional Chinese SMS
./main -m scam_sms_detector.gguf -p "分類此短信：您的銀行賬戶已被凍結，請立即點擊鏈接驗證身份" -n 30

Batch Processing

import subprocess
import json

def classify_sms(text):
    cmd = [
        "./main", 
        "-m", "scam_sms_detector.gguf",
        "-p", f"Classify this SMS as SCAM or LEGITIMATE: {text}",
        "-n", "10"
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout.strip()

# Example usage
messages = [
    "Your package is ready for delivery. Track: https://bit.ly/track123",
    "Meeting scheduled for 3 PM tomorrow in conference room A",
    "恭喜！您已被選中獲得免費iPhone，請點擊領取"
]

for msg in messages:
    classification = classify_sms(msg)
    print(f"Message: {msg}")
    print(f"Classification: {classification}\n")

API Integration

# Simple Flask API wrapper
from flask import Flask, request, jsonify
import subprocess

app = Flask(__name__)

@app.route('/classify', methods=['POST'])
def classify_sms():
    data = request.json
    sms_text = data.get('text', '')
    
    cmd = [
        "./main", 
        "-m", "scam_sms_detector.gguf",
        "-p", f"Classify: {sms_text}",
        "-n", "20"
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    
    return jsonify({
        'text': sms_text,
        'classification': result.stdout.strip(),
        'confidence': 'high'  # You may want to implement confidence scoring
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

📈 Performance & Capabilities

Language Support

Traditional Chinese: Optimized for Hong Kong usage patterns
English: Standard international English
Mixed Language: Can handle code-switching between Chinese and English

Expected Performance

Memory Usage: ~3GB RAM during inference

⚠️ Limitations

Regional Specificity: Optimized for Hong Kong SMS patterns; may need retraining for other regions
Language Support: Limited to Traditional Chinese and English
Context Dependency: May require additional context for borderline cases
Update Frequency: Scam patterns evolve; periodic retraining recommended
Legal Compliance: Users responsible for compliance with local privacy laws

🤝 Contributing

You are welcomed to contributions to improve the model:

Data Collection: Help expand the training dataset
Bug Reports: Report issues or false classifications
Feature Requests: Suggest improvements or new capabilities

Acknowledgments

Meta AI for the Llama 3.2 base model
Apple MLX team for the fine-tuning framework
Georgi Gerganov for llama.cpp

Downloads last month: 12

GGUF

Model size

1.24B params

Architecture

llama

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rainnighttram/Scam_Detection

Base model

meta-llama/Llama-3.2-1B

Quantized

(210)

this model

Rainnighttram
/

Scam_Detection