🧠❤️ Gemma-3-1B — Empathetic Mental Health Assistant

A fine-tuned, lightweight conversational model designed for supportive and empathetic responses in the mental health domain.

📌 Overview

This model is a fine-tuned version of Unsloth's Gemma-3-1B-IT, specifically adapted to provide helpful, empathetic, and non-judgmental responses for mental health–related queries.

Optimized for low-resource environments, it leverages:

Unsloth 4-bit LoRA fine-tuning for efficiency
GGUF format for a small memory footprint
Fast inference on desktop, mobile, and edge devices

The adapter model is available at MeWan2808/scaai_gemma_1b.

✨ Key Features

✅ Empathetic Responses — Tailored for supportive mental health conversations
✅ Lightweight — 1B parameters, quantized to 4-bit for minimal resource use
✅ Cross-Platform — Compatible with llama.cpp, ollama, React Native, and Hugging Face Inference API
✅ Low Memory Footprint — Runs on devices with <8GB RAM
✅ Domain-Specific — Fine-tuned on the Riyazmk/mentalhealth dataset (first 1000 rows)

⚠️ Ethical Notice & Limitations

🚨 This AI is NOT a substitute for professional mental health care.
It is not designed for:

Crisis intervention
Diagnosing mental health conditions
Prescribing treatments or medications

📢 If you’re in a crisis, please reach out to professionals:

📞 988 Suicide and Crisis Lifeline (US)
📞 Samaritans (UK) — 116 123
📞 AASRA (India) — +91-9820466726

🛠️ Technical Details

Base Model: unsloth/gemma-3-1b-it
Fine-Tuning Method: LoRA with Unsloth (4-bit QLoRA)
Dataset: Riyazmk/mentalhealth (first 1000 rows)
Language: English (en)
Quantization: 4-bit (q4_0) in GGUF format
Adapter Model: MeWan2808/scaai_gemma_1b

Example Training Command

from unsloth import FastLanguageModel
model = FastLanguageModel.from_pretrained(
    model_name="unsloth/gemma-3-1b-it",
    load_in_4bit=True,
)
# Add LoRA fine-tuning configuration here...

B[Tokenizer
Converts to Tokens]

B --> C[Gemma-3-1B Model<br>LoRA Fine-Tuned, 4-bit]
C --> D[Response Generation<br>"Try deep breathing..."]
D --> E[User Output<br>Empathetic Response]


--- -->

## 🚀 How to Use

### **1️⃣ Python with Transformers and PEFT**

Install dependencies:
```bash
pip install transformers peft torch

Run inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "unsloth/gemma-3-1b-it"
adapter_model = "MeWan2808/scaai_gemma_1b"

# Load base tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Apply adapter
model = PeftModel.from_pretrained(model, adapter_model)

# Question
question = "I’ve been feeling anxious lately. What are some healthy coping strategies?"

# Encode
inputs = tokenizer(question, return_tensors="pt").to(model.device)

# Generate safely
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2️⃣ Python with Unsloth

Install dependencies:

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

Run inference:

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    "MeWan2808/scaai_gemma_1b",  # Note: Use the adapter model here if merged, or load base + adapter
    load_in_4bit=True,
)

prompt = "I'm feeling very anxious lately. Can you give me some coping tips?"
response = model.chat(tokenizer, prompt)
print(response)

3️⃣ llama.cpp (GGUF)

llama-cli -m ./gemma_1b_q4_0.gguf -p "How can I manage stress before an exam?"

4️⃣ Ollama

ollama create gemma-mental-health -f ./Modelfile
ollama run gemma-mental-health

5️⃣ React Native / Mobile

Bundle gemma_1b_q4_0.gguf in app assets
Use llama.cpp mobile bindings or llama-rn for local inference
Process user input offline for privacy and performance

📊 Example Interaction

Prompt:

I’ve been feeling down lately and don’t know why.

Response:

I’m really sorry you’re going through this—it’s tough when those feelings creep in without a clear cause. Try jotting down your thoughts in a journal to spot any patterns, or treat yourself to small moments of care, like a relaxing walk or deep breathing. You’re not alone, and your feelings matter.

📂 Available Files

gemma_1b_q4_0.gguf — Quantized model for local inference
config.json, tokenizer.json — Model and tokenizer configurations
README.md — This documentation

📅 Changelog

v1.0 — Initial release with LoRA fine-tuning and 4-bit quantization

🙌 Acknowledgements

Unsloth for the Gemma-3-1B-IT base model
UnslothAI for efficient fine-tuning tools
Riyazmk/mentalhealth dataset contributors

📜 License

Licensed under Apache 2.0 — free to use, modify, and distribute with proper attribution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for MeWan2808/scaai_gemma_1b

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it