Lech-Iyoko
/

bert-symptom-checker

Text Classification

Model card Files Files and versions Community

Lech-Iyoko commited on Mar 17

Commit

44b1cae

·

verified ·

1 Parent(s): 1bd6281

Update README.md

Files changed (1) hide show

README.md +81 -1

README.md CHANGED Viewed

@@ -16,4 +16,84 @@ tags:
 - symptomchecker
 - nlp
 - healthcare
----

 - symptomchecker
 - nlp
 - healthcare
+---
+# AI-Powered Symptom Checker 🏥🤖
+This model predicts potential medical conditions based on user-reported symptoms. Built using **BERT** and fine-tuned on the **MedText dataset**, it helps users get preliminary symptom insights.
+## 🔍 Model Details
+- **Model Type:** Text Classification
+- **Base Model:** BERT (`bert-base-uncased`)
+- **Dataset:** MedText (1.4k medical cases)
+- **Metrics:** Accuracy: `96.5%`, F1-score: `95.1%`
+- **Intended Use:** Assist users in identifying possible conditions based on symptoms
+- **Limitations:** Not a replacement for professional medical diagnosis
+## 📖 Usage Example
+```python
+from transformers import pipeline
+model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
+result = model("I have a severe headache and nausea.")
+print(result)
+## 📌 Limitations & Ethical Considerations
+- This model should not be used for medical diagnosis. Always consult a healthcare professional.
+## 📝 Training Hyperparameters
+- Preprocessing: Lowercasing, tokenisation, stopword removal
+- Training Framework: Hugging Face transformers
+- Training Regime: fp32 (full precision training for stability)
+- Batch Size: 16
+- Learning Rate: 3e-5
+- Epochs: 5
+- Optimiser: AdamW
+- Scheduler: Linear with warmup
+## ⏱ Speeds, Sizes, Times
+- Model Checkpoint Size: 4.5GB
+- Training Duration: ~3-4 hours on Google Colab
+- Throughput: 1200 samples per minute
+## 🧪 Evaluation
+- Testing Data, Factors & Metrics
+- Testing Data
+- Dataset: MedText (1.4k samples)
+- Dataset Type: Medical symptom descriptions → condition prediction
+## Splits:
+- Train: 80% (1,120 cases)
+- Test: 20% (280 cases)
+## Metrics
+- Accuracy: 96.5% (measures overall correctness)
+- F1-Score: 95.1% (harmonic mean of precision & recall)
+- Precision: 94.7% (correct condition predictions out of all predicted)
+- Recall: 95.5% (correct condition predictions out of all actual)
+## 📊 Results
+- Metric	Score
+- Accuracy	96.5%
+- F1-Score	95.1%
+- Precision	94.7%
+- Recall	95.5%
+## Summary
+- Strengths: High recall ensures most conditions are correctly identified.
+- Weaknesses: Model might struggle with rare conditions due to dataset limitations.
+## ⚙️ Model Architecture & Objective
+- Architecture: BERT (bert-base-uncased) fine-tuned for medical text classification.
+- Objective: Predict potential conditions/outcomes based on patient symptom descriptions.
+💻 Compute Infrastructure
+Hardware
+- Training: Google Colab (NVIDIA T4 GPU, 16GB RAM)
+- Inference: Hugging Face Inference API (optimised for CPU/GPU use)
+Software
+- Python Version: 3.8
+- Deep Learning Framework: PyTorch (transformers library)
+- Tokeniser: BERT WordPiece Tokenizer
+- Preprocessing Libraries: nltk, spacy, textacy