Update README.md
Browse files
README.md
CHANGED
@@ -16,4 +16,84 @@ tags:
|
|
16 |
- symptomchecker
|
17 |
- nlp
|
18 |
- healthcare
|
19 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
- symptomchecker
|
17 |
- nlp
|
18 |
- healthcare
|
19 |
+
---
|
20 |
+
# AI-Powered Symptom Checker π₯π€
|
21 |
+
This model predicts potential medical conditions based on user-reported symptoms. Built using **BERT** and fine-tuned on the **MedText dataset**, it helps users get preliminary symptom insights.
|
22 |
+
|
23 |
+
## π Model Details
|
24 |
+
- **Model Type:** Text Classification
|
25 |
+
- **Base Model:** BERT (`bert-base-uncased`)
|
26 |
+
- **Dataset:** MedText (1.4k medical cases)
|
27 |
+
- **Metrics:** Accuracy: `96.5%`, F1-score: `95.1%`
|
28 |
+
- **Intended Use:** Assist users in identifying possible conditions based on symptoms
|
29 |
+
- **Limitations:** Not a replacement for professional medical diagnosis
|
30 |
+
|
31 |
+
## π Usage Example
|
32 |
+
```python
|
33 |
+
from transformers import pipeline
|
34 |
+
|
35 |
+
model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
|
36 |
+
result = model("I have a severe headache and nausea.")
|
37 |
+
print(result)
|
38 |
+
|
39 |
+
## π Limitations & Ethical Considerations
|
40 |
+
- This model should not be used for medical diagnosis. Always consult a healthcare professional.
|
41 |
+
|
42 |
+
## π Training Hyperparameters
|
43 |
+
- Preprocessing: Lowercasing, tokenisation, stopword removal
|
44 |
+
- Training Framework: Hugging Face transformers
|
45 |
+
- Training Regime: fp32 (full precision training for stability)
|
46 |
+
- Batch Size: 16
|
47 |
+
- Learning Rate: 3e-5
|
48 |
+
- Epochs: 5
|
49 |
+
- Optimiser: AdamW
|
50 |
+
- Scheduler: Linear with warmup
|
51 |
+
|
52 |
+
## β± Speeds, Sizes, Times
|
53 |
+
- Model Checkpoint Size: 4.5GB
|
54 |
+
- Training Duration: ~3-4 hours on Google Colab
|
55 |
+
- Throughput: 1200 samples per minute
|
56 |
+
|
57 |
+
## π§ͺ Evaluation
|
58 |
+
- Testing Data, Factors & Metrics
|
59 |
+
- Testing Data
|
60 |
+
- Dataset: MedText (1.4k samples)
|
61 |
+
- Dataset Type: Medical symptom descriptions β condition prediction
|
62 |
+
|
63 |
+
## Splits:
|
64 |
+
- Train: 80% (1,120 cases)
|
65 |
+
- Test: 20% (280 cases)
|
66 |
+
|
67 |
+
## Metrics
|
68 |
+
- Accuracy: 96.5% (measures overall correctness)
|
69 |
+
- F1-Score: 95.1% (harmonic mean of precision & recall)
|
70 |
+
- Precision: 94.7% (correct condition predictions out of all predicted)
|
71 |
+
- Recall: 95.5% (correct condition predictions out of all actual)
|
72 |
+
|
73 |
+
## π Results
|
74 |
+
- Metric Score
|
75 |
+
- Accuracy 96.5%
|
76 |
+
- F1-Score 95.1%
|
77 |
+
- Precision 94.7%
|
78 |
+
- Recall 95.5%
|
79 |
+
|
80 |
+
## Summary
|
81 |
+
- Strengths: High recall ensures most conditions are correctly identified.
|
82 |
+
- Weaknesses: Model might struggle with rare conditions due to dataset limitations.
|
83 |
+
|
84 |
+
## βοΈ Model Architecture & Objective
|
85 |
+
- Architecture: BERT (bert-base-uncased) fine-tuned for medical text classification.
|
86 |
+
- Objective: Predict potential conditions/outcomes based on patient symptom descriptions.
|
87 |
+
|
88 |
+
π» Compute Infrastructure
|
89 |
+
Hardware
|
90 |
+
- Training: Google Colab (NVIDIA T4 GPU, 16GB RAM)
|
91 |
+
- Inference: Hugging Face Inference API (optimised for CPU/GPU use)
|
92 |
+
|
93 |
+
Software
|
94 |
+
- Python Version: 3.8
|
95 |
+
- Deep Learning Framework: PyTorch (transformers library)
|
96 |
+
- Tokeniser: BERT WordPiece Tokenizer
|
97 |
+
- Preprocessing Libraries: nltk, spacy, textacy
|
98 |
+
|
99 |
+
|