Lech-Iyoko commited on
Commit
44b1cae
Β·
verified Β·
1 Parent(s): 1bd6281

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -16,4 +16,84 @@ tags:
16
  - symptomchecker
17
  - nlp
18
  - healthcare
19
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - symptomchecker
17
  - nlp
18
  - healthcare
19
+ ---
20
+ # AI-Powered Symptom Checker πŸ₯πŸ€–
21
+ This model predicts potential medical conditions based on user-reported symptoms. Built using **BERT** and fine-tuned on the **MedText dataset**, it helps users get preliminary symptom insights.
22
+
23
+ ## πŸ” Model Details
24
+ - **Model Type:** Text Classification
25
+ - **Base Model:** BERT (`bert-base-uncased`)
26
+ - **Dataset:** MedText (1.4k medical cases)
27
+ - **Metrics:** Accuracy: `96.5%`, F1-score: `95.1%`
28
+ - **Intended Use:** Assist users in identifying possible conditions based on symptoms
29
+ - **Limitations:** Not a replacement for professional medical diagnosis
30
+
31
+ ## πŸ“– Usage Example
32
+ ```python
33
+ from transformers import pipeline
34
+
35
+ model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
36
+ result = model("I have a severe headache and nausea.")
37
+ print(result)
38
+
39
+ ## πŸ“Œ Limitations & Ethical Considerations
40
+ - This model should not be used for medical diagnosis. Always consult a healthcare professional.
41
+
42
+ ## πŸ“ Training Hyperparameters
43
+ - Preprocessing: Lowercasing, tokenisation, stopword removal
44
+ - Training Framework: Hugging Face transformers
45
+ - Training Regime: fp32 (full precision training for stability)
46
+ - Batch Size: 16
47
+ - Learning Rate: 3e-5
48
+ - Epochs: 5
49
+ - Optimiser: AdamW
50
+ - Scheduler: Linear with warmup
51
+
52
+ ## ⏱ Speeds, Sizes, Times
53
+ - Model Checkpoint Size: 4.5GB
54
+ - Training Duration: ~3-4 hours on Google Colab
55
+ - Throughput: 1200 samples per minute
56
+
57
+ ## πŸ§ͺ Evaluation
58
+ - Testing Data, Factors & Metrics
59
+ - Testing Data
60
+ - Dataset: MedText (1.4k samples)
61
+ - Dataset Type: Medical symptom descriptions β†’ condition prediction
62
+
63
+ ## Splits:
64
+ - Train: 80% (1,120 cases)
65
+ - Test: 20% (280 cases)
66
+
67
+ ## Metrics
68
+ - Accuracy: 96.5% (measures overall correctness)
69
+ - F1-Score: 95.1% (harmonic mean of precision & recall)
70
+ - Precision: 94.7% (correct condition predictions out of all predicted)
71
+ - Recall: 95.5% (correct condition predictions out of all actual)
72
+
73
+ ## πŸ“Š Results
74
+ - Metric Score
75
+ - Accuracy 96.5%
76
+ - F1-Score 95.1%
77
+ - Precision 94.7%
78
+ - Recall 95.5%
79
+
80
+ ## Summary
81
+ - Strengths: High recall ensures most conditions are correctly identified.
82
+ - Weaknesses: Model might struggle with rare conditions due to dataset limitations.
83
+
84
+ ## βš™οΈ Model Architecture & Objective
85
+ - Architecture: BERT (bert-base-uncased) fine-tuned for medical text classification.
86
+ - Objective: Predict potential conditions/outcomes based on patient symptom descriptions.
87
+
88
+ πŸ’» Compute Infrastructure
89
+ Hardware
90
+ - Training: Google Colab (NVIDIA T4 GPU, 16GB RAM)
91
+ - Inference: Hugging Face Inference API (optimised for CPU/GPU use)
92
+
93
+ Software
94
+ - Python Version: 3.8
95
+ - Deep Learning Framework: PyTorch (transformers library)
96
+ - Tokeniser: BERT WordPiece Tokenizer
97
+ - Preprocessing Libraries: nltk, spacy, textacy
98
+
99
+