Ishan0612's picture
Update README.md
9ffe45d verified
---
library_name: transformers
tags:
- medical-ner
- biobert
- healthcare
- disease-extraction
- named-entity-recognition
- huggingface
- ncbi-disease-dataset
- biomedical-ner
- healthcare-ai
license: apache-2.0
datasets:
- ncbi/ncbi_disease
language:
- en
metrics:
- f1
- precision
- recall
base_model:
- dmis-lab/biobert-base-cased-v1.1
pipeline_tag: token-classification
---
# BioBERT Disease NER Model
Introducing **one of the strongest and most accurate disease NER models**, fine-tuned on BioBERT using the trusted **NCBI Disease dataset**.
It achieves an outstanding **98.64% accuracy** and an impressive **F1-score of 89.04%**, delivering high performance for disease extraction tasks.
Optimized for precise identification of **diseases**, **symptoms**, and **medical conditions** from clinical and biomedical texts.
## Model Performance
- **Precision:** 86.80%
- **Recall:** 91.39%
- **F1-Score:** 89.04%
- **Accuracy:** 98.64%
✅ Fine-tuned over **6,800+ annotated examples** for **5 epochs**, achieving consistently high validation scores.
## Intended Use
- Extract disease mentions from clinical and biomedical documents.
- Support healthcare AI systems and medical research automation.
## Training Data
This model was trained on the [NCBI disease dataset](https://huggingface.co/datasets/ncbi_disease), which consists of 793 PubMed abstracts with 6892 disease mentions.
## How to Use
You can use this model with the Hugging Face Transformers library:
*Note:* LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B-Disease", and LABEL_2 to "I-Disease" following the BIO tagging format.
```python
from transformers import pipeline
nlp = pipeline(
"ner",
model="Ishan0612/biobert-ner-disease-ncbi",
tokenizer="Ishan0612/biobert-ner-disease-ncbi",
aggregation_strategy="simple"
)
text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease."
results = nlp(text)
for entity in results:
print(f"{entity['word']} - ({entity['entity_group']})")
```
This should output:
Extracted Medical Entities:
the patient has signs of - (LABEL_0)
diabetes - (LABEL_1)
mellitus - (LABEL_2)
and - (LABEL_0)
chronic - (LABEL_1)
obstructive pulmonary disease - (LABEL_2)
. - (LABEL_0)
## License
This model is licensed under the **Apache 2.0 License**, same as the original BioBERT (`dmis-lab/biobert-base-cased-v1.1`).
## Citation
@article{lee2020biobert,
title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining},
author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo},
journal={Bioinformatics},
volume={36},
number={4},
pages={1234--1240},
year={2020},
publisher={Oxford University Press}
}