BioBERT Disease NER Model
Introducing one of the strongest and most accurate disease NER models, fine-tuned on BioBERT using the trusted NCBI Disease dataset.
It achieves an outstanding 98.64% accuracy and an impressive F1-score of 89.04%, delivering high performance for disease extraction tasks.
Optimized for precise identification of diseases, symptoms, and medical conditions from clinical and biomedical texts.
Model Performance
- Precision: 86.80%
- Recall: 91.39%
- F1-Score: 89.04%
- Accuracy: 98.64%
โ Fine-tuned over 6,800+ annotated examples for 5 epochs, achieving consistently high validation scores.
Intended Use
- Extract disease mentions from clinical and biomedical documents.
- Support healthcare AI systems and medical research automation.
Training Data
This model was trained on the NCBI disease dataset, which consists of 793 PubMed abstracts with 6892 disease mentions.
How to Use
You can use this model with the Hugging Face Transformers library:
Note: LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B-Disease", and LABEL_2 to "I-Disease" following the BIO tagging format.
from transformers import pipeline
nlp = pipeline(
"ner",
model="Ishan0612/biobert-ner-disease-ncbi",
tokenizer="Ishan0612/biobert-ner-disease-ncbi",
aggregation_strategy="simple"
)
text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease."
results = nlp(text)
for entity in results:
print(f"{entity['word']} ({entity['entity_group']}) - Confidence: {entity['score']:.2f}")
This should output:
Extracted Medical Entities: the patient has signs of (LABEL_0) - Confidence: 1.00 diabetes (LABEL_1) - Confidence: 1.00 mellitus (LABEL_2) - Confidence: 1.00 and (LABEL_0) - Confidence: 1.00 chronic (LABEL_1) - Confidence: 1.00 obstructive pulmonary disease (LABEL_2) - Confidence: 1.00 . (LABEL_0) - Confidence: 1.00
License
This model is licensed under the Apache 2.0 License, same as the original BioBERT (dmis-lab/biobert-base-cased-v1.1
).
Citation
@article{lee2020biobert, title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining}, author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo}, journal={Bioinformatics}, volume={36}, number={4}, pages={1234--1240}, year={2020}, publisher={Oxford University Press} }
- Downloads last month
- 21
Model tree for Ishan0612/biobert-ner-disease-ncbi
Base model
dmis-lab/biobert-base-cased-v1.1