BioBERT Disease NER Model

Introducing one of the strongest and most accurate disease NER models, fine-tuned on BioBERT using the trusted NCBI Disease dataset.
It achieves an outstanding 98.64% accuracy and an impressive F1-score of 89.04%, delivering high performance for disease extraction tasks.

Optimized for precise identification of diseases, symptoms, and medical conditions from clinical and biomedical texts.

Model Performance

Precision: 86.80%
Recall: 91.39%
F1-Score: 89.04%
Accuracy: 98.64%

✅ Fine-tuned over 6,800+ annotated examples for 5 epochs, achieving consistently high validation scores.

Intended Use

Extract disease mentions from clinical and biomedical documents.
Support healthcare AI systems and medical research automation.

Training Data

This model was trained on the NCBI disease dataset, which consists of 793 PubMed abstracts with 6892 disease mentions.

How to Use

You can use this model with the Hugging Face Transformers library:

Note: LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B-Disease", and LABEL_2 to "I-Disease" following the BIO tagging format.

from transformers import pipeline

nlp = pipeline(
    "ner",
    model="Ishan0612/biobert-ner-disease-ncbi",
    tokenizer="Ishan0612/biobert-ner-disease-ncbi",
    aggregation_strategy="simple"
)

text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease."

results = nlp(text)

for entity in results:
    print(f"{entity['word']} ({entity['entity_group']}) - Confidence: {entity['score']:.2f}")

This should output:

Extracted Medical Entities: the patient has signs of (LABEL_0) - Confidence: 1.00 diabetes (LABEL_1) - Confidence: 1.00 mellitus (LABEL_2) - Confidence: 1.00 and (LABEL_0) - Confidence: 1.00 chronic (LABEL_1) - Confidence: 1.00 obstructive pulmonary disease (LABEL_2) - Confidence: 1.00 . (LABEL_0) - Confidence: 1.00

License

This model is licensed under the Apache 2.0 License, same as the original BioBERT (dmis-lab/biobert-base-cased-v1.1).

Citation

@article{lee2020biobert, title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining}, author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo}, journal={Bioinformatics}, volume={36}, number={4}, pages={1234--1240}, year={2020}, publisher={Oxford University Press} }

Ishan0612
/

biobert-ner-disease-ncbi