|
--- |
|
library_name: transformers |
|
tags: |
|
- medical-ner |
|
- biobert |
|
- healthcare |
|
- disease-extraction |
|
- named-entity-recognition |
|
- huggingface |
|
- ncbi-disease-dataset |
|
- biomedical-ner |
|
- healthcare-ai |
|
license: apache-2.0 |
|
datasets: |
|
- ncbi/ncbi_disease |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- precision |
|
- recall |
|
base_model: |
|
- dmis-lab/biobert-base-cased-v1.1 |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# BioBERT Disease NER Model |
|
Introducing **one of the strongest and most accurate disease NER models**, fine-tuned on BioBERT using the trusted **NCBI Disease dataset**. |
|
It achieves an outstanding **98.64% accuracy** and an impressive **F1-score of 89.04%**, delivering high performance for disease extraction tasks. |
|
|
|
Optimized for precise identification of **diseases**, **symptoms**, and **medical conditions** from clinical and biomedical texts. |
|
|
|
## Model Performance |
|
- **Precision:** 86.80% |
|
- **Recall:** 91.39% |
|
- **F1-Score:** 89.04% |
|
- **Accuracy:** 98.64% |
|
|
|
✅ Fine-tuned over **6,800+ annotated examples** for **5 epochs**, achieving consistently high validation scores. |
|
|
|
## Intended Use |
|
- Extract disease mentions from clinical and biomedical documents. |
|
- Support healthcare AI systems and medical research automation. |
|
|
|
## Training Data |
|
This model was trained on the [NCBI disease dataset](https://huggingface.co/datasets/ncbi_disease), which consists of 793 PubMed abstracts with 6892 disease mentions. |
|
|
|
## How to Use |
|
You can use this model with the Hugging Face Transformers library: |
|
|
|
*Note:* LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B-Disease", and LABEL_2 to "I-Disease" following the BIO tagging format. |
|
```python |
|
from transformers import pipeline |
|
|
|
nlp = pipeline( |
|
"ner", |
|
model="Ishan0612/biobert-ner-disease-ncbi", |
|
tokenizer="Ishan0612/biobert-ner-disease-ncbi", |
|
aggregation_strategy="simple" |
|
) |
|
|
|
text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease." |
|
|
|
results = nlp(text) |
|
|
|
for entity in results: |
|
print(f"{entity['word']} - ({entity['entity_group']})") |
|
``` |
|
This should output: |
|
|
|
Extracted Medical Entities: |
|
|
|
the patient has signs of - (LABEL_0) |
|
|
|
diabetes - (LABEL_1) |
|
|
|
mellitus - (LABEL_2) |
|
|
|
and - (LABEL_0) |
|
|
|
chronic - (LABEL_1) |
|
|
|
obstructive pulmonary disease - (LABEL_2) |
|
|
|
. - (LABEL_0) |
|
|
|
## License |
|
This model is licensed under the **Apache 2.0 License**, same as the original BioBERT (`dmis-lab/biobert-base-cased-v1.1`). |
|
|
|
## Citation |
|
@article{lee2020biobert, |
|
title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining}, |
|
author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo}, |
|
journal={Bioinformatics}, |
|
volume={36}, |
|
number={4}, |
|
pages={1234--1240}, |
|
year={2020}, |
|
publisher={Oxford University Press} |
|
} |