README.md · Ishan0612/biobert-ner-disease-ncbi at main

biobert-ner-disease-ncbi / README.md

Ishan0612

Update README.md

9ffe45d verified about 13 hours ago

preview code

raw

history blame contribute delete

2.77 kB

	---
	library_name: transformers
	tags:
	- medical-ner
	- biobert
	- healthcare
	- disease-extraction
	- named-entity-recognition
	- huggingface
	- ncbi-disease-dataset
	- biomedical-ner
	- healthcare-ai
	license: apache-2.0
	datasets:
	- ncbi/ncbi_disease
	language:
	- en
	metrics:
	- f1
	- precision
	- recall
	base_model:
	- dmis-lab/biobert-base-cased-v1.1
	pipeline_tag: token-classification
	---

	# BioBERT Disease NER Model
	Introducing one of the strongest and most accurate disease NER models, fine-tuned on BioBERT using the trusted NCBI Disease dataset.
	It achieves an outstanding 98.64% accuracy and an impressive F1-score of 89.04%, delivering high performance for disease extraction tasks.

	Optimized for precise identification of diseases, symptoms, and medical conditions from clinical and biomedical texts.

	## Model Performance
	- Precision: 86.80%
	- Recall: 91.39%
	- F1-Score: 89.04%
	- Accuracy: 98.64%

	✅ Fine-tuned over 6,800+ annotated examples for 5 epochs, achieving consistently high validation scores.

	## Intended Use
	- Extract disease mentions from clinical and biomedical documents.
	- Support healthcare AI systems and medical research automation.

	## Training Data
	This model was trained on the [NCBI disease dataset](https://huggingface.co/datasets/ncbi_disease), which consists of 793 PubMed abstracts with 6892 disease mentions.

	## How to Use
	You can use this model with the Hugging Face Transformers library:

	Note: LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B-Disease", and LABEL_2 to "I-Disease" following the BIO tagging format.
	```python
	from transformers import pipeline

	nlp = pipeline(
	"ner",
	model="Ishan0612/biobert-ner-disease-ncbi",
	tokenizer="Ishan0612/biobert-ner-disease-ncbi",
	aggregation_strategy="simple"
	)

	text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease."

	results = nlp(text)

	for entity in results:
	print(f"{entity['word']} - ({entity['entity_group']})")
	```
	This should output:

	Extracted Medical Entities:

	the patient has signs of - (LABEL_0)

	diabetes - (LABEL_1)

	mellitus - (LABEL_2)

	and - (LABEL_0)

	chronic - (LABEL_1)

	obstructive pulmonary disease - (LABEL_2)

	. - (LABEL_0)

	## License
	This model is licensed under the Apache 2.0 License, same as the original BioBERT (`dmis-lab/biobert-base-cased-v1.1`).

	## Citation
	@article{lee2020biobert,
	title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining},
	author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo},
	journal={Bioinformatics},
	volume={36},
	number={4},
	pages={1234--1240},
	year={2020},
	publisher={Oxford University Press}
	}