bansalaman18
/

uncased_L-6_H-768_A-12

tensorflow-converted

Model card Files Files and versions

uncased_L-6_H-768_A-12 / README.md

bansalaman18's picture

Upload README.md with huggingface_hub

91cec0a verified about 1 month ago

|

history blame contribute delete

1.94 kB

	---
	language: en
	tags:
	- bert
	- pytorch
	- tensorflow-converted
	- uncased
	license: apache-2.0
	model-index:
	- name: uncased_L-6_H-768_A-12
	results: []
	---

	# BERT uncased_L-6_H-768_A-12

	This model is a PyTorch conversion of the original TensorFlow BERT checkpoint.

	## Model Details

	- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
	- Language: English (uncased)
	- Architecture:
	- Layers: 6
	- Hidden Size: 768
	- Attention Heads: 12
	- Vocabulary Size: 30522
	- Max Position Embeddings: 512

	## Model Configuration

	```json
	{
	"hidden_size": 768,
	"hidden_act": "gelu",
	"initializer_range": 0.02,
	"vocab_size": 30522,
	"hidden_dropout_prob": 0.1,
	"num_attention_heads": 12,
	"type_vocab_size": 2,
	"max_position_embeddings": 512,
	"num_hidden_layers": 6,
	"intermediate_size": 3072,
	"attention_probs_dropout_prob": 0.1
	}
	```

	## Usage

	```python
	from transformers import BertForPreTraining, BertTokenizer

	# Load the model and tokenizer
	model = BertForPreTraining.from_pretrained('bansalaman18/uncased_L-6_H-768_A-12')
	tokenizer = BertTokenizer.from_pretrained('bansalaman18/uncased_L-6_H-768_A-12')

	# Example usage
	text = "Hello, this is a sample text for BERT."
	inputs = tokenizer(text, return_tensors='pt')
	outputs = model(**inputs)
	```

	## Training Data

	This model was originally trained on the same data as the standard BERT models:
	- English Wikipedia (2500M words)
	- BookCorpus (800M words)

	## Conversion Details

	This model was converted from the original TensorFlow checkpoint to PyTorch format using a custom conversion script with the Hugging Face Transformers library.

	## Citation

	```bibtex
	@article{devlin2018bert,
	title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
	author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
	journal={arXiv preprint arXiv:1810.04805},
	year={2018}
	}
	```