--- language: en tags: - bert - pytorch - tensorflow-converted - uncased license: apache-2.0 model-index: - name: bert-uncased_L-10_H-256_A-4 results: [] --- # BERT bert-uncased_L-10_H-256_A-4 This model is a PyTorch conversion of the original TensorFlow BERT checkpoint. ## Model Details - **Model Type**: BERT (Bidirectional Encoder Representations from Transformers) - **Language**: English (uncased) - **Architecture**: - Layers: 10 - Hidden Size: 256 - Attention Heads: 4 - Vocabulary Size: 30522 - Max Position Embeddings: 512 ## Model Configuration ```json { "hidden_size": 256, "hidden_act": "gelu", "initializer_range": 0.02, "vocab_size": 30522, "hidden_dropout_prob": 0.1, "num_attention_heads": 4, "type_vocab_size": 2, "max_position_embeddings": 512, "num_hidden_layers": 10, "intermediate_size": 1024, "attention_probs_dropout_prob": 0.1 } ``` ## Usage ```python from transformers import BertForPreTraining, BertTokenizer # Load the model and tokenizer model = BertForPreTraining.from_pretrained('bansalaman18/bert-uncased_L-10_H-256_A-4') tokenizer = BertTokenizer.from_pretrained('bansalaman18/bert-uncased_L-10_H-256_A-4') # Example usage text = "Hello, this is a sample text for BERT." inputs = tokenizer(text, return_tensors='pt') outputs = model(**inputs) ``` ## Training Data This model was originally trained on the same data as the standard BERT models: - English Wikipedia (2500M words) - BookCorpus (800M words) ## Conversion Details This model was converted from the original TensorFlow checkpoint to PyTorch format using a custom conversion script with the Hugging Face Transformers library. ## Citation ```bibtex @article{devlin2018bert, title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding}, author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, journal={arXiv preprint arXiv:1810.04805}, year={2018} } ```