bansalaman18
/

bert-uncased_L-10_H-256_A-4

+---
+language: en
+tags:
+- bert
+- pytorch
+- tensorflow-converted
+- uncased
+license: apache-2.0
+model-index:
+- name: uncased_L-10_H-256_A-4
+  results: []
+---
+# BERT uncased_L-10_H-256_A-4
+This model is a PyTorch conversion of the original TensorFlow BERT checkpoint.
+## Model Details
+- **Model Type**: BERT (Bidirectional Encoder Representations from Transformers)
+- **Language**: English (uncased)
+- **Architecture**:
+  - Layers: 10
+  - Hidden Size: 256
+  - Attention Heads: 4
+  - Vocabulary Size: 30522
+  - Max Position Embeddings: 512
+## Model Configuration
+```json
+{
+  "hidden_size": 256,
+  "hidden_act": "gelu",
+  "initializer_range": 0.02,
+  "vocab_size": 30522,
+  "hidden_dropout_prob": 0.1,
+  "num_attention_heads": 4,
+  "type_vocab_size": 2,
+  "max_position_embeddings": 512,
+  "num_hidden_layers": 10,
+  "intermediate_size": 1024,
+  "attention_probs_dropout_prob": 0.1
+}
+```
+## Usage
+```python
+from transformers import BertForPreTraining, BertTokenizer
+# Load the model and tokenizer
+model = BertForPreTraining.from_pretrained('bansalaman18/uncased_L-10_H-256_A-4')
+tokenizer = BertTokenizer.from_pretrained('bansalaman18/uncased_L-10_H-256_A-4')
+# Example usage
+text = "Hello, this is a sample text for BERT."
+inputs = tokenizer(text, return_tensors='pt')
+outputs = model(**inputs)
+```
+## Training Data
+This model was originally trained on the same data as the standard BERT models:
+- English Wikipedia (2500M words)
+- BookCorpus (800M words)
+## Conversion Details
+This model was converted from the original TensorFlow checkpoint to PyTorch format using a custom conversion script with the Hugging Face Transformers library.
+## Citation
+```bibtex
+@article{devlin2018bert,
+  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
+  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
+  journal={arXiv preprint arXiv:1810.04805},
+  year={2018}
+}
+```