---
language: en
tags:
- bert
- pytorch
- tensorflow-converted
- uncased
license: apache-2.0
model-index:
- name: bert-uncased_L-10_H-256_A-4
  results: []
---

# BERT bert-uncased_L-10_H-256_A-4

This model is a PyTorch conversion of the original TensorFlow BERT checkpoint.

## Model Details

- **Model Type**: BERT (Bidirectional Encoder Representations from Transformers)
- **Language**: English (uncased)
- **Architecture**: 
  - Layers: 10
  - Hidden Size: 256
  - Attention Heads: 4
  - Vocabulary Size: 30522
  - Max Position Embeddings: 512

## Model Configuration

```json
{
  "hidden_size": 256,
  "hidden_act": "gelu",
  "initializer_range": 0.02,
  "vocab_size": 30522,
  "hidden_dropout_prob": 0.1,
  "num_attention_heads": 4,
  "type_vocab_size": 2,
  "max_position_embeddings": 512,
  "num_hidden_layers": 10,
  "intermediate_size": 1024,
  "attention_probs_dropout_prob": 0.1
}
```

## Usage

```python
from transformers import BertForPreTraining, BertTokenizer

# Load the model and tokenizer
model = BertForPreTraining.from_pretrained('bansalaman18/bert-uncased_L-10_H-256_A-4')
tokenizer = BertTokenizer.from_pretrained('bansalaman18/bert-uncased_L-10_H-256_A-4')

# Example usage
text = "Hello, this is a sample text for BERT."
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
```

## Training Data

This model was originally trained on the same data as the standard BERT models:
- English Wikipedia (2500M words)
- BookCorpus (800M words)

## Conversion Details

This model was converted from the original TensorFlow checkpoint to PyTorch format using a custom conversion script with the Hugging Face Transformers library.

## Citation

```bibtex
@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}
```