|
--- |
|
language: en |
|
tags: |
|
- bert |
|
- pytorch |
|
- tensorflow-converted |
|
- uncased |
|
license: apache-2.0 |
|
model-index: |
|
- name: uncased_L-6_H-768_A-12 |
|
results: [] |
|
--- |
|
|
|
# BERT uncased_L-6_H-768_A-12 |
|
|
|
This model is a PyTorch conversion of the original TensorFlow BERT checkpoint. |
|
|
|
## Model Details |
|
|
|
- **Model Type**: BERT (Bidirectional Encoder Representations from Transformers) |
|
- **Language**: English (uncased) |
|
- **Architecture**: |
|
- Layers: 6 |
|
- Hidden Size: 768 |
|
- Attention Heads: 12 |
|
- Vocabulary Size: 30522 |
|
- Max Position Embeddings: 512 |
|
|
|
## Model Configuration |
|
|
|
```json |
|
{ |
|
"hidden_size": 768, |
|
"hidden_act": "gelu", |
|
"initializer_range": 0.02, |
|
"vocab_size": 30522, |
|
"hidden_dropout_prob": 0.1, |
|
"num_attention_heads": 12, |
|
"type_vocab_size": 2, |
|
"max_position_embeddings": 512, |
|
"num_hidden_layers": 6, |
|
"intermediate_size": 3072, |
|
"attention_probs_dropout_prob": 0.1 |
|
} |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import BertForPreTraining, BertTokenizer |
|
|
|
# Load the model and tokenizer |
|
model = BertForPreTraining.from_pretrained('bansalaman18/uncased_L-6_H-768_A-12') |
|
tokenizer = BertTokenizer.from_pretrained('bansalaman18/uncased_L-6_H-768_A-12') |
|
|
|
# Example usage |
|
text = "Hello, this is a sample text for BERT." |
|
inputs = tokenizer(text, return_tensors='pt') |
|
outputs = model(**inputs) |
|
``` |
|
|
|
## Training Data |
|
|
|
This model was originally trained on the same data as the standard BERT models: |
|
- English Wikipedia (2500M words) |
|
- BookCorpus (800M words) |
|
|
|
## Conversion Details |
|
|
|
This model was converted from the original TensorFlow checkpoint to PyTorch format using a custom conversion script with the Hugging Face Transformers library. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{devlin2018bert, |
|
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding}, |
|
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, |
|
journal={arXiv preprint arXiv:1810.04805}, |
|
year={2018} |
|
} |
|
``` |
|
|