Commit
·
1e7cd09
1
Parent(s):
42335c5
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: es
|
3 |
+
tags:
|
4 |
+
- GPT-2
|
5 |
+
- text-generation
|
6 |
+
datasets:
|
7 |
+
- oscar
|
8 |
+
widgets:
|
9 |
+
- text: "Érase un vez "
|
10 |
+
---
|
11 |
+
|
12 |
+
# Spanish GPT-2
|
13 |
+
|
14 |
+
GPT-2 model trained from scratch on the Spanish portion of [OSCAR](https://huggingface.co/datasets/viewer/?dataset=oscar).
|
15 |
+
The model is trained with Flax and using TPUs sponsored by Google since this is part of the
|
16 |
+
[Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
|
17 |
+
organised by HuggingFace.
|
18 |
+
|
19 |
+
## Model description
|
20 |
+
|
21 |
+
The model used for training is [OpenAI's GPT-2](https://openai.com/blog/better-language-models/), introduced in the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.
|
22 |
+
|
23 |
+
This model is available in the 🤗 [Model Hub](https://huggingface.co/gpt2).
|
24 |
+
|
25 |
+
## Training data
|
26 |
+
|
27 |
+
Spanish portion of OSCAR or **O**pen **S**uper-large **C**rawled **A**LMAnaCH co**R**pus, a huge multilingual corpus obtained by language classification and filtering of the [Common Crawl](https://commoncrawl.org/) corpus using the [goclassy](https://github.com/pjox/goclassy) architecture.
|
28 |
+
|
29 |
+
This corpus is available in the 🤗 [Datasets](https://huggingface.co/datasets/oscar) library.
|
30 |
+
|
31 |
+
## Team members
|
32 |
+
- Manuel Romero ([mrm8488](https://huggingface.co/mrm8488))
|
33 |
+
- María Grandury ([mariagrandury](https://huggingface.co/))
|
34 |
+
- Pablo González de Prado ([Pablogps](https://huggingface.co/Pablogps))
|
35 |
+
- Daniel Vera ([daveni](https://huggingface.co/daveni))
|
36 |
+
- Sri Lakshmi ([srisweet](https://huggingface.co/srisweet))
|
37 |
+
- José Posada ([jdposa](https://huggingface.co/jdposa))
|
38 |
+
- Santiago Hincapie ([shpotes](https://huggingface.co/shpotes))
|
39 |
+
- Jorge ([jorgealro](https://huggingface.co/jorgealro))
|