Commit
·
b2921d6
1
Parent(s):
5852898
Update README.md
Browse files
README.md
CHANGED
@@ -48,12 +48,15 @@ as well as the English [MNLI dataset](https://huggingface.co/datasets/multi_nli)
|
|
48 |
The main advantage of distilled models is that they are smaller (faster inference, lower memory requirements) than their teachers (XLM-RoBERTa-large).
|
49 |
The disadvantage is that they lose some of the performance of their larger teachers.
|
50 |
|
|
|
|
|
|
|
51 |
|
52 |
### How to use the model
|
53 |
#### Simple zero-shot classification pipeline
|
54 |
```python
|
55 |
from transformers import pipeline
|
56 |
-
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/
|
57 |
|
58 |
sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
|
59 |
candidate_labels = ["politics", "economy", "entertainment", "environment"]
|
@@ -66,7 +69,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
66 |
import torch
|
67 |
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
68 |
|
69 |
-
model_name = "MoritzLaurer/
|
70 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
71 |
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
72 |
|
@@ -94,6 +97,7 @@ and significantly reduces training costs.
|
|
94 |
|
95 |
### Training procedure
|
96 |
The model was trained using the Hugging Face trainer with the following hyperparameters.
|
|
|
97 |
```
|
98 |
training_args = TrainingArguments(
|
99 |
num_train_epochs=3, # total number of training epochs
|
@@ -121,12 +125,12 @@ XLM-RoBERTa-large instead of -base (multilingual-MiniLM-L6-v2).
|
|
121 |
|Datasets|avg_xnli|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|
|
122 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
123 |
|Accuracy|0.75|0.73|0.78|0.762|0.754|0.821|0.779|0.775|0.724|0.76|0.689|0.738|0.732|0.7|0.762|0.751|
|
124 |
-
|Speed
|
125 |
|
126 |
|Datasets|mnli_m|mnli_mm|
|
127 |
| :---: | :---: | :---: |
|
128 |
|Accuracy|0.818|0.831|
|
129 |
-
|Speed
|
130 |
|
131 |
|
132 |
|
|
|
48 |
The main advantage of distilled models is that they are smaller (faster inference, lower memory requirements) than their teachers (XLM-RoBERTa-large).
|
49 |
The disadvantage is that they lose some of the performance of their larger teachers.
|
50 |
|
51 |
+
For highest inference speed, I recommend using the [6-layer model](https://huggingface.co/MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli)
|
52 |
+
(the model on this page has 12 layers and is slower). For higher performance I recommend
|
53 |
+
[mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) (as of 14.02.2023).
|
54 |
|
55 |
### How to use the model
|
56 |
#### Simple zero-shot classification pipeline
|
57 |
```python
|
58 |
from transformers import pipeline
|
59 |
+
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/multilingual-MiniLMv2-L12-mnli-xnli")
|
60 |
|
61 |
sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
|
62 |
candidate_labels = ["politics", "economy", "entertainment", "environment"]
|
|
|
69 |
import torch
|
70 |
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
|
71 |
|
72 |
+
model_name = "MoritzLaurer/multilingual-MiniLMv2-L12-mnli-xnli"
|
73 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
74 |
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
75 |
|
|
|
97 |
|
98 |
### Training procedure
|
99 |
The model was trained using the Hugging Face trainer with the following hyperparameters.
|
100 |
+
The exact underlying model is [mMiniLMv2-L12-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large).
|
101 |
```
|
102 |
training_args = TrainingArguments(
|
103 |
num_train_epochs=3, # total number of training epochs
|
|
|
125 |
|Datasets|avg_xnli|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|
|
126 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
127 |
|Accuracy|0.75|0.73|0.78|0.762|0.754|0.821|0.779|0.775|0.724|0.76|0.689|0.738|0.732|0.7|0.762|0.751|
|
128 |
+
|Speed text/sec (A100 GPU, eval_batch=120)|4535.0|4629.0|4417.0|4500.0|3938.0|4959.0|4634.0|4152.0|4190.0|4368.0|4630.0|4698.0|4929.0|4291.0|4420.0|5275.0|
|
129 |
|
130 |
|Datasets|mnli_m|mnli_mm|
|
131 |
| :---: | :---: | :---: |
|
132 |
|Accuracy|0.818|0.831|
|
133 |
+
|Speed text/sec (A100 GPU, eval_batch=120)|2912.0|2902.0|
|
134 |
|
135 |
|
136 |
|