MoritzLaurer commited on
Commit
b2921d6
·
1 Parent(s): 5852898

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -48,12 +48,15 @@ as well as the English [MNLI dataset](https://huggingface.co/datasets/multi_nli)
48
  The main advantage of distilled models is that they are smaller (faster inference, lower memory requirements) than their teachers (XLM-RoBERTa-large).
49
  The disadvantage is that they lose some of the performance of their larger teachers.
50
 
 
 
 
51
 
52
  ### How to use the model
53
  #### Simple zero-shot classification pipeline
54
  ```python
55
  from transformers import pipeline
56
- classifier = pipeline("zero-shot-classification", model="MoritzLaurer/xlm-v-base-mnli-xnli")
57
 
58
  sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
59
  candidate_labels = ["politics", "economy", "entertainment", "environment"]
@@ -66,7 +69,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
66
  import torch
67
  device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
68
 
69
- model_name = "MoritzLaurer/xlm-v-base-mnli-xnli"
70
  tokenizer = AutoTokenizer.from_pretrained(model_name)
71
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
72
 
@@ -94,6 +97,7 @@ and significantly reduces training costs.
94
 
95
  ### Training procedure
96
  The model was trained using the Hugging Face trainer with the following hyperparameters.
 
97
  ```
98
  training_args = TrainingArguments(
99
  num_train_epochs=3, # total number of training epochs
@@ -121,12 +125,12 @@ XLM-RoBERTa-large instead of -base (multilingual-MiniLM-L6-v2).
121
  |Datasets|avg_xnli|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|
122
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
123
  |Accuracy|0.75|0.73|0.78|0.762|0.754|0.821|0.779|0.775|0.724|0.76|0.689|0.738|0.732|0.7|0.762|0.751|
124
- |Speed (text/sec)|4535.0|4629.0|4417.0|4500.0|3938.0|4959.0|4634.0|4152.0|4190.0|4368.0|4630.0|4698.0|4929.0|4291.0|4420.0|5275.0|
125
 
126
  |Datasets|mnli_m|mnli_mm|
127
  | :---: | :---: | :---: |
128
  |Accuracy|0.818|0.831|
129
- |Speed (text/sec)|2912.0|2902.0|
130
 
131
 
132
 
 
48
  The main advantage of distilled models is that they are smaller (faster inference, lower memory requirements) than their teachers (XLM-RoBERTa-large).
49
  The disadvantage is that they lose some of the performance of their larger teachers.
50
 
51
+ For highest inference speed, I recommend using the [6-layer model](https://huggingface.co/MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli)
52
+ (the model on this page has 12 layers and is slower). For higher performance I recommend
53
+ [mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) (as of 14.02.2023).
54
 
55
  ### How to use the model
56
  #### Simple zero-shot classification pipeline
57
  ```python
58
  from transformers import pipeline
59
+ classifier = pipeline("zero-shot-classification", model="MoritzLaurer/multilingual-MiniLMv2-L12-mnli-xnli")
60
 
61
  sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
62
  candidate_labels = ["politics", "economy", "entertainment", "environment"]
 
69
  import torch
70
  device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
71
 
72
+ model_name = "MoritzLaurer/multilingual-MiniLMv2-L12-mnli-xnli"
73
  tokenizer = AutoTokenizer.from_pretrained(model_name)
74
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
75
 
 
97
 
98
  ### Training procedure
99
  The model was trained using the Hugging Face trainer with the following hyperparameters.
100
+ The exact underlying model is [mMiniLMv2-L12-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large).
101
  ```
102
  training_args = TrainingArguments(
103
  num_train_epochs=3, # total number of training epochs
 
125
  |Datasets|avg_xnli|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|
126
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
127
  |Accuracy|0.75|0.73|0.78|0.762|0.754|0.821|0.779|0.775|0.724|0.76|0.689|0.738|0.732|0.7|0.762|0.751|
128
+ |Speed text/sec (A100 GPU, eval_batch=120)|4535.0|4629.0|4417.0|4500.0|3938.0|4959.0|4634.0|4152.0|4190.0|4368.0|4630.0|4698.0|4929.0|4291.0|4420.0|5275.0|
129
 
130
  |Datasets|mnli_m|mnli_mm|
131
  | :---: | :---: | :---: |
132
  |Accuracy|0.818|0.831|
133
+ |Speed text/sec (A100 GPU, eval_batch=120)|2912.0|2902.0|
134
 
135
 
136