recoilme's picture
Adding the Open Portuguese LLM Leaderboard Evaluation Results (#3)
8955cd3 verified
---
license: apache-2.0
library_name: transformers
model-index:
- name: Gemma-2-Ataraxy-Gemmasutra-9B-slerp
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 76.49
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 42.25
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 1.74
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 10.74
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 12.39
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 35.63
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: ENEM Challenge (No Images)
type: eduagarcia/enem_challenge
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 75.65
name: accuracy
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BLUEX (No Images)
type: eduagarcia-temp/BLUEX_without_images
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 64.26
name: accuracy
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: OAB Exams
type: eduagarcia/oab_exams
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 53.76
name: accuracy
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 RTE
type: assin2
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 93.21
name: f1-macro
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 STS
type: eduagarcia/portuguese_benchmark
split: test
args:
num_few_shot: 15
metrics:
- type: pearson
value: 80.91
name: pearson
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: FaQuAD NLI
type: ruanchaves/faquad-nli
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 77.39
name: f1-macro
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HateBR Binary
type: ruanchaves/hatebr
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 87.61
name: f1-macro
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: PT Hate Speech Binary
type: hate_speech_portuguese
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 66.84
name: f1-macro
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: tweetSentBR
type: eduagarcia/tweetsentbr_fewshot
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 66.14
name: f1-macro
source:
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
---
# Gemma-2-Ataraxy-Gemmasutra-9B-slerp
## 💻 Usage
```python
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/recoilme__Gemma-2-Ataraxy-Gemmasutra-9B-slerp-details)
| Metric |Value|
|-------------------|----:|
|Avg. |29.87|
|IFEval (0-Shot) |76.49|
|BBH (3-Shot) |42.25|
|MATH Lvl 5 (4-Shot)| 1.74|
|GPQA (0-shot) |10.74|
|MuSR (0-shot) |12.39|
|MMLU-PRO (5-shot) |35.63|
# Open Portuguese LLM Leaderboard Evaluation Results
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
| Metric | Value |
|--------------------------|---------|
|Average |**73.97**|
|ENEM Challenge (No Images)| 75.65|
|BLUEX (No Images) | 64.26|
|OAB Exams | 53.76|
|Assin2 RTE | 93.21|
|Assin2 STS | 80.91|
|FaQuAD NLI | 77.39|
|HateBR Binary | 87.61|
|PT Hate Speech Binary | 66.84|
|tweetSentBR | 66.14|