---
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---

# Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 100B tokens seen.

# How to use
```
import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_cautious_100B_token_8222025",
)

print(pipe("The key to life is"))
```

# Downstream Eval
## ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
```
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_100B_token_8222025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
```

|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       |↑  | 0.3183|±  |0.0136|
|              |       |none  |     0|acc_norm  |↑  | 0.3379|±  |0.0138|
|arc_easy      |      1|none  |     0|acc       |↑  | 0.6650|±  |0.0097|
|              |       |none  |     0|acc_norm  |↑  | 0.6061|±  |0.0100|
|hellaswag     |      1|none  |     0|acc       |↑  | 0.3999|±  |0.0049|
|              |       |none  |     0|acc_norm  |↑  | 0.5025|±  |0.0050|
|lambada_openai|      1|none  |     0|acc       |↑  | 0.3912|±  |0.0068|
|              |       |none  |     0|perplexity|↓  |23.8709|±  |0.8855|
|openbookqa    |      1|none  |     0|acc       |↑  | 0.2580|±  |0.0196|
|              |       |none  |     0|acc_norm  |↑  | 0.3740|±  |0.0217|
|piqa          |      1|none  |     0|acc       |↑  | 0.7116|±  |0.0106|
|              |       |none  |     0|acc_norm  |↑  | 0.7149|±  |0.0105|

## MMLU
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.2519|±  |0.0037|
| - humanities     |      2|none  |      |acc   |↑  |0.2540|±  |0.0064|
| - other          |      2|none  |      |acc   |↑  |0.2527|±  |0.0078|
| - social sciences|      2|none  |      |acc   |↑  |0.2480|±  |0.0078|
| - stem           |      2|none  |      |acc   |↑  |0.2518|±  |0.0077|