xp1992slz commited on
Commit
b0e2a56
·
1 Parent(s): 2bb1830
Llama-3.1-8B-UltraLong-4M-Instruct.png ADDED
README.md CHANGED
@@ -1,3 +1,80 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # Model Information
8
+
9
+ We introduce **UltraLong-8B**, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.
10
+
11
+
12
+ ## The UltraLong Models
13
+
14
+ - [ultralong/Llama-3.1-8B-UltraLong-1M-Instruct](https://huggingface.co/ultralong/Llama-3.1-8B-UltraLong-1M-Instruct)
15
+ - [ultralong/Llama-3.1-8B-UltraLong-2M-Instruct](https://huggingface.co/ultralong/Llama-3.1-8B-UltraLong-2M-Instruct)
16
+ - [ultralong/Llama-3.1-8B-UltraLong-4M-Instruct](https://huggingface.co/ultralong/Llama-3.1-8B-UltraLong-4M-Instruct)
17
+
18
+
19
+ ## Uses
20
+
21
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
22
+
23
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
24
+
25
+ ```python
26
+ import transformers
27
+ import torch
28
+
29
+ model_id = "ultralong/Llama-3.1-8B-UltraLong-4M-Instruct"
30
+
31
+ pipeline = transformers.pipeline(
32
+ "text-generation",
33
+ model=model_id,
34
+ model_kwargs={"torch_dtype": torch.bfloat16},
35
+ device_map="auto",
36
+ )
37
+
38
+ messages = [
39
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
40
+ {"role": "user", "content": "Who are you?"},
41
+ ]
42
+
43
+ outputs = pipeline(
44
+ messages,
45
+ max_new_tokens=256,
46
+ )
47
+ print(outputs[0]["generated_text"][-1])
48
+ ```
49
+
50
+ ## Model Card
51
+
52
+ * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
53
+ * Continued Pretraining: 1B tokens on 4M Per-source upsampled SlimPajama data.
54
+ * Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains.
55
+ * Maximum context window: 4M tokens
56
+
57
+ ## Evaluation Results
58
+
59
+ We evaluate UltraLong-8B on a diverse set of benchmarks, including long-context tasks (e.g., RULER, LV-Eval, and InfiniteBench) and standard tasks (e.g., MMLU, MATH, GSM-8K, and HumanEval). UltraLong-8B achieves superior performance on ultra-long context tasks while maintaining competitive results on standard benchmarks.
60
+
61
+ ### Needle in a Haystack
62
+
63
+ <img width="80%" alt="image" src="Llama-3.1-8B-UltraLong-4M-Instruct.png">
64
+
65
+ ### Long context evaluation
66
+
67
+ <img width="80%" alt="image" src="long_benchmark.png">
68
+
69
+ ### Standard capability evaluation
70
+
71
+ <img width="80%" alt="image" src="standard_benchmark.png">
72
+
73
+ ## Correspondence to
74
+ Chejian Xu (chejian2@illinois.edu), Wei Ping (wping@nvidia.com)
75
+
76
+ ## Citation
77
+
78
+ <pre>
79
+
80
+ </pre>
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:641ab10f80c5bbfff70b497475004217984631805653efb48bbbeef274c1bfca
3
+ size 897
generation_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4abcf8ce084298b021f1d6f42b4c08d141d1897eff20428e987a9e8dbcfd2e42
3
+ size 121
long_benchmark.png ADDED
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c58de27585b804eb04ddbd447ea968cb4bce7c837354f9544cde422bae07b55
3
+ size 4899049080
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:230bd87502ae97083c7036db362a93735583e5cbbb8caadd45f366379e591e0a
3
+ size 4832007448
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c104438497e27fb91430d9bf29dfe677fe589e8c6926ed08a8b43fb2c6a21f6
3
+ size 4999813112
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6d9084bbc917e3f3cdee6a3d495553dc8ba7f9e967556516fe619c193a18b3b
3
+ size 4999813128
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d914d8ff6b625a230019165305daec4d6184aed6234e7a9612469f2be9fde7f
3
+ size 4832007496
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c42a592d6b95cc58d743282535d659a2890b433a79a079fdf65b38c242b09eb6
3
+ size 4999813120
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc74bff74003032a5df43e041a5a903c6cab652a039c2d26042d79c3c9c65741
3
+ size 2583741096
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fde2fcc0ec44f067e7429e647bfbeeb159a649a8fd9f209ff315ec5411af6b4
3
+ size 23950
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f38c73729248f6c127296386e3cdde96e254636cc58b4169d3fd32328d9a8ec
3
+ size 296
standard_benchmark.png ADDED
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
3
+ size 17209920
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5965bf8c62811c97bff003a2198c7ccb5c0589ae9310e9c796a008a336e1c9
3
+ size 55377