Add paper link to model card (#5)
Browse files- Add paper link to model card (797ecea8a3e604c7070e169fc1354b49fce51349)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,15 +1,15 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
-
tags:
|
4 |
-
- finetuned
|
5 |
-
- chat
|
6 |
language:
|
7 |
- en
|
8 |
- ko
|
9 |
- ja
|
10 |
- zh
|
11 |
-
pipeline_tag: text-generation
|
12 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
# Trillion-7B-preview
|
@@ -22,7 +22,7 @@ library_name: transformers
|
|
22 |
|
23 |
## Introduction
|
24 |
|
25 |
-
We introduce Trillion-7B-preview, a preview of our latest large language model designed to push the boundaries of multilingual scalability and performance.
|
26 |
|
27 |
|
28 |
When comparing performance to training FLOPs for Trillion-7B-preview with competitive models, our model pushes the Pareto frontier, achieving around 66.5% average performance while using significantly fewer compute (~9.3×10²² FLOPs). It outperforms models like Mistral-7B-Instruct-v0.3 and SOLAR-10.7B-Instruct-v1.0 while remaining competitive with models requiring 3-8× more compute such as Qwen2.5-7B-Instruct and EXAONE-3.5-7.8B-Instruct. For full benchmark results, see tables below.
|
@@ -240,4 +240,4 @@ This model repository is licensed under the Apache-2.0 License.
|
|
240 |
}
|
241 |
```
|
242 |
## Contact
|
243 |
-
For inquiries, please contact: info@trillionlabs.co
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
- ko
|
5 |
- ja
|
6 |
- zh
|
|
|
7 |
library_name: transformers
|
8 |
+
license: apache-2.0
|
9 |
+
pipeline_tag: text-generation
|
10 |
+
tags:
|
11 |
+
- finetuned
|
12 |
+
- chat
|
13 |
---
|
14 |
|
15 |
# Trillion-7B-preview
|
|
|
22 |
|
23 |
## Introduction
|
24 |
|
25 |
+
We introduce Trillion-7B-preview, a preview of our latest large language model designed to push the boundaries of multilingual scalability and performance. This model is presented in the paper: [Trillion-7B-preview](https://huggingface.co/papers/2504.15431).
|
26 |
|
27 |
|
28 |
When comparing performance to training FLOPs for Trillion-7B-preview with competitive models, our model pushes the Pareto frontier, achieving around 66.5% average performance while using significantly fewer compute (~9.3×10²² FLOPs). It outperforms models like Mistral-7B-Instruct-v0.3 and SOLAR-10.7B-Instruct-v1.0 while remaining competitive with models requiring 3-8× more compute such as Qwen2.5-7B-Instruct and EXAONE-3.5-7.8B-Instruct. For full benchmark results, see tables below.
|
|
|
240 |
}
|
241 |
```
|
242 |
## Contact
|
243 |
+
For inquiries, please contact: info@trillionlabs.co
|