picocreator commited on
Commit
650a941
·
verified ·
1 Parent(s): f42f284

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -11
README.md CHANGED
@@ -6,16 +6,8 @@ library_name: transformers
6
 
7
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png)
8
 
9
- > Psst! Try out the model on [![Featherless](https://img.shields.io/badge/featherless--ai%2FQwerky--QwQ--32B-Dummy?style=flat&label=Featherless&color=facc15)](https://featherless.ai/models/featherless-ai/Qwerky-QwQ-32B)
10
- > Our launch article can be found on here! [![Substack](https://img.shields.io/badge/Substack-Dummy?style=flat&color=facc15)](https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large)
11
-
12
- Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.
13
-
14
- As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B Instruct Preview, we have successfully converted Qwen 2.5 QwQ 32B into a RWKV variant without requiring a pretrain on the base model or retraining the model from scratch. Enabling us to test and validate the more efficient RWKV Linear attention with a much smaller budget. Since our preview, we have continued to refine our technique and managed to improve the model over the preview model iteration.
15
-
16
- As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
17
-
18
- You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
19
 
20
  Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
21
 
@@ -80,4 +72,14 @@ generated_ids = [
80
  ]
81
 
82
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
- ```
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png)
8
 
9
+ - Try out the model on [![Featherless](https://img.shields.io/badge/featherless--ai%2FQwerky--QwQ--32B-Dummy?style=flat&label=Featherless&color=facc15)](https://featherless.ai/models/featherless-ai/Qwerky-QwQ-32B)
10
+ - Model details can be found on our blog post here! [![Substack](https://img.shields.io/badge/Substack-Dummy?style=flat&color=facc15)](https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large)
 
 
 
 
 
 
 
 
11
 
12
  Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
13
 
 
72
  ]
73
 
74
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
75
+ ```
76
+
77
+ ## Model notes
78
+
79
+ Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.
80
+
81
+ As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B Instruct Preview, we have successfully converted Qwen 2.5 QwQ 32B into a RWKV variant without requiring a pretrain on the base model or retraining the model from scratch. Enabling us to test and validate the more efficient RWKV Linear attention with a much smaller budget. Since our preview, we have continued to refine our technique and managed to improve the model over the preview model iteration.
82
+
83
+ As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
84
+
85
+ You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).