|
--- |
|
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png |
|
license: apache-2.0 |
|
library_name: transformers |
|
--- |
|
|
|
 |
|
|
|
- Try out the model on [](https://featherless.ai/models/featherless-ai/Qwerky-QwQ-32B) |
|
- Model details from our blog post here! [](https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large) |
|
|
|
Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models: |
|
|
|
| Tasks | Metric | Qwerky-QwQ-32B | Qwen/QwQ-32B | Qwerky-72B | Qwen2.5-72B-Instruct | |
|
|:---:|:---:|:---:|:---:|:---:|:---:| |
|
| arc_challenge | acc_norm | **0.5640** | 0.5563 | **0.6382** | 0.6323 | |
|
| arc_easy | acc_norm | 0.7837 | **0.7866** | **0.8443** | 0.8329 | |
|
| hellaswag | acc_norm | 0.8303 | **0.8407** | 0.8573 | **0.8736** | |
|
| lambada_openai | acc | 0.6621 | **0.6683** | **0.7539** | 0.7506 | |
|
| piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** | |
|
| sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** | |
|
| winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 | |
|
| mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** | |
|
|
|
> *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.* |
|
|
|
|
|
## Running with `transformers` |
|
|
|
Since this model is not on transformers at the moment you will have to enable remote code with the following line. |
|
|
|
```py |
|
# ... |
|
|
|
model = AutoModelForCausalLM.from_pretrained("featherless-ai/Qwerky-QwQ-32B", trust_remote_code=True) |
|
|
|
# ... |
|
``` |
|
|
|
Other than enabling remote code, you may run the model like a regular model with transformers like so. |
|
|
|
```py |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "featherless-ai/Qwerky-72B" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto", |
|
trust_remote_code=True, |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
prompt = """There is a very famous song that I recall by the singer's surname as Astley. |
|
I can't remember the name or the youtube URL that people use to link as an example url. |
|
What's song name?""" |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": prompt}, |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, tokenize=False, add_generation_prompt=True |
|
) |
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
generated_ids = model.generate(**model_inputs, max_new_tokens=512) |
|
generated_ids = [ |
|
output_ids[len(input_ids) :] |
|
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
``` |
|
|
|
## Model notes |
|
|
|
Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility. |
|
|
|
As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B Instruct Preview, we have successfully converted Qwen 2.5 QwQ 32B into a RWKV variant without requiring a pretrain on the base model or retraining the model from scratch. Enabling us to test and validate the more efficient RWKV Linear attention with a much smaller budget. Since our preview, we have continued to refine our technique and managed to improve the model over the preview model iteration. |
|
|
|
As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models. |
|
|
|
You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1). |
|
|