Update README.md
Browse files
README.md
CHANGED
@@ -31,6 +31,7 @@ The responses are shorter depending on the 'prompt'
|
|
31 |
Responds better to Spanish and is more proficient in cryptocurrency news
|
32 |
knows more about interpreting what the user asks for in Spanish
|
33 |
the previous model 'NickyNicky/Llama-1B-base-GRPO-RAG-NEWS-SPANISH' towards inference of more tokens. The v2 model responds shorter but with what the user requests.
|
|
|
34 |
```
|
35 |
|
36 |
<!-- https://wandb.ai/multimodal_master/huggingface/runs/3fl5oqaw/workspace?nw=nwusertrainllms -->
|
|
|
31 |
Responds better to Spanish and is more proficient in cryptocurrency news
|
32 |
knows more about interpreting what the user asks for in Spanish
|
33 |
the previous model 'NickyNicky/Llama-1B-base-GRPO-RAG-NEWS-SPANISH' towards inference of more tokens. The v2 model responds shorter but with what the user requests.
|
34 |
+
controlling sharp rises 'kl', 'loss' and 'grad_norm with 'bleu_reward_func'
|
35 |
```
|
36 |
|
37 |
<!-- https://wandb.ai/multimodal_master/huggingface/runs/3fl5oqaw/workspace?nw=nwusertrainllms -->
|