Quest-AI
/

quest-corruption-7b-s375-v3-GRPO

Model card Files Files and versions Community

kalomaze commited on Feb 24

Commit

5e6ba94

·

verified ·

1 Parent(s): cfc2076

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -36,3 +36,7 @@ Trained without ChatML templating. This model uses a pattern of:
 The primary utility of this model is as a means to synthesize rejected / lower quality preference data from pre-existing SFT data (i.e, the general pretraining corpus).
 This is useful in the context of teaching a reward model **generalized preferences** from lower quality, subtly incoherent base model-esque completions, of which are trivial to produce compared to human annotations.

 The primary utility of this model is as a means to synthesize rejected / lower quality preference data from pre-existing SFT data (i.e, the general pretraining corpus).
 This is useful in the context of teaching a reward model **generalized preferences** from lower quality, subtly incoherent base model-esque completions, of which are trivial to produce compared to human annotations.
+## Acknowledgements
+Trained on 8xH200s provided free of charge by [Deepshard](https://github.com/deepshard) for research & open source experimentation. Big McThankies.