Update README.md
Browse files
README.md
CHANGED
@@ -36,3 +36,7 @@ Trained without ChatML templating. This model uses a pattern of:
|
|
36 |
|
37 |
The primary utility of this model is as a means to synthesize rejected / lower quality preference data from pre-existing SFT data (i.e, the general pretraining corpus).
|
38 |
This is useful in the context of teaching a reward model **generalized preferences** from lower quality, subtly incoherent base model-esque completions, of which are trivial to produce compared to human annotations.
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
The primary utility of this model is as a means to synthesize rejected / lower quality preference data from pre-existing SFT data (i.e, the general pretraining corpus).
|
38 |
This is useful in the context of teaching a reward model **generalized preferences** from lower quality, subtly incoherent base model-esque completions, of which are trivial to produce compared to human annotations.
|
39 |
+
|
40 |
+
## Acknowledgements
|
41 |
+
|
42 |
+
Trained on 8xH200s provided free of charge by [Deepshard](https://github.com/deepshard) for research & open source experimentation. Big McThankies.
|