Thanks.
Hi,
We primarily follow RLHFlow's recipe, except that we train for 2 epochs instead.
· Sign up or log in to comment