virtuoussy
/

Qwen2.5-7B-Instruct-RLVR

Model card Files Files and versions Community

virtuoussy commited on Apr 2

Commit

920335c

·

verified ·

1 Parent(s): 1680393

Update README.md

Files changed (1) hide show

README.md +21 -3

README.md CHANGED Viewed

@@ -9,13 +9,13 @@ base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
-## Model Details
 The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
 Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
-Demo:
 > ```python
 > # Load model directly
@@ -65,4 +65,22 @@ Demo:
 > output=model.generate(input_ids,do_sample=False)
 > judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
 > print("Model judgement: ",judgement)
-> ```

 - Qwen/Qwen2.5-7B-Instruct
 ---
+Model Details
 The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
 Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
+## **Quick start**
 > ```python
 > # Load model directly
 > output=model.generate(input_ids,do_sample=False)
 > judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
 > print("Model judgement: ",judgement)
+> ```
+## Use as a remote reward
+```bash
+# launch a remote reward
+bash launch_reward.sh {MODEL_PATH} {ANSWER_PATH} {METRIC}
+# MODEL_PATH: the path of our generative reward model.
+# ANSWER_PATH: the path of the training data.
+# METRIC: greedy/prob
+# This will launch a reward at http://127.0.0.1:8000/get_reward
+# train
+bash train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_API}
+# Both train.sh and launch_reward.sh can be found in the model directory.
+# We will release our github repo soon!
+```