Update README.md
Browse files
README.md
CHANGED
@@ -9,13 +9,13 @@ base_model:
|
|
9 |
- Qwen/Qwen2.5-7B-Instruct
|
10 |
---
|
11 |
|
12 |
-
|
13 |
|
14 |
The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
|
15 |
|
16 |
Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
|
17 |
|
18 |
-
|
19 |
|
20 |
> ```python
|
21 |
> # Load model directly
|
@@ -65,4 +65,22 @@ Demo:
|
|
65 |
> output=model.generate(input_ids,do_sample=False)
|
66 |
> judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
|
67 |
> print("Model judgement: ",judgement)
|
68 |
-
> ```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
- Qwen/Qwen2.5-7B-Instruct
|
10 |
---
|
11 |
|
12 |
+
Model Details
|
13 |
|
14 |
The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
|
15 |
|
16 |
Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
|
17 |
|
18 |
+
## **Quick start**
|
19 |
|
20 |
> ```python
|
21 |
> # Load model directly
|
|
|
65 |
> output=model.generate(input_ids,do_sample=False)
|
66 |
> judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
|
67 |
> print("Model judgement: ",judgement)
|
68 |
+
> ```
|
69 |
+
|
70 |
+
## Use as a remote reward
|
71 |
+
|
72 |
+
```bash
|
73 |
+
# launch a remote reward
|
74 |
+
bash launch_reward.sh {MODEL_PATH} {ANSWER_PATH} {METRIC}
|
75 |
+
|
76 |
+
# MODEL_PATH: the path of our generative reward model.
|
77 |
+
# ANSWER_PATH: the path of the training data.
|
78 |
+
# METRIC: greedy/prob
|
79 |
+
# This will launch a reward at http://127.0.0.1:8000/get_reward
|
80 |
+
|
81 |
+
# train
|
82 |
+
bash train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_API}
|
83 |
+
|
84 |
+
# Both train.sh and launch_reward.sh can be found in the model directory.
|
85 |
+
# We will release our github repo soon!
|
86 |
+
```
|