Safetensors
qwen2
virtuoussy commited on
Commit
920335c
·
verified ·
1 Parent(s): 1680393

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -9,13 +9,13 @@ base_model:
9
  - Qwen/Qwen2.5-7B-Instruct
10
  ---
11
 
12
- ## Model Details
13
 
14
  The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
15
 
16
  Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
17
 
18
- Demo:
19
 
20
  > ```python
21
  > # Load model directly
@@ -65,4 +65,22 @@ Demo:
65
  > output=model.generate(input_ids,do_sample=False)
66
  > judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
67
  > print("Model judgement: ",judgement)
68
- > ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - Qwen/Qwen2.5-7B-Instruct
10
  ---
11
 
12
+ Model Details
13
 
14
  The generative reward model used in paper "Expanding RL with Verifiable Rewards Across Diverse Domains".
15
 
16
  Inputting the question, label and the response to be evaluated, the model will judge if the response is right.
17
 
18
+ ## **Quick start**
19
 
20
  > ```python
21
  > # Load model directly
 
65
  > output=model.generate(input_ids,do_sample=False)
66
  > judgement=tokenizer.decode(output[0][input_ids.shape[1]:],skip_special_tokens=True)
67
  > print("Model judgement: ",judgement)
68
+ > ```
69
+
70
+ ## Use as a remote reward
71
+
72
+ ```bash
73
+ # launch a remote reward
74
+ bash launch_reward.sh {MODEL_PATH} {ANSWER_PATH} {METRIC}
75
+
76
+ # MODEL_PATH: the path of our generative reward model.
77
+ # ANSWER_PATH: the path of the training data.
78
+ # METRIC: greedy/prob
79
+ # This will launch a reward at http://127.0.0.1:8000/get_reward
80
+
81
+ # train
82
+ bash train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_API}
83
+
84
+ # Both train.sh and launch_reward.sh can be found in the model directory.
85
+ # We will release our github repo soon!
86
+ ```