TRL documentation
Reward Functions
You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v0.18.1).
Reward Functions
This module contains some useful reward functions, primarily intended for use with the GRPOTrainer.
Format rewards
think_format_reward
trl.rewards.think_format_reward
< source >( completions: list **kwargs ) → list[float]
Parameters
- completions (
list[list[dict[str, str]]]
) — List of completions to be evaluated. Each completion must be a list of one message, i.e. a dictionary containing the key"content"
with the value being the text of the completion. - **kwargs — Additional keyword arguments. This function does not use them, but they are required in the function signature to ensure compatibility with trainers like GRPOTrainer.
Returns
list[float]
A list of rewards, where each reward is 1.0 if the completion matches the expected format, otherwise 0.0.
Reward function that checks if the reasoning process is enclosed within "<think>"
and "</think>"
tags. The
function returns a reward of 1.0 if the format is correct, otherwise 0.0.