Original reward space
#15
by
anjaa
- opened
Hi, I am a bit confused, what is the original reward space?
Seems like rewards are transformed to the range of -0.5 to 4.5.
The actual rewards of this example from the HelpSteer dataset
are [3,3,4,2,2] for the five helpsteer objectives:
helpfulness, correctness, coherence, complexity, verbosity
We can linearly transform our predicted rewards to the
original reward space to compare with the ground truth
helpsteer_rewards_pred = multi_obj_rewards[0, :5] * 5 - 0.5
print(helpsteer_rewards_pred)
[2.78125 2.859375 3.484375 1.3847656 1.296875 ]
I linearly transformed HelpSteer rewards from [0-4] to [0.1, 0.9] for training the model with x -> (x+0.5)/5
. So, in order to inverse transform it to the original scale [0,4], I applied x -> x*5 - 0.5
Haoxiang-Wang
changed discussion status to
closed