Aug 15

Hi, I am a bit confused, what is the original reward space?
Seems like rewards are transformed to the range of -0.5 to 4.5.

The actual rewards of this example from the HelpSteer dataset

are [3,3,4,2,2] for the five helpsteer objectives:

helpfulness, correctness, coherence, complexity, verbosity

We can linearly transform our predicted rewards to the

original reward space to compare with the ground truth

helpsteer_rewards_pred = multi_obj_rewards[0, :5] * 5 - 0.5
print(helpsteer_rewards_pred)

[2.78125 2.859375 3.484375 1.3847656 1.296875 ]

Haoxiang-Wang

RLHFlow org Aug 15

I linearly transformed HelpSteer rewards from [0-4] to [0.1, 0.9] for training the model with x -> (x+0.5)/5. So, in order to inverse transform it to the original scale [0,4], I applied x -> x*5 - 0.5

Haoxiang-Wang changed discussion status to closed Aug 15

RLHFlow
/

ArmoRM-Llama3-8B-v0.1

Original reward space