Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6