Secrets of RLHF in Large Language Models Part I: PPO Paper • 2307.04964 • Published Jul 11, 2023 • 28
Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 9
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment Paper • 2310.00212 • Published Sep 30, 2023 • 2