mnoukhov/summarize_from_feedback_oai_preprocessing_1706381144_relabel_pythia6.9b Viewer • Updated Jun 20 • 177k • 61
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144 Viewer • Updated Jan 27 • 130k • 1.01k
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Paper • 2410.18252 • Published 13 days ago • 5