RLHF - a Vigneshwaran Collection

Vigneshwaran 's Collections

RLHF

RLHF

updated 6 days ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 62
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 39
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 29
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 84
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15 • 82
Dataset Reset Policy Optimization for RLHF

Paper • 2404.08495 • Published Apr 12 • 8
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

Paper • 2404.14723 • Published Apr 23 • 10
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20 • 34
Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13 • 34
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Paper • 2406.02900 • Published Jun 5 • 11
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22 • 10
HelpSteer2: Open-source dataset for training top-performing reward models

Paper • 2406.08673 • Published Jun 12 • 16
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Paper • 2406.09279 • Published Jun 13 • 1
Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14 • 14
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 32
Theoretical guarantees on the best-of-n alignment policy

Paper • 2401.01879 • Published Jan 3
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

Paper • 2312.11456 • Published Dec 18, 2023 • 1
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Paper • 2304.06767 • Published Apr 13, 2023 • 2
Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1 • 24
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

Paper • 2406.10216 • Published Jun 14 • 2
Scaling Laws for Reward Model Overoptimization

Paper • 2210.10760 • Published Oct 19, 2022
AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3 • 48
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

Paper • 2405.17931 • Published May 28
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Paper • 2405.00451 • Published May 1
Foundations of Reinforcement Learning and Interactive Decision Making

Paper • 2312.16730 • Published Dec 27, 2023
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Paper • 2408.07199 • Published Aug 13 • 20
Disentangling Length from Quality in Direct Preference Optimization

Paper • 2403.19159 • Published Mar 28
Imitating Language via Scalable Inverse Reinforcement Learning

Paper • 2409.01369 • Published Sep 2
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 24
D2PO: Discriminator-Guided DPO with Response Evaluation Models

Paper • 2405.01511 • Published May 2
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Paper • 2408.06266 • Published Aug 12 • 9
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 134
The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published Sep 30 • 4
HelpSteer2-Preference: Complementing Ratings with Preferences

Paper • 2410.01257 • Published Oct 2 • 19
A Critical Evaluation of AI Feedback for Aligning Large Language Models

Paper • 2402.12366 • Published Feb 19 • 3
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Paper • 2410.08146 • Published Oct 10
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Paper • 2410.02089 • Published Oct 2 • 11
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF

Paper • 2411.01798 • Published 13 days ago • 8