Transforming and Combining Rewards for Aligning Large Language Models Paper • 2402.00742 • Published Feb 1 • 11
Leverage the Average: an Analysis of KL Regularization in RL Paper • 2003.14089 • Published Mar 31, 2020 • 2
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1 • 10
UltraFeedback: Boosting Language Models with High-quality Feedback Paper • 2310.01377 • Published Oct 2, 2023 • 5
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Paper • 2404.12318 • Published Apr 18 • 14