Collections
Discover the best community collections!
Collections including paper arxiv:2404.01954
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 48 -
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 19 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 11 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 82