ORPO: Monolithic Preference Optimization without Reference Model
Paper
•
2403.07691
•
Published
•
61
Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook