yale-nlp/RefDPO
Viewer
•
Updated
•
312k
•
122
Model and data collection for our work "Understanding Reference Policies in Direct Preference Optimization" (https://arxiv.org/abs/2407.13709)
Note Datasets