Question about cDPO
#2
by
athirdpath
- opened
Hello, and thank you for all your work on MergeKit.
I'm using my Iambe model to produce a uncensored role-playing DPO pairs dataset at the moment, I'm up to ~3k examples. When you say cDPO, I assume you're referring to this mini-paper? If so, is there an open source repo out there that supports it? I understand the broad strokes and like what I see but couldn't implement it myself.
Hi! Glad you're finding it useful - your experiments with 20b models are quite interesting.
Yep, that's the mini-paper in question. Trl added support for the cDPO loss function in commit c84e591. You can enable it by passing the label_smoothing
argument to DPOTrainer.
Thank you!
athirdpath
changed discussion status to
closed