Question about cDPO

by athirdpath - opened Dec 5, 2023

Discussion

athirdpath

Dec 5, 2023

•

edited Dec 5, 2023

Hello, and thank you for all your work on MergeKit.

I'm using my Iambe model to produce a uncensored role-playing DPO pairs dataset at the moment, I'm up to ~3k examples. When you say cDPO, I assume you're referring to this mini-paper? If so, is there an open source repo out there that supports it? I understand the broad strokes and like what I see but couldn't implement it myself.

chargoddard

Owner Dec 5, 2023

Hi! Glad you're finding it useful - your experiments with 20b models are quite interesting.

Yep, that's the mini-paper in question. Trl added support for the cDPO loss function in commit c84e591. You can enable it by passing the label_smoothing argument to DPOTrainer.

athirdpath

Dec 5, 2023

Thank you!

athirdpath changed discussion status to closed Dec 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment