@davanstrien on Hugging Face: "Could more DPO-style preference data be crucial for enhancing open LLMs across…"

Post

2029

Could more DPO-style preference data be crucial for enhancing open LLMs across different languages?

Leveraging a 7k preference dataset Argilla ( @alvarobartt ), Hugging Face ( @lewtun ) and Kaist AI ( @JW17 & @nlee-208 )
utilized Kaist AI's recently introduced ORPO technique ORPO: Monolithic Preference Optimization without Reference Model (2403.07691) with the latest MistralAI MOE model to create a very high-performing open LLM: HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Since ORPO doesn't require a separate SFT stage, all that is needed is a strong base model + high-quality DPO-style datasets.

Currently, there is a significant lack of non-English DPO datasets. Filling this gap could significantly improve open LLMs in various languages.

You can get an overview of the current state of DPO datasets across different languages here: DIBT/preference_data_by_language

Join the conversation