Post
2029
Could more DPO-style preference data be crucial for enhancing open LLMs across different languages?
Leveraging a 7k preference dataset Argilla ( @alvarobartt ), Hugging Face ( @lewtun ) and Kaist AI ( @JW17 & @nlee-208 )
utilized Kaist AI's recently introduced ORPO technique ORPO: Monolithic Preference Optimization without Reference Model (2403.07691) with the latest MistralAI MOE model to create a very high-performing open LLM: HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Since ORPO doesn't require a separate SFT stage, all that is needed is a strong base model + high-quality DPO-style datasets.
Currently, there is a significant lack of non-English DPO datasets. Filling this gap could significantly improve open LLMs in various languages.
You can get an overview of the current state of DPO datasets across different languages here: DIBT/preference_data_by_language
Leveraging a 7k preference dataset Argilla ( @alvarobartt ), Hugging Face ( @lewtun ) and Kaist AI ( @JW17 & @nlee-208 )
utilized Kaist AI's recently introduced ORPO technique ORPO: Monolithic Preference Optimization without Reference Model (2403.07691) with the latest MistralAI MOE model to create a very high-performing open LLM: HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Since ORPO doesn't require a separate SFT stage, all that is needed is a strong base model + high-quality DPO-style datasets.
Currently, there is a significant lack of non-English DPO datasets. Filling this gap could significantly improve open LLMs in various languages.
You can get an overview of the current state of DPO datasets across different languages here: DIBT/preference_data_by_language