---
tags:
- finetune
- fine tune
- dpo
- sft
- yi
license: other
---

THIS MODEL IS EXPERIMENTAL AND MIGHT BE BUGGY, I DIDN'T PERFECT THE STRENGTH OF DPO AND SFT YET. 


Yi-34B-200K trained via DPO on RAWrr_v1 at ctx 200 (lora_r 4, lora_alpha 8) and then via SFT at ctx 1400 (lora_r 16, lora_alpha 32) on AEZAKMI_v2.
It's less prone to refusals than Yi-34B-200K-AEZAKMI-v2 but that's work in progress still - I want to do DPO with higher lora rank and ctx and then repeat SFT training.
I haven't tested it too much, but on what I've seen, it's a good model. 

If you want to re-produce this model by merging loras, start by downloading Yi-34B-200K-Llamafied. \
Then merge it with https://huggingface.co/adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r2 \
Then merge the resulting model with https://huggingface.co/adamo1139/yi-34b-200k-aezakmi-v2-rawrr-v1-run1-experimental-LoRA


License:
yi-license + non-commercial use only