This is a TEST It was made with a custom Orthogonal Activation Steering script I shared HERE : https://huggingface.co/posts/Undi95/318385306588047#663609dc1818d469455c0222 (but be ready to put your hands in some fucked up code bro)
Step :
- First I took Unholy (FT of L3 on Toxic Dataset)
- Then I trained 2 epoch of DPO on top, with the SAME dataset (https://wandb.ai/undis95/Uncensored8BDPO/runs/3rg4rz13/workspace?nw=nwuserundis95)
- Finally, I used OAS on top, bruteforcing the layer to get the best one (I don't really understand all of this, sorry)