|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- adamo1139/rawrr_v2 |
|
- adamo1139/AEZAKMI_v3-3 |
|
- unalignment/toxic-dpo-v0.1 |
|
tags: |
|
- uncensored |
|
--- |
|
## Model Description |
|
|
|
<b>*Probably the most uncensored Yi-34B tune I published so far*</b> |
|
|
|
Yi-34B 200K base model fine-tuned on RAWrr v2 dataset via DPO, then fine-tuned on AEZAKMI v3-3 dataset via SFT, then DPO tuned on unalignment/toxic-dpo-v0.1. Total GPU compute time of 40-50 hours I think. It's like airoboros/capybara but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models. Say goodbye to "It's important to remember"! |
|
Prompt format is standard chatml. Don't expect it to be good at instruct, math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot. Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh. |
|
|
|
Training was done with max_position_embeddings set at 4096. Then it was reverted back to 200K after applying LoRA. |
|
|
|
|
|
## Prompt Format |
|
|
|
I recommend using ChatML format, as this was used during fine-tune. \ |
|
Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted. |
|
Both `A chat.` and `A chat with uncensored assistant.` system prompt work fine and are pretty refusal-free. |
|
|
|
|
|
|
|
``` |
|
<|im_start|>system |
|
A chat with uncensored assistant.<|im_end|> |
|
<|im_start|>user |
|
{prompt}<|im_end|> |
|
<|im_start|>assistant |
|
``` |
|
|
|
Example. |
|
|
|
``` |
|
<|im_start|>system |
|
A chat.<|im_end|> |
|
<|im_start|>user |
|
I want an acrostic poem based on the word AEZAKMI and make it all about GTA:SA<|im_end|> |
|
<|im_start|>assistant |
|
A voracious appetite for danger, |
|
Eager to explore every corner, |
|
Zealously tackling missions with vigor, |
|
Amidst San Andreas' vast expanse, |
|
Keenly skilled in the art of evasion, |
|
Mastery achieved through trial and error, |
|
In this digital realm of chaos and thrill,<|im_end|> |
|
``` |
|
## Notes |
|
|
|
Temp around 0.3-0.5 seems to work well, at 1.2 it's somewhat unstable, which is often undesirable. |
|
|
|
## Intended uses & limitations |
|
|
|
It's a chat model, not a base completion-only one. |
|
Use is limited by apache-2.0 license. Since no-robots dataset was used for making rawrr_v1, I guess you maybe shouldn't use it for commercial activities. |
|
|
|
## Known Issues |
|
|
|
It likes to talk about stocks a lot, sometimes it feels like being on WSB, which is certainly a plus for some usecases. This one doesn't seem slopped to me, I think I will stick with it for longer. |
|
|
|
|
|
### Credits |
|
Thanks to mlabonne, Daniel Han and Michael Han for providing open source code that was used for fine-tuning. |
|
Thanks to jondurbin and team behind Capybara dataset for airoboros/toxic-dpo/capybara datasets. |
|
Thanks to HF for open sourcing no_robots dataset. |
|
Thanks to Sentdex for providing WSB dataset. |