SuperHOT Prototype 2 w/ 8K Context
This is a second prototype of SuperHOT, a NSFW focused LoRA, this time 7B with 8K context and no RLHF, using the same technique described in the github blog.
Looking for Merged & Quantized Models?
Make some please :)
Using the monkey-patch?
You will NEED to apply the monkeypatch or, if you are already using the monkeypatch, change the scaling factor to 0.25 and the maximum sequence length to 8192
The monkeypatch is only necessary if you are using a front-end/back-end that does not already support scaling and said front-end/back-end is Python-based (i.e. Huggingface Transformers). To apply the patch, you will need to copy the llama_rope_scaled_monkey_patch.py
into your working directory and call the exported function replace_llama_rope_with_scaled_rope
at the very start of your Python program. It will modify the Transformers library's implementation of RoPE to properly apply the scaling factor.
Using Oobabooga with Exllama?
Switch your loader to exllama
or exllama_hf
Add the arguments max_seq_len 8192
and compress_pos_emb 4
. While the model may work well with compress_pos_emb 2
, it was trained on 4, so that is what I advocate for you to use
Example in the command-line:
python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf
In the UI, you will see the loader option in the Models
tab. Once you select either exllama
or exllama_hf
, the max_seq_len
and compress_pos_emb
settings will appear.
Training Details
I trained the LoRA with the following configuration:
- 1200 samples (~400 samples over 2048 sequence length)
- learning rate of 3e-4
- 3 epochs
- The exported modules are:
- q_proj
- k_proj
- v_proj
- o_proj
- no bias
- Rank = 4
- Alpha = 8
- no dropout
- weight decay of 0.1
- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
- Trained on 4-bit base model
- Cutoff length: 4096