SnakyMcSnekFace
/

Psyfighter2-13B-vore

Text Generation

Not-For-All-Audiences

text-generation-inference

Model card Files Files and versions Community

SnakyMcSnekFace commited on Sep 18

Commit

f327f9c

•

1 Parent(s): 9ef4b89

Upload README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -174,6 +174,8 @@ Half of the samples was generated by this model where prompts contained the adve
 [KTO](https://arxiv.org/abs/2402.01306) trainer from [Hugging Face TRL library](https://huggingface.co/docs/trl/en/kto_trainer) was employed for performing preference alignment. The LoRA adapter from the previous training stages was merged into the model, and a new LoRA adapter was created for the KTO training. The quantized base model serves as a reference.
 #### QLoRa adapter configuration
 - Rank: 16
@@ -210,7 +212,7 @@ The model's performance in Adventure Mode has improved substantially. The writin
 ![Gradient Norm](img/kto_grad_norm.png)
 ![Learning rate](img/kto_learning_rate.png)
 ![Rewards](img/kto_train_rewards.png)
-![Log probabilities](img/train_logps.png)
 ![KL divergence](img/kto_train_kl_divergence.png)

 [KTO](https://arxiv.org/abs/2402.01306) trainer from [Hugging Face TRL library](https://huggingface.co/docs/trl/en/kto_trainer) was employed for performing preference alignment. The LoRA adapter from the previous training stages was merged into the model, and a new LoRA adapter was created for the KTO training. The quantized base model serves as a reference.
+During the alignment, the model was encouraged to respect player's actions and agency, construct a coherent narrative, and use evocative language to describe the world and the outcome of the player's actions.
 #### QLoRa adapter configuration
 - Rank: 16
 ![Gradient Norm](img/kto_grad_norm.png)
 ![Learning rate](img/kto_learning_rate.png)
 ![Rewards](img/kto_train_rewards.png)
+![Log probabilities](img/kto_train_logps.png)
 ![KL divergence](img/kto_train_kl_divergence.png)