EpistemeAI
/

Fireball-Alpaca-Llama3.1.08-8B-Philos-C-R1-KTO-beta

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

legolasyiu commited on Sep 16

Commit

5dc0892

•

1 Parent(s): a6da892

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -9,6 +9,8 @@ tags:
 - unsloth
 - llama
 - trl
 ---
 <img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/>
@@ -104,7 +106,7 @@ Where to send questions or comments about the model Instructions on how to provi
 ## Training
 **KTO Fine tuning**:
-Experimental: KTO fine tuning
 KTO - Kahneman-Tversky Optimization (KTO) that makes it easier and cheaper than ever before to align LLMs on your data without compromising performance
@@ -225,4 +227,4 @@ But Llama 3.1 is a new technology, and like any new technology, there are risks
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - unsloth
 - llama
 - trl
+datasets:
+- argilla/distilabel-intel-orca-kto
 ---
 <img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/>
 ## Training
 **KTO Fine tuning**:
+Experimental: KTO fine tuning with dataset- argilla/distilabel-intel-orca-kto
 KTO - Kahneman-Tversky Optimization (KTO) that makes it easier and cheaper than ever before to align LLMs on your data without compromising performance
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)