qgallouedec HF staff commited on
Commit
40e0089
1 Parent(s): f3e4f46

End of training

Browse files
Files changed (1) hide show
  1. README.md +31 -2
README.md CHANGED
@@ -28,7 +28,7 @@ print(output["generated_text"][1]["content"])
28
 
29
  ## Training procedure
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/huggingface/huggingface/runs/9fvfu3z2)
32
 
33
  This model was trained with XPO, a method introduced in [Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF](https://huggingface.co/papers/2405.21046).
34
 
@@ -38,4 +38,33 @@ This model was trained with XPO, a method introduced in [Exploratory Preference
38
  - Transformers: 4.45.0.dev0
39
  - Pytorch: 2.4.1
40
  - Datasets: 3.0.0
41
- - Tokenizers: 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Training procedure
30
 
31
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/huggingface/huggingface/runs/pya9ndl2)
32
 
33
  This model was trained with XPO, a method introduced in [Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF](https://huggingface.co/papers/2405.21046).
34
 
 
38
  - Transformers: 4.45.0.dev0
39
  - Pytorch: 2.4.1
40
  - Datasets: 3.0.0
41
+ - Tokenizers: 0.19.1
42
+
43
+ ## Citations
44
+
45
+ Cite XPO as:
46
+
47
+ ```bibtex
48
+ @article{jung2024binary,
49
+ title = {{Binary Classifier Optimization for Large Language Model Alignment}},
50
+ author = {Seungjae Jung and Gunsoo Han and Daniel Wontae Nam and Kyoung{-}Woon On},
51
+ year = 2024,
52
+ eprint = {arXiv:2404.04656}
53
+ }
54
+
55
+ ```
56
+
57
+ Cite TRL as:
58
+
59
+ ```bibtex
60
+
61
+ @misc{vonwerra2022trl,
62
+ title = {{TRL: Transformer Reinforcement Learning}},
63
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
64
+ year = 2020,
65
+ journal = {GitHub repository},
66
+ publisher = {GitHub},
67
+ howpublished = {\url{https://github.com/huggingface/trl}}
68
+
69
+ }
70
+ ```