angelahzyuan
commited on
Commit
•
9c2d8f2
1
Parent(s):
ce3e22a
Update README.md
Browse files
README.md
CHANGED
@@ -8,10 +8,11 @@ pipeline_tag: text-generation
|
|
8 |
---
|
9 |
Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
|
10 |
|
11 |
-
# Gemma-2-9B-It-SPPO-
|
12 |
|
13 |
This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 1, based on the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
|
14 |
|
|
|
15 |
|
16 |
## Links to Other Models
|
17 |
- [Gemma-2-9B-It-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1)
|
|
|
8 |
---
|
9 |
Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
|
10 |
|
11 |
+
# Gemma-2-9B-It-SPPO-Iter1
|
12 |
|
13 |
This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 1, based on the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
|
14 |
|
15 |
+
**Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent/verify/huggingface?returnModelRepoId=google/gemma-2-9b-it)
|
16 |
|
17 |
## Links to Other Models
|
18 |
- [Gemma-2-9B-It-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1)
|