angelahzyuan
commited on
Commit
•
8201064
1
Parent(s):
7380dd4
Update README.md
Browse files
README.md
CHANGED
@@ -37,10 +37,9 @@ This model was developed using [Self-Play Preference Optimization](https://arxiv
|
|
37 |
| Mistral7B-PairRM-SPPO Iter 1 | 24.79 | 23.51 | 1855 |
|
38 |
| Mistral7B-PairRM-SPPO Iter 2 | 26.89 | 27.62 | 2019 |
|
39 |
| Mistral7B-PairRM-SPPO Iter 3 | 28.53 | 31.02 | 2163 |
|
40 |
-
| Mistral7B-PairRM-SPPO Iter 1 (best-of-16) |
|
41 |
-
| Mistral7B-PairRM-SPPO Iter 2 (best-of-16) |
|
42 |
-
| Mistral7B-PairRM-SPPO Iter 3 (best-of-16) |
|
43 |
-
|
44 |
## [Arena-Hard Evaluation Results](https://github.com/lm-sys/arena-hard)
|
45 |
|
46 |
Model | Score | 95% CI | average \# Tokens |
|
|
|
37 |
| Mistral7B-PairRM-SPPO Iter 1 | 24.79 | 23.51 | 1855 |
|
38 |
| Mistral7B-PairRM-SPPO Iter 2 | 26.89 | 27.62 | 2019 |
|
39 |
| Mistral7B-PairRM-SPPO Iter 3 | 28.53 | 31.02 | 2163 |
|
40 |
+
| Mistral7B-PairRM-SPPO Iter 1 (best-of-16) | 28.71 | 27.77 | 1901 |
|
41 |
+
| Mistral7B-PairRM-SPPO Iter 2 (best-of-16) | 31.23 | 32.12 | 2035 |
|
42 |
+
| Mistral7B-PairRM-SPPO Iter 3 (best-of-16) | 32.13 | 34.94 | 2174 |
|
|
|
43 |
## [Arena-Hard Evaluation Results](https://github.com/lm-sys/arena-hard)
|
44 |
|
45 |
Model | Score | 95% CI | average \# Tokens |
|