UCLA-AGI
/

Mistral7B-PairRM-SPPO-Iter2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

angelahzyuan commited on May 6

Commit

8201064

•

1 Parent(s): 7380dd4

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -37,10 +37,9 @@ This model was developed using [Self-Play Preference Optimization](https://arxiv
 | Mistral7B-PairRM-SPPO Iter 1              |     24.79    |   23.51  |     1855    |
 | Mistral7B-PairRM-SPPO Iter 2              |     26.89    |   27.62  |     2019    |
 | Mistral7B-PairRM-SPPO Iter 3              |     28.53    |   31.02  |     2163    |
-| Mistral7B-PairRM-SPPO Iter 1 (best-of-16) |     31.23    |   32.12  |     2035    |
-| Mistral7B-PairRM-SPPO Iter 2 (best-of-16) |     32.13    |   34.94  |     2174    |
-| Mistral7B-PairRM-SPPO Iter 3 (best-of-16) |     31.07    |   31.86  |     2036    |
 ## [Arena-Hard Evaluation Results](https://github.com/lm-sys/arena-hard)
 Model | Score | 95% CI | average \# Tokens |

 | Mistral7B-PairRM-SPPO Iter 1              |     24.79    |   23.51  |     1855    |
 | Mistral7B-PairRM-SPPO Iter 2              |     26.89    |   27.62  |     2019    |
 | Mistral7B-PairRM-SPPO Iter 3              |     28.53    |   31.02  |     2163    |
+| Mistral7B-PairRM-SPPO Iter 1 (best-of-16) |     28.71    |   27.77  |     1901    |
+| Mistral7B-PairRM-SPPO Iter 2 (best-of-16) |     31.23    |   32.12  |     2035    |
+| Mistral7B-PairRM-SPPO Iter 3 (best-of-16) |     32.13    |   34.94  |     2174    |
 ## [Arena-Hard Evaluation Results](https://github.com/lm-sys/arena-hard)
 Model | Score | 95% CI | average \# Tokens |