chrisociepa
commited on
Commit
•
4dee57c
1
Parent(s):
6b53686
Update README.md
Browse files
README.md
CHANGED
@@ -118,8 +118,7 @@ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/s
|
|
118 |
- Reader (Generator) - open book question answering task, commonly used in RAG
|
119 |
- Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
|
120 |
|
121 |
-
|
122 |
-
|
123 |
|
124 |
| | Average | RAG Reranking | RAG Reader | Perplexity |
|
125 |
|--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
|
@@ -132,7 +131,7 @@ Current scores of pretrained and continuously pretrained models according to Ope
|
|
132 |
| mistralai/Mistral-7B-v0.1 | 30.67 | 60.35 | 85.39 | 857.32 |
|
133 |
| internlm/internlm2-7b | 33.03 | 69.39 | 73.63 | 5498.23 |
|
134 |
| alpindale/Mistral-7B-v0.2-hf | 33.05 | 60.23 | 85.21 | 932.60 |
|
135 |
-
| speakleash/mistral-apt3-7B/spi-e0_hf
|
136 |
| | | | | |
|
137 |
| **Models with different sizes:** | | | | |
|
138 |
| sdadas/polish-gpt2-xl (1.7B) | -23.22 | 48.07 | 3.04 | 160.95 |
|
@@ -148,7 +147,10 @@ Current scores of pretrained and continuously pretrained models according to Ope
|
|
148 |
| [Bielik-7B-Instruct-v0.1](https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1) | 39.28 | 61.89 | 86.00 | 277.92 |
|
149 |
|
150 |
|
151 |
-
As you can see, Bielik-7B-v0.1 does not have the best Average score, but it has some clear advantages, e.g. the best score in the RAG Reader task.
|
|
|
|
|
|
|
152 |
|
153 |
|
154 |
## Limitations and Biases
|
@@ -201,7 +203,7 @@ The model could not have been created without the commitment and work of the ent
|
|
201 |
[Piotr Rybak](https://www.linkedin.com/in/piotrrybak/)
|
202 |
and many other wonderful researchers and enthusiasts of the AI world.
|
203 |
|
204 |
-
Members of the ACK Cyfronet AGH team:
|
205 |
[Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).
|
206 |
|
207 |
|
|
|
118 |
- Reader (Generator) - open book question answering task, commonly used in RAG
|
119 |
- Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
|
120 |
|
121 |
+
As of April 3, 2024, the following table showcases the current scores of pretrained and continuously pretrained models according to the Open PL LLM Leaderboard, evaluated in a 5-shot setting:
|
|
|
122 |
|
123 |
| | Average | RAG Reranking | RAG Reader | Perplexity |
|
124 |
|--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
|
|
|
131 |
| mistralai/Mistral-7B-v0.1 | 30.67 | 60.35 | 85.39 | 857.32 |
|
132 |
| internlm/internlm2-7b | 33.03 | 69.39 | 73.63 | 5498.23 |
|
133 |
| alpindale/Mistral-7B-v0.2-hf | 33.05 | 60.23 | 85.21 | 932.60 |
|
134 |
+
| speakleash/mistral-apt3-7B/spi-e0_hf (experimental) | **35.50** | **62.14** | 87.48 | 132.78 |
|
135 |
| | | | | |
|
136 |
| **Models with different sizes:** | | | | |
|
137 |
| sdadas/polish-gpt2-xl (1.7B) | -23.22 | 48.07 | 3.04 | 160.95 |
|
|
|
147 |
| [Bielik-7B-Instruct-v0.1](https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1) | 39.28 | 61.89 | 86.00 | 277.92 |
|
148 |
|
149 |
|
150 |
+
As you can see, Bielik-7B-v0.1 does not have the best Average score, but it has some clear advantages, e.g. the best score in the RAG Reader task.
|
151 |
+
|
152 |
+
The results in the above table were obtained without utilizing instruction templates for instructional models, instead treating them like base models.
|
153 |
+
This approach could skew the results, as instructional models are optimized with specific instructions in mind.
|
154 |
|
155 |
|
156 |
## Limitations and Biases
|
|
|
203 |
[Piotr Rybak](https://www.linkedin.com/in/piotrrybak/)
|
204 |
and many other wonderful researchers and enthusiasts of the AI world.
|
205 |
|
206 |
+
Members of the ACK Cyfronet AGH team providing valuable support and expertise:
|
207 |
[Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).
|
208 |
|
209 |
|