pszemraj leaderboard-pr-bot commited on
Commit
d7e029a
1 Parent(s): 8845b1d

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (9183dd10def4d6381f87e5a06fe52484263e35e4)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -154,6 +154,98 @@ model-index:
154
  source:
155
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
156
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  ---
158
 
159
 
@@ -194,3 +286,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
194
  |Winogrande (5-shot) |50.99|
195
  |GSM8k (5-shot) | 0.68|
196
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  source:
155
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
156
  name: Open LLM Leaderboard
157
+ - task:
158
+ type: text-generation
159
+ name: Text Generation
160
+ dataset:
161
+ name: IFEval (0-Shot)
162
+ type: HuggingFaceH4/ifeval
163
+ args:
164
+ num_few_shot: 0
165
+ metrics:
166
+ - type: inst_level_strict_acc and prompt_level_strict_acc
167
+ value: 23.86
168
+ name: strict accuracy
169
+ source:
170
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
171
+ name: Open LLM Leaderboard
172
+ - task:
173
+ type: text-generation
174
+ name: Text Generation
175
+ dataset:
176
+ name: BBH (3-Shot)
177
+ type: BBH
178
+ args:
179
+ num_few_shot: 3
180
+ metrics:
181
+ - type: acc_norm
182
+ value: 3.04
183
+ name: normalized accuracy
184
+ source:
185
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
186
+ name: Open LLM Leaderboard
187
+ - task:
188
+ type: text-generation
189
+ name: Text Generation
190
+ dataset:
191
+ name: MATH Lvl 5 (4-Shot)
192
+ type: hendrycks/competition_math
193
+ args:
194
+ num_few_shot: 4
195
+ metrics:
196
+ - type: exact_match
197
+ value: 0.0
198
+ name: exact match
199
+ source:
200
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
201
+ name: Open LLM Leaderboard
202
+ - task:
203
+ type: text-generation
204
+ name: Text Generation
205
+ dataset:
206
+ name: GPQA (0-shot)
207
+ type: Idavidrein/gpqa
208
+ args:
209
+ num_few_shot: 0
210
+ metrics:
211
+ - type: acc_norm
212
+ value: 0.78
213
+ name: acc_norm
214
+ source:
215
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
216
+ name: Open LLM Leaderboard
217
+ - task:
218
+ type: text-generation
219
+ name: Text Generation
220
+ dataset:
221
+ name: MuSR (0-shot)
222
+ type: TAUR-Lab/MuSR
223
+ args:
224
+ num_few_shot: 0
225
+ metrics:
226
+ - type: acc_norm
227
+ value: 9.07
228
+ name: acc_norm
229
+ source:
230
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
231
+ name: Open LLM Leaderboard
232
+ - task:
233
+ type: text-generation
234
+ name: Text Generation
235
+ dataset:
236
+ name: MMLU-PRO (5-shot)
237
+ type: TIGER-Lab/MMLU-Pro
238
+ config: main
239
+ split: test
240
+ args:
241
+ num_few_shot: 5
242
+ metrics:
243
+ - type: acc
244
+ value: 1.66
245
+ name: accuracy
246
+ source:
247
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
248
+ name: Open LLM Leaderboard
249
  ---
250
 
251
 
 
286
  |Winogrande (5-shot) |50.99|
287
  |GSM8k (5-shot) | 0.68|
288
 
289
+
290
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
291
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-GQA)
292
+
293
+ | Metric |Value|
294
+ |-------------------|----:|
295
+ |Avg. | 6.62|
296
+ |IFEval (0-Shot) |23.86|
297
+ |BBH (3-Shot) | 3.04|
298
+ |MATH Lvl 5 (4-Shot)| 0.00|
299
+ |GPQA (0-shot) | 0.78|
300
+ |MuSR (0-shot) | 9.07|
301
+ |MMLU-PRO (5-shot) | 1.66|
302
+