Felladrin leaderboard-pr-bot commited on
Commit
9f472ac
β€’
1 Parent(s): e7f5066

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (12841743f988b039ee65cab970f5c04835f88bbb)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +162 -53
README.md CHANGED
@@ -12,59 +12,62 @@ datasets:
12
  - databricks/databricks-dolly-15k
13
  - THUDM/webglm-qa
14
  widget:
15
- - messages:
16
- - role: system
17
- content: You are a helpful assistant, who answers with empathy.
18
- - role: user
19
- content: Got a question for you!
20
- - role: assistant
21
- content: "Sure! What's it?"
22
- - role: user
23
- content: Why do you love cats so much!? 🐈
24
- - messages:
25
- - role: system
26
- content: "You are a helpful assistant who answers user's questions with empathy."
27
- - role: user
28
- content: Who is Mona Lisa?
29
- - messages:
30
- - role: system
31
- content: You are a helpful assistant who provides concise responses.
32
- - role: user
33
- content: Heya!
34
- - role: assistant
35
- content: Hi! How may I help you today?
36
- - role: user
37
- content: I need to build a simple website. Where should I start learning about web development?
38
- - messages:
39
- - role: user
40
- content: Invited some friends to come home today. Give me some ideas for games to play with them!
41
- - messages:
42
- - role: system
43
- content: "You are a helpful assistant who answers user's questions with details and curiosity."
44
- - role: user
45
- content: What are some potential applications for quantum computing?
46
- - messages:
47
- - role: system
48
- content: You are a helpful assistant who gives creative responses.
49
- - role: user
50
- content: Write the specs of a game about mages in a fantasy world.
51
- - messages:
52
- - role: system
53
- content: "You are a helpful assistant who answers user's questions with details."
54
- - role: user
55
- content: Tell me about the pros and cons of social media.
56
- - messages:
57
- - role: system
58
- content: "You are a helpful assistant who answers user's questions with confidence."
59
- - role: user
60
- content: What is a dog?
61
- - role: assistant
62
- content: 'A dog is a four-legged, domesticated animal that is a member of the class Mammalia,
63
- which includes all mammals. Dogs are known for their loyalty, playfulness, and
64
- ability to be trained for various tasks. They are also used for hunting, herding,
65
- and as service animals.'
66
- - role: user
67
- content: What is the color of an apple?
 
 
 
68
  inference:
69
  parameters:
70
  max_new_tokens: 250
@@ -174,6 +177,98 @@ model-index:
174
  source:
175
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
176
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  ---
178
 
179
  # A Llama Chat Model of 160M Parameters
@@ -255,3 +350,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
255
  |TruthfulQA (0-shot) |44.16|
256
  |Winogrande (5-shot) |51.30|
257
  |GSM8k (5-shot) | 0.00|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - databricks/databricks-dolly-15k
13
  - THUDM/webglm-qa
14
  widget:
15
+ - messages:
16
+ - role: system
17
+ content: You are a helpful assistant, who answers with empathy.
18
+ - role: user
19
+ content: Got a question for you!
20
+ - role: assistant
21
+ content: Sure! What's it?
22
+ - role: user
23
+ content: Why do you love cats so much!? 🐈
24
+ - messages:
25
+ - role: system
26
+ content: You are a helpful assistant who answers user's questions with empathy.
27
+ - role: user
28
+ content: Who is Mona Lisa?
29
+ - messages:
30
+ - role: system
31
+ content: You are a helpful assistant who provides concise responses.
32
+ - role: user
33
+ content: Heya!
34
+ - role: assistant
35
+ content: Hi! How may I help you today?
36
+ - role: user
37
+ content: I need to build a simple website. Where should I start learning about
38
+ web development?
39
+ - messages:
40
+ - role: user
41
+ content: Invited some friends to come home today. Give me some ideas for games
42
+ to play with them!
43
+ - messages:
44
+ - role: system
45
+ content: You are a helpful assistant who answers user's questions with details
46
+ and curiosity.
47
+ - role: user
48
+ content: What are some potential applications for quantum computing?
49
+ - messages:
50
+ - role: system
51
+ content: You are a helpful assistant who gives creative responses.
52
+ - role: user
53
+ content: Write the specs of a game about mages in a fantasy world.
54
+ - messages:
55
+ - role: system
56
+ content: You are a helpful assistant who answers user's questions with details.
57
+ - role: user
58
+ content: Tell me about the pros and cons of social media.
59
+ - messages:
60
+ - role: system
61
+ content: You are a helpful assistant who answers user's questions with confidence.
62
+ - role: user
63
+ content: What is a dog?
64
+ - role: assistant
65
+ content: A dog is a four-legged, domesticated animal that is a member of the class
66
+ Mammalia, which includes all mammals. Dogs are known for their loyalty, playfulness,
67
+ and ability to be trained for various tasks. They are also used for hunting,
68
+ herding, and as service animals.
69
+ - role: user
70
+ content: What is the color of an apple?
71
  inference:
72
  parameters:
73
  max_new_tokens: 250
 
177
  source:
178
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
179
  name: Open LLM Leaderboard
180
+ - task:
181
+ type: text-generation
182
+ name: Text Generation
183
+ dataset:
184
+ name: IFEval (0-Shot)
185
+ type: HuggingFaceH4/ifeval
186
+ args:
187
+ num_few_shot: 0
188
+ metrics:
189
+ - type: inst_level_strict_acc and prompt_level_strict_acc
190
+ value: 15.75
191
+ name: strict accuracy
192
+ source:
193
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
194
+ name: Open LLM Leaderboard
195
+ - task:
196
+ type: text-generation
197
+ name: Text Generation
198
+ dataset:
199
+ name: BBH (3-Shot)
200
+ type: BBH
201
+ args:
202
+ num_few_shot: 3
203
+ metrics:
204
+ - type: acc_norm
205
+ value: 3.17
206
+ name: normalized accuracy
207
+ source:
208
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
209
+ name: Open LLM Leaderboard
210
+ - task:
211
+ type: text-generation
212
+ name: Text Generation
213
+ dataset:
214
+ name: MATH Lvl 5 (4-Shot)
215
+ type: hendrycks/competition_math
216
+ args:
217
+ num_few_shot: 4
218
+ metrics:
219
+ - type: exact_match
220
+ value: 0.0
221
+ name: exact match
222
+ source:
223
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
224
+ name: Open LLM Leaderboard
225
+ - task:
226
+ type: text-generation
227
+ name: Text Generation
228
+ dataset:
229
+ name: GPQA (0-shot)
230
+ type: Idavidrein/gpqa
231
+ args:
232
+ num_few_shot: 0
233
+ metrics:
234
+ - type: acc_norm
235
+ value: 1.01
236
+ name: acc_norm
237
+ source:
238
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
239
+ name: Open LLM Leaderboard
240
+ - task:
241
+ type: text-generation
242
+ name: Text Generation
243
+ dataset:
244
+ name: MuSR (0-shot)
245
+ type: TAUR-Lab/MuSR
246
+ args:
247
+ num_few_shot: 0
248
+ metrics:
249
+ - type: acc_norm
250
+ value: 3.17
251
+ name: acc_norm
252
+ source:
253
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
254
+ name: Open LLM Leaderboard
255
+ - task:
256
+ type: text-generation
257
+ name: Text Generation
258
+ dataset:
259
+ name: MMLU-PRO (5-shot)
260
+ type: TIGER-Lab/MMLU-Pro
261
+ config: main
262
+ split: test
263
+ args:
264
+ num_few_shot: 5
265
+ metrics:
266
+ - type: acc
267
+ value: 1.51
268
+ name: accuracy
269
+ source:
270
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
271
+ name: Open LLM Leaderboard
272
  ---
273
 
274
  # A Llama Chat Model of 160M Parameters
 
350
  |TruthfulQA (0-shot) |44.16|
351
  |Winogrande (5-shot) |51.30|
352
  |GSM8k (5-shot) | 0.00|
353
+
354
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
355
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Llama-160M-Chat-v1)
356
+
357
+ | Metric |Value|
358
+ |-------------------|----:|
359
+ |Avg. | 4.10|
360
+ |IFEval (0-Shot) |15.75|
361
+ |BBH (3-Shot) | 3.17|
362
+ |MATH Lvl 5 (4-Shot)| 0.00|
363
+ |GPQA (0-shot) | 1.01|
364
+ |MuSR (0-shot) | 3.17|
365
+ |MMLU-PRO (5-shot) | 1.51|
366
+