Commit
c24ebad
โ€ข
1 Parent(s): 071ac3c

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (a79c60cc5102075969f59803d9733633b733bc46)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +221 -66
README.md CHANGED
@@ -1,81 +1,222 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
- - en
 
5
  tags:
6
- - text-generation
7
  base_model: JackFram/llama-160m
8
  datasets:
9
- - ehartford/wizard_vicuna_70k_unfiltered
10
- - totally-not-an-llm/EverythingLM-data-V3
11
- - Open-Orca/SlimOrca-Dedup
12
- - databricks/databricks-dolly-15k
13
- - THUDM/webglm-qa
14
  widget:
15
- - text: |-
16
- <|im_start|>system
17
- You are a helpful assistant, who answers with empathy.<|im_end|>
18
- <|im_start|>user
19
- Got a question for you!<|im_end|>
20
- <|im_start|>assistant
21
- Sure! What's it?<|im_end|>
22
- <|im_start|>user
23
- Why do you love cats so much!? ๐Ÿˆ<|im_end|>
24
- <|im_start|>assistant
25
- - text: |-
26
- <|im_start|>system
27
- You are a helpful assistant who answers user's questions with empathy.<|im_end|>
28
- <|im_start|>user
29
- Who is Mona Lisa?<|im_end|>
30
- <|im_start|>assistant
31
- - text: |-
32
- <|im_start|>system
33
- You are a helpful assistant who provides concise responses.<|im_end|>
34
- <|im_start|>user
35
- Heya!<|im_end|>
36
- <|im_start|>assistant
37
- Hi! How may I help you today?<|im_end|>
38
- <|im_start|>user
39
- I need to build a simple website. Where should I start learning about web development?<|im_end|>
40
- <|im_start|>assistant
41
- - text: |-
42
- <|im_start|>user
43
- Invited some friends to come home today. Give me some ideas for games to play with them!<|im_end|>
44
- <|im_start|>assistant
45
- - text: |-
46
- <|im_start|>system
47
- You are a helpful assistant who answers user's questions with details and curiosity.<|im_end|>
48
- <|im_start|>user
49
- What are some potential applications for quantum computing?<|im_end|>
50
- <|im_start|>assistant
51
- - text: |-
52
- <|im_start|>system
53
- You are a helpful assistant who gives creative responses.<|im_end|>
54
- <|im_start|>user
55
- Write the specs of a game about mages in a fantasy world.<|im_end|>
56
- <|im_start|>assistant
57
- - text: |-
58
- <|im_start|>system
59
- You are a helpful assistant who answers user's questions with details.<|im_end|>
60
- <|im_start|>user
61
- Tell me about the pros and cons of social media.<|im_end|>
62
- <|im_start|>assistant
63
- - text: |-
64
- <|im_start|>system
65
- You are a helpful assistant who answers user's questions with confidence.<|im_end|>
66
- <|im_start|>user
67
- What is a dog?<|im_end|>
68
- <|im_start|>assistant
69
- A dog is a four-legged, domesticated animal that is a member of the class Mammalia, which includes all mammals. Dogs are known for their loyalty, playfulness, and ability to be trained for various tasks. They are also used for hunting, herding, and as service animals.<|im_end|>
70
- <|im_start|>user
71
- What is the color of an apple?<|im_end|>
72
- <|im_start|>assistant
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  inference:
74
  parameters:
75
  max_new_tokens: 250
76
  penalty_alpha: 0.5
77
  top_k: 4
78
  repetition_penalty: 1.01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ---
80
 
81
  # A Llama Chat Model of 160M Parameters
@@ -143,3 +284,17 @@ output = generate(
143
 
144
  print(output[0]["generated_text"])
145
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
+ - text-generation
7
  base_model: JackFram/llama-160m
8
  datasets:
9
+ - ehartford/wizard_vicuna_70k_unfiltered
10
+ - totally-not-an-llm/EverythingLM-data-V3
11
+ - Open-Orca/SlimOrca-Dedup
12
+ - databricks/databricks-dolly-15k
13
+ - THUDM/webglm-qa
14
  widget:
15
+ - text: '<|im_start|>system
16
+
17
+ You are a helpful assistant, who answers with empathy.<|im_end|>
18
+
19
+ <|im_start|>user
20
+
21
+ Got a question for you!<|im_end|>
22
+
23
+ <|im_start|>assistant
24
+
25
+ Sure! What''s it?<|im_end|>
26
+
27
+ <|im_start|>user
28
+
29
+ Why do you love cats so much!? ๐Ÿˆ<|im_end|>
30
+
31
+ <|im_start|>assistant'
32
+ - text: '<|im_start|>system
33
+
34
+ You are a helpful assistant who answers user''s questions with empathy.<|im_end|>
35
+
36
+ <|im_start|>user
37
+
38
+ Who is Mona Lisa?<|im_end|>
39
+
40
+ <|im_start|>assistant'
41
+ - text: '<|im_start|>system
42
+
43
+ You are a helpful assistant who provides concise responses.<|im_end|>
44
+
45
+ <|im_start|>user
46
+
47
+ Heya!<|im_end|>
48
+
49
+ <|im_start|>assistant
50
+
51
+ Hi! How may I help you today?<|im_end|>
52
+
53
+ <|im_start|>user
54
+
55
+ I need to build a simple website. Where should I start learning about web development?<|im_end|>
56
+
57
+ <|im_start|>assistant'
58
+ - text: '<|im_start|>user
59
+
60
+ Invited some friends to come home today. Give me some ideas for games to play
61
+ with them!<|im_end|>
62
+
63
+ <|im_start|>assistant'
64
+ - text: '<|im_start|>system
65
+
66
+ You are a helpful assistant who answers user''s questions with details and curiosity.<|im_end|>
67
+
68
+ <|im_start|>user
69
+
70
+ What are some potential applications for quantum computing?<|im_end|>
71
+
72
+ <|im_start|>assistant'
73
+ - text: '<|im_start|>system
74
+
75
+ You are a helpful assistant who gives creative responses.<|im_end|>
76
+
77
+ <|im_start|>user
78
+
79
+ Write the specs of a game about mages in a fantasy world.<|im_end|>
80
+
81
+ <|im_start|>assistant'
82
+ - text: '<|im_start|>system
83
+
84
+ You are a helpful assistant who answers user''s questions with details.<|im_end|>
85
+
86
+ <|im_start|>user
87
+
88
+ Tell me about the pros and cons of social media.<|im_end|>
89
+
90
+ <|im_start|>assistant'
91
+ - text: '<|im_start|>system
92
+
93
+ You are a helpful assistant who answers user''s questions with confidence.<|im_end|>
94
+
95
+ <|im_start|>user
96
+
97
+ What is a dog?<|im_end|>
98
+
99
+ <|im_start|>assistant
100
+
101
+ A dog is a four-legged, domesticated animal that is a member of the class Mammalia,
102
+ which includes all mammals. Dogs are known for their loyalty, playfulness, and
103
+ ability to be trained for various tasks. They are also used for hunting, herding,
104
+ and as service animals.<|im_end|>
105
+
106
+ <|im_start|>user
107
+
108
+ What is the color of an apple?<|im_end|>
109
+
110
+ <|im_start|>assistant'
111
  inference:
112
  parameters:
113
  max_new_tokens: 250
114
  penalty_alpha: 0.5
115
  top_k: 4
116
  repetition_penalty: 1.01
117
+ model-index:
118
+ - name: Llama-160M-Chat-v1
119
+ results:
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: AI2 Reasoning Challenge (25-Shot)
125
+ type: ai2_arc
126
+ config: ARC-Challenge
127
+ split: test
128
+ args:
129
+ num_few_shot: 25
130
+ metrics:
131
+ - type: acc_norm
132
+ value: 24.74
133
+ name: normalized accuracy
134
+ source:
135
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
136
+ name: Open LLM Leaderboard
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: HellaSwag (10-Shot)
142
+ type: hellaswag
143
+ split: validation
144
+ args:
145
+ num_few_shot: 10
146
+ metrics:
147
+ - type: acc_norm
148
+ value: 35.29
149
+ name: normalized accuracy
150
+ source:
151
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
152
+ name: Open LLM Leaderboard
153
+ - task:
154
+ type: text-generation
155
+ name: Text Generation
156
+ dataset:
157
+ name: MMLU (5-Shot)
158
+ type: cais/mmlu
159
+ config: all
160
+ split: test
161
+ args:
162
+ num_few_shot: 5
163
+ metrics:
164
+ - type: acc
165
+ value: 26.13
166
+ name: accuracy
167
+ source:
168
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
169
+ name: Open LLM Leaderboard
170
+ - task:
171
+ type: text-generation
172
+ name: Text Generation
173
+ dataset:
174
+ name: TruthfulQA (0-shot)
175
+ type: truthful_qa
176
+ config: multiple_choice
177
+ split: validation
178
+ args:
179
+ num_few_shot: 0
180
+ metrics:
181
+ - type: mc2
182
+ value: 44.16
183
+ source:
184
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
185
+ name: Open LLM Leaderboard
186
+ - task:
187
+ type: text-generation
188
+ name: Text Generation
189
+ dataset:
190
+ name: Winogrande (5-shot)
191
+ type: winogrande
192
+ config: winogrande_xl
193
+ split: validation
194
+ args:
195
+ num_few_shot: 5
196
+ metrics:
197
+ - type: acc
198
+ value: 51.3
199
+ name: accuracy
200
+ source:
201
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
202
+ name: Open LLM Leaderboard
203
+ - task:
204
+ type: text-generation
205
+ name: Text Generation
206
+ dataset:
207
+ name: GSM8k (5-shot)
208
+ type: gsm8k
209
+ config: main
210
+ split: test
211
+ args:
212
+ num_few_shot: 5
213
+ metrics:
214
+ - type: acc
215
+ value: 0.0
216
+ name: accuracy
217
+ source:
218
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-160M-Chat-v1
219
+ name: Open LLM Leaderboard
220
  ---
221
 
222
  # A Llama Chat Model of 160M Parameters
 
284
 
285
  print(output[0]["generated_text"])
286
  ```
287
+
288
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
289
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Llama-160M-Chat-v1)
290
+
291
+ | Metric |Value|
292
+ |---------------------------------|----:|
293
+ |Avg. |30.27|
294
+ |AI2 Reasoning Challenge (25-Shot)|24.74|
295
+ |HellaSwag (10-Shot) |35.29|
296
+ |MMLU (5-Shot) |26.13|
297
+ |TruthfulQA (0-shot) |44.16|
298
+ |Winogrande (5-shot) |51.30|
299
+ |GSM8k (5-shot) | 0.00|
300
+