leaderboard-pr-bot commited on
Commit
7f45761
1 Parent(s): fddb2c5

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +187 -65
README.md CHANGED
@@ -1,73 +1,78 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  base_model: Felladrin/Minueza-32M-Base
4
  pipeline_tag: text-generation
5
- language:
6
- - en
7
- datasets:
8
- - databricks/databricks-dolly-15k
9
- - Felladrin/ChatML-databricks-dolly-15k
10
- - euclaise/reddit-instruct-curated
11
- - Felladrin/ChatML-reddit-instruct-curated
12
- - THUDM/webglm-qa
13
- - Felladrin/ChatML-WebGLM-QA
14
- - starfishmedical/webGPT_x_dolly
15
- - Felladrin/ChatML-webGPT_x_dolly
16
- - LDJnr/Capybara
17
- - Felladrin/ChatML-Capybara
18
- - Open-Orca/SlimOrca-Dedup
19
- - Felladrin/ChatML-SlimOrca-Dedup
20
- - HuggingFaceH4/ultrachat_200k
21
- - Felladrin/ChatML-ultrachat_200k
22
- - nvidia/HelpSteer
23
- - Felladrin/ChatML-HelpSteer
24
- - sablo/oasst2_curated
25
- - Felladrin/ChatML-oasst2_curated
26
- - CohereForAI/aya_dataset
27
- - Felladrin/ChatML-aya_dataset
28
- - argilla/distilabel-capybara-dpo-7k-binarized
29
- - Felladrin/ChatML-distilabel-capybara-dpo-7k-binarized
30
- - argilla/distilabel-intel-orca-dpo-pairs
31
- - Felladrin/ChatML-distilabel-intel-orca-dpo-pairs
32
- - argilla/ultrafeedback-binarized-preferences
33
- - Felladrin/ChatML-ultrafeedback-binarized-preferences
34
- - sablo/oasst2_dpo_pairs_en
35
- - Felladrin/ChatML-oasst2_dpo_pairs_en
36
- - NeuralNovel/Neural-DPO
37
- - Felladrin/ChatML-Neural-DPO
38
  widget:
39
- - messages:
40
- - role: system
41
- content: >-
42
- You are a career counselor. The user will provide you with an individual looking for guidance in their professional life, and your task is to assist them in determining what careers they are most suited for based on their skills, interests, and experience. You should also conduct research into the various options available, explain the job market trends in different industries, and advice on which qualifications would be beneficial for pursuing particular fields.
43
- - role: user
44
- content: Heya!
45
- - role: assistant
46
- content: Hi! How may I help you?
47
- - role: user
48
- content: >-
49
- I am interested in developing a career in software engineering. What
50
- would you recommend me to do?
51
- - messages:
52
- - role: system
53
- content: You are a highly knowledgeable assistant. Help the user as much as you can.
54
- - role: user
55
- content: How can I become a healthier person?
56
- - messages:
57
- - role: system
58
- content: You are a helpful assistant who gives creative responses.
59
- - role: user
60
- content: Write the specs of a game about mages in a fantasy world.
61
- - messages:
62
- - role: system
63
- content: You are a helpful assistant who answers user's questions with details.
64
- - role: user
65
- content: Tell me about the pros and cons of social media.
66
- - messages:
67
- - role: system
68
- content: You are a helpful assistant who answers user's questions with details and curiosity.
69
- - role: user
70
- content: What are some potential applications for quantum computing?
 
 
 
 
 
71
  inference:
72
  parameters:
73
  max_new_tokens: 250
@@ -76,6 +81,109 @@ inference:
76
  top_p: 0.55
77
  top_k: 35
78
  repetition_penalty: 1.176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ---
80
 
81
  # Minueza-32M-Chat: A chat model with 32 million parameters
@@ -182,3 +290,17 @@ For Direct Preference Optimization:
182
  | weight_decay | 0 |
183
  | warmup_ratio | 0.02 |
184
  | beta | 0.1 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ datasets:
6
+ - databricks/databricks-dolly-15k
7
+ - Felladrin/ChatML-databricks-dolly-15k
8
+ - euclaise/reddit-instruct-curated
9
+ - Felladrin/ChatML-reddit-instruct-curated
10
+ - THUDM/webglm-qa
11
+ - Felladrin/ChatML-WebGLM-QA
12
+ - starfishmedical/webGPT_x_dolly
13
+ - Felladrin/ChatML-webGPT_x_dolly
14
+ - LDJnr/Capybara
15
+ - Felladrin/ChatML-Capybara
16
+ - Open-Orca/SlimOrca-Dedup
17
+ - Felladrin/ChatML-SlimOrca-Dedup
18
+ - HuggingFaceH4/ultrachat_200k
19
+ - Felladrin/ChatML-ultrachat_200k
20
+ - nvidia/HelpSteer
21
+ - Felladrin/ChatML-HelpSteer
22
+ - sablo/oasst2_curated
23
+ - Felladrin/ChatML-oasst2_curated
24
+ - CohereForAI/aya_dataset
25
+ - Felladrin/ChatML-aya_dataset
26
+ - argilla/distilabel-capybara-dpo-7k-binarized
27
+ - Felladrin/ChatML-distilabel-capybara-dpo-7k-binarized
28
+ - argilla/distilabel-intel-orca-dpo-pairs
29
+ - Felladrin/ChatML-distilabel-intel-orca-dpo-pairs
30
+ - argilla/ultrafeedback-binarized-preferences
31
+ - Felladrin/ChatML-ultrafeedback-binarized-preferences
32
+ - sablo/oasst2_dpo_pairs_en
33
+ - Felladrin/ChatML-oasst2_dpo_pairs_en
34
+ - NeuralNovel/Neural-DPO
35
+ - Felladrin/ChatML-Neural-DPO
36
  base_model: Felladrin/Minueza-32M-Base
37
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  widget:
39
+ - messages:
40
+ - role: system
41
+ content: You are a career counselor. The user will provide you with an individual
42
+ looking for guidance in their professional life, and your task is to assist
43
+ them in determining what careers they are most suited for based on their skills,
44
+ interests, and experience. You should also conduct research into the various
45
+ options available, explain the job market trends in different industries, and
46
+ advice on which qualifications would be beneficial for pursuing particular fields.
47
+ - role: user
48
+ content: Heya!
49
+ - role: assistant
50
+ content: Hi! How may I help you?
51
+ - role: user
52
+ content: I am interested in developing a career in software engineering. What
53
+ would you recommend me to do?
54
+ - messages:
55
+ - role: system
56
+ content: You are a highly knowledgeable assistant. Help the user as much as you
57
+ can.
58
+ - role: user
59
+ content: How can I become a healthier person?
60
+ - messages:
61
+ - role: system
62
+ content: You are a helpful assistant who gives creative responses.
63
+ - role: user
64
+ content: Write the specs of a game about mages in a fantasy world.
65
+ - messages:
66
+ - role: system
67
+ content: You are a helpful assistant who answers user's questions with details.
68
+ - role: user
69
+ content: Tell me about the pros and cons of social media.
70
+ - messages:
71
+ - role: system
72
+ content: You are a helpful assistant who answers user's questions with details
73
+ and curiosity.
74
+ - role: user
75
+ content: What are some potential applications for quantum computing?
76
  inference:
77
  parameters:
78
  max_new_tokens: 250
 
81
  top_p: 0.55
82
  top_k: 35
83
  repetition_penalty: 1.176
84
+ model-index:
85
+ - name: Minueza-32M-Chat
86
+ results:
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: AI2 Reasoning Challenge (25-Shot)
92
+ type: ai2_arc
93
+ config: ARC-Challenge
94
+ split: test
95
+ args:
96
+ num_few_shot: 25
97
+ metrics:
98
+ - type: acc_norm
99
+ value: 20.39
100
+ name: normalized accuracy
101
+ source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Chat
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: HellaSwag (10-Shot)
109
+ type: hellaswag
110
+ split: validation
111
+ args:
112
+ num_few_shot: 10
113
+ metrics:
114
+ - type: acc_norm
115
+ value: 26.54
116
+ name: normalized accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Chat
119
+ name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: MMLU (5-Shot)
125
+ type: cais/mmlu
126
+ config: all
127
+ split: test
128
+ args:
129
+ num_few_shot: 5
130
+ metrics:
131
+ - type: acc
132
+ value: 25.75
133
+ name: accuracy
134
+ source:
135
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Chat
136
+ name: Open LLM Leaderboard
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: TruthfulQA (0-shot)
142
+ type: truthful_qa
143
+ config: multiple_choice
144
+ split: validation
145
+ args:
146
+ num_few_shot: 0
147
+ metrics:
148
+ - type: mc2
149
+ value: 47.27
150
+ source:
151
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Chat
152
+ name: Open LLM Leaderboard
153
+ - task:
154
+ type: text-generation
155
+ name: Text Generation
156
+ dataset:
157
+ name: Winogrande (5-shot)
158
+ type: winogrande
159
+ config: winogrande_xl
160
+ split: validation
161
+ args:
162
+ num_few_shot: 5
163
+ metrics:
164
+ - type: acc
165
+ value: 50.99
166
+ name: accuracy
167
+ source:
168
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Chat
169
+ name: Open LLM Leaderboard
170
+ - task:
171
+ type: text-generation
172
+ name: Text Generation
173
+ dataset:
174
+ name: GSM8k (5-shot)
175
+ type: gsm8k
176
+ config: main
177
+ split: test
178
+ args:
179
+ num_few_shot: 5
180
+ metrics:
181
+ - type: acc
182
+ value: 0.0
183
+ name: accuracy
184
+ source:
185
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Chat
186
+ name: Open LLM Leaderboard
187
  ---
188
 
189
  # Minueza-32M-Chat: A chat model with 32 million parameters
 
290
  | weight_decay | 0 |
291
  | warmup_ratio | 0.02 |
292
  | beta | 0.1 |
293
+
294
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
295
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Minueza-32M-Chat)
296
+
297
+ | Metric |Value|
298
+ |---------------------------------|----:|
299
+ |Avg. |28.49|
300
+ |AI2 Reasoning Challenge (25-Shot)|20.39|
301
+ |HellaSwag (10-Shot) |26.54|
302
+ |MMLU (5-Shot) |25.75|
303
+ |TruthfulQA (0-shot) |47.27|
304
+ |Winogrande (5-shot) |50.99|
305
+ |GSM8k (5-shot) | 0.00|
306
+