leaderboard-pr-bot commited on
Commit
ffe72f9
1 Parent(s): e21f570

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +36 -6
README.md CHANGED
@@ -1,5 +1,10 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  model-index:
4
  - name: OpenHermes-2.5-neural-chat-v3-3-Slerp
5
  results:
@@ -17,6 +22,9 @@ model-index:
17
  - type: acc_norm
18
  value: 68.09
19
  name: normalized accuracy
 
 
 
20
  - task:
21
  type: text-generation
22
  name: Text Generation
@@ -30,6 +38,9 @@ model-index:
30
  - type: acc_norm
31
  value: 86.2
32
  name: normalized accuracy
 
 
 
33
  - task:
34
  type: text-generation
35
  name: Text Generation
@@ -44,6 +55,9 @@ model-index:
44
  - type: acc
45
  value: 64.26
46
  name: accuracy
 
 
 
47
  - task:
48
  type: text-generation
49
  name: Text Generation
@@ -57,6 +71,8 @@ model-index:
57
  metrics:
58
  - type: mc2
59
  value: 62.78
 
 
60
  - task:
61
  type: text-generation
62
  name: Text Generation
@@ -71,6 +87,9 @@ model-index:
71
  - type: acc
72
  value: 79.16
73
  name: accuracy
 
 
 
74
  - task:
75
  type: text-generation
76
  name: Text Generation
@@ -85,11 +104,9 @@ model-index:
85
  - type: acc
86
  value: 67.78
87
  name: accuracy
88
- tags:
89
- - merge
90
- base_model:
91
- - teknium/OpenHermes-2.5-Mistral-7B
92
- - Intel/neural-chat-7b-v3-3
93
  ---
94
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/x44nNbPTpv0zGTqA1Jb2q.png)
95
 
@@ -176,4 +193,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
176
 
177
  If you would like to support me:
178
 
179
- [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - merge
5
+ base_model:
6
+ - teknium/OpenHermes-2.5-Mistral-7B
7
+ - Intel/neural-chat-7b-v3-3
8
  model-index:
9
  - name: OpenHermes-2.5-neural-chat-v3-3-Slerp
10
  results:
 
22
  - type: acc_norm
23
  value: 68.09
24
  name: normalized accuracy
25
+ - type: acc_norm
26
+ value: 68.09
27
+ name: normalized accuracy
28
  - task:
29
  type: text-generation
30
  name: Text Generation
 
38
  - type: acc_norm
39
  value: 86.2
40
  name: normalized accuracy
41
+ - type: acc_norm
42
+ value: 86.2
43
+ name: normalized accuracy
44
  - task:
45
  type: text-generation
46
  name: Text Generation
 
55
  - type: acc
56
  value: 64.26
57
  name: accuracy
58
+ - type: acc
59
+ value: 64.26
60
+ name: accuracy
61
  - task:
62
  type: text-generation
63
  name: Text Generation
 
71
  metrics:
72
  - type: mc2
73
  value: 62.78
74
+ - type: mc2
75
+ value: 62.78
76
  - task:
77
  type: text-generation
78
  name: Text Generation
 
87
  - type: acc
88
  value: 79.16
89
  name: accuracy
90
+ - type: acc
91
+ value: 79.16
92
+ name: accuracy
93
  - task:
94
  type: text-generation
95
  name: Text Generation
 
104
  - type: acc
105
  value: 67.78
106
  name: accuracy
107
+ - type: acc
108
+ value: 67.78
109
+ name: accuracy
 
 
110
  ---
111
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/x44nNbPTpv0zGTqA1Jb2q.png)
112
 
 
193
 
194
  If you would like to support me:
195
 
196
+ [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
197
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
198
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_PulsarAI__OpenHermes-2.5-neural-chat-v3-3-Slerp)
199
+
200
+ | Metric |Value|
201
+ |---------------------------------|----:|
202
+ |Avg. |71.38|
203
+ |AI2 Reasoning Challenge (25-Shot)|68.09|
204
+ |HellaSwag (10-Shot) |86.20|
205
+ |MMLU (5-Shot) |64.26|
206
+ |TruthfulQA (0-shot) |62.78|
207
+ |Winogrande (5-shot) |79.16|
208
+ |GSM8k (5-shot) |67.78|
209
+