Commit
cd98338
1 Parent(s): 4ae7957

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (5a2444834232e41353e466b99c404d93a8db1261)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +41 -34
README.md CHANGED
@@ -1,29 +1,29 @@
1
  ---
2
- license: llama2
3
  language:
4
- - en
5
- tags:
6
- - mistral
7
- - merge
8
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
9
  pipeline_tag: text-generation
10
  base_model:
11
- - Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
12
- - ehartford/dolphin-2.1-mistral-7b
13
- - Open-Orca/Mistral-7B-OpenOrca
14
- - bhenrym14/mistral-7b-platypus-fp16
15
- - ehartford/samantha-1.2-mistral-7b
16
- - iteknium/CollectiveCognition-v1.1-Mistral-7B
17
- - HuggingFaceH4/zephyr-7b-alpha
18
- datasets:
19
- - stingning/ultrachat
20
- - garage-bAInd/Open-Platypus
21
- - Open-Orca/OpenOrca
22
- - TIGER-Lab/MathInstruct
23
- - OpenAssistant/oasst_top1_2023-08-25
24
- - teknium/openhermes
25
- - meta-math/MetaMathQA
26
- - Open-Orca/SlimOrca
27
  model-index:
28
  - name: sethuiyer/SynthIQ-7b
29
  results:
@@ -42,8 +42,7 @@ model-index:
42
  value: 65.87
43
  name: normalized accuracy
44
  source:
45
- url: >-
46
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
47
  name: Open LLM Leaderboard
48
  - task:
49
  type: text-generation
@@ -59,8 +58,7 @@ model-index:
59
  value: 85.82
60
  name: normalized accuracy
61
  source:
62
- url: >-
63
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
64
  name: Open LLM Leaderboard
65
  - task:
66
  type: text-generation
@@ -77,8 +75,7 @@ model-index:
77
  value: 64.75
78
  name: accuracy
79
  source:
80
- url: >-
81
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
82
  name: Open LLM Leaderboard
83
  - task:
84
  type: text-generation
@@ -94,8 +91,7 @@ model-index:
94
  - type: mc2
95
  value: 57
96
  source:
97
- url: >-
98
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
99
  name: Open LLM Leaderboard
100
  - task:
101
  type: text-generation
@@ -112,8 +108,7 @@ model-index:
112
  value: 78.69
113
  name: accuracy
114
  source:
115
- url: >-
116
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
117
  name: Open LLM Leaderboard
118
  - task:
119
  type: text-generation
@@ -130,8 +125,7 @@ model-index:
130
  value: 64.06
131
  name: accuracy
132
  source:
133
- url: >-
134
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
135
  name: Open LLM Leaderboard
136
  ---
137
 
@@ -220,4 +214,17 @@ License is LLama2 license as uukuguy/speechless-mistral-six-in-one-7b is llama2
220
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sethuiyer__SynthIQ-7b)
221
 
222
  # [Nous Benchmark Evalation Results](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
223
- Detailed results can be found [here](https://gist.github.com/sethuiyer/f47dee388a4e95d46181c98d37d66a58)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
+ - en
4
+ license: llama2
 
 
5
  library_name: transformers
6
+ tags:
7
+ - mistral
8
+ - merge
9
+ datasets:
10
+ - stingning/ultrachat
11
+ - garage-bAInd/Open-Platypus
12
+ - Open-Orca/OpenOrca
13
+ - TIGER-Lab/MathInstruct
14
+ - OpenAssistant/oasst_top1_2023-08-25
15
+ - teknium/openhermes
16
+ - meta-math/MetaMathQA
17
+ - Open-Orca/SlimOrca
18
  pipeline_tag: text-generation
19
  base_model:
20
+ - Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
21
+ - ehartford/dolphin-2.1-mistral-7b
22
+ - Open-Orca/Mistral-7B-OpenOrca
23
+ - bhenrym14/mistral-7b-platypus-fp16
24
+ - ehartford/samantha-1.2-mistral-7b
25
+ - iteknium/CollectiveCognition-v1.1-Mistral-7B
26
+ - HuggingFaceH4/zephyr-7b-alpha
 
 
 
 
 
 
 
 
 
27
  model-index:
28
  - name: sethuiyer/SynthIQ-7b
29
  results:
 
42
  value: 65.87
43
  name: normalized accuracy
44
  source:
45
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
 
46
  name: Open LLM Leaderboard
47
  - task:
48
  type: text-generation
 
58
  value: 85.82
59
  name: normalized accuracy
60
  source:
61
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
 
62
  name: Open LLM Leaderboard
63
  - task:
64
  type: text-generation
 
75
  value: 64.75
76
  name: accuracy
77
  source:
78
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
 
79
  name: Open LLM Leaderboard
80
  - task:
81
  type: text-generation
 
91
  - type: mc2
92
  value: 57
93
  source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
 
95
  name: Open LLM Leaderboard
96
  - task:
97
  type: text-generation
 
108
  value: 78.69
109
  name: accuracy
110
  source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
 
112
  name: Open LLM Leaderboard
113
  - task:
114
  type: text-generation
 
125
  value: 64.06
126
  name: accuracy
127
  source:
128
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
 
129
  name: Open LLM Leaderboard
130
  ---
131
 
 
214
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sethuiyer__SynthIQ-7b)
215
 
216
  # [Nous Benchmark Evalation Results](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
217
+ Detailed results can be found [here](https://gist.github.com/sethuiyer/f47dee388a4e95d46181c98d37d66a58)
218
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
219
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sethuiyer__SynthIQ-7b)
220
+
221
+ | Metric |Value|
222
+ |---------------------------------|----:|
223
+ |Avg. |69.37|
224
+ |AI2 Reasoning Challenge (25-Shot)|65.87|
225
+ |HellaSwag (10-Shot) |85.82|
226
+ |MMLU (5-Shot) |64.75|
227
+ |TruthfulQA (0-shot) |57.00|
228
+ |Winogrande (5-shot) |78.69|
229
+ |GSM8k (5-shot) |64.06|
230
+