Weyaxi leaderboard-pr-bot commited on
Commit
403c2e2
1 Parent(s): c84ec95

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (7173a4b2b60baff6590a7c81770187e241bae629)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -57,6 +57,109 @@ datasets:
57
  - HuggingFaceH4/no_robots
58
  - OpenAssistant/oasst_top1_2023-08-25
59
  - WizardLM/WizardLM_evol_instruct_70k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ---
61
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/5s12oq859qLfDkkTNam_C.png)
62
 
@@ -288,4 +391,17 @@ Thanks to all open source AI community.
288
 
289
  If you would like to support me:
290
 
291
- [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  - HuggingFaceH4/no_robots
58
  - OpenAssistant/oasst_top1_2023-08-25
59
  - WizardLM/WizardLM_evol_instruct_70k
60
+ model-index:
61
+ - name: Einstein-v6.1-Llama3-8B
62
+ results:
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: AI2 Reasoning Challenge (25-Shot)
68
+ type: ai2_arc
69
+ config: ARC-Challenge
70
+ split: test
71
+ args:
72
+ num_few_shot: 25
73
+ metrics:
74
+ - type: acc_norm
75
+ value: 62.46
76
+ name: normalized accuracy
77
+ source:
78
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: HellaSwag (10-Shot)
85
+ type: hellaswag
86
+ split: validation
87
+ args:
88
+ num_few_shot: 10
89
+ metrics:
90
+ - type: acc_norm
91
+ value: 82.41
92
+ name: normalized accuracy
93
+ source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: MMLU (5-Shot)
101
+ type: cais/mmlu
102
+ config: all
103
+ split: test
104
+ args:
105
+ num_few_shot: 5
106
+ metrics:
107
+ - type: acc
108
+ value: 66.19
109
+ name: accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: TruthfulQA (0-shot)
118
+ type: truthful_qa
119
+ config: multiple_choice
120
+ split: validation
121
+ args:
122
+ num_few_shot: 0
123
+ metrics:
124
+ - type: mc2
125
+ value: 55.1
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
128
+ name: Open LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: Winogrande (5-shot)
134
+ type: winogrande
135
+ config: winogrande_xl
136
+ split: validation
137
+ args:
138
+ num_few_shot: 5
139
+ metrics:
140
+ - type: acc
141
+ value: 79.32
142
+ name: accuracy
143
+ source:
144
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
145
+ name: Open LLM Leaderboard
146
+ - task:
147
+ type: text-generation
148
+ name: Text Generation
149
+ dataset:
150
+ name: GSM8k (5-shot)
151
+ type: gsm8k
152
+ config: main
153
+ split: test
154
+ args:
155
+ num_few_shot: 5
156
+ metrics:
157
+ - type: acc
158
+ value: 66.11
159
+ name: accuracy
160
+ source:
161
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
162
+ name: Open LLM Leaderboard
163
  ---
164
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/5s12oq859qLfDkkTNam_C.png)
165
 
 
391
 
392
  If you would like to support me:
393
 
394
+ [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
395
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
396
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v6.1-Llama3-8B)
397
+
398
+ | Metric |Value|
399
+ |---------------------------------|----:|
400
+ |Avg. |68.60|
401
+ |AI2 Reasoning Challenge (25-Shot)|62.46|
402
+ |HellaSwag (10-Shot) |82.41|
403
+ |MMLU (5-Shot) |66.19|
404
+ |TruthfulQA (0-shot) |55.10|
405
+ |Winogrande (5-shot) |79.32|
406
+ |GSM8k (5-shot) |66.11|
407
+