MaziyarPanahi commited on
Commit
2aebce5
1 Parent(s): e85021f

Update README.md (#6)

Browse files

- Update README.md (b738e7d482fd845942a2f9ddf144acbe6a2c318b)

Files changed (1) hide show
  1. README.md +11 -16
README.md CHANGED
@@ -135,10 +135,19 @@ This model is suitable for a wide range of applications, including but not limit
135
 
136
  Coming soon
137
 
138
- # 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 
139
 
 
 
 
 
 
 
 
 
 
140
 
141
- Leaderboard 2: coming soon!
142
 
143
 
144
  | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
@@ -202,17 +211,3 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-2.3-qwen2-72b"
202
 
203
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
204
 
205
-
206
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
207
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.3-qwen2-72b)
208
-
209
- | Metric |Value|
210
- |-------------------|----:|
211
- |Avg. |30.17|
212
- |IFEval (0-Shot) |38.50|
213
- |BBH (3-Shot) |51.23|
214
- |MATH Lvl 5 (4-Shot)|14.73|
215
- |GPQA (0-shot) |16.22|
216
- |MuSR (0-shot) |11.24|
217
- |MMLU-PRO (5-shot) |49.10|
218
-
 
135
 
136
  Coming soon
137
 
138
+ # 🏆 [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
139
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.3-qwen2-72b)
140
 
141
+ | Metric |Value|
142
+ |-------------------|----:|
143
+ |Avg. |30.17|
144
+ |IFEval (0-Shot) |38.50|
145
+ |BBH (3-Shot) |51.23|
146
+ |MATH Lvl 5 (4-Shot)|14.73|
147
+ |GPQA (0-shot) |16.22|
148
+ |MuSR (0-shot) |11.24|
149
+ |MMLU-PRO (5-shot) |49.10|
150
 
 
151
 
152
 
153
  | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
 
211
 
212
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
213