leaderboard-pt-pr-bot
commited on
Commit
•
41f55d6
1
Parent(s):
0b29f9a
Adding the Open Portuguese LLM Leaderboard Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard
The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions
README.md
CHANGED
@@ -13,9 +13,9 @@ tags:
|
|
13 |
- preference
|
14 |
- ultrafeedback
|
15 |
- moe
|
|
|
16 |
datasets:
|
17 |
- argilla/ultrafeedback-binarized-preferences-cleaned
|
18 |
-
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
|
19 |
pipeline_tag: text-generation
|
20 |
model-index:
|
21 |
- name: notux-8x7b-v1
|
@@ -108,3 +108,21 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
108 |
|Winogrande (5-shot) |81.61|
|
109 |
|GSM8k (5-shot) |61.64|
|
110 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
- preference
|
14 |
- ultrafeedback
|
15 |
- moe
|
16 |
+
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
|
17 |
datasets:
|
18 |
- argilla/ultrafeedback-binarized-preferences-cleaned
|
|
|
19 |
pipeline_tag: text-generation
|
20 |
model-index:
|
21 |
- name: notux-8x7b-v1
|
|
|
108 |
|Winogrande (5-shot) |81.61|
|
109 |
|GSM8k (5-shot) |61.64|
|
110 |
|
111 |
+
|
112 |
+
# Open Portuguese LLM Leaderboard Evaluation Results
|
113 |
+
|
114 |
+
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/argilla/notux-8x7b-v1) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
|
115 |
+
|
116 |
+
| Metric | Value |
|
117 |
+
|--------------------------|--------|
|
118 |
+
|Average |**73.1**|
|
119 |
+
|ENEM Challenge (No Images)| 70.96|
|
120 |
+
|BLUEX (No Images) | 60.22|
|
121 |
+
|OAB Exams | 49.52|
|
122 |
+
|Assin2 RTE | 92.66|
|
123 |
+
|Assin2 STS | 82.40|
|
124 |
+
|FaQuAD NLI | 79.85|
|
125 |
+
|HateBR Binary | 77.91|
|
126 |
+
|PT Hate Speech Binary | 73.30|
|
127 |
+
|tweetSentBR | 71.08|
|
128 |
+
|