leaderboard-pt-pr-bot commited on
Commit
6b85ca3
•
1 Parent(s): f06ca5c

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -1,6 +1,153 @@
1
  ---
2
  library_name: transformers
3
  tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
@@ -196,4 +343,23 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
  tags: []
4
+ model-index:
5
+ - name: Llama-3-8B-Dolphin-Portuguese-v0.3
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: ENEM Challenge (No Images)
12
+ type: eduagarcia/enem_challenge
13
+ split: train
14
+ args:
15
+ num_few_shot: 3
16
+ metrics:
17
+ - type: acc
18
+ value: 68.86
19
+ name: accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
22
+ name: Open Portuguese LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BLUEX (No Images)
28
+ type: eduagarcia-temp/BLUEX_without_images
29
+ split: train
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc
34
+ value: 57.86
35
+ name: accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
38
+ name: Open Portuguese LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: OAB Exams
44
+ type: eduagarcia/oab_exams
45
+ split: train
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc
50
+ value: 61.91
51
+ name: accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
54
+ name: Open Portuguese LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: Assin2 RTE
60
+ type: assin2
61
+ split: test
62
+ args:
63
+ num_few_shot: 15
64
+ metrics:
65
+ - type: f1_macro
66
+ value: 93.05
67
+ name: f1-macro
68
+ source:
69
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
70
+ name: Open Portuguese LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: Assin2 STS
76
+ type: eduagarcia/portuguese_benchmark
77
+ split: test
78
+ args:
79
+ num_few_shot: 15
80
+ metrics:
81
+ - type: pearson
82
+ value: 76.48
83
+ name: pearson
84
+ source:
85
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
86
+ name: Open Portuguese LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: FaQuAD NLI
92
+ type: ruanchaves/faquad-nli
93
+ split: test
94
+ args:
95
+ num_few_shot: 15
96
+ metrics:
97
+ - type: f1_macro
98
+ value: 76.78
99
+ name: f1-macro
100
+ source:
101
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
102
+ name: Open Portuguese LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: HateBR Binary
108
+ type: ruanchaves/hatebr
109
+ split: test
110
+ args:
111
+ num_few_shot: 25
112
+ metrics:
113
+ - type: f1_macro
114
+ value: 83.25
115
+ name: f1-macro
116
+ source:
117
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
118
+ name: Open Portuguese LLM Leaderboard
119
+ - task:
120
+ type: text-generation
121
+ name: Text Generation
122
+ dataset:
123
+ name: PT Hate Speech Binary
124
+ type: hate_speech_portuguese
125
+ split: test
126
+ args:
127
+ num_few_shot: 25
128
+ metrics:
129
+ - type: f1_macro
130
+ value: 68.85
131
+ name: f1-macro
132
+ source:
133
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
134
+ name: Open Portuguese LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: tweetSentBR
140
+ type: eduagarcia/tweetsentbr_fewshot
141
+ split: test
142
+ args:
143
+ num_few_shot: 25
144
+ metrics:
145
+ - type: f1_macro
146
+ value: 71.3
147
+ name: f1-macro
148
+ source:
149
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3
150
+ name: Open Portuguese LLM Leaderboard
151
  ---
152
 
153
  # Model Card for Model ID
 
343
 
344
  ## Model Card Contact
345
 
346
+ [More Information Needed]
347
+
348
+
349
+ # Open Portuguese LLM Leaderboard Evaluation Results
350
+
351
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.3) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
352
+
353
+ | Metric | Value |
354
+ |--------------------------|---------|
355
+ |Average |**73.15**|
356
+ |ENEM Challenge (No Images)| 68.86|
357
+ |BLUEX (No Images) | 57.86|
358
+ |OAB Exams | 61.91|
359
+ |Assin2 RTE | 93.05|
360
+ |Assin2 STS | 76.48|
361
+ |FaQuAD NLI | 76.78|
362
+ |HateBR Binary | 83.25|
363
+ |PT Hate Speech Binary | 68.85|
364
+ |tweetSentBR | 71.30|
365
+