Commit
4dc8fd2
1 Parent(s): 863ba2a

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (899003d0138b2d37ccaf9a1954ba8b766722239f)
- Fixing some errors of the leaderboard evaluation results in the ModelCard yaml (bc5b8a3f949b51ad6cd681ceaf2de906bc861c38)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +173 -10
README.md CHANGED
@@ -1,18 +1,165 @@
1
  ---
2
- tags:
3
- - text-generation
4
- - pytorch
5
- - LLM
6
- - Portuguese
7
- - Llama 2
8
- inference: false
9
- license: llama2
10
  language:
11
  - pt
12
- pipeline_tag: text-generation
13
  library_name: transformers
 
 
 
 
 
 
14
  datasets:
15
  - dominguesm/CC-MAIN-2023-23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  <p align="center">
@@ -94,4 +241,20 @@ Glória, e sua governanta, a governanta Josefa. No entanto, no outono de
94
  Capitu, uma moça de 14 anos, que se tornará sua companheira por muitos anos.
95
  ```
96
 
97
- **NOTE**: README under construction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
2
  language:
3
  - pt
4
+ license: llama2
5
  library_name: transformers
6
+ tags:
7
+ - text-generation
8
+ - pytorch
9
+ - LLM
10
+ - Portuguese
11
+ - Llama 2
12
  datasets:
13
  - dominguesm/CC-MAIN-2023-23
14
+ inference: false
15
+ pipeline_tag: text-generation
16
+ model-index:
17
+ - name: Canarim-7B-Instruct
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: ENEM Challenge (No Images)
24
+ type: eduagarcia/enem_challenge
25
+ split: train
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc
30
+ value: 27.5
31
+ name: accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
34
+ name: Open Portuguese LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: BLUEX (No Images)
40
+ type: eduagarcia-temp/BLUEX_without_images
41
+ split: train
42
+ args:
43
+ num_few_shot: 3
44
+ metrics:
45
+ - type: acc
46
+ value: 26.15
47
+ name: accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
50
+ name: Open Portuguese LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: OAB Exams
56
+ type: eduagarcia/oab_exams
57
+ split: train
58
+ args:
59
+ num_few_shot: 3
60
+ metrics:
61
+ - type: acc
62
+ value: 29.93
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
66
+ name: Open Portuguese LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: Assin2 RTE
72
+ type: assin2
73
+ split: test
74
+ args:
75
+ num_few_shot: 15
76
+ metrics:
77
+ - type: f1_macro
78
+ value: 75.74
79
+ name: f1-macro
80
+ source:
81
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
82
+ name: Open Portuguese LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: Assin2 STS
88
+ type: eduagarcia/portuguese_benchmark
89
+ split: test
90
+ args:
91
+ num_few_shot: 15
92
+ metrics:
93
+ - type: pearson
94
+ value: 12.08
95
+ name: pearson
96
+ source:
97
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
98
+ name: Open Portuguese LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: FaQuAD NLI
104
+ type: ruanchaves/faquad-nli
105
+ split: test
106
+ args:
107
+ num_few_shot: 15
108
+ metrics:
109
+ - type: f1_macro
110
+ value: 43.92
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: HateBR Binary
120
+ type: ruanchaves/hatebr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 79.57
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: PT Hate Speech Binary
136
+ type: hate_speech_portuguese
137
+ split: test
138
+ args:
139
+ num_few_shot: 25
140
+ metrics:
141
+ - type: f1_macro
142
+ value: 64.01
143
+ name: f1-macro
144
+ source:
145
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
146
+ name: Open Portuguese LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: tweetSentBR
152
+ type: eduagarcia-temp/tweetsentbr
153
+ split: test
154
+ args:
155
+ num_few_shot: 25
156
+ metrics:
157
+ - type: f1_macro
158
+ value: 66.0
159
+ name: f1-macro
160
+ source:
161
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=dominguesm/Canarim-7B-Instruct
162
+ name: Open Portuguese LLM Leaderboard
163
  ---
164
 
165
  <p align="center">
 
241
  Capitu, uma moça de 14 anos, que se tornará sua companheira por muitos anos.
242
  ```
243
 
244
+ **NOTE**: README under construction
245
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
246
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/Canarim-7B-Instruct)
247
+
248
+ | Metric | Value |
249
+ |--------------------------|---------|
250
+ |Average |**47.21**|
251
+ |ENEM Challenge (No Images)| 27.50|
252
+ |BLUEX (No Images) | 26.15|
253
+ |OAB Exams | 29.93|
254
+ |Assin2 RTE | 75.74|
255
+ |Assin2 STS | 12.08|
256
+ |FaQuAD NLI | 43.92|
257
+ |HateBR Binary | 79.57|
258
+ |PT Hate Speech Binary | 64.01|
259
+ |tweetSentBR | 66|
260
+