leaderboard-pt-pr-bot commited on
Commit
34bbd8c
•
1 Parent(s): b759072

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +167 -2
README.md CHANGED
@@ -1,10 +1,157 @@
1
  ---
 
 
2
  license: gemma
3
  datasets:
4
  - openbmb/UltraFeedback
5
- language:
6
- - en
7
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
  Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
10
 
@@ -72,3 +219,21 @@ The following hyperparameters were used during training:
72
  }
73
  ```
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: gemma
5
  datasets:
6
  - openbmb/UltraFeedback
 
 
7
  pipeline_tag: text-generation
8
+ model-index:
9
+ - name: Gemma-2-9B-It-SPPO-Iter2
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ name: Text Generation
14
+ dataset:
15
+ name: ENEM Challenge (No Images)
16
+ type: eduagarcia/enem_challenge
17
+ split: train
18
+ args:
19
+ num_few_shot: 3
20
+ metrics:
21
+ - type: acc
22
+ value: 73.69
23
+ name: accuracy
24
+ source:
25
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
26
+ name: Open Portuguese LLM Leaderboard
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: BLUEX (No Images)
32
+ type: eduagarcia-temp/BLUEX_without_images
33
+ split: train
34
+ args:
35
+ num_few_shot: 3
36
+ metrics:
37
+ - type: acc
38
+ value: 63.0
39
+ name: accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
42
+ name: Open Portuguese LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: OAB Exams
48
+ type: eduagarcia/oab_exams
49
+ split: train
50
+ args:
51
+ num_few_shot: 3
52
+ metrics:
53
+ - type: acc
54
+ value: 53.12
55
+ name: accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
58
+ name: Open Portuguese LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: Assin2 RTE
64
+ type: assin2
65
+ split: test
66
+ args:
67
+ num_few_shot: 15
68
+ metrics:
69
+ - type: f1_macro
70
+ value: 94.07
71
+ name: f1-macro
72
+ source:
73
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
74
+ name: Open Portuguese LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: Assin2 STS
80
+ type: eduagarcia/portuguese_benchmark
81
+ split: test
82
+ args:
83
+ num_few_shot: 15
84
+ metrics:
85
+ - type: pearson
86
+ value: 78.28
87
+ name: pearson
88
+ source:
89
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
90
+ name: Open Portuguese LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: FaQuAD NLI
96
+ type: ruanchaves/faquad-nli
97
+ split: test
98
+ args:
99
+ num_few_shot: 15
100
+ metrics:
101
+ - type: f1_macro
102
+ value: 77.46
103
+ name: f1-macro
104
+ source:
105
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
106
+ name: Open Portuguese LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: HateBR Binary
112
+ type: ruanchaves/hatebr
113
+ split: test
114
+ args:
115
+ num_few_shot: 25
116
+ metrics:
117
+ - type: f1_macro
118
+ value: 87.65
119
+ name: f1-macro
120
+ source:
121
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
122
+ name: Open Portuguese LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: PT Hate Speech Binary
128
+ type: hate_speech_portuguese
129
+ split: test
130
+ args:
131
+ num_few_shot: 25
132
+ metrics:
133
+ - type: f1_macro
134
+ value: 71.13
135
+ name: f1-macro
136
+ source:
137
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
138
+ name: Open Portuguese LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: tweetSentBR
144
+ type: eduagarcia/tweetsentbr_fewshot
145
+ split: test
146
+ args:
147
+ num_few_shot: 25
148
+ metrics:
149
+ - type: f1_macro
150
+ value: 69.4
151
+ name: f1-macro
152
+ source:
153
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
154
+ name: Open Portuguese LLM Leaderboard
155
  ---
156
  Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
157
 
 
219
  }
220
  ```
221
 
222
+
223
+ # Open Portuguese LLM Leaderboard Evaluation Results
224
+
225
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
226
+
227
+ | Metric | Value |
228
+ |--------------------------|--------|
229
+ |Average |**74.2**|
230
+ |ENEM Challenge (No Images)| 73.69|
231
+ |BLUEX (No Images) | 63|
232
+ |OAB Exams | 53.12|
233
+ |Assin2 RTE | 94.07|
234
+ |Assin2 STS | 78.28|
235
+ |FaQuAD NLI | 77.46|
236
+ |HateBR Binary | 87.65|
237
+ |PT Hate Speech Binary | 71.13|
238
+ |tweetSentBR | 69.40|
239
+